Hello,
Recently we upgraded our MongoDB cluster from 3.4 to 3.6. It is sharded PSA ( Priimary, Secondary, Arbitor) setup with an extra hidden secondary for backups.
After about 2 months of running we started to get secondaries ( including the hidden secondary ) start to fail with out of file descriptors. A simple lsof on the process shows it is at its max of 64K.
When restarting the secondary nodes with high number of file descriptors the logs are getting filled with messages of closing the map reduce temporary tables like
2020-12-04T18:55:49.537+0000 I REPL [signalProcessingThread] Completing collection drop for customerwtbtestcustomerorgid86.system.drop.1607080266i359t-1.tmp.mr.masters_28783 with drop optime { ts: Timestamp(1607080266, 359), t: -1 } (notification optime: { ts: Timestamp(4294967295, 4294967295), t: 9223372036854775807 })
2020-12-04T18:55:49.538+0000 I STORAGE [signalProcessingThread] Finishing collection drop for customerwtbtestcustomerorgid86.system.drop.1607080266i359t-1.tmp.mr.masters_28783 (no UUID).
Checking the optime timstamp they go back the 2 months since the upgrade.
We are able to monitor the leakage of file descriptors on our replicas including the hidden secondaries with this this metric
db.serverStatus().wiredTiger.connection. on the Files Currently Open