Mongo 3.6 WiredTiger with Map Reduce High Number of Open Files on Replica Sets

Hello,

Recently we upgraded our MongoDB cluster from 3.4 to 3.6. It is sharded PSA ( Priimary, Secondary, Arbitor) setup with an extra hidden secondary for backups.

After about 2 months of running we started to get secondaries ( including the hidden secondary ) start to fail with out of file descriptors. A simple lsof on the process shows it is at its max of 64K.

When restarting the secondary nodes with high number of file descriptors the logs are getting filled with messages of closing the map reduce temporary tables like

2020-12-04T18:55:49.537+0000 I REPL [signalProcessingThread] Completing collection drop for customerwtbtestcustomerorgid86.system.drop.1607080266i359t-1.tmp.mr.masters_28783 with drop optime { ts: Timestamp(1607080266, 359), t: -1 } (notification optime: { ts: Timestamp(4294967295, 4294967295), t: 9223372036854775807 })

2020-12-04T18:55:49.538+0000 I STORAGE [signalProcessingThread] Finishing collection drop for customerwtbtestcustomerorgid86.system.drop.1607080266i359t-1.tmp.mr.masters_28783 (no UUID).

Checking the optime timstamp they go back the 2 months since the upgrade.

We are able to monitor the leakage of file descriptors on our replicas including the hidden secondaries with this this metric
db.serverStatus().wiredTiger.connection. on the Files Currently Open

Do you use XFS file system or EXT4?

Check this out:

Thank you for the response, we are using XFS. I will check out the link.

Hi @Jonathan_Stairs, welcome to the community!

This may be a long shot, but there was an issue where a sharded map-reduce fail to clean up temporary collections in some cases (SERVER-36966). According to the ticket, this was fixed in MongoDB 4.0.5. Is it possible for you to upgrade to at least the latest release in the 4.0 series (4.0.21) and see if the issue persists? Note that MongoDB 3.6 series will be out of support soon (April 2021).

Best regards,
Kevin

1 Like

Hello Kevin,

Thank you for the response, we are planning the next upgrade to 4.0 in February will have to watch for this fix then.