MongoDB processes get hung up when trying to acquire lock

We have a three node MongoDB replica set deployed in our Prod environment. The primary mongod process gets hung up after running for 12 hours or so. We are able to see too many threads (around 15,000) stuck in the same stack,

#0 0x00007efd9c853c21 in do_futex_wait () from /lib64/
#1 0x00007efd9c853ce7 in __new_sem_wait_slow () from /lib64/
#2 0x00007efd9c853d85 in sem_timedwait () from /lib64/
#3 0x00005580f6d94c6c in mongo::TicketHolder::waitForTicketUntil(mongo::Date_t) ()
#4 0x00005580f681aedc in mongo::LockerImpl<false>::_lockGlobalBegin(mongo::LockMode, mongo::Duration<std::ratio<1l, 1000l> >) ()
#5 0x00005580f680a724 in mongo::Lock::GlobalLock::_enqueue(mongo::LockMode, unsigned int) ()
#6 0x00005580f680a79e in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, unsigned int, mongo::Lock::GlobalLock::EnqueueOnly) ()
#7 0x00005580f680a7e8 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, unsigned int) ()

Due to the sensitive nature of the db.logs they cannot be shared. The db logs had statements that showed around 14484 connections were open.

2020-05-22T13:38:49.633+0000 I NETWORK [listener] connection accepted from #18988 (14484 connections now open)


Without logs, unfortunately it’s difficult to say what went wrong.

However, if you’re not using the latest 4.2 series of MongoDB, you may experience the issue described in SERVER-35770. Upgrading to the latest MongoDB 4.2 series should resolve this.

I can’t say for sure if the number of connections you see is excessive or not. Are you running multiple copy of the app? Are they coded using proper connection pooling (e.g. by not calling MongoClient() multiple times during the life of the app)?

Best regards,

Hi @kevinadi,

Thanks for the reply.

We are not using transactions in our applications. Also the number of connections is excessive because we create a single MongoClient configured with max 100 Connections in the application.

The issue occurs only when using WiredTiger and in version 3.6.2. The same issue did not occur in 4.0.10.

Hi Raghu,

Since the issue in SERVER-35770 was fixed in MongoDB 4.0.2 and above (see the “fix version” entry in the ticket), it is plausible that you are hitting that issue.

Either way I’m glad that you have this resolved. In the meantime, I would suggest you to explore the possibility of moving to the newest minor version of the 4.0 branch, which is currently 4.0.19. There may be additional issues that you haven’t experienced yet that were fixed in the latest version.

Best regards,

1 Like