We have a MongoDB replica set configured - primary, secondary & arbiter. In the past few weeks one of the instances has crashed multiple times. The logs show the following :
2020-08-28T12:14:20.570+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.578+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.612+0000 I SHARDING [conn54569] Marking collection *****.componenttypes as collection version: <unsharded>
2020-08-28T12:14:20.616+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.623+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.623+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.625+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.629+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.630+0000 W NETWORK [listener] Error accepting new connection TooManyFilesOpen: error in creating eventfd: Too many open files
2020-08-28T12:14:20.644+0000 I NETWORK [listener] Error accepting new connection on 0.0.0.0:27017: Too many open files
2020-08-28T12:14:20.644+0000 I NETWORK [listener] Error accepting new connection on 0.0.0.0:27017: Too many open files
2020-08-28T12:14:20.644+0000 I NETWORK [listener] Error accepting new connection on 0.0.0.0:27017: Too many open files
2020-08-28T12:14:20.644+0000 I NETWORK [listener] Error accepting new connection on 0.0.0.0:27017: Too many open files
The /lib/systemd/system/mongod.service file looks like this:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000 processes
Max open files 64000 64000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 64122 64122 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
You are hitting the operating system limit for the number of open file descriptors which is non unusual on database servers. Please refer to your operating system documentation for how to increase this limit.
@Willy_Latorre Setting with the ulimit command only effects the shell session. This parameter need to be set in the appropriate script(or init system) that starts mongod.
Thanks for all the responses. I had already set the Mongo recommended ulimit settings in the service file and the proc limits file showed that the 64000 was set. However when I ran ulimit -u as the Mongo user, it was returning 1024. I increased this limit to 100000, but the instance has crashed again.
Steps to increase limit to 100000
Add fs.file-max = 100000 to /etc/sysctl.conf
Refresh with new config - sudo sysctl -p
Edit the /etc/security/limits.conf file & add following lines to it
'* soft nofile 100000
'* hard nofile 100000
mongodb soft nofile 100000
mongodb hard nofile 100000
root soft nofile 100000
root hard nofile 100000
Add ‘session required pam_limits.so’ to /etc/pam.d/common-session
Logout and login and ‘ulimit -a’ as Mongo user returns 100000
The config in the service file has been in place for a long time, I inherited the system like this Although I recently upgraded this Mongo instance and issued systemctl daemon-reload at the Disable Transparent Huge Pages (THP) step, where I created the /etc/systemd/system/disable-transparent-huge-pages.service file.
Any other suggestions as to what I could be missing? I’m new to Mongo and your help is really appreciated.
This mongo instance has crashed a few more times since, with the below in the logs the latest time. Any other troubleshooting advice?
**2020-09-30T11:05:40.172+0000 I ACCESS [conn94647] Successfully authenticated as principal DbUser on admin from client 13.0.1.189:34528
2020-09-30T11:05:40.172+0000 I ACCESS [conn94559] Successfully authenticated as principal DbUser on admin from client 13.0.1.189:34356
2020-09-30T11:05:40.180+0000 I ACCESS [conn94566] Successfully authenticated as principal DbUser on admin from client 13.0.1.189:34358
2020-09-30T11:05:40.181+0000 E - [conn95032] cannot open /dev/urandom Too many open files
2020-09-30T11:05:40.181+0000 F - [conn95032] Fatal Assertion 28839 at src/mongo/platform/random.cpp 159
2020-09-30T11:05:40.183+0000 F - [conn95032]
***aborting after fassert() failure
2020-09-30T11:05:40.183+0000 E - [conn95028] cannot open /dev/urandom Too many open files
2020-09-30T11:05:40.183+0000 F - [conn95028] Fatal Assertion 28839 at src/mongo/platform/random.cpp 159
2020-09-30T11:05:40.183+0000 F - [conn95028]
***aborting after fassert() failure
2020-09-30T11:05:40.183+0000 E - [conn95031] cannot open /dev/urandom Too many open files
2020-09-30T11:05:40.183+0000 F - [conn95031] Fatal Assertion 28839 at src/mongo/platform/random.cpp 159
2020-09-30T11:05:40.183+0000 F - [conn95031]
***aborting after fassert() failure
2020-09-30T11:05:40.183+0000 E - [conn95025] cannot open /dev/urandom Too many open files
2020-09-30T11:05:40.183+0000 F - [conn95025] Fatal Assertion 28839 at src/mongo/platform/random.cpp 159
2020-09-30T11:05:40.183+0000 F - [conn95025]
***aborting after fassert() failure**
Thanks Chris. There seems to be thousands of open connections coming from the app. I looked into it further and can see the mongoose driver is outdated so getting that updated and hope it helps. Thanks for the help with this