Mongo Version: 3.2.8
Deployment is replica set with an arbiter.
Observing following issue during peak system utilization:
Primary MongoDB Server:
2020-03-25T00:06:18.640+0000 I COMMAND [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after repl: 310, after storageEngine: 490, after tcmalloc: 490, at end: 4970 }
2020-03-25T00:06:25.992+0000 I REPL [ReplicationExecutor] Starting an election, since we’ve seen no PRIMARY in the past 10000ms
Secondary MongoDB Server:
2020-03-25T00:06:26.657+0000 I REPL [ReplicationExecutor] Error in heartbeat request to rats2.sm2:33000; ExceededTimeLimit: Operation timed out
mongostat output for refrence during the failure:
insert query update delete getmore command flushes mapped vsize res faults qr|qw ar|aw netIn netOut conn set repl time
*18 *0 *42 *2 0 2|0 0 7.75G 18.9G 205M 1 0|0 0|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:15Z
*16 *0 *143 *0 0 1|0 0 7.75G 18.9G 204M 2 0|0 0|0 79b 12.9k 25 TBFM SEC 2020-04-09T11:24:16Z
*11 *0 *130 *2 0 11|0 0 7.75G 18.9G 202M 1 0|0 0|0 805b 29.4k 25 TBFM SEC 2020-04-09T11:24:17Z
*4 *0 *233 *1 0 2|0 0 7.75G 18.9G 204M 2 0|0 0|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:18Z
*3 *0 *232 *1 0 4|0 0 7.75G 18.9G 203M 0 0|0 0|0 353b 14.2k 25 TBFM SEC 2020-04-09T11:24:19Z
*2 *0 *80 *0 0 5|0 0 7.75G 18.9G 203M 0 0|0 0|0 311b 14.5k 25 TBFM SEC 2020-04-09T11:24:20Z
*79 *0 *328 *0 0 3|0 0 7.75G 18.9G 205M 5 0|0 0|0 295b 13.8k 25 TBFM SEC 2020-04-09T11:24:21Z
*14 *0 *258 *0 0 6|0 0 7.75G 18.9G 202M 2 0|0 0|0 369b 14.9k 25 TBFM SEC 2020-04-09T11:24:22Z
*2 *0 *170 *0 0 18|0 0 7.75G 18.9G 189M 0 0|1 0|0 1.29k 32.0k 25 TBFM SEC 2020-04-09T11:24:23Z
*1 *0 *13 *0 0 2|0 0 7.75G 18.9G 185M 1 0|1 0|0 137b 13.3k 25 TBFM SEC 2020-04-09T11:24:24Z
insert query update delete getmore command flushes mapped vsize res faults qr|qw ar|aw netIn netOut conn set repl time
*8 *0 *413 *0 0 2|0 0 7.75G 18.9G 183M 0 0|0 1|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:25Z
*12 *0 *274 *0 0 1|0 0 7.75G 18.9G 184M 0 0|0 0|0 79b 12.9k 25 TBFM SEC 2020-04-09T11:24:26Z
*8 *0 *210 *0 0 2|0 0 7.75G 18.9G 181M 0 0|0 0|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:27Z
*3 *0 *224 *0 0 2|0 0 7.75G 18.9G 180M 0 0|0 0|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:28Z
*6 *0 *134 *0 0 6|0 0 7.75G 18.9G 181M 0 0|0 0|0 465b 14.6k 25 TBFM SEC 2020-04-09T11:24:29Z
*7 *0 *190 *0 0 4|0 0 7.75G 18.9G 181M 2 0|0 0|1 253b 14.1k 25 TBFM SEC 2020-04-09T11:24:30Z
*7 *0 *364 *0 0 3|0 0 7.75G 18.9G 181M 0 0|0 0|0 295b 13.8k 25 TBFM SEC 2020-04-09T11:24:31Z
*4 *0 *245 *0 0 8|0 0 7.75G 18.9G 179M 0 0|0 0|0 485b 15.7k 25 TBFM SEC 2020-04-09T11:24:32Z
*7 *0 *207 *0 0 10|0 0 7.75G 18.9G 176M 2 0|0 0|0 801b 16.5k 25 TBFM SEC 2020-04-09T11:24:33Z
*7 *0 *115 *0 0 8|0 0 7.75G 18.9G 175M 1 0|0 0|0 513b 28.0k 25 TBFM SEC 2020-04-09T11:24:34Z
insert query update delete getmore command flushes mapped vsize res faults qr|qw ar|aw netIn netOut conn set repl time
*6 *0 *181 *0 0 2|0 0 7.75G 18.9G 177M 2 0|0 0|0 237b 13.4k 25 TBFM SEC 2020-04-09T11:24:35Z
*12 *0 *329 *0 0 7|0 0 7.75G 18.9G 177M 0 0|0 0|0 455b 27.6k 26 TBFM SEC 2020-04-09T11:24:36Z
*13 *0 *119 *0 0 5|0 0 7.75G 18.9G 175M 0 0|0 0|0 429b 14.7k 25 TBFM SEC 2020-04-09T11:24:37Z