Mongos: Couldn't get a connection within the time limit

Hi,
From time to time one or two of my mongos instances gets into the state where it can’t connect replica set:

numYields:0 ok:0 errMsg:“Encountered non-retryable error during query :: caused by :: Couldn’t get a connection within the time limit” errName:NetworkInterfaceExceededTimeLimit errCode:202 reslen:342 protocol:op_msg 20038ms

After restart of the mongos everything is fine again. Do you have idea what may be cause of that ?

Here is version of my mongo installation:
[mongosMain] mongos version v4.2.3
[mongosMain] db version v4.2.3
[mongosMain] git version: 6874650b362138df74be53d366bbefc321ea32d4
[mongosMain] OpenSSL version: OpenSSL 1.0.2j-fips 26 [mongosMain] allocator: tcmalloc
[mongosMain] modules: none
[mongosMain] build environment:
[mongosMain] distmod: suse12
[mongosMain] distarch: x86_64
[mongosMain] target_arch: x86_64

Encountered similar issue for couple of mongos after upgrading to 4.0. I would also like to know what caused this and how to fix .

I still don’t know what is causing it. It doesn’t happen on router which is on the same machine as primary node so probably something with network. I will try to play with ShardingTaskExecutorPoolMinSize and ShardingTaskExecutorPoolMaxConnecting parameters.

Here is fragment of my log with network debug when such problem happens:

2020-05-13T13:21:18.631+0200 D3 NETWORK [ReplicaSetMonitor-TaskExecutor] Updating 10.122.129.44:27018 lastWriteDate to 2020-05-13T13:21:16.000+0200
2020-05-13T13:21:18.631+0200 D3 NETWORK [ReplicaSetMonitor-TaskExecutor] Updating 10.122.129.44:27018 opTime to { ts: Timestamp(1589368876, 1), t: 3 }
2020-05-13T13:21:18.631+0200 D1 NETWORK [ReplicaSetMonitor-TaskExecutor] Refreshing replica set crkid took 0ms
2020-05-13T13:21:18.631+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Returning ready connection to 10.122.129.44:27018
2020-05-13T13:21:18.631+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.44:27018 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:18.631+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.44:27018 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.45:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.45:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.44:27018 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.44:27018 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.44:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.44:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.43:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:18.833+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.43:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:18.903+0200 D4 CONNPOOL [TaskExecutorPool-0] Updating controller for 10.122.129.44:27018 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: true }
2020-05-13T13:21:18.903+0200 D4 CONNPOOL [TaskExecutorPool-0] Comparing connection state for 10.122.129.44:27018 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:19.071+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Updating controller for 10.122.129.43:27018 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:19.071+0200 D4 CONNPOOL [ReplicaSetMonitor-TaskExecutor] Comparing connection state for 10.122.129.43:27018 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Updating controller for 10.122.129.44:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Comparing connection state for 10.122.129.44:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Updating controller for 10.122.129.45:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Comparing connection state for 10.122.129.45:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Updating controller for 10.122.129.43:27019 with State: { requests: 0, ready: 1, pending: 0, active: 0, isExpired: false }
2020-05-13T13:21:19.107+0200 D4 CONNPOOL [ShardRegistry] Comparing connection state for 10.122.129.43:27019 to Controls: { maxPending: 2, target: 1, }
2020-05-13T13:21:19.252+0200 D2 ASIO [TaskExecutorPool-0] Failed to get connection from pool for request 19195893: NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time
limit
2020-05-13T13:21:19.252+0200 D2 ASIO [TaskExecutorPool-0] Failed to get connection from pool for request 19195894: NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time
limit
2020-05-13T13:21:19.252+0200 I NETWORK [conn1969] Marking host 10.122.129.43:27018 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time limit
2020-05-13T13:21:19.252+0200 I COMMAND [conn1969] command crkid-prod.crkid_dokument_status command: update { update: “crkid_dokument_status”, ordered: true, txnNumber: 4, $db: “crkid-prod”, $clu
sterTime: { clusterTime: Timestamp(1589368859, 22), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, lsid: { id: UUID(“f14f5162-2f43-4788-92b8-db2d0b11c46c”)
} } nShards:1 nMatched:0 nModified:0 numYields:0 reslen:407 protocol:op_msg 19999ms
2020-05-13T13:21:19.252+0200 I COMMAND [conn3081] command crkid-prod.crkid_dokument_status command: findAndModify { findAndModify: “crkid_dokument_status”, query: { _id: “CRKID#WPL.2019.01.10.00
4869” }, new: false, update: { $set: { synced: true } }, txnNumber: 18, $db: “crkid-prod”, $clusterTime: { clusterTime: Timestamp(1589368859, 22), signature: { hash: BinData(0, 0000000000000000000
000000000000000000000), keyId: 0 } }, lsid: { id: UUID(“e791e4a7-afd3-4c7e-9144-fd5248c50047”) } } numYields:0 ok:0 errMsg:“Couldn’t get a connection within the time limit” errName:NetworkInterfac
eExceededTimeLimit errCode:202 reslen:281 protocol:op_msg 19999ms
2020-05-13T13:21:19.253+0200 D2 ASIO [TaskExecutorPool-0] Failed to get connection from pool for request 19195895: NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time
limit
2020-05-13T13:21:19.253+0200 D2 ASIO [TaskExecutorPool-0] Failed to get connection from pool for request 19195896: NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time
limit

I am also encountering same error : errName:NetworkInterfaceExceededTimeLimit errCode:202

I have upgraded mongo to 4.2.6 and changed size of connection pool and it seems it helped:

taskExecutorPoolSize: 0
ShardingTaskExecutorPoolMinSize: 10
ShardingTaskExecutorPoolMaxConnecting: 5