Typical logs messages on our config servers (replica set of 3 servers) report about slow queries, but not sure if that is the cause or a consequence of the problems:
(the node name “mongop_rtr1” is one of our router servers)
{"t":{"$date":"2020-10-02T02:59:27.634+02:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn3845","msg":"Slow query","attr":{"type":"command","ns":"config.$cmd","command":{"update":"mongos","bypassDocumentValidation":false,"ordered":true,"updates":[{"q":{"_id":"mongop_rtr1:27017"},"u":{"$set":{"_id":"mongop_rtr1:27017","ping":{"$date":"2020-10-02T00:59:27.533Z"},"up":6361,"waiting":true,"mongoVersion":"4.4.1","advisoryHostFQDNs":["mongop_rtr1.c5qbig.net"]}},"multi":false,"upsert":true}],"writeConcern":{"w":"majority","wtimeout":60000},"maxTimeMS":30000,"$replData":1,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1601600365,"i":1}},"signature":{"hash":{"$binary":{"base64":"dlGdgrHNtIx9THiJXv/LoISsF64=","subType":"0"}},"keyId":6821045628173287455}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1601600365,"i":1}},"t":12}},"$db":"config"},"numYields":0,"reslen":619,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"r":2,"w":1}},"Database":{"acquireCount":{"r":1,"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":2}},"oplog":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":1},"writeConcern":{"w":"majority","wtimeout":60000},"storage":{},"protocol":"op_msg","durationMillis":100}}
Another one from the logs of config server that might be relevant (or not, who can tell?) :
{"t":{"$date":"2020-10-02T12:14:25.743+02:00"},"s":"I", "c":"SHARDING", "id":4608400, "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Skipping sharded index consistency check because feature compatibility version is not fully upgraded"}
Checking further on this “feature compatibility version” log: I ran the following command against my primary server of the configs replica set :
db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )
which returned me the following output:
connecting to: mongodb://mongop_cfg1:27019/cdrarch?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("6fa9ef61-2a3e-4c4e-b50f-616ab9631119") }
MongoDB server version: 4.4.1
{
"featureCompatibilityVersion" : {
"version" : "4.2"
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff000000000000000c")
},
"lastCommittedOpTime" : Timestamp(1601634848, 67),
"$clusterTime" : {
"clusterTime" : Timestamp(1601634848, 67),
"signature" : {
"hash" : BinData(0,"7JA+LfjqaFLdPtYykM/0jUkVCUY="),
"keyId" : NumberLong("6821045628173287455")
}
},
"operationTime" : Timestamp(1601634848, 67)
}
Notice this version=4.2 ; is this as expected, since we are running with MongoDB binaries of version 4.4.1 ?
I have run this same command against the primaries of all my replica sets, and they all reply with this version=4.2, while running MongoDB binaries of version 4.41
-> is that ok, or not ?
Raised the log verbosity on the config shards for the “config” dbase from level 0 to 1, for the “QUERY” logs, and got very many of these logs:
{"t":{"$date":"2020-10-02T13:04:54.533+02:00"},"s":"D1", "c":"QUERY", "id":22790, "ctx":"conn463","msg":"Received interrupt request for unknown op","attr":{"opId":7396919}}
{"t":{"$date":"2020-10-02T13:04:54.615+02:00"},"s":"D1", "c":"QUERY", "id":22790, "ctx":"conn317","msg":"Received interrupt request for unknown op","attr":{"opId":7396921}}
{"t":{"$date":"2020-10-02T13:04:54.623+02:00"},"s":"D1", "c":"QUERY", "id":22790, "ctx":"conn20","msg":"Received interrupt request for unknown op","attr":{"opId":7396923}}
-> normal, or not ?
And finally, the reporting about terribly slow queries for the chunks:
{"t":{"$date":"2020-10-02T13:07:26.576+02:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn3834","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"_configsvrMoveChunk":1,"ns":"cdrarch.cdr_mobi_20200830","min":{"SHARD_MINSEC":2015.0},"max":{"SHARD_MINSEC":2016.0},"shard":"db_rs002","lastmod":{"$timestamp":{"t":3692,"i":1}},"lastmodEpoch":{"$oid":"5f53fc15da305f92f9cee0be"},"toShard":"db_rs036","maxChunkSizeBytes":67108864,"secondaryThrottle":{},"waitForDelete":true,"forceJumbo":false,"writeConcern":{"w":"majority","wtimeout":15000},"lsid":{"id":{"$uuid":"236d3eab-d1d8-4757-a5f7-0bea4c46f8a3"},"uid":{"$binary":{"base64":"YtJ8CVGJPpojGlBhlVfpmkB+TWiGCwPUvkGEjp5tty0=","subType":"0"}}},"$replData":1,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1601635925,"i":17}},"signature":{"hash":{"$binary":{"base64":"dsDAd22QCBRQKtavuAirYaWi3bI=","subType":"0"}},"keyId":6821045628173287455}},"$audit":{"$impersonatedUsers":[{"user":"mongo-admin","db":"admin"}],"$impersonatedRoles":[{"role":"root","db":"admin"}]},"$client":{"application":{"name":"MongoDB Shell"},"driver":{"name":"MongoDB Internal Client","version":"4.4.1"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"},"mongos":{"host":"mongop_rtr2:27017","client":"10.100.22.99:14298","version":"4.4.1"}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1601635925,"i":17}},"t":12}},"$db":"admin"},"numYields":0,"reslen":537,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":11}},"ReplicationStateTransition":{"acquireCount":{"w":14}},"Global":{"acquireCount":{"r":11,"w":3}},"Database":{"acquireCount":{"r":6,"w":3}},"Collection":{"acquireCount":{"r":2,"w":3}},"Mutex":{"acquireCount":{"r":9}},"oplog":{"acquireCount":{"r":4}}},"flowControl":{"acquireCount":3,"timeAcquiringMicros":3},"writeConcern":{"w":"majority","wtimeout":15000},"storage":{},"protocol":"op_msg","durationMillis":921059}}
And to confirm again that no single document needs to be moved, with these chunk movements: there are 0 (=zero) docs in this collection:
Totals
data : 0B docs : 0 chunks : 3601
Shard db_rs028 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs034 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs036 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs002 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs031 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs029 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs030 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs035 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs033 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B
Shard db_rs032 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B