Lab: Shard a Collection [Unable to validate_lab_shard_collection]

Hi, Can’t seem to get the validation working.
I’ve read the other threads but running the validation tool multiple times didn’t help.
I’ve also recreated the entire cluster (mongos, and 2 replicate sets).

I’m getting different errors from the validation tool (See below).
Any suggestions?


vagrant@m103:/shared$ validate_lab_shard_collection ; validate_lab_shard_collection

Client experienced a timeout when connecting to ‘m103-repl-2’ - check that mongod/mongos
processes are running on the correct ports, and that the ‘m103-admin’ user
authenticates against the admin database.

Replica set ‘m103-repl-2’ not configured correctly - make sure each node is started with
a wiredTiger cache size of 0.1 GB. Your cluster will crash in the following lab
if you don’t do this!
vagrant@m103:/shared$ validate_lab_shard_collection ; validate_lab_shard_collection

No documents found in m103.products - make sure you import the dataset into the
‘products’ collection in the ‘m103’ database.

Client experienced a timeout when connecting to ‘m103-repl-2’ - check that mongod/mongos
processes are running on the correct ports, and that the ‘m103-admin’ user
authenticates against the admin database.

First try to connect to m103-repl-2 directly with the mongo shell, the m103-admin user and m103-pass password.

If that fails, verify that m103-repl-2 mongod processes are started correctly.

If you can connect directly to m103-repl-2 replica set, start mongos and verify that both m103-repl and m103-repl-2 are part of the shard servers with sh.status().

I’m able to connect to m103-repl-2 with the shell, and the replica set is running.
I’ve put the rs.status() output below.

The shard status (rs.status()) from the connection to mongos also seems to be fine.

— Output of rs.status() for m103-repl-2: -----------------------
MongoDB Enterprise m103-repl-2:PRIMARY> rs.status()
{
“set” : “m103-repl-2”,
“date” : ISODate(“2018-11-02T14:20:21.459Z”),
“myState” : 1,
“term” : NumberLong(1),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“heartbeatIntervalMillis” : NumberLong(2000),
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“readConcernMajorityOpTime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“appliedOpTime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“durableOpTime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
}
},
“members” : [
{
“_id” : 0,
“name” : “192.168.103.100:27004”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 4020,
“optime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2018-11-02T14:20:16Z”),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“electionTime” : Timestamp(1541164449, 2),
“electionDate” : ISODate(“2018-11-02T13:14:09Z”),
“configVersion” : 3,
“self” : true,
“lastHeartbeatMessage” : “”
},
{
“_id” : 1,
“name” : “192.168.103.100:27005”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3920,
“optime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“optimeDurable” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2018-11-02T14:20:16Z”),
“optimeDurableDate” : ISODate(“2018-11-02T14:20:16Z”),
“lastHeartbeat” : ISODate(“2018-11-02T14:20:21.439Z”),
“lastHeartbeatRecv” : ISODate(“2018-11-02T14:20:19.556Z”),
“pingMs” : NumberLong(140),
“lastHeartbeatMessage” : “”,
“syncingTo” : “192.168.103.100:27004”,
“syncSourceHost” : “192.168.103.100:27004”,
“syncSourceId” : 0,
“infoMessage” : “”,
“configVersion” : 3
},
{
“_id” : 2,
“name” : “192.168.103.100:27006”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3917,
“optime” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“optimeDurable” : {
“ts” : Timestamp(1541168416, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2018-11-02T14:20:16Z”),
“optimeDurableDate” : ISODate(“2018-11-02T14:20:16Z”),
“lastHeartbeat” : ISODate(“2018-11-02T14:20:19.220Z”),
“lastHeartbeatRecv” : ISODate(“2018-11-02T14:20:20.452Z”),
“pingMs” : NumberLong(76),
“lastHeartbeatMessage” : “”,
“syncingTo” : “192.168.103.100:27004”,
“syncSourceHost” : “192.168.103.100:27004”,
“syncSourceId” : 0,
“infoMessage” : “”,
“configVersion” : 3
}
],
“ok” : 1,
“operationTime” : Timestamp(1541168416, 1),
“$gleStats” : {
“lastOpTime” : Timestamp(0, 0),
“electionId” : ObjectId(“7fffffff0000000000000001”)
},
“$configServerState” : {
“opTime” : {
“ts” : Timestamp(1541168418, 1),
“t” : NumberLong(2)
}
},
“$clusterTime” : {
“clusterTime” : Timestamp(1541168418, 1),
“signature” : {
“hash” : BinData(0,“aqNpWAl3EdI0Nd64inmneGI5l1M=”),
“keyId” : NumberLong(“6619245881101123611”)
}
}
}

– Output of sh.status() from mongos connection: ---------------------
MongoDB Enterprise mongos> sh.status()
— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“5bdc490f843b7b6216cff97d”)
}
shards:
{ “_id” : “m103-repl”, “host” : “m103-repl/192.168.103.100:27001,192.168.103.100:27002,192.168.103.100:27003”, “state” : 1 }
{ “_id” : “m103-repl-2”, “host” : “m103-repl-2/192.168.103.100:27004,192.168.103.100:27005,192.168.103.100:27006”, “state” : 1 }
active mongoses:
“3.6.8” : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
1 : Success
databases:
{ “_id” : “config”, “primary” : “config”, “partitioned” : true }
config.system.sessions
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
m103-repl 1
{ “_id” : { “$minKey” : 1 } } -->> { “_id” : { “$maxKey” : 1 } } on : m103-repl Timestamp(1, 0)
{ “_id” : “m103”, “primary” : “m103-repl”, “partitioned” : true }
m103.products
shard key: { “sku” : 1 }
unique: false
balancing: true
chunks:
m103-repl 2
m103-repl-2 1
{ “sku” : { “$minKey” : 1 } } -->> { “sku” : 23153496 } on : m103-repl-2 Timestamp(2, 0)
{ “sku” : 23153496 } -->> { “sku” : 28928914 } on : m103-repl Timestamp(2, 1)
{ “sku” : 28928914 } -->> { “sku” : { “$maxKey” : 1 } } on : m103-repl Timestamp(1, 2)

Indeed, this look ok.

You are may be experimenting real timeouts as pointed out in the following thread: “(FIXED) Error in validating Lab Shard Collection - Use m103.products

Thanks for the help.

Guess, I’ll just have to skip answering that lab.

I too experienced every single one of the errors you listed. I was not able to get the validation script to work on the computer I’d been using. The only way I was able to get the validation to work was to use a significantly more powerful computer. Even then, I still got the error one time.

I think the validation script is flawed, and I have yet to see any responses from MongoDB staff in this forum that would indicate that they are even looking into the problem, or have any plan to get it fixed.

Silly question, but did you adjust those memory settings that were specified? Because one of the points that the validation script mentions is that you did not.

It’s worth checking the memory settings, of course, if the validation script indicates that they are in error. However, in my case I got that same memory error several times on my “slow” computer even though the memory settings and everything else were correctly specified. Re-running the script without any changes gave different errors other times I ran it, such as the ‘No documents found’ error. The validation script worked successfully (most of the time) with the same config files on my “fast” computer. This leads me to believe the validation script is flawed.

Oh I agree. I’m afraid that the validator doesn’t run well in situations where the VM is running on an underpowered host system.

FYI, I’ve applied the specified memory setting.

I’ve come to the conclusion that it’s a host cpu limitations causing my issues.

I’d found the mongod processes going into hung state during db.serverShutdown(), forcing me to kill the processes.

Thanks all.

For some unknown reason my 4th mongod died. The following command helped me:
mongod -f /shared/mongod-repl-4.conf