Vagrant@m103:~$ validate_lab_shard_collection Incorrect number of documents imported - make sure you import the entire dataset

I try to validate my third lab but I always get the same error …an I think it´s correct because
in the website you saw : " The collection products should contain exactly 516784 documents."

I check and I have

MongoDB Enterprise mongos> db.products.count()
734669

I did the import twice and get the same error …

Windows Terminal

vagrant@m103:~$ mongoimport --drop /dataset/products.json --port 26000 -u “m103-admin” -p “m103-pass” --authenticationDatabase “admin” --db m103 --c
ollection products

any idea ?

thanks

after ten minutes now I get a new error message …

vagrant@m103:~$ validate_lab_shard_collection

Incorrect shard key selected - ‘_id’ is not a good shard key because it
increases monotonically forever. The chunks that need splitting will always be
in the highest chunk range!

The first error just meant that distribution of shards wasn’t complete, that’s why you get a different message after 10 mins. So next time, just wait a little while before trying.

The second message means you chose a wrong shard key. Try again.

1 Like

I try import dataset many times to recreate a new key shard …but now i get error on that:

MongoDB Enterprise mongos> db.products.createIndex({“sku”: 1})
{
“raw” : {
“m103-repl/192.168.103.100:27001,192.168.103.100:27002,192.168.103.100:27003” : {
“numIndexesBefore” : 5,
“numIndexesAfter” : 5,
“note” : “all indexes already exist”,
“ok” : 1
}
},
“ok” : 1,
“operationTime” : Timestamp(1572732377, 1),
“$clusterTime” : {
“clusterTime” : Timestamp(1572732384, 1),
“signature” : {
“hash” : BinData(0,“pqbv7gMNy2in5ZZVhIIek8Pf6tY=”),
“keyId” : NumberLong(“6754783010169552923”)
}
}
}
MongoDB Enterprise mongos> db.adminCommand( { shardCollection: “m103.products”, key: { sku: 1 } } )
{
“ok” : 0,
“errmsg” : “Please create an index that starts with the proposed shard key before sharding the collection”,
“code” : 72,
“codeName” : “InvalidOptions”,
“operationTime” : Timestamp(1572732404, 6),
“$clusterTime” : {
“clusterTime” : Timestamp(1572732404, 6),
“signature” : {
“hash” : BinData(0,“Ggcat/pa5caYJElLN+oeiz8iDF4=”),
“keyId” : NumberLong(“6754783010169552923”)
}
}
}
MongoDB Enterprise mongos>

my status.sh

MongoDB Enterprise mongos> sh.status()
— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“5dbdcf5cf1a1dd247b09a017”)
}
shards:
{ “_id” : “m103-repl”, “host” : “m103-repl/192.168.103.100:27001,192.168.103.100:27002,192.168.103.100:27003”, “state” : 1 }
{ “_id” : “m103-repl-2”, “host” : “m103-repl-2/192.168.103.100:27004,192.168.103.100:27005,192.168.103.100:27006”, “state” : 1 }
active mongoses:
“3.6.14” : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
2 : Success
databases:
{ “_id” : “config”, “primary” : “config”, “partitioned” : true }
config.system.sessions
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
m103-repl 1
{ “_id” : { “$minKey” : 1 } } -->> { “_id” : { “$maxKey” : 1 } } on : m103-repl Timestamp(1, 0)
{ “_id” : “m103”, “primary” : “m103-repl”, “partitioned” : true }
{ “_id” : “test”, “primary” : “m103-repl”, “partitioned” : false }

MongoDB Enterprise mongos>

So did the validation went successful?

Please make sure you are connected to the correct DB before creating the index

Hi @marco_05419,

For this, you need to choose the correct shard key - create and index using the same and then shard your collection for the same key.

Also, for the initial error “Incorrect number of documents imported”, it is a known issue with the validation script for this lab and we are already working on it.

The possible work around for now is to simply re-import your collection and switch between the probable shard keys, this will work.

You may also take a look at the threads in our forum to know more about this issue, as below:

Thanks,
Muskan

1 Like

hi @marco_05419
I have experienced similar issue like you
likewise, my products count is 734669 but 10 minutes later my products count is 516784 at my mongo shell on mogos .
if you comply this lab step, you will get right result :slight_smile:
please refer my env

  1. on mongos products count is 516784

2)m103-repl-2 primary - changing products count

3)m103-repl primary - changing products count

1 Like

I had the same thing, I suppose you just need to give it some time (~10 minutes) to chunk and balance the records.

MongoDB Enterprise mongos> db.products.count()
708762
MongoDB Enterprise mongos> db.products.count()
516784

Given that there are known issues with the lab - either the validation script or the populating functionality - maybe it would be a great intermediate “fix” to at least make a note in the lab about this since it’s known to be an issue. I also spent quite a bit of time grinding on this because I thought I was the problem when that wasn’t the case.

It would also be great for someone to explain how this isn’t an issue with mongodb. Why is it possible for a correct setup to import data and then get back incorrect results since the shard set isn’t ready? To someone that’s using this to learn, it 100% looks like a mongo bug: I put in data, I get back different data. This is not what one expects from a properly functioning data store.

1 Like

Hi @Michael_70789,

Thanks for your feedback. I will prioritise this and we will have it fixed as soon as possible.

The count() function without any query predicate returns result based on the collection’s metadata and hence it could be an approximate value. This is what’s happening in this case.

To find the exact count of the document you can use this function - itcount().

Hope it helps!

Please feel free to get back to us if you have any other query.

Thanks,
Shubham Ranjan
Curriculum Services Engineer

@Shubham_Ranjan

Thanks for the explanation of count() vs. itcount(). It’s definitely helpful.

I did a terrible job describing my thoughts on this. I actually meant for this information to be made visible somehow in the course for all students. You’re right that it’s pretty clearly documented that count() could definitely return a “wrong” answer, but I’m not sure that users would necessarily think to doubt count() in a way that might cause them to check the manual. Since the course at this point is doing less spoon feeding (which is great), maybe instead of flat out saying “do this instead of that because of that”, there might be a suggestion (called out fairly heavily) to take an opportunity to read the docs about count() and how it works with shards and how itcount() works as well. I spent a fair bit of time running count() doing things and running count() thinking that if the number didn’t come back correctly that it was a mistake I made in the lab. Checking the discussion (I was worried about accidentally exposing myself to an answer and tainting my lab work) was kinda my last resort before rebuilding the whole thing again.

Thanks!

1 Like

Hi @Michael_70789,

Thank you so much for sharing your thoughts on this. I will sync with the team today and we will make the necessary changes in the course content :slight_smile: .

Thanks,
Shubham Ranjan
Curriculum Services Engineer