Incorrect number of documents imported - make sure you import the entire dataset

I imported the documents with command
mongoimport --drop /dataset/products.json --port 26000 -u “m103-admin” -p “m103-pass” --authenticationDatabase “admin” --db m103 --collection products
db.products.count() it’s showing me “516784” count after doing the index procedure and ran the
validate_lab_shard_collection command it’s showing me
Incorrect number of documents imported - make sure you import the entire
dataset.
becuase it’s showing count now “708841” why it’s increasing automatically?

Use itcount() rather than count(). Refer to https://docs.mongodb.com/manual/reference/method/cursor.itcount/ for more info.

Throwing in some other quicker alternatives:

  1. db.products.aggregate([{$count: "rec count"}])
  2. db.products.count({_id: {$exists: true}})
    NB: if you use count() in a sharded cluster, you must include a query to get an accurate result.
1 Like

Hi @Milind_19368,

The db.collection.count() method without a query predicate return results based on the collection’s metadata, which may result in an approximate count. In particular, on a sharded cluster, the resulting count will not correctly filter out orphaned documents.

Orphaned documents are also getting counted and that’s why you are seeing this number. This is a known issue in our lab. I would recommend you to run the validator after waiting for sometime so that the metadata is updated and then the validator will run successfully. For more information please refer this post.

Hope it helps!

If you still have any query then please feel free to get back to us.

Happy Learning :slight_smile:

Thanks,
Shubham Ranjan
Curriculum Support Engineer

Same here with the issue "Incorrect number "
if someone has the answer. thanks,

vagrant@m103:/data/conf$ validate_lab_shard_collection
Incorrect number of documents imported - make sure you import the entire
dataset.

I imported 2 files

  1. /dataset/products.json
  2. /dataset/products.part2.json
    total: 1210945 by count() imported
    shards balance looks good but the total load is not correct.

data : 177.38MiB docs : 1210945 chunks : 4
Shard m103-repl contains 64.4% data, 64.01% docs in cluster, avg obj size on shard : 154B
Shard m103-repl-2 contains 35.59% data, 35.98% docs in cluster, avg obj size on shard : 151B

The issue was fixed.
I did load the data again, just first file imported.
followed by:

  1. make db sharded
  2. create index
  3. make the collection sharded
    validate again,
    this time works.

Hi @suhank,

I’m glad your issue got resolved.

Just for the record : In this lab, you are supposed to import only the products.json dataset.

For any other users who might be getting the same error, please refer this post.

Thanks,
Shubham Ranjan
Curriculum Support Engineer