MongoDB.live, free & fully virtual. June 9th - 10th. Register Now

Naive doubt regarding sharding

Hey, I am Kushal Shah, naive at Mongo.
I request someone to explain me what would happen if we insert a document without shard key into a sharded collection.
As in, suppose my shard key looks like this:

{
   "Username": "Kushal"
}

I insert a document which looks like the following:

{
   "ID" : 3
}

Please tell in which chunk the document would be store.

If I query, would it need to scan through all the documents of the collection?

Hi @Shah_Kushal,

Welcome to the MongoDB community!

As is, this wouldn’t be a valid shard key. Trying to shard a collection with this would result in an error.

For example, using MongoDB 4.2:

“Shard key { Username: “Kushal” } can contain either a single ‘hashed’ field or multiple numerical fields set to a value of 1. Failed to parse field Username”

However, assuming you sharded on Username: 1, lets see what happens when a document is inserted without the shard key:

mongos> db.users.insert({ID : 3})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 61,
		"errmsg" : "document { _id: ObjectId('5eb2713cf49a3bda89da595c'), ID: 3.0 } does not contain shard key for pattern { Username: 1.0 }"
	}
})

Every document inserted into a sharded collection must contain the shard key. A document will be associated with a single shard based on the shard key index.

Queries based on the shard key (or a prefix of a compound shard key) will only target the relevant shards. Queries on a sharded collection that don’t fit that criteria will be broadcast to all shards. For more information, see Targeted Operations vs Broadcast Operations in the MongoDB documentation.

To learn more about MongoDB I would recommend taking the free online courses at MongoDB University and following one of the learning paths (DBA or Developer).

Regards,
Stennie

1 Like

Dear @Stennie, Immensely thankful to you for your answer. It clears my doubt. I would ask few more. Actually I am writing a term paper on MongoDB, focusing mainly on Replica Sets and Shards.
Apart from the MongoDB manual can you suggest any intuitive material?
Thanks in advance.

Hi,

Can you clarify what sort of information you are looking for? It is probably more helpful to start with the documentation and explore specific questions that arise before getting into deeper implementation details that require more context.

The MongoDB Manual provides a very comprehensive end user guide including several categories of Frequently Asked Questions.

You can learn more about behaviour (such as the question you asked) by setting up a local test deployment or perhaps using MongoDB Atlas.

For local test deployments I recommend using mlaunch which is part of the mtools Python package (see: mtools installation guide).

I used mlaunch to quickly stand up a local sharded cluster to provide the example message in my original response to you. I already have MongoDB installed and in my path, so creating a cluster with 2 shards looks like:

$ mlaunch --shards 2 --repl
launching: "mongod" on port 27018
launching: "mongod" on port 27019
launching: "mongod" on port 27020
launching: "mongod" on port 27021
launching: "mongod" on port 27022
launching: "mongod" on port 27023
launching: config server on port 27024
replica set 'configRepl' initialized.
replica set 'shard01' initialized.
replica set 'shard02' initialized.
launching: mongos on port 27017
adding shards. can take up to 30 seconds...

The free M103: Basic Cluster Administration online course at MongoDB University would also give you more insight. You can login using the same credentials as this forum.

Regards,
Stennie

1 Like

Surely I would refer those part of the documentation.

I have one more issue. Suppose that, a mongos is playing the role of a balancer redistributing the data over the shards. Now, we switch off that particular instance of mongos. Please tell what would happen then, as in would the balancing process roll back and stop, or would the mongos finish balancing and then stop.

Secondly, while a particular chunk is being part of balancing process, suppose a client performs some operation on that chunk, how under the hood, the mongos would reflect those changes?