The questions about the transactions for shardings

I have following two questions about transactions on sharding…, supposed we have a sharding cluster consists of 10 shards which has the MongoDB 4.2 software running, now we have a multi-documents transaction which has 7 shards involved

(1) How the coordinator being selected ? (please answer a little bit details)
(2) After the coordinator being selected, how it knows which other shards are involved the
transaction? and then check them the commit/abord status? (as I knew, shards have no knowledge of data distribution maps).

Thanks
-Yiding

Hey @YidingZhang

So I will do my best to answer your question.

I am not sure exactaly what you mean by coordinator, as there is nothing in the mongodb docs about it

However there is the mongos which:

MongoDB mongos instances route queries and write operations to shards in a sharded cluster. mongos provide the only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate directly with the shards.

The mongos tracks what data is on which shard by caching the metadata from the config servers. The mongos uses the metadata to route operations from applications and clients to the mongod instances. A mongos has no persistent state and consumes minimal system resources.

So queries and writes get sent to the mongos and it will decide which shards to involve. It does this base on if the shard key is included in the query.

Write Operations on Sharded Clusters

For sharded collections in a sharded cluster, the mongos directs write operations from applications to the shards that are responsible for the specific portion of the data set. The mongos uses the cluster metadata from theconfig database to route the write operation to the appropriate shards.

MongoDB partitions data in a sharded collection into ranges based on the values of the shard key. Then, MongoDB distributes these chunks to shards. The shard key determines the distribution of chunks to shards. This can affect the performance of write operations in the cluster.

IMPORTANT

Update operations that affect a single document must include the shard key or the _id field. Updates that affect multiple documents are more efficient in some situations if they have the shard key, but can be broadcast to all shards.

So from the docs that I have shared and your situation of 10 shards with 7 being effected by the update…

The mongos checks the metadata in the config servers for this information. There is a collection stored in the configServer known as chunks which will display the chunk range inclusive min and exclusive man.

This is done by writeConcern

Write concern describes the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.

Correct, this info is in the ConfigServers

Here are the docs on Consideration for Transactions in a Sharded Cluster. However please keep in mind that Multi-document Transactions are not the normal case for MongoDB, due to the rich document structure.

In MongoDB, an operation on a single document is atomic. Because you can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases

Lastly
Have you taken the M103 Basic of Cluster Adminastration? This course will help clear up some of your questions. The latest course started On Aug 20 but you can still sign up until Monday.

2 Likes

Thank you for that awesome refresher @natac13 and thank you for posting your question here @YidingZhang!

You’ve both reminded me that I need to go back over my notes from M103 Basic of Cluster Administration. It’s been about 8 months now and I have forgotten some things.

Maybe I’ll take it a second time for the sake of retaining much of what I have learned along the way. It’s a great course.

Cheers:-)

1 Like

@juliettet

After being through the line up of courses, I was most impressed by M103 and M121! Anything you forgot I am sure will come back just like riding a bike! And if not the forum community is always here to help support each other!

Truthfully all these courses are fantastic. And Free!

1 Like

@natac13:

I should say “coordinator” shard instead of “coordinator”… the word was from Aly Cabral’s presentation on MongoDB World 2019, I was there and listened full talk, she made an example to explain how transaction being commit or aborted by using two involved shards, one called “coordinator shard” and it will check the commit/abort status of other involved shards and issued acknowledge finally, she didn’t mention how “coordinator” being selected and how it has knowledge which shards involved the transaction. this is my questions came from,

-Yiding
(I couldn’t find the record of her presentation in YouTube and I maybe wrong, if you have chance review the record of her presentation, please send me the uri)

Hey @YidingZhang

That’s very cool you were there in person! I hope to be able to go one year.

Note: I am not sure if the video I am linking to is the presentation you saw but…

Down to the point. The coordinator is the shard that receives the first update in the transaction

And just to stress the point made by every presenter from MongoDB on this topic

Just because we have transaction, doesn’t mean you should model your data in Mongo like any other relational DB server

And in that video Eliot mentions that this transaction thing only cost $30-50 Million!

@natac13:

Thanks for providing the links…
Now I understood how the coordinator being selected,
Back to my second question, from the video, near 1:46, Eliot said
“The client is going to say commit, coordinator’s going to say prepare, get ready. All the shards are going to be we are ready, let’s go, the coordinator says, commit, we are done”.

In case, the coordinator needs to know the shards in the cluster involved the transaction, then it can communicate them to prepare to trigger the commit, how coordinator knew which shards involved?

-Yiding

Hey @YidingZhang

So Esha explains that the Coordinator shard will write to it Oplog an entry of the shards involved in the transaction. How it knows which shards are involved? I honestly do not know for sure (someone more experienced can help me out) however I would guess the same way the mongos knows which shard to query from or write to, by getting that info from the Config servers which store data about the Shards in the cluster

If you go through, or have already been through, the Course mentioned by @juliettet (M103), it talks about how replication is done through the Oplog

MongoDB applies database operations on the primary and then records the operations on the primary’s oplog

And after watching that video it seem that is how the shards are to ‘communicate’ when dealing with distributed transactions. And the only place information like what your asking about

is stored in the Config Servers

1 Like

@natac13:

Thanks for providing the link and I watched the Esha’s demo and hoped to see the documents about it.

-Yiding