Contradiction between courses M103 and M201 regarding Skip()

Lecture: Queries in a Sharded Cluster which is a part of Chapter 3 of course M103, at 2:13 states “With skip(), mongos performs the skipping on the merged set of results, and doesn’t push anything down to the shard level.

Furthermore, the notes below this lecture explains a specific case where skip() is used with limit(). It states “When used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to ensure a sufficient number of documents are returned to the mongos to apply the final limit() and skip() successfully.”

Thus as per course M103, Skipping is not performed on the individual shards, but is performed only once, on the final merged results.

However, course M201 states otherwise.

Lecture: Performance Considerations in Distributed Systems Part 2, which is a part of Chapter 5 of course M201, at 4:40 states “Mongos will select the designated shards. A local Skip and Limit will be performed, and once those results are done, a final merge of that result set is performed on the primary shard, and then the result is sent back to our client through the mongos.”

Following process flow taken from this Lecture.

image

Can MongoDB University TA team please explain this contradiction between the 2 lectures from courses M103 and M201?

As per my understanding, if skip() is performed at local shard levels, i.e. before deriving the final set of merged documents, then there are high chances that the local skips will end up skipping the wrong documents; thus, the final resultant set of documents after the merge will be incorrect. However, I will let you guys clarify / confirm this please… Thanks!

Hi @PuneetC,

If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.

If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result.

When skip() is used in conjunction with a limit() , the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.

I hope it helps!!

Please let me know, if you have any further questions.

Thanks,
Sonali

Thanks Sonali, I agree with what you have updated.

Course M201, Lecture Performance Considerations in Distributed Systems Part 2, Chapter 5 will need to be updated to get this corrected… Please let the curriculum team know.