C# driver skip function is of type integer

Hi,
I am using MongoDB c# latest driver from Nuget Packages.

The count of documents CountDocuments(); will return long type.

So ideally it should be able to skip long size of documents. But Skip() function accepts only integer type.

This will restrict from using MongoDB c# driver will many records. I am trying to implement paging by Skip() and Limit function.

With regards,
Nithin B.

Hi @Nithin_Bandaru,

You’ve stumbled upon a classic database performance trap: skip and limit for paging. The driver is doing the right thing by requiring an integer type. I strongly recommend that you do not skip large quantities of documents. And here’s why:

When you query documents like this db.collection.find({ a: 1 }).limit(100).skip(0), the database happily finds 100 documents to return 100 documents. Similarly, when you query documents like this: db.collection.find({ a: 1 }).limit(100).skip(100) the database happily finds 200 documents to return 100 documents.

See the problem? To make it even more apparent, when you do this: db.collection.find({ a: 1 }).limit(100).skip(2147483600), where 2147483600 approaches the maximum size of an integer type, the database happily finds 2,147,483,700 documents to return 100 documents. That’s a lot of work for any database! That’s 2,147,483,600 documents that the database must find, potentially pulling from disk before the database can even begin returning the 100 documents you’re requesting!

Thankfully, because MongoDB provides powerful options for using flexible schema design, there are better ways of doing paging. I recommend you check out my blog post on the subject here:
Paging with the Bucket Pattern: Part 1

Thanks!

Justin

2 Likes

@Justin Thanks for the suggestion.

The bucket pattern looks interesting. It looks similar to Cassandra DB storage where data gets grouped by partition columns. There they will get auto grouped but here it manual in this pattern.

To implement this kind of storage I would need to keep in-memory queues and extract them in a specific time interval and save it to DB only once and in other ways, if updating records or realtime will be very expensive.

What do you think?

I also have written one more question which is very similar to this problem https://www.mongodb.com/community/forums/t/how-to-loop-through-mongodbs-all-records-with-resume-from-a-particular-record/4621
Here I was wondering if I can use the _id field to make jumps for resume functionality of migration because it already has an index and more importantly it is unique. If I am able to apply $gt on this field then this will not require any bucket patters(Hoping that MongoDB uses B+ trees for indexed).

Basically what I am saying is we need to have one unique column that can be comparable for greater or lesser. _id column is one of them. Not sure if it is comparable with $gt.

What do you think about this?

I was just wondering how will the IDE’s work in these scenarios with too many records.
All IDE’s of mostly all databases show records in paging format right.
As of my understanding, I think all of them use cursors. Just my assumption.
Generally, the cursor is like Enumerator where you can read next not skip few and read next.
That’s why they don’t have the ability to jump from one page to a far next page, they can only move to the next page.

Am I correct?

Hi @Nithin_Bandaru,

You definitely can use $gt with _id. That’s a common and efficient strategy for paging. Just remember to sort by _id too.

When you issue a find command on MongoDB, you get a cursor like most other databases. It works the same way. Issuing a find statement and iterating through every document in the collection isn’t paging (and thus doesn’t use skip/limit), it’s iterating and falls under batch size. However, most applications and IDEs don’t use this method, they use paging.

Remember that each time you issue a new find command (or select statement on a relational database), you’re given a new cursor. Keeping a long running cursor and iterating through results has completely different performance characteristics than iterating through the same result set using skip and limit (generating a new cursor for every query). Almost all applications use skip and limit for paging. Keeping a cursor around for extended periods of time with any database is problematic for a variety of reasons that I won’t go into now.

Thanks,

Justin

1 Like