Hello @karaimin
Yes, that should be avoid when you are acting in a realtime app.
Some thoughts
One thought on this, large arrays are almost always difficult to handle. Maybe you can change your schema and use embedding for $values? This would open completely different options.
You can check if a multikey Index on one or some fields in your array can help. In case you can find a good index you can walk along the index without the $unwind
I am not sure what you mean. When you deconstruct an array with $unwind you will get an document per field and element. You can add the option includeArrayIndex
to $unwind. This will add an field to the new document which just numbers the fields. This could also provide a path to go.
{ "_id" : 1, "arrayIndex" : NumberLong(0), "item" : "foo", values: "foo" }
{ "_id" : 1, "arrayIndex" : NumberLong(1), "item" : "foo", values: "foo2" }
{ "_id" : 1, "arrayIndex" : NumberLong(2), "item" : "foo", values: "foo3" }
{ "_id" : 2, "arrayIndex" : NumberLong(1), "item" : "bar", values: "bar" }
{ "_id" : 2, "arrayIndex" : NumberLong(2), "item" : "bar", values: "bar2" }
{ "_id" : 2, "arrayIndex" : NumberLong(3), "item" : "bar", values: "bar3" }
The common approch on pagination Using skip() and limit()
// Page 1
db.users.find().limit (10)
// Page 2
db.users.find().skip(10).limit(10)
// Page 3
db.users.find().skip(20).limit(10)
In general, to retrieve page ‘n’ the code looks like this:
db.users.find().skip(pagesize*(n-1)).limit(pagesize)
As the size of your data increases, this approach has performance problems. The reason is that every time the query is executed, the full result set is built up, then the server has to walk from the beginning of the collection to the specified offset. As your offset increases, this process gets slower. Also, this process does not make efficient use of the indexes. So typically the ‘skip()’ and ‘limit()’ approach is useful when you have small data sets,
Approach Using find() and limit()
The reason the previous approach does not scale well is the skip() command. Depending how you have build your _id you may can use the natural order in the stored data like a timestamp, or an index (pls s. above).
- Retrieve the _id of the last document in the current page
- Retrieve documents greater than this “_id” in the next page
// Page 1
db.users.find().limit(pageSize);
// Find the id of the last document in this page
last_id = ...
// Page 2
users = db.users.find({` `'_id'` `> last_id}). limit(10);
// Update the last id with the id of the last document in this page
last_id = ...
This approach leverages the inherent order that exists in the “_id” field. Also, since the “_id” field is indexed by default, the performance of the find operation is very good.
There are further options to do pagination:
- using
$slice
or
- using range queries
both I do not think will help here so I skip the examples.
Cheers,
Michael