Fetching nth document based on integer array

Ajay_Pillay · December 26, 2020, 2:37pm

Hi, so let’s say I have a random array of n integers [23, 2, 78, 99, … ] where 0 < n < collection.count(), and each element is within this range as well.

In my collection, it is currently indexed by _id, and say it contains the following data:

{
    "_id" : "1293jnqke",
    "name" : "name_string",
    "other_data" : [{
        "data" : ...
    }]
}

How should I go about grabbing the data, in chunks of 50, which correspond to each element in the array? So with the example integer array I have, I will slice it down to the first 50 elements, and then I wish to run a find() on the collection, extracting the names of the 23rd, 2nd, 78th, 99th, etc elements, based on the natural ordering of the collection (no need for any sorting beforehand). So the output should look like this:

{
    [ {"name" : "23rd name"},
    {"name" : "2nd name"},
    {"name" : "78th name"},
    {"name" : "99th name"},
    {"name" : "53rd name"},
    {"name" : "11th name"},
    ...]
}

One way of course is by looping through the array with findOne() and skip() but I don’t think that is very efficient, is it possible to execute this in a single query?

Pavel_Duchovny · December 28, 2020, 7:35am

Hi @Ajay_Pillay,

To begin with the optimal solution is to add a position field for each document and index it. This will allow an index search for the documents, each new record should be increased in value:

{
    "_id" : "1293jnqke",
    "name" : "name_string",
   positon : 1,
    "other_data" : [{
        "data" : ...
    }]
},
{
    "_id" : "1294jnqke",
    "name" : "name_string",
   positon : 2,
    "other_data" : [{
        "data" : ...
    }]
}
...

This way you can run :

db.coll.find({position : {$in : [23, 2, 78, 99, … ]}})

If the above schema design is not possible, I would recommend creating a materialized view which will have a calculated position field, you can use $merge or change stream logic to create it.

However, I managed to create a “performance” limited aggregation to fetch this logic, it uses a limit stage which should specify the max number + 1 from the input search array (eg. 99 + 1 = 100), $facet to have an array and create an index using the $unwind:

db.coll,aggregate([{$facet: {
  search: [ {$limit : 100},{$project : {_id: 0, name : 1}} ]
}}, {$unwind: {
  path: "$search",
  includeArrayIndex: 'index',
  preserveNullAndEmptyArrays: true
}}, {$match: {
  "index" : {$in : [23, 2, 78, 99]}
}}, {$replaceRoot: {
  newRoot: "$search"
}}]);

Please note that this query s suboptimal as it can’t utilise index and the $facet will allow the created document to be max 16MB of allowed BSON size. Therefore use it only if the amount of data scanned s limited otherwise it will error out.

Best regards,
Pavel

Ajay_Pillay · January 1, 2021, 6:11pm

Hi @Pavel_Duchovny, thank you for the depth and clarity of your answer. I think in my scenario I will probably implement the first method where there is an indexed position field, but I appreciate the thought put into the second method, I will definitely be testing that out as well. Thank you!

system · January 6, 2021, 6:11pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.