How to search for objects within embedded arrays with paging?

I have a collection that looks sort of like this - each entry has an embedded array (and items of the array have an embedded array as well):

{
  "_id": ObjectId("..."),
  "name": "Abc",
  "items": [
    {
      "name": "123",
      "subItems": [
        {
          "data": "qwerty"
        },
        {
          "data": "qwezxc"
        }
    },
    {
      "name": "456",
      "subItems": [
        {
          "data": "qwerty"
        },
        {
          "data": "zxcvbm"
        }
      ]
    }
  ]
}

I need to run different kinds of requests for these objects - sometimes returning results as the entire objects, but sometimes I need to search and return a result per item, or even per a subItem (in an $unwind sort of way). Like, if I search subItems.data for qwe, I may need the result like this:

{ "data":
  [
    { "_id": ObjectId("..."), "name": "Abc", "items": { "name": "123", "subItems": { "data": "qwerty"} } },
    { "_id": ObjectId("..."), "name": "Abc", "items": { "name": "123", "subItems": { "data": "qwezxc"} } },
    { "_id": ObjectId("..."), "name": "Abc", "items": { "name": "456", "subItems": { "data": "qwerty"} } },
  ],
  "count": 30 // e.g. if $limit: 3, but the collection above had more matching results
}

And with paging, I preferably need the total count, or at least availability of more results.

At first I’ve been using aggregations for this, something along the lines of

$match -> $unwind -> $match -> $facet \
                                       |-> $skip -> $limit
                                       |-> $count

But even if I only have ~20k items in the collection, the query takes a good minute since, naturally, the $count after $unwind is incredibly heavy as the query loses any and all caching at that point. Even when I tried replacing $count with a bigger $limit to simply know if there are more results, the query is still too slow for practical use in e.g. an admin dashboard.

Are there any ways to achieve these kinds of results within a few seconds with just this collection, or do I more or less have to denormalize the data into several collections for each of the embedded levels that I need to “unwind” by? So that I can just use the normal search and not aggregations (which are, admittedly, not intended for this purpose in the first place).

I am already duplicating the data for some of the queries, but the amount of these kinds of embedded arrays and queries grows across the database, as does the amount of items in general, so I’m apprehensive about excessive duplicating due to the size of the database and having to be very careful with data consistency since it’s all handled by the application code and not the database.

I’d appreciate any ideas regarding this, I’m rather stuck due to my limited experience in the Mongo way of doing things. I do realize SQL is in general better suited for these sorts of queries, but we’re overall too invested in Mongo.

The queries you are intending to perform involves filtering specific elements in an array as well as projecting some fields as the query result. Secondly, you are looking for performance with your queries.

Querying:

There are specific operators to filter and project array fields. You can use both the Aggregation Framework as well as the MongoDB Query Language (MQL) for these queries.

Aggregation has array operators, like $filter, $map and $reduce to filter and transform the array data. With MQL, using the find method, you can use array query and projection operators.


Query Performance:

The performance of queries can be improved using indexes. Both, the aggregation and MQL queries can use indexes to make the queries run fast. Both the filtering and sorting operations can use indexes. You can create indexes on array fields too (these are called as Multikey Indexes).

You can also generate the query plans for both the MQL and aggregations using the explain method to verify the index usage.

1 Like