Why does $match on indexed variables in aggregation pipeline slow down the query

Dogunbound_hounds · June 10, 2021, 3:40pm

This is the pipline I’m trying to run:

   [{
      $match: {
        $and: [
          {
            epoch: {
              $gt: start,
              $lt: end
            }
          }, {
            site: "my site"
          }
        ]
      }
    },{"$group" : {_id:"$file", count:{$sum:1}}}]

The above is slower than:

{"$group" : {_id:"$file", count:{$sum:1}}}

The $match operator slows it down. To note: epoch, file, and site are all indexed descending. Maybe I am misunderstanding something, but intuitively, an indexed match before a group by operation should be faster than a singular group by.

Is this just a performance issue? Speed doesn’t matter a whole lot in my particular application. I just want to learn what MongoDB is doing underneath the hood.

MaBeuLux88 · June 10, 2021, 4:24pm

Hi @Dogunbound_hounds,

In this pipeline, only the match operation can use an index (singular). You can confirm which index is used by running an explain:

db.coll.aggregate([...], {explain: true})

See doc: https://docs.mongodb.com/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes

The best compound index you can use for this query is:

{site: 1, epoch: 1}

The order is important here because I’m respecting the ESR rule (Equality, Sort, Range).

Also, on a side note, that won’t make a difference in the performances, but the $and isn’t necessary here as it’s the default system in place already.

So your query is equivalent to:

[{
    $match: {
        epoch: {
            $gt: start,
            $lt: end
        },
        site: "my site"
    }
}, {
    "$group": {
        _id: "$file",
        count: {
            $sum: 1
        }
    }
}]

Cheers,
Maxime.