How to use multiple single-field indexes in an aggregation pipeline with a $sort?

yuda_zhang · March 12, 2021, 10:49am

I am having troubles with aggregation pipeline with $sort

The object I got is like this:

{
    "fieldA" : a,
    "fieldB" : b,
    "CreateTime" : timeStamp
}

and I created all three single-field index

then I have the query like this:

db.getCollection('test').aggregate(
    [
        {
            $match:{"fieldA":"a","fieldB":"b"}
        },
        {
            $sort:{"CreateTime":-1}
        }
    ]
)

When I check the explain(), It shows it only hit the “CreateTime” index, which doesn’t really help when the data volume is very large.

However if I try to do the find query like this:

db.getCollection('test').find(
    {
        "fieldA":"a",
        "fieldB":"b"
    }
).sort({"CreateTime":-1})

it shows all the indexes are hit and it does help with the performance a lot(finish query within 10 ms with 20 m data volume)

Is there anyway I can make my aggregation use more than one index while there is a “sort” in it?

I know that compound index is a choice, but I’d like to avoid using it since it would be a lot less reusable.

Thanks

MaBeuLux88 · March 12, 2021, 2:06pm

Hi @yuda_zhang and welcome in the MongoDB Community !

The compound index is the right way to do this. They are preferred to index intersections and more optimized.
Compound indexes are also reusable in the sense where if you have the index {a:1, b:1, c:1}, you also have “for free” all the following indexes:

{a:1}
{a:-1}
{a:1, b:1}
{a:-1, b:-1}
{a:-1, b:-1, c:-1}

Also, because your 2 first fields are equality matches, they are probably interchangeable (unless there is a big cardinality difference between these 2 fields and the first one is the most selective entry).

So for example, if in another query you need the index {b:1}, then you could consider creating {b:1, a:1, c:1} instead to “reuse” this index even more.

It would be also completely fine to have {a:1, b:1, c:1} and {b:1, c:1} if the selectivity of the field a is important and you have enough RAM to support these 2 indexes.

Indexes are worth it. Use the best one possible to achieve the best performances :-).

Cheers,
Maxime.

Ronan_Merrick · March 15, 2021, 7:16pm

Hi Yuda,

Welcome to the MongoDB Forums. Maxime has already give you some good advice regarding compound indexes. I would also like to mention the ESR rule in case you are not familiar with this.

ESR stands for Equality Sort Range.

The basic premise of this is that when creating your compound indexes, you should create them in the following order:

Equality Filters
Sort Fields
Range Filters

In this case the best index to support this query would be one that contains fieldA and FieldB followed by createTime.

Following this rule will reduce the number of documents scanned and help you to avoid blocking sorts.

Ronan

yuda_zhang · March 16, 2021, 9:04am

Thanks a lot for the answers above! The motivation of this post is that I want to be “lazy” and just create multiple single-field indexes and then not to worry about too much about the additional queries(just like Beugnet mentioned, sometimes the order in where clause matters). But seem like I should stick with the compound index.