Range query with sort optimization

Hi all!

Our application uses Mongo as journal (event sourcing), on which each document has an aggregate id field, a from and to fields

Our main query is, for example:

db.getCollection('akka_persistence_journal_CategoryTree').find({"pid":"CategoryTree-46", "from" : {$gte: 24291}, "to":{$lte: 24299}}).sort({"to":1})

We’ve already created an index as following

	"key" : {
			"pid" : 1,
			"to" : -1,
			"from" : -1
		}

But, when doing an explain over that query we’re not getting the best ratio on totalKeysExamined:totalDocsExamined:nReturned. For example, he explain for the above query is:

{
  "docsExamined": 9,
  "nreturned": 9,
  "keysExamined": 24300
}

Any suggestion in how to tune our index or our query in order to improve that ratio?

Thanks in advance

Hello @julian_zelayeta, welcome to the MongoDB Community forum!

The documenation topic Sort and Non-prefix Subset of an Index says that, the keys in a compound index are used in a query’s sort only if the query conditions before the sort have equality condition (not range conditions, like $gte, as in your query).

That said, please post the explain's output.

Hi @Prasad_Saya, thank for that explanation.

Here is the explain output:

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "catalog.akka_persistence_journal_CategoryTree",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "pid" : {
                        "$eq" : "CategoryTree-46"
                    }
                }, 
                {
                    "to" : {
                        "$lte" : 27.0
                    }
                }, 
                {
                    "from" : {
                        "$gte" : 21.0
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "pid" : 1,
                    "to" : -1,
                    "from" : -1
                },
                "indexName" : "pid_1_to_-1_from_-1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "pid" : [],
                    "to" : [],
                    "from" : []
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "backward",
                "indexBounds" : {
                    "pid" : [ 
                        "[\"CategoryTree-46\", \"CategoryTree-46\"]"
                    ],
                    "to" : [ 
                        "[-inf.0, 27.0]"
                    ],
                    "from" : [ 
                        "[21.0, inf.0]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ 
            {
                "stage" : "FETCH",
                "filter" : {
                    "from" : {
                        "$gte" : 21.0
                    }
                },
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "keyPattern" : {
                        "pid" : 1,
                        "to" : -1
                    },
                    "indexName" : "pid_seq",
                    "isMultiKey" : false,
                    "multiKeyPaths" : {
                        "pid" : [],
                        "to" : []
                    },
                    "isUnique" : false,
                    "isSparse" : false,
                    "isPartial" : false,
                    "indexVersion" : 2,
                    "direction" : "backward",
                    "indexBounds" : {
                        "pid" : [ 
                            "[\"CategoryTree-46\", \"CategoryTree-46\"]"
                        ],
                        "to" : [ 
                            "[-inf.0, 27.0]"
                        ]
                    }
                }
            }
        ]
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 7,
        "executionTimeMillis" : 1,
        "totalKeysExamined" : 28,
        "totalDocsExamined" : 7,
        "executionStages" : {
            "stage" : "FETCH",
            "nReturned" : 7,
            "executionTimeMillisEstimate" : 0,
            "works" : 29,
            "advanced" : 7,
            "needTime" : 20,
            "needYield" : 0,
            "saveState" : 0,
            "restoreState" : 0,
            "isEOF" : 1,
            "docsExamined" : 7,
            "alreadyHasObj" : 0,
            "inputStage" : {
                "stage" : "IXSCAN",
                "nReturned" : 7,
                "executionTimeMillisEstimate" : 0,
                "works" : 28,
                "advanced" : 7,
                "needTime" : 20,
                "needYield" : 0,
                "saveState" : 0,
                "restoreState" : 0,
                "isEOF" : 1,
                "keyPattern" : {
                    "pid" : 1,
                    "to" : -1,
                    "from" : -1
                },
                "indexName" : "pid_1_to_-1_from_-1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "pid" : [],
                    "to" : [],
                    "from" : []
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "backward",
                "indexBounds" : {
                    "pid" : [ 
                        "[\"CategoryTree-46\", \"CategoryTree-46\"]"
                    ],
                    "to" : [ 
                        "[-inf.0, 27.0]"
                    ],
                    "from" : [ 
                        "[21.0, inf.0]"
                    ]
                },
                "keysExamined" : 28,
                "seeks" : 21,
                "dupsTested" : 0,
                "dupsDropped" : 0
            }
        }
    },
    "serverInfo" : {
        "host" : "e2a568b9a111",
        "port" : 27017,
        "version" : "4.4.4",
        "gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"
    },
    "ok" : 1.0
}

Hi @julian_zelayeta,

Looks like you have the best possible index for this query because you don’t have a SORT stage in your winning plan so no in-memory sort which is usually way more costly than scanning index keys.

The only comment I can make is that you are reading the index backwards (see "direction" : "backward") instead of forward because of the sort {to:1} and the index {to:-1} but that really shouldn’t make any difference at this point.

You should get similar results with the index {pid:1, to:1, from:1}. It would just read forward instead of backward.

If you want to improve the performances on this query, you just need more RAM I guess. But it already executes in 1ms according to your explain plan so it will be hard to beat.

Cheers,
Maxime.

Hello @julian_zelayeta,

As @MaBeuLux88 mentioned the lack of SORT stage shows that the index is being used for the sort operation.

I would like to know, how many documents are there in the collection and also after the query how many are being returned, for a typical query run. In addition, I want you find out how the data is organized in your collection; see Create Queries that Ensure Selectivity. Selectivity can affect how efficiently the index is used for a given query and tells if there are possibilities of further optimization.

1 Like

@Prasad_Saya in that collection there is near 6 million documents at this moment. As we use event sourcing with snapshots, a typical query run returns always between 1 and 10 documents.
First our application scans for a snapshot document on our snapshot collections, and then it retrieves from our journal collection the N latest documents, that as I said, are between 1 and 10 documents.

A few things that just popped in my mind:

  1. You can do that with a single query with $lookup. Sending a single query instead of 2 sounds like a good idea at least.
  2. If the 10 docs you are fetching from the journal collection are always the same for each snapshot doc, you could also directly keep in an array of sub-documents in the snapshot doc the 10 first journal docs (just the fields you need). It’s the subset pattern mixed with the extended reference pattern. Sort of.

@MaBeuLux88 we cant achieve your suggestion, since we are using Akka. Akka forces us to implement those queries separately, one for snapshot collection, and other, for our journal collection. And it forces us to use pid (the aggregate id), from and to as bounds to seek along all the documents for a given pid.

If that is true then I guess Akka is clearly a “suboptimal” solution to say the least… :grimacing:

2 Likes