Optimization of $match+$sort on aggregation pipeline

Hello everyone,

I have a collection with 10 documents. This is the result of my query using aggregation

> db.system.profile.find({"op":"command"}).pretty()
{
	"op" : "command",
	"ns" : "test.bar3",
	"command" : {
		"aggregate" : "bar3",
		"pipeline" : [
			{
				"$match" : {
					"from" : "0x222222",
					"to" : "0x333333"
				}
			},
			{
				"$sort" : {
					"_id" : -1
				}
			}
		],
		"cursor" : {
			
		},
		"lsid" : {
			"id" : UUID("7712d372-c98a-472a-9020-94072f40dd84")
		},
		"$db" : "test"
	},
	"keysExamined" : 10,
	"docsExamined" : 10,
	"cursorExhausted" : true,
	"numYield" : 0,
	"nreturned" : 5,
	"queryHash" : "DE146C9F",
	"planCacheKey" : "DE146C9F",
	"locks" : {
		"ReplicationStateTransition" : {
			"acquireCount" : {
				"w" : NumberLong(1)
			}
		},
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(1)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(1)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(1)
			}
		},
		"Mutex" : {
			"acquireCount" : {
				"r" : NumberLong(1)
			}
		}
	},
	"flowControl" : {
		
	},
	"responseLength" : 1028,
	"protocol" : "op_msg",
	"millis" : 0,
	"planSummary" : "IXSCAN { _id: 1 }",
	"ts" : ISODate("2020-03-19T08:31:11.480Z"),
	"client" : "127.0.0.1",
	"allUsers" : [ ],
	"user" : ""
}

On $match stage, there are 5 documents satisfy filter condition. The following $sort stage sorts the documents with "_id", so the "planSummary" is "IXSCAN { _id: 1 }".

Why the number of "keysExamined" is 10 but not 5?? In my opinion, $match will scan all documents in collections, and then get 5 documents. So $sort stage should sort those 5 documents.

I just want to know the stage execution sequence. Is $match before $sort or $sort before $match?

Thanks in advance!!!

Version: v4.2.3 Mac osx

1 Like

The execution sequence is always $match followed by $sort (irrespective of the order of these stages). The optimizer makes sure that the sort stage has documents after they are filtered (that would be less number of documents than the input to the match stage).

More details at: $sort + $match Sequence Optimization

There might be some index on the collection bar3, and this is affecting the $match stage, I think. You can post the details.

If possible, please run the explain for the aggregation query with “executionStats” mode.

To fully do match and sort from an index you would need an index on from:1,to:1,_id:1

What indexes do you have on this collection?

There is only one index _id_ on the collection bar3. So I suppose execution plan of $match is COLLSCAN and its keysExamined should be 0 and docsExamined should be 10.

> db.bar3.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.bar3"
	}
]
> db.bar3.find().count()
10
> 

Because of index _id_, I suppose execution plan of $sort is IXSCAN, and its keysExamined should be 5. Is that right?

1 Like

OK, the result of explain is following:

> db.bar3.explain("executionStats").aggregate([ { "$match" : { "from" : "0x222222", "to" : "0x333333" } }, { "$sort" : { "_id" : -1 } } ])
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.bar3",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"from" : {
						"$eq" : "0x222222"
					}
				},
				{
					"to" : {
						"$eq" : "0x333333"
					}
				}
			]
		},
		"optimizedPipeline" : true,
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"from" : {
							"$eq" : "0x222222"
						}
					},
					{
						"to" : {
							"$eq" : "0x333333"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"_id" : 1
				},
				"indexName" : "_id_",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"_id" : [ ]
				},
				"isUnique" : true,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"_id" : [
						"[MaxKey, MinKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 5,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 10,
		"totalDocsExamined" : 10,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"from" : {
							"$eq" : "0x222222"
						}
					},
					{
						"to" : {
							"$eq" : "0x333333"
						}
					}
				]
			},
			"nReturned" : 5,
			"executionTimeMillisEstimate" : 0,
			"works" : 11,
			"advanced" : 5,
			"needTime" : 5,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"docsExamined" : 10,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 10,
				"executionTimeMillisEstimate" : 0,
				"works" : 11,
				"advanced" : 10,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"keyPattern" : {
					"_id" : 1
				},
				"indexName" : "_id_",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"_id" : [ ]
				},
				"isUnique" : true,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"_id" : [
						"[MaxKey, MinKey]"
					]
				},
				"keysExamined" : 10,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "maxiaomindeMacBook-Pro.local",
		"port" : 27017,
		"version" : "4.2.3",
		"gitVersion" : "6874650b362138df74be53d366bbefc321ea32d4"
	},
	"ok" : 1
}
> 

It seems that $match stage queries from index. Really weird.

The execution sequence is always “$match” followed by “$sort”

This is exactly as expected. There is no index to use to do the $match “fast” but there is an index to do the $sort on _id and the way to do it is to look at every key in the _id index, to keep them in order and then to manually “filter” on the documents (that’s what the "FETCH" stage in explain is doing).

1 Like

Thank you for your reply. However, I still feel confused.

If I create index {from: 1, to : 1}, the totalKeysExamined is 10 but not 5. For $match, in fact, using index {from:1, to:1} to query is faster than using index {_id: 1}, however, we can see it dose not use index {from: 1, to:1} in explain. When this collection has more and more documents, this query plan will perform worse.

The command is following:

> db.bar3.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.bar3"
	},
	{
		"v" : 2,
		"key" : {
			"from" : 1,
			"to" : 1
		},
		"name" : "from_1_to_1",
		"ns" : "test.bar3"
	}
]
> db.bar3.find().count()
10
> db.bar3.explain("executionStats").aggregate([ { "$match" : { "from" : "0x222222", "to" : "0x333333" } }, { "$sort" : { "_id" : -1 } } ])
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.bar3",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"from" : {
						"$eq" : "0x222222"
					}
				},
				{
					"to" : {
						"$eq" : "0x333333"
					}
				}
			]
		},
		"optimizedPipeline" : true,
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"from" : {
							"$eq" : "0x222222"
						}
					},
					{
						"to" : {
							"$eq" : "0x333333"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"_id" : 1
				},
				"indexName" : "_id_",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"_id" : [ ]
				},
				"isUnique" : true,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"_id" : [
						"[MaxKey, MinKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 5,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 10,
		"totalDocsExamined" : 10,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"from" : {
							"$eq" : "0x222222"
						}
					},
					{
						"to" : {
							"$eq" : "0x333333"
						}
					}
				]
			},
			"nReturned" : 5,
			"executionTimeMillisEstimate" : 0,
			"works" : 11,
			"advanced" : 5,
			"needTime" : 5,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"docsExamined" : 10,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 10,
				"executionTimeMillisEstimate" : 0,
				"works" : 11,
				"advanced" : 10,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"keyPattern" : {
					"_id" : 1
				},
				"indexName" : "_id_",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"_id" : [ ]
				},
				"isUnique" : true,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"_id" : [
						"[MaxKey, MinKey]"
					]
				},
				"keysExamined" : 10,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "maxiaomindeMacBook-Pro.local",
		"port" : 27017,
		"version" : "4.2.3",
		"gitVersion" : "6874650b362138df74be53d366bbefc321ea32d4"
	},
	"ok" : 1
}
> 

As official document said, the execution sequence is $match followed by $sort, so, what is the sequence for index usage? How to view the process of selecting indexes by the MongoDB Optimizer in detail? Why use this index instead of other indexes? I think this will be helpful for data modeling and index creation.

My initial suggestion was to create an index on all three fields ( {from:1,to:1,_id:1} ) - then the index would be used for both $match and $sort.

2 Likes