Lab 4.2 - Aggregation not using partial index - SOLVED

Looking for some direction here.

The aggregation query includes a $sort stage, but from what I can tell that stage provides no functional purpose to the query. Do we just use the aggregation query as provided even though it includes a meaningless $sort stage? Or do you expect that we would look to modify the query? (the reason I ask is that the best index would likely be different if we didn’t need to sort on stars. The memory exception would also go away, but I expect there would still be value in building an index for other perf reasons).

As well, is the essence of “smallest index” in the question intended to include the concept of Partial Index? I’ve created an index on {stars: 1, cuisine: 1} where {stars: $gt 2}. I would expect the query to use this index, but for some reason it is deferring to a COLLSCAN.

Can you provide some guidance?

@ darrvar

Since you don’t know the context of the aggregation, you really don’t know whether it “…provides no functional purpose to the query.” Moreover, this is a test, and you can just take the question as posed if you want to get the correct answer.

The intention of the “smallest index” is the number of elements in the index, not the size of the index. Good luck.

Thanks David for the clarity. I was able to get the correct answer to the question, but I still am seeing some curious behavior by the optimizer, and would appreciate if you could help me understand.

If I create an index {stars: 1}, then the optimizer elects to use that index in the aggregation. However, I wanted to play around a bit deeper. I dropped the {stars: 1} index, and decided to build a partial index on {stars: 1} where {stars: $gt 2}, which aligns perfectly with the criteria in the $match stage. I figured this would be an even more efficient path for the optimizer to choose, due to the smaller index size. However, I was surprised to find that the partial index was completely ignored, and instead the optimizer elected to perform a COLLSCAN.

Is this the behavior that you would expect? If so I would appreciate help in understanding. Thank you.

@ darrvar

Sorry, not knowing exactly what you’ve done, I can’t really answer your question; and, since this is much deeper into the theory than we have time for, I’m not going to spend much time on it right now

As I get some time over the next week, I’ll look into this a bit and post back here if I have any suggestions. Meanwhile, notice that the documentation on partial indexes here definitely warns you of multiple issues that can derail the use of the index.

I’m surprised that your optimizer does a collscan. Mine uses the partial index on mongo 3.4.19

It might depends on the query. If the optimizer knows that only documents in the partial index matches it will use the index otherwise it won’t.

If partial index as given above and you do find( { stars : 1 } ) then COLLSCAN but for find( { stars : 3 } ) then index scan.

I’m not sure I have any real guidance. I created the partial index just like and running the aggregation

db.restaurants.aggregate([
{ $match: { stars: { $gt: 2 } } },
{ $sort: { stars: 1 } },
{ $group: { _id: “$cuisine”, count: { $sum: 1 } } }
])

it completes. When I use the explain it shows that I’m using the partial index

{
“stages” : [
{
“$cursor” : {
“query” : {
“stars” : {
“$gt” : 2
}
},
“sort” : {
“stars” : 1
},
“fields” : {
“cuisine” : 1,
“_id” : 0
},
“queryPlanner” : {
“plannerVersion” : 1,
“namespace” : “m201.restaurants”,
“indexFilterSet” : false,
“parsedQuery” : {
“stars” : {
“$gt” : 2
}
},
“winningPlan” : {
“stage” : “FETCH”,
“inputStage” : {
“stage” : “IXSCAN”,
“keyPattern” : {
“stars” : 1
},
“indexName” : “stars_1”,
“isMultiKey” : false,
“multiKeyPaths” : {
“stars” :
},
“isUnique” : false,
“isSparse” : false,
“isPartial” : true,
“indexVersion” : 2,
“direction” : “forward”,
“indexBounds” : {
“stars” : [
“(2.0, inf.0]”
]
}
}
},
“rejectedPlans” :
}
}
},
{
“$group” : {
“_id” : “$cuisine”,
“count” : {
“$sum” : {
“$const” : 1
}
}
}
}
],
“ok” : 1
}

Because you

The optimizer knows that the partial index could satisfy the query so it uses it. If you match $lte:2 it will probably be a collection scan.

Thanks guys. Happy to say I am now seeing the partial index being used in the aggregation query. It appears that I had bad syntax in the index creation. I was using:

db.restaurants.createIndex({ stars: 1 }, {partialFilterExpression: {stars: {gt: 2}}})

(Correct me if I’m wrong, but I believe this essentially builds the index on the subset of restaurants documents that have “stars” equal to the true/false result of FIELD “gt” equaling the value 2. Instead of erroring out, the flexible schema of MongoDB allows this to succeed on the chance that documents may have a “gt” field in the future. Obviously not the intention)

If I instead use:
db.restaurants.createIndex({ stars: 1 }, {partialFilterExpression: {stars: {$gt: 2}}})

….then the aggregation query does pickup the IXSCAN as expected.

Thanks all for your input.