Lets consider a document structure with arrays with sub-documents as elements. For example, a collection with these two documents:
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : [
{
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
},
{
"a" : 2,
"b" : [
{
"x" : 111,
"y" : 222
}
]
}
]
},
{
"_id" : 2,
"fld1" : "str-2",
"fld2" : [
{
"a" : 9,
"b" : [
{
"x" : 91,
"y" : 92
},
{
"x" : 911,
"y" : 922
}
]
}
]
}
The Query:
In an aggregation query, the $match+$project
stage filtering and $match+$unwind+$match
filtering work differently on a document. But, can produce similar output.
The following query is using $match+$unwind+$match
filter. Note the output:
db.nests2.aggregate( [
{ $match: { fld1: "str-1" } },
{ $unwind: "$fld2" },
{ $match: { "fld2.a": 1 } },
] ).pretty()
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : {
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
}
}
The following query is using $match
(filter on multiple levels) + $project
(using $addFields
in the example).
db.nests2.aggregate( [
{ $match: { fld1: "str-1", "fld2.a": 1 } },
{ $addFields: { fld2: {
$filter: {
input: "$fld2",
cond: {
$eq: [ "$$this.a", 1 ]
}
}
} } }
] ).pretty()
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : [
{
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
}
]
}
Indexes:
If a compound index is defined on the keys { fld1 : 1, "fld2.a" : 1 }
, this index is applied on both the queries. But, on the first $match
stage only. This can be verified by generating a query plan using explain()
on both queries.
( [EDIT ADD]: Even if an index is created on the single field { "fld2.a": 1 }
it will not be used by the query).
Same goes as we go down multiple levels down. For example in the following case with $match
and the index:
{ $match: { fld1: "str-1", "fld2.a": 1, "fld2.b.y": 22 } }
and the index, { fld1 : 1, "fld2.a" : 1, "fld2.b.y": 1 }
NOTES:
In general, indexes on the document fields are best utilized when the aggregation stages that use the index come early in the pipeline (mostly as the first stage as in this case). The stages $match
and $sort
use the indexes for performance. Also see documentation: Aggregation Pipeline Optimization.
Note that creating indexes on arrays (a.k.a. multikey indexes ) on large array fields can affect performance, as the indexes can grow very large.