With a lookup aggregation, is the order of the returned array configurable?
Ideally I would want it to be in order of _id or date of the looked up records.
Ok, so let’s solve this by example.
Insert those 3 documents separately, so it will be clear, in what order they were inserted (‘position’ property reflects the insertion order):
db.test2.insertOne({ position: 1, linkedTo: 'B' });
db.test2.insertOne({ position: 2, linkedTo: 'A' });
db.test2.insertOne({ position: 3, linkedTo: 'C' });
Now, insert linked documents:
db.test1.insertMany([
{ name: 'B' },
{ name: 'C' },
{ name: 'A' }
]);
If we use this aggregation:
db.test1.aggregate([
{
$lookup: {
from: 'test2',
localField: 'name',
foreignField: 'linkedTo',
as: 'joined',
}
},
{
// this stage used just to simplify the output
$project: {
'joined.position': 1,
}
},
]).pretty();
We will get the documents, in the order they were inserted in the ‘test1’ collection.
[
{
"_id" : ObjectId("5f11e59833d75e5a740b1790"),
"joined" : [
{ "position": 1 }
]
},
{
"_id" : ObjectId("5f11e59833d75e5a740b1791"),
"joined" : [
{ "position": 3 }
]
},
{
"_id": ObjectId("5f11e59833d75e5a740b1792"),
"joined": [
{ "position": 2 }
]
}
]
Now, if we want to sort the documents by some property from joined (looked-up) collections, we can add $sort stage:
db.test1.aggregate([
{
$lookup: {
from: 'test2',
localField: 'name',
foreignField: 'linkedTo',
as: 'joined',
}
},
{
// this stage used just to simplify the output
$project: {
'joined.position': 1,
}
},
{
$sort: {
'joined.position': 1,
}
}
]).pretty();
Here you go! All sorted by joined document’s property:
[
{
"_id" : ObjectId("5f11e59833d75e5a740b1790"),
"joined" : [
{ "position": 1 }
]
},
{
"_id" : ObjectId("5f11e59833d75e5a740b1792"),
"joined" : [
{ "position": 2 }
]
},
{
"_id" : ObjectId("5f11e59833d75e5a740b1791"),
"joined" : [
{ "position": 3 }
]
}
]
@slava is Correct.
In any query language i would say you need to use sorts to gurantee the order of the records/documents otherwise its more a matter of luck …
In my use case test 2 has multiple records with linkedTo = B. Therefore the lookup will return more than one test2 item. I want to ensure I can pick the first version, by _id, of the joined test 2 array.
In a $group, I would sort the values first to ensure their eventual order, … but in a lookup there does not seem to be a way to sort the results from the ‘from table’.
Option: 1. Lookup returns a List . This infers an order. Is the joined list from test2 ordered by _id already? If so, job done. From initial tests this seems to be the case but I would like confirmation.
Option 2 : The order is not guaranteed. I’ll need to add extra code to filter the result by the minimum _id, which will hit performance on an already slow and creaking aggregation pipeline.
Any guidance gratefully accepted.
Hello, @Neil_Albiston1, @Pavel_Duchovny!
By default, documents are returned in the order, they were written to DB. There is a chance, that the documents may be returned from the collection in the desired order. But, as @Pavel_Duchovny stated above, it is chance, not a guarantee. So, better to use $sort stage and add indexes on sorting fields to improve performance of the aggregation queries, so they do not ‘creak’
Let’s make another dataset for this case:
db.teams.insertMany([
{ _id: 't2', name: 'B', country: 'US' },
{ _id: 't1', name: 'A', country: 'Canada' },
]);
db.players.insertMany([
{ _id: 'p5', fromTeam: 'A', player: 'Bob' },
{ _id: 'p4', fromTeam: 'B', player: 'Bill' },
{ _id: 'p2', fromTeam: 'B', player: 'Luke' },
{ _id: 'p1', fromTeam: 'A', player: 'Drake' },
{ _id: 'p3', fromTeam: 'B', player: 'Oswald' },
]);
If we run this aggregation:
db.teams.aggregate([
{
$lookup: {
from: 'players',
localField: 'name',
foreignField: 'fromTeam',
as: 'players',
}
},
]).pretty();
We will get this result:
[
{
"_id" : "t2",
"name" : "B",
"country" : "US",
"players" : [
{ "_id" : "p4", "fromTeam" : "B", "player" : "Bill" },
{ "_id" : "p2", "fromTeam" : "B", "player" : "Luke" },
{ "_id" : "p3", "fromTeam" : "B", "player" : "Oswald" }
]
},
{
"_id" : "t1",
"name" : "A",
"country" : "Canada",
"players" : [
{ "_id" : "p5", "fromTeam" : "A", "player" : "Bob" },
{ "_id" : "p1", "fromTeam" : "A", "player" : "Drake" }
]
}
]
Notice, that documents are returned in order, they has been created, not by _id field.
Let’s add some sorting:
db.teams.aggregate([
{
$sort: {
// sort documents in 'teams' collection
_id: 1
}
},
{
$lookup: {
from: 'players',
let: {
// this is needed, so we can use it in
// the $match stage below
teamName: '$name',
},
pipeline: [
{
// sort documents in 'players' collection
$sort: {
_id: 1
}
},
{
$match: {
$expr: {
$eq: ['$fromTeam', '$$teamName'],
},
}
},
],
as: 'players',
}
},
]).pretty();
Sorted ouput:
[
{
"_id" : "t1",
"name" : "A",
"country" : "Canada",
"players" : [
{ "_id" : "p1", "fromTeam" : "A", "player" : "Drake" },
{ "_id" : "p5", "fromTeam" : "A", "player" : "Bob" }
]
},
{
"_id" : "t2",
"name" : "B",
"country" : "US",
"players" : [
{ "_id" : "p2", "fromTeam" : "B", "player" : "Luke" },
{ "_id" : "p3", "fromTeam" : "B", "player" : "Oswald" },
{ "_id" : "p4", "fromTeam" : "B", "player" : "Bill" }
]
}
]
If you need to get only 1 (and first, according to your ordering rules, defined by $sort stage) document from joined collection, you need to add $limit stage after your $match stage.
Fortunately my _ids are mongo ObjectIds which contain a timestamp so the _ids are in creation order.
…but I think I will be safe , as you suggest, and add a filter on maximum date, or a sort & limit. Hope it doesn’t impact performance too much.
Thank you for your help.