Order of returned array in a Lookup

With a lookup aggregation, is the order of the returned array configurable?
Ideally I would want it to be in order of _id or date of the looked up records.

Ok, so let’s solve this by example.

Insert those 3 documents separately, so it will be clear, in what order they were inserted (‘position’ property reflects the insertion order):

db.test2.insertOne({ position: 1, linkedTo: 'B' });
db.test2.insertOne({ position: 2, linkedTo: 'A' });
db.test2.insertOne({ position: 3, linkedTo: 'C' });

Now, insert linked documents:

db.test1.insertMany([ 
  { name: 'B' }, 
  { name: 'C' }, 
  { name: 'A' } 
]);

If we use this aggregation:

db.test1.aggregate([
  {
    $lookup: {
      from: 'test2',
      localField: 'name',
      foreignField: 'linkedTo',
      as: 'joined',
    }
  },
  {
   // this stage used just to simplify the output
    $project: {
      'joined.position': 1,
    }
  },
]).pretty();

We will get the documents, in the order they were inserted in the ‘test1’ collection.

[
  {
    "_id" : ObjectId("5f11e59833d75e5a740b1790"),
    "joined" : [
      { "position": 1 }
    ]
  },
  {
    "_id" : ObjectId("5f11e59833d75e5a740b1791"),
    "joined" : [
      { "position": 3 }
    ]
  },
  {
    "_id": ObjectId("5f11e59833d75e5a740b1792"),
    "joined": [
      { "position": 2 }
    ]
  }
]

Now, if we want to sort the documents by some property from joined (looked-up) collections, we can add $sort stage:

db.test1.aggregate([
  {
    $lookup: {
      from: 'test2',
      localField: 'name',
      foreignField: 'linkedTo',
      as: 'joined',
    }
  },
  {
   // this stage used just to simplify the output
    $project: {
      'joined.position': 1,
    }
  },
  {
    $sort: {
      'joined.position': 1,
    }
  }
]).pretty();

Here you go! All sorted by joined document’s property:

[
  {
    "_id" : ObjectId("5f11e59833d75e5a740b1790"),
    "joined" : [
      { "position": 1 }
    ]
  },
  {
    "_id" : ObjectId("5f11e59833d75e5a740b1792"),
    "joined" : [
      { "position": 2 }
    ]
  },
  {
    "_id" : ObjectId("5f11e59833d75e5a740b1791"),
    "joined" : [
      { "position": 3 }
    ]
  }
]
1 Like

@slava is Correct.

In any query language i would say you need to use sorts to gurantee the order of the records/documents otherwise its more a matter of luck …

In my use case test 2 has multiple records with linkedTo = B. Therefore the lookup will return more than one test2 item. I want to ensure I can pick the first version, by _id, of the joined test 2 array.

In a $group, I would sort the values first to ensure their eventual order, … but in a lookup there does not seem to be a way to sort the results from the ‘from table’.

Option: 1. Lookup returns a List . This infers an order. Is the joined list from test2 ordered by _id already? If so, job done. From initial tests this seems to be the case but I would like confirmation.

Option 2 : The order is not guaranteed. I’ll need to add extra code to filter the result by the minimum _id, which will hit performance on an already slow and creaking aggregation pipeline.

Any guidance gratefully accepted.

Hello, @Neil_Albiston1, @Pavel_Duchovny!

By default, documents are returned in the order, they were written to DB. There is a chance, that the documents may be returned from the collection in the desired order. But, as @Pavel_Duchovny stated above, it is chance, not a guarantee. So, better to use $sort stage and add indexes on sorting fields to improve performance of the aggregation queries, so they do not ‘creak’ :wink:

Let’s make another dataset for this case:

db.teams.insertMany([
  { _id: 't2', name: 'B', country: 'US' },
  { _id: 't1', name: 'A', country: 'Canada' },
]);

db.players.insertMany([
  { _id: 'p5', fromTeam: 'A', player: 'Bob' },
  { _id: 'p4', fromTeam: 'B', player: 'Bill' },
  { _id: 'p2', fromTeam: 'B', player: 'Luke' },
  { _id: 'p1', fromTeam: 'A', player: 'Drake' },
  { _id: 'p3', fromTeam: 'B', player: 'Oswald' },
]);

If we run this aggregation:

db.teams.aggregate([
  {
    $lookup: {
      from: 'players',
      localField: 'name',
      foreignField: 'fromTeam',
      as: 'players',
    }
  },
]).pretty();

We will get this result:

[
  {
    "_id" : "t2",
    "name" : "B",
    "country" : "US",
    "players" : [
      { "_id" : "p4", "fromTeam" : "B", "player" : "Bill" },
      { "_id" : "p2", "fromTeam" : "B", "player" : "Luke" },
      { "_id" : "p3", "fromTeam" : "B", "player" : "Oswald" }
    ]
  },
  {
    "_id" : "t1",
    "name" : "A",
    "country" : "Canada",
    "players" : [
      { "_id" : "p5", "fromTeam" : "A", "player" : "Bob" },
      { "_id" : "p1", "fromTeam" : "A", "player" : "Drake" }
    ]
  }
]

Notice, that documents are returned in order, they has been created, not by _id field.

Let’s add some sorting:

db.teams.aggregate([
  {
    $sort: {
      // sort documents in 'teams' collection
      _id: 1
    }
  },
  {
    $lookup: {
      from: 'players',
      let: {
        // this is needed, so we can use it in 
        // the $match stage below
        teamName: '$name',
      },
      pipeline: [
        {
          // sort documents in 'players' collection
          $sort: {
            _id: 1
          }
        },
        {
          $match: {
            $expr: {
              $eq: ['$fromTeam', '$$teamName'],
            },
          }
        },
      ],
      as: 'players',
    }
  },
]).pretty();

Sorted ouput:

[
  {
    "_id" : "t1",
    "name" : "A",
    "country" : "Canada",
    "players" : [
      { "_id" : "p1", "fromTeam" : "A", "player" : "Drake" },
      { "_id" : "p5", "fromTeam" : "A", "player" : "Bob" }
    ]
  },
  {
    "_id" : "t2",
    "name" : "B",
    "country" : "US",
    "players" : [
      { "_id" : "p2", "fromTeam" : "B", "player" : "Luke" },
      { "_id" : "p3", "fromTeam" : "B", "player" : "Oswald" },
      { "_id" : "p4", "fromTeam" : "B", "player" : "Bill" }
    ]
  }
]

If you need to get only 1 (and first, according to your ordering rules, defined by $sort stage) document from joined collection, you need to add $limit stage after your $match stage.

Fortunately my _ids are mongo ObjectIds which contain a timestamp so the _ids are in creation order.
…but I think I will be safe , as you suggest, and add a filter on maximum date, or a sort & limit. Hope it doesn’t impact performance too much.

Thank you for your help.