Getting random documents with aggregate $group

basically I’m getting random documents when I apply $group but I don’t want to get random documents for example
I have 6 documents like
1
2
3
4
5
6
if I apply groupby
1,2
3,5
4,6
let suppose these results after groupby
but when I run the query always the mongodb return random like
3,5
1,2
4,6
or
4.6
3,5
1,2
or
2,1
3,5
6,4

I’m using aggregates

this is the query
indent preformatted text by 4 spaces

                      {
      $lookup: {
        from: "products",
        let: { categoryId: "$_id", parentModel: "$parentModel" },
        pipeline: [
          {
            $match: {
              $expr: {
                $in: ["$$categoryId", "$category"],
              },
            },
            },                               
           {
            $sort: { price: 1 },
          },
          {
            $group: {
              _id: "$parentModel",
              parentModel: { $addToSet: "$parentModel" },
              category: { $first: "$category" },
              mainImage: { $first: "$mainImage" },
              slug: { $first: "$slug" },
              height: { $first: "$height" },
              width: { $first: "$width" },
              length: { $first: "$length" },
              price: { $first: "$price" },
              title: { $first: "$title" },
              groupedItem: { $first: "$groupedItem" }, 
              test: { $first: "$_id" }, 
            },
          },
          {
            $facet: {
              paginatedResults: [
                { $skip: skipPage },
                { $limit: PAGE_SIZE },
                {
                  $project: {
                    _id: "$test",
                    mainImage: 1,
                    slug: 1,
                    height: 1,
                    width: 1,
                    length: 1,
                    price: 1,
                    title: 1,
                    groupedItem: 1, 
                  }, 
                },
              ],
              totalCount: [
                {
                  $count: "count",
                },
              ],
            },
          },
        ],
        as: "products",
      },
    },

It will help us help you better if you could post real well formatted sample documents, with real well formatted sample desired output.

The main issues is that when we do not have real documents to work with it is often too time consuming to work on your problem and we simply skip over.

2 Likes
{
  $lookup: {
    from: "products",
    let: { categoryId: "$_id", parentModel: "$parentModel" },
    pipeline: [
      {
        $match: {
          $expr: {
            $in: ["$$categoryId", "$category"],
          },
        },
      },
      {
        $sort: { price: 1 },
      },
      {
        $group: {
          _id: "$parentModel",
          parentModel: { $addToSet: "$parentModel" },
          category: { $first: "$category" },
          mainImage: { $first: "$mainImage" },
          slug: { $first: "$slug" },
          height: { $first: "$height" },
          width: { $first: "$width" },
          length: { $first: "$length" },
          price: { $first: "$price" },
          title: { $first: "$title" },
          groupedItem: { $first: "$groupedItem" }, 
          test: { $first: "$_id" }, 
        },
      },
      {
        $facet: {
          paginatedResults: [
            { $skip: skipPage },
            { $limit: PAGE_SIZE },
            {
              $project: {
                _id: "$test",
                mainImage: 1,
                slug: 1,
                height: 1,
                width: 1,
                length: 1,
                price: 1,
                title: 1,
                groupedItem: 1, 
              }, 
            },
          ],
          totalCount: [
            {
              $count: "count",
            },
          ],
        },
      },
    ],
    as: "products",
  },
},

even i’m applying sorting but every time i’m getting random data

Sorry but without

and

it too much time consuming for me. Hopefully others will step in.

Hi,

We are facing the same behaviour when upgrading from 4.0.27 to 4.2.18 (Atlas). A pipeline that used to return objects with a consistent order is now returning with a random order. We figured it out because the pagination was not consistent, returning elements from the previous page. After investigation, we found out that the group is not returning always the same orders for groups.

We managed to replicate the issue with a dummy collection and a small subset of data from the doc.

On mongodb 4.2.18

db.books.insertMany([
  { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
  { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
  { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
  { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
  { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

The query in the doc is always consistent.

db.books.aggregate([
  { $group : { _id : "$author", books: { $push: "$title" } } }
])
{ _id: 'Dante',  books: [ 'The Banquet', 'Divine Comedy', 'Eclogues' ] }
{ _id: 'Homer', books: [ 'The Odyssey', 'Iliad' ] }

Anyway, when grouping by two fields the groups are listed in a different order almost every time we run the query.

db.books.aggregate([
{$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}])

For example:

{ _id: {author:"Dante", copies:1}, books: [ 'Divine Comedy'] }
{ _id: {author:"Dante", copies:2}, books: [ 'The Banquet', 'Eclogues' ] }
{ _id: {author:"Homer", copies:10} , books: [ 'The Odyssey', 'Iliad' ] }

and

{ _id: {author:"Dante", copies:1}, books: [ 'Divine Comedy'] }
{ _id: {author:"Homer", copies:10} , books: [ 'The Odyssey', 'Iliad' ] }
{ _id: {author:"Dante", copies:2}, books: [ 'The Banquet', 'Eclogues' ] }

We cannot really say what order should be applied.

The doc also reports:

When using $push in a $group stage, the order of the documents in the output array is determined by the order of the documents coming into pipeline. To guarantee a defined order, the $group pipeline stage should follow a $sort stage.

Does this mean that one should apply a sort in the collection before the group?
By doing so we are able to get always the same result:

db.books.aggregate([
{$sort: {
 author: 1,
 copies: 1
}}, {$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}])
{ _id: {author:"Dante", copies:2}, books: [ 'The Banquet', 'Eclogues' ] }
{ _id: {author:"Homer", copies:10} , books: [ 'The Odyssey', 'Iliad' ] }
{ _id: {author:"Dante", copies:1}, books: [ 'Divine Comedy'] }

Anyway, we would not expect that order as Dante:1 comes in before Dante:2.

On mongodb 4.0.27

db.books.insertMany([
  { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
  { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
  { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
  { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
  { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])
db.books.aggregate([
  { $group : { _id : "$author", books: { $push: "$title" } } }
])
{ _id: 'Dante',  books: [ 'The Banquet', 'Divine Comedy', 'Eclogues' ] }
{ _id: 'Homer', books: [ 'The Odyssey', 'Iliad' ] }

Grouping by two fields:

db.books.aggregate([
{$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}])

Always returns:

{ _id: {author:"Homer", copies:10} , books: [ 'The Odyssey', 'Iliad' ] }
{ _id: {author:"Dante", copies:1}, books: [ 'Divine Comedy'] }
{ _id: {author:"Dante", copies:2}, books: [ 'The Banquet', 'Eclogues' ] }

Concluding

Is this us misunderstanding the documentation or weird behaviour in 4.2.X?

We ended up adding a final sorting step like:

db.books.aggregate([
{$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}, {$sort: {
 '_id.author': 1,
 '_id.copies': 1
}}])

But still wondering what changed between these two versions.

You should not rely in having a particular or consistent order if you do not sort.

See MongoDB: imported documents do not keep the same order if were inserted through insertMany()

Agree. In this specific case, the point is that 4.0 was returning always in the same order and we erroneously relied on it. We will double-check queries in our code base from now on.

Regarding the doc I mentioned:

When using $push in a $group stage, the order of the documents in the output array is determined by the order of the documents coming into pipeline. To guarantee a defined order, the $group pipeline stage should follow a $sort stage.

I just realised my error, I was expecting the resultset ordered with the sort key defined before the group step. But what the doc is saying is that the accumulator array order is defined by the order of the documents coming into pipeline.

db.books.aggregate([
{$sort: {
 title: -1
}}, {$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}])
{ _id: { author: 'Dante', copies: 1 },
  books: [ 'Divine Comedy' ] }
{ _id: { author: 'Dante', copies: 2 },
  books: [ 'The Banquet', 'Eclogues' ] }
{ _id: { author: 'Homer', copies: 10 },
  books: [ 'The Odyssey', 'Iliad' ] }

and

db.books.aggregate([
{$sort: {
 title: 1
}}, {$group: {
 _id: {
  author: '$author',
  copies: '$copies'
 },
 books: {
  $push: '$title'
 }
}}])
{ _id: { author: 'Homer', copies: 10 },
  books: [ 'Iliad', 'The Odyssey' ] }
{ _id: { author: 'Dante', copies: 2 },
  books: [ 'Eclogues', 'The Banquet' ] }
{ _id: { author: 'Dante', copies: 1 },
  books: [ 'Divine Comedy' ] }

This problem cost me over an hour of my life that I don’t expect to get back… I am glad I finally wondered if this could be the problem that thread confirms. Though I agree @steevej that one should not rely on a default sort order, but it is extremely counter-intuitive that the sort order after a $group aggregation has the result of being random - even when $sort is applied before it. And I would at least expect better documentation on this. I suggest adding a note about it here: https://docs.mongodb.com/manual/reference/operator/aggregation/sort/