Watch keynotes and sessions from MongoDB.live, our virtual developer conference.

Is there a benefit in adding `groupby "$$arraycolumn"` instead of unwind and groupby

Hi,

I had previously encountered a scenario where we were unwinding “tags” column and then grouping by on top of it.
Pipeline would be
{$unwind:"$tags"} ,{$group: {"_id": “$tags”}}

Looking to see if
{$group: {"_id": “$$tags”}}
could be a good alternative and if there is extra benefits of implementing a short path there.

I did not find an open ticket.

I had some cycles to kill,
I did check the source code and it looks like unwind and groupby would make duplicate BSON objects.
for each document(Hopefully I am correct here).
But It looks like writing an iterator and using the same BSON object and replacing only “$tags” with unwound values, would give better performance for the scenario.

This might be a widely used usecase.

‘$tags’ and ‘$$tags’ refer to completely different variables. You need to understand, that ‘$tags’ and ‘$$tags’ are not interchangeable.

$unwind stage before $group is used when you need to group documents, bashed on the values in the array, not the whole array.

Here are some examples, so it would be easier for you to understand.

Below is an example of using $unwind before $group:

db.sampleCollection.aggregate([
  {
    $match: {},
  },
  // the result of this stage is 1 document:
  // { _id: 1, tags: ['tag1', 'tag2', 'tag3']  }
  {
    $unwind: '$tags',
  },
  // after $unwind we will have 3 documents, because we duplicated that 
  // single document and destructured its array of 'tags':
  // { _id: 1, tags: 'tag1'  }
  // { _id: 1, tags: 'tag2'  }
  // { _id: 1, tags: 'tag3'  }
  {
    $group: {
      _id: '$tags',
    },
  },
  // if we group by 'tags' prop from the prev stage, we will get 3 groups:
  // { _id: 'tag1'  }
  // { _id: 'tag2'  }
  // { _id: 'tag3'  }
]);

Below is an example of using $group without $unwind:

db.sampleCollection.aggregate([
  {
    $match: {},
  },
  // the result of this stage is 1 document:
  // { _id: 1, tags: ['tag1', 'tag2', 'tag3']  }
  {
    $group: {
      _id: '$tags',
    },
  },
  // if we group by 'tags' prop from the prev stage, 
  // we will get only 1 document,
  // because we used the whole (not-unwound) array as a grouping key
  // { _id: ['tag1', 'tag2', 'tag3']  }
]);

And if you use ‘$$tags’ instead of ‘$tags’ in the pipelines above, you will get an error, because variable ‘$$tags’ is not defined.

To understand it better, you need to read more about aggregation pipeline in MongoDB, specifically:

Hi Slava,

Thank you for the response.

And if you use ‘$$tags’ instead of ‘$tags’ in the pipelines above, you will get an error, because variable ‘$$tags’ is not defined.

I agree with your assessment here.
Trying to propose “$$tags” be supported in group by paths and it could be made faster(compared to unwind $tags + groupby $tags) based on mongo source code.

Was there any such proposal before and was it rejected based on any evaluation.