Aggregation pane in compass - beware the default mode

Hi all,
I am not sure whether this is actually a real problem. I am just notfiying this as I have been misleaded.
In the aggregation pane of Compass, the results can be inaccurate when the ‘sample mode’ button is enabled


When I worked on the User Report ticket, I first used a classic group-sort-limit pipeline in Compass, equivalent to:

[{
$group: {
_id: ‘$email’,
numComments: {
‘$sum’: 1
}
}
}, {
$sort: {
numComments: -1
}
}, {
$limit: 20
}]

The first document was different from the expected result in the UserReportTest class. I had :

{ _id : “iain_glen@gameofthron.es”, numComments:616}

instead of

{ _id:“roger_ashton-griffiths@gameofthron.es”, numComments:909}

I thought something was wrong and used $sortByCount instead.
I just did more detailed investigation, it appears that :

  1. My request is correct
  2. I have to uncheck ‘Sample mode’ to see a valid result in Compass.
    This seems like a bug to me, I will let people familiar with Compass give some feedback on this.
    I searched in the communities forums but did not found any reference to this kind of problem.

Cheers,
Loïc.

1 Like

This does not seem like a bug to me. It seems pretty clear. In my understanding the operation you want is done only on a sample of the data when sample mode is activate.

1 Like

Hi,
I personnally think that the $sort operator should be accurate even in this mode. However, I can accept that this is a way to improve the performances by giving a taste of the final result.
However, this should be highighted more explicitely, as this is the default mode. I did not notice initially that this mode was activated and I believed that the data were accurately displayed.
The purpose of the previous post was to notify every one that could be misleaded the same way as me. I let to the mongodb team draw any further conclusions.

Cheers,
Loïc.

Hi @ldecloedt,

The aggregation builder visualizer can be confusing at times, especially if you rely in a sort order.
The reason for this is that each of the stages will be processed in parallel in the aggregation builder.

What do I mean by this ?

Well, in Compass Aggregation builder, each stage will run in parallel, and may or may not contain the exact sampled dataset from the previous stage.

What does that mean exactly ?

When you build this pipeline:

[{
$group: {
_id: ‘$email’,
numComments: {
‘$sum’: 1
}
}
}, {
$sort: {
numComments: -1
}
}, {
$limit: 20
}]

What is actually being executed is the following:

1st stage:

[{$sample: 1000},{ $group: { _id: ‘$email’, numComments: { ‘$sum’: 1 } } } ]

2st stage

[{$sample: 1000},{ $group: { _id: ‘$email’, numComments: { ‘$sum’: 1 } } }, {$sort: {numComments: -1}}]

3rd stage

[{$sample: 1000},{ $group: { _id: ‘$email’, numComments: { ‘$sum’: 1 } } }, {$sort: {numComments: -1}}, {$limit: 20}]

Since every time we run add a new stage to the aggregation builder, we also sample those results, you might see different documents in the preview mode.

It is not necessarily a bug, given that this preview is to do a visual validation that the documents are what they supposed to look like. I agree, this can generate some confusion, but it is the side effect of an optimization so that you do not have to wait for a long query to execute all the time.

N.

Hi @Norberto,
I understand your point. In this case, I would suggest to add a hint of some sort to notify the users that are new to the application.

Loïc.