Aggregation by Date

MongoDB 3.6.18

I have two queries which work…

var q1 = [
    { $unwind: '$resp.'+starCode },
    { $project: { stars: '$resp.'+starCode } },
    { $group: {
        _id: 1,
        total: { $sum: '$stars' },
        count: { $sum: { $cond: [{ $ne: ['$stars',''] },1,0] } },
        average: { $avg: '$stars' }          
        }
    }
]

which returns…

[
 {
  "_id": 1,
  "total": 51,
  "count": 14,
  "average": 3.642857142857143
 }
]

And another…

var q2 = [       
    { $group:  {
        _id: { $dateToString: { format: '%H:%M %Y-%m-%d', date: '$date' } },
        count: { $sum: 1 }
      }
    }        
]

which returns…

[
 {
  "_id": "14:58 2020-05-14",
  "count": 2
 },
 {
  "_id": "14:46 2020-05-14",
  "count": 2
 },
etc,etc,etc.

However when I try an combine these ( count of stars in the period ) using…

var q3 = [
    { $unwind: '$resp.'+starCode },
    { $project: { stars: '$resp.'+starCode } }, 
    { $group:  {
        _id: { $dateToString: { format: '%H:%M %Y-%m-%d', date: '$date' } },
        total: { $sum: '$stars' },
        count: { $sum: 1 }
      }
    }            
]

The the _id value becomes NULL and there’s no grouping - ie…

[
 {
  "_id": null,
  "total": 51,
  "count": 14
 }
]

Any Suggestions welcome.

:wave:

Hi @Peter_Alderson, your $project stage is saying only pass on the stars field to to the next stage. That means that date if not present in the $group stage. Try adding date: 1 into the $project stage to see if you get the expected results.

1 Like

As @Doug_Duncan mentioned you can change your $project stage, or instead use a $addFields stage:

{ $addFields: { stars: '$resp.'+starCode } }

Both these stages have some common functionality, and different purposes. $project is mostly used to include / exclude fields from the document to be accessed in the following stage. $addFields adds new fields to the already existing ones in a document.

In your case, the $project stage looks appropriate as you are using only two fields in the following $group stage.

1 Like

Thanks @Doug_Duncan.

Yep, that worked. I was assuming that $project was additive !

Peter

Hi @Peter_Alderson glad that worked out for you.

$project either I want this list of fields, or I don’t want this subset returned.

For adding in new fields, while keeping all the rest that are in the current pipeline, you would use $addFields as @Prasad_Saya mentioned earlier.

Which you use depends on which fields you need from that point on. It’s best to only send the fields on that you need to complete the pipeline to save on the amount of data being passed around.

The aggregation pipeline automatically determines fields that are required, so it is actually best to only add a $project stage if results need to be renamed or reshaped (typically at the end of a pipeline).

Adding an early $project stage can be less efficient because it bypasses the automatic dependency analysis.

If you don’t need to rename or reshape results, there is no need to include a $project stage.

Regards,
Stennie

1 Like

The aggregation can be improved with the pipeline

var q3 = [
    { $unwind: '$resp.'+starCode },
    { $project: { stars: '$resp.'+starCode } }, 
    { $group:  {
        _id: { $dateToString: { format: '%H:%M %Y-%m-%d', date: '$date' } },
        total: { $sum: '$stars' },
        count: { $sum: 1 }
      }
    }            
]

replaced with the following, and eliminating the $project stage:

var q3 = [
    { $unwind: '$resp.'+starCode },
    { $group:  {
        _id: { $dateToString: { format: '%H:%M %Y-%m-%d', date: '$date' } },
        total: { $sum: '$resp.'+starCode },
        count: { $sum: 1 }
      }
    }    
]

Hi @Prasad_Saya,

Can you explain to me the benefit of the improvement you identify ?

Peter

Not having the $project stage. This means your aggregation doesn’t have to scan all the documents after the initial $unwind stage. That is less processing. It matters when there are a large number of documents.

@stennie, you know more about MongoDB and it’s inner workings than I do since you’ve worked at MongoDB for a number of years now, and I am glad that you are here to share your knowledge.

The link you posted is to only the optimization where MongoDB can pull up $match() stages if they are after projections. I think what you really meant was to link the the section above that:

Projection Optimization

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

Now having said that, I don’t know that I necessarily agree with your statement:

Sure the optimizer can figure out the fields necessary to pass through the pipeline and I see that being a great thing badly written aggregation pipelines. I prefer, however, to be explicit about what I am passing through (it helps me see the data that I’m interested in, especially if there are a large number of fields) and until it’s proven that running a projection, early in the process, can cause performance issues I will continue to $project() only the fields that I need in my pipelines as early as I can. That doesn’t mean that my way is better, but if I’m not having any performance issues, I’m OK with doing it this way. Also isn’t the optimizer just implicitly doing what I’m explicitly doing?

In addition I will think that if you $project fields of a compound index you avoid a document fetch.

1 Like

Stennie is right, $project can make your pipeline less efficient if you happen to specify a field you don’t need or forget to exclude a field you don’t.

For instance in order for a covered index on a and b to be used in an explicit projection you would have to remember to exclude _id if it’s not in the index and not needed.

Unnecessary stages are better left out.

In fact, in this particular aggregation you don’t even need $unwind - it should just be a single $group stage.

Asya

3 Likes

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.