M121 Chapter 3 $group and accumulator setup

Ok… The lab says to figure the max, min, standard Deviation, and average for all movies that won at least 1 oscar.

Here is my quesiton… This would mean that the first thing I need to do is match all oscar awards greater then one.

My challenge is in the sanitation stage… The current field is “awards” : “1 won” with the number won being the first part of the string. How can I look at a string and match a subset of it (e.g. the word “won”)?

Would the use of $split on the awards string to remove the " " which will return the number as one element and the “won” as the second… Then use a $cond: operator to check for “won” in the array and if exists, then count the movie, else don’t include it?

Or am I making this harder than it actually is?

Update…
I think I am almost there… But I need a nudge…

I started by matching all documents that have an awards field and imdb.ratings field.

I then added a field and looked into the awards field and split the field into an array containing the words… (I spit on the ’ ’ (space))

I then matched the field for all documents that contain “win.”

I can project up to this point and my document count is going down…

But when I use the group stage, I cannot get the accumulator expressions to execute.

I am using
“highest_rating”: {$max: “$imdb.rating”} but this is outputting an empty " ".

I know I have documents there cause I use the "count: {$sum: 1} and it shows 1689.

This is my output

[
  {
    "_id": null,
    "highest_rating": "",
    "count": 1689
  }
]

Any ideas?

Do you have any $project stage prior to your $max grouping? If you do any and you do not project imdb.rating then this field is not present anymore.

That is not the requirement. The following are the supplied examples:

Won 13 Oscars
Won 1 Oscar

So just to be clear, it is Won rather than win.

Lastly, take a look at https://docs.mongodb.com/manual/reference/operator/query/regex/ and https://docs.mongodb.com/manual/reference/operator/aggregation/regexMatch/.

@steevej,
Thank you for your response… I would like to point out that according to my output… the data contained in the string “awards” where the oscar winners is kept does not contain the word Won… The field uses the word win…

I verified this by trying to match the word won and it returns an empty set.

Also… I appreciate you pointing out the regex operator… To the course administrators, this would have been really handy to know how to do… My only problem is that the commands that are taught, require the use of commands that are not taught and there is no reference to them.

Because of this, I converted the string into an array and then matched based on the different entries… (it allows me to match based on the word “win”)… So it does start to weed down the number of selected entries. but I cannot get the $max operator to return a value. It returns an empty set.

Can Someone from the Curriculum Team please contact me… I need clarification on the above sequence along with some guidance on accumulator operators. Thanks

Because it is Won rather than won.

Ok… This is getting frustrating… Doesn’t this render the same result? The awards field is very poorly designed in the first place. So… I would like to point out that if you look at my matching results. entering the word Won or win should produce the same results.

[
  {
    "_id": {
      "$oid": "573a1397f29313caabce69a8"
    },
    "awards": "Won 1 Oscar. Another 2 wins & 1 nomination.",
    "imdb": {
      "rating": 4.5
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "2",
      "wins",
      "&",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a1397f29313caabce7d5d"
    },
    "awards": "Won 1 Golden Globe. Another 3 wins & 9 nominations.",
    "imdb": {
      "rating": 4.5
    },
    "oscar": [
      "Won",
      "1",
      "Golden",
      "Globe.",
      "Another",
      "3",
      "wins",
      "&",
      "9",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1397f29313caabce6f14"
    },
    "awards": "Won 1 Oscar. Another 1 win.",
    "imdb": {
      "rating": 5
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "1",
      "win."
    ]
  },
  {
    "_id": {
      "$oid": "573a13b5f29313caabd4408f"
    },
    "awards": "Won 2 Primetime Emmys. Another 8 wins & 17 nominations.",
    "imdb": {
      "rating": 5.1
    },
    "oscar": [
      "Won",
      "2",
      "Primetime",
      "Emmys.",
      "Another",
      "8",
      "wins",
      "&",
      "17",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1394f29313caabce0d54"
    },
    "awards": "Won 1 Oscar. Another 1 win & 2 nominations.",
    "imdb": {
      "rating": 5.4
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "1",
      "win",
      "&",
      "2",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1398f29313caabce90bd"
    },
    "awards": "Won 1 Primetime Emmy. Another 1 nomination.",
    "imdb": {
      "rating": 5.5
    },
    "oscar": [
      "Won",
      "1",
      "Primetime",
      "Emmy.",
      "Another",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a1398f29313caabce97d4"
    },
    "awards": "Won 1 Primetime Emmy. Another 2 nominations.",
    "imdb": {
      "rating": 5.5
    },
    "oscar": [
      "Won",
      "1",
      "Primetime",
      "Emmy.",
      "Another",
      "2",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a139af29313caabcf01ff"
    },
    "awards": "Won 1 Oscar. Another 9 wins & 10 nominations.",
    "imdb": {
      "rating": 5.5
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "9",
      "wins",
      "&",
      "10",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1395f29313caabce1ea7"
    },
    "awards": "Won 1 Golden Globe. Another 1 nomination.",
    "imdb": {
      "rating": 5.6
    },
    "oscar": [
      "Won",
      "1",
      "Golden",
      "Globe.",
      "Another",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a1397f29313caabce676e"
    },
    "awards": "Won 1 Oscar. Another 1 nomination.",
    "imdb": {
      "rating": 5.6
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a1397f29313caabce6352"
    },
    "awards": "Won 1 Golden Globe. Another 1 nomination.",
    "imdb": {
      "rating": 5.7
    },
    "oscar": [
      "Won",
      "1",
      "Golden",
      "Globe.",
      "Another",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a1397f29313caabce5f7a"
    },
    "awards": "Won 1 Primetime Emmy. Another 3 nominations.",
    "imdb": {
      "rating": 5.7
    },
    "oscar": [
      "Won",
      "1",
      "Primetime",
      "Emmy.",
      "Another",
      "3",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1391f29313caabcd8f66"
    },
    "awards": "Won 1 Oscar. Another 4 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "4",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1396f29313caabce3baa"
    },
    "awards": "Won 1 Oscar. Another 3 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "3",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1396f29313caabce554c"
    },
    "awards": "Won 1 Oscar. Another 2 wins & 7 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "2",
      "wins",
      "&",
      "7",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1398f29313caabce9532"
    },
    "awards": "Won 1 Oscar. Another 1 win & 1 nomination.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "1",
      "win",
      "&",
      "1",
      "nomination."
    ]
  },
  {
    "_id": {
      "$oid": "573a13a9f29313caabd1ea5e"
    },
    "awards": "Won 1 Primetime Emmy. Another 4 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Primetime",
      "Emmy.",
      "Another",
      "4",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a13b8f29313caabd4b427"
    },
    "awards": "Won 1 Oscar. Another 2 wins & 8 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Oscar.",
      "Another",
      "2",
      "wins",
      "&",
      "8",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a13e7f29313caabdc7f1c"
    },
    "awards": "Won 1 Primetime Emmy. Another 3 nominations.",
    "imdb": {
      "rating": 5.8
    },
    "oscar": [
      "Won",
      "1",
      "Primetime",
      "Emmy.",
      "Another",
      "3",
      "nominations."
    ]
  },
  {
    "_id": {
      "$oid": "573a1394f29313caabce05e4"
    },
    "awards": "Won 1 Golden Globe. Another 1 win.",
    "imdb": {
      "rating": 5.9
    },
    "oscar": [
      "Won",
      "1",
      "Golden",
      "Globe.",
      "Another",
      "1",
      "win."
    ]
  }
]

It seems to me that for every Won entry, there is another win… But Ok… Now can we talk about the accumulator statements not returning a result?

Yes, I projected them after the first Match… Do I also need to project it after the second $match?

My order of operations is
$match
$addFields
$match
$project.

I project the imdb.rating, awards, and my added field.

Agree! The https://developer.mongodb.com/how-to/attribute-pattern/ would be a good fit for a redesign.

I guess that for educational purpose, sometimes you need to do that in order to introduce some concepts.

Some Won and Oscar without win from

"_id": {
      "$oid": "573a1397f29313caabce676e"
    },
    "awards": "Won 1 Oscar. Another 1 nomination.",

"_id": {
      "$oid": "573a1391f29313caabcd8f66"
    },
    "awards": "Won 1 Oscar. Another 4 nominations.",

"_id": {
      "$oid": "573a1396f29313caabce3baa"
    },
    "awards": "Won 1 Oscar. Another 3 nominations.",

Also some of the others with Won and Oscar have the word wins or win. (with a dot) which are both different from win (without a dot).

I do not think that you do.

For the accumulator, you need a $group which is not present in:

Which I assume you had from your first post:

I think you need to share your $group stage so that we can find why

Make sure you enclose the stage within 2 lines of triple back ticks ``` to make sure we see all the quotes, spaces and dollar signs.

1 Like

Hi @David_Thompson,

I clearly understand your frustration. Please know that we are already revamping the course to improve the learning experience for our users. We will soon re-launch the course.

I completely agree with you. Let me point you to some useful resources:

To simplify you can use your regex in the $match stage as below:

$match: {
      awards: <regex pattern>
    } 

The $match stage should return 914 documents. Then you can proceed to use the $group stage to find highest_rating, lowest_rating, average_rating and deviation

Please feel free to reach out if you have any additional questions.

Kind Regards,
Sonali Mamgain

Thank you @Sonali_Mamgain and @steevej for your patience with me…

I hold back on a lot of my comments because I really enjoy the classes and I like the way this class is set up. It forces you to get into the documentation which is what any good developer will be doing. I don’t want to air things in public forums and run the risk of turning others off on MongoDB… (I think I have truly found my calling, LOL)

I want to ask about the awards field. As a developer, I believe it highly likely that you will encounter a field or two formatted similar to the way that the award field is formatted. Too many times we encounter businesses that don’t know what they want to collect or how to do it… so they tend to just mash multiple items into one text field.

What I wanted your opinion on is this (the aggregation pipeline) seems to be a perfect solution to selecting these fields. Where I am not as confident is whether it is good practice to use the aggregation pipeline to effect any schema changes on the field. For example the awards field should be an embedded document with the award as a separate field and the won or nominated as an integer field

Example

award.oscar.won or award.oscar.nomoinated

So… where I am not confident is how would you take a string field and break it down into the respective schema update.

Example

award: “Won 1 oscar. Nominated for 1 oscar. Another 3 wins.”

Take that string and transform the data to
awards.oscar.won: 4
awards.oscar.nominated: 1

Storing the data this way makes the query much simpler.

I approached it by breaking the string into an array (splitting it on the " " ) and then using the $in: operator to look into the array and return the ones with the “Win” in them. Is this basically the same as using the $regex operator?

Thoughts?