Lab - $group and Accumulators (averages off, and directions lacking)

These aren’t the answer to the lab. The lab doesn’t give specific directions. What could I be missing?

BlockquoteMongoDB Enterprise Cluster0-shard-0:PRIMARY> pipeline = [{$match: {
… “imdb.votes” : { $gte : 0 },
… “imdb.rating” : { $gte : 0 },
… “awards” : { “$regex” : /Oscar/i}
… }}, {$addFields: {
… everything: 1,

… }}, {$group: {
… _id: “$everything”,
… “avg_rating”: {
… “$avg”: “$imdb.rating”,
… },
… “max_rating”: {
… “$max”: “$imdb.rating”,
… },
… “min_rating”: {
… “$min”: “$imdb.rating”,
… },
… “std_rating”: {
… “$stdDevSamp”: “$imdb.rating”,
… },
… “avg_votes”: {
… “$avg”: “$imdb.votes”,
… },
… “max_votes”: {
… “$max”: “$imdb.votes”,
… },
… “min_votes”: {
… “$min”: “$imdb.votes”,
… }

… }}]
[
{
“$match” : {
“imdb.votes” : {
“$gte” : 0
},
“imdb.rating” : {
“$gte” : 0
},
“awards” : {
“$regex” : /Oscar/i
}
}
},
{
“$addFields” : {
“everything” : 1
}
},
{
“$group” : {
“_id” : “$everything”,
“avg_rating” : {
“$avg” : “$imdb.rating”
},
“max_rating” : {
“$max” : “$imdb.rating”
},
“min_rating” : {
“$min” : “$imdb.rating”
},
“std_rating” : {
“$stdDevSamp” : “$imdb.rating”
},
“avg_votes” : {
“$avg” : “$imdb.votes”
},
“max_votes” : {
“$max” : “$imdb.votes”
},
“min_votes” : {
“$min” : “$imdb.votes”
}
}
}
]
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.aggregate(pipeline).pretty()
{
“_id” : 1,
“avg_rating” : 7.327883538633819,
“max_rating” : 9.3,
“min_rating” : 3.9,
“std_rating” : 0.6523259640926787,
“avg_votes” : 69129.29301978351,
“max_votes” : 1521105,
“min_votes” : 64
}
MongoDB Enterprise Cluster0-shard-0:PRIMARY>

Your regex does not match only the documents specified in the lab.

For example, it will match

{awards: 'Nominated for 1 Oscar. Another 1 win.'}

And the lab gave 2 specific examples:

Won 1 Oscar
Won 13 Oscars

You also filter out imdb.votes:0 and imdb.rating:0. This is not one of the lab requirement.

That lab states NOTHING about restricting the votes or scoring. Should it?

Looking for greater than or equal to zero just ensures a numeric value. You can’t really average with a non-numeric, or can you?

In the last lab, we calculated a normalized rating that required us to know what the minimum and maximum values for imdb.votes were. These values were found using the $group stage!

For all films that won at least 1 Oscar, calculate the standard deviation, highest, lowest, and average imdb.rating. Use the sample standard deviation expression.

HINT - All movies in the collection that won an Oscar begin with a string resembling one of the following in their awards field

Won 13 Oscars Won 1 Oscar

Select the correct answer from the choices below. Numbers are truncated to 4 decimal places.

Okay, so kudos. “Nominations for Oscars” were also present. That doesn’t make any sense to me, but it’s not my dataset.

It does not. So it is not part of the requirements. So you should not do it.

Zero is a numerical value.

About the lab specific directions, just before the 2 examples, there was hint that looks like All movies in the collection that won an Oscar begin with a string resembling one of the following in their awards field. This was follow with the 2 examples I gave. (I edited my post because I wrote 10 but it was 13, sorry if I did not remember the examples correctly).

You might not be proficient, choosing my words carefully so I do not edit them after, enough with regex, so I am suggesting that you look at https://docs.mongodb.com/manual/reference/operator/query/regex/. In there you will see that a caret at the beginning of a regex anchor the regex at the beginning of the string, the only really specific direction for this lab.

You might also be interested in \d, the equivalent of [0-9], followed by the + sign for matching the digit parts of the examples.

Here’s what I dislike about that exercise - it mentions the exercise previous and then doesn’t give any further idea if the previous exercise is meant to be built off of or not. You can’t tell if it’s a veiled hint, or simply reiterating what was previously done. Also, previous lectures mention screening off items that were null or bad values, and often these types of steps are simply givens in the process of a standard ETL.

The clarity is often just not there, and I don’t know the people creating the labs enough to recognize what their level of subtlety should be.

I’m the goofball that thought nominations should not be part of the awards field of the data set, sure. I’ll own that mistake. And maybe I didn’t see it in the vanilla sample set, so I imagined the class set would be equal.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.