Hint on chapter 4

may i have some hint on
Chapter 4: Core Aggregation - Multidimensional Grouping

Lab - $facets

???

Although we learned about sortByCount and bucket, what you asked is more like ordering and limiting…

so i did too, i made a facet and filter top 10 using order and limit strateg, and non of them return same results…

if you mean bucketing the rating and ten last bucket, not top ten item (for example, )? then how much? x in 8, + y in 8.1 + z in 8.2 … ?
then how to know the last existing backet? filtering etc…


also for my own knowledge… i wanted to filter the mixing item using $map adding new field to it say they are over lapping, or $filter filtering them out, i couldn’t come up with a working idea… any thing i should know? this lesson is harder than i think:

BTW… MY NOT WORKING CODE, WHICH ALSO DO NOT PRODUCE DESIRED RESULTS IN IT’S HALF WAY… (i don’t think is it matter if i share it as long as it is wrong):

db.movies.aggregate([{
			$facet: {
				imdbTop: [{
						$sort: {
							'imdb.rating': -1
						}
					}, {
						$limit: 10
					}, {
						$project: {
							rating: '$imdb.rating',
							title: '$title'
						}
					}
				],
				metacriticTop: [{
						$sort: {
							'metacritic': -1
						}
					}, {
						$limit: 10
					}, {
						$project: {
							rating: '$imdb.rating',
							title: '$title'
						}
					}
				]
			}
		}, {
			$project: {
				"topBoth": {
					$filter: {
						input: '$imdbTop',
						as: 'imdb',
						cond: [{
								'$$imdb.title': {
									$in: '$metacriticTop.title'
								}
							}, true, false]
					}
				}
			}
		}
	])

BTW facet and lot of other things are buggy on mongoDB Compass… the working scenario in shell, would generate ‘[’ ‘{’ error in in compass… also it eat too much memory after several command, and it wont clear out. and plus to that many of instances, you have to force end task it from task manager, other wise it won’t run.

1 Like

Yes. But there is not one title in common on top 10 of imdb.rating and metacritic rating sorted lists.

1 Like

I also can’t find any common set using the facet pipeline

1 Like

Try to filter first your data with $match before using $facet.

2 Likes

$match pipeline is already included in the $facet

1 Like

There is a difference! You first $match documents which contains valid data for imd.rating and metacritic before you start with $facet.

1 Like

and what is desired data? it said within 10 top rate from each, find shared between both, so i just have to take 10 from each and compare them, nothing else???

Hi,

What we ask in this lab is the number of movies that are in both top-10 by imdb.rating and top-10 by metacritic, i.e., the intersection of two arrays of 10 movies, so the result must be something between 0 and 10.

As @Johan_15013 suggested, the first step should be to filter documents which do not have a valid imdb.rating or metacritic, e.g. selecting documents where both fields are greater or equal than 0.

As a second step (for better performance) I would include a projection to exclude all the unneeded fields, and then a facet stage which should sort by metacritic and imdb.rating, with $limit: 10.

The final step should be the aforementioned intersection.

Hope this helps.

José Carlos

3 Likes

This is the one lab that really really threw me off!

I wasted more time trying to incorporate the facet-related stages introduced into the pipeline than actually trying to solve the problem

The joys of learning :smile:

Worth a proper discussion once the lab closes to get a feel for the most optimal solution. I’m sure most of us would have approached this in radically different ways

Thanks Jose

Hi,

You are welcome! If you follow the steps given in my previous post you’ll find the query is not that difficult… once you know how facets work. Four stages would be a good enough solution.

José Carlos

1 Like

In case anyone is in the same boat I was, know that you can use all other operators within the $facet sub-pipelines. I thought one was limited to only the ones taught in Ch4 (i.e. $sortByCount, $bucket, and $bucketAuto), but this is not the case. You can use most (if not all) other operators within $facet.

2 Likes

I have the same issue as most people in this post. I think, it is wording of the problem that threw us off. It states, “top 10 highest rated movies as per imdb ratings and Metacritic”. This is commonly understood as “top 10 rating in the whole collection”. But it seems that, this question can be answered only if we interpret the meaning to be, “top 10 rating among those documents that do contain a non-zero/non-null rating in both the ratings”.

2 Likes

first i though if i sort by greater data, it would still deduct empty data in limit, but know i know that using $match it will also reduce the result that may appear in top ten of the other field, for example if it have high rating it may still do not own metacritic, and they never overlap, so by reducing them we increase the chance for them to overlap…

Yet unfortunately :unamused: still doesn’t find any shared top ten i filter data using match both rating and metacritic for all conditions including: 1.being exist, and not be null, and not be empty string in a single match statement…
then i done as i previously mention using facet, and in the last stage, i intersect… nothing came out, i also output both array and manually check for overlapping, nothing were there…

------EDIT------

Zero $gte:0 seem to be working, wonder why with that much of filtering i still had issue

Initial $match stage is critical. $gte or $gt . $setIntersection

May be it’s a bit more difficult as there are 11 documents with a maximum metacritic value of 100. From a command line (locally, Windows7Pro and mongoDB 3.4.17 , mongoDB 3.6.1 at the server), the result has always been an empty array. So, I have checked it with Compass and separate queries for “Imdb.rating” and “metacritic”. Here, the sort sequence for metacritic docs seems to be slightly different, and the results may be different. But I don’t know how to use $facet with Compass. May be it is essential to code metacritic: {$gte: 0} first followed by “imdb.rating”: {$gte: 0} at the $match stage. I had always done it vice versa and, as said, the intersection array was empty: []. Best regards! Wilfried

The wording of the question is terrible and open to interpretation.

For anyone stuck here, the hints -

  1. The question says “the common movies in both top-10 imdb rating and top-10 metacritic rating”, it means the movie must have both ratings; rather than ranking the ratings separately.

  2. Quite a few movies have same top metacritic ratings. You have to sort on the title to get the answer wanted by the question, although this sort is not explicitly required by the question!

2 Likes

I wasted so much time on this just because I didn’t sort the metacritic titles

This was indeed pretty lame! Spent more than an hour looking at my (ridiculously simple) aggregation code, and nothing was coming out of the intersection.

I manually checked both top 10 registers and they indeed had no matches between each other. So I went further and manually looked for the top 20, and found the common movie title. I did figured out my pipeline wouldn’t produce the expected result without a second sorting field, so I used the _id itself (doesn’t matter asc or desc), to sort both metacritic and imdb and, voilá, there is the correct answer.

Then, I just stopped by here and found out I wasn’t the only one thinking the labs’ description is lacking a major crucial information for us to solve the problem (a simple “top 10 alphabetical ordered movies” would suffice). I even followed Jose Carlos’ answer from September 2nd (which was pretty much what I’d done myself), and still no lucky.

If there is another way to achieve the result without using a second sort field, I believe most of the people here would be glad to know (a 4 stage pipeline sample, like Jose Carlos pointed out).

If there isn’t, my suggestion to MongoDB Team is to redact the labs’ description and include some extra info, so future students won’t waste too much time sweating over it. Another option is to change the labs’ collection to include at least one match considering only the metacritic/imdb sorting. Or even another option, put a 0 (zero) or none option in the answer’s list.

1 Like

Hi,

You are right, a second sort field is needed. I’ve flagged this to pass your suggestion to the curriculum developers.

José Carlos

Hello,

This may be something too simple, but just to check the data, I ran

db.movies.aggregate([
{
    $match: {
        metacritic: {$ne: null},
        "imdb.rating": {$ne: ""}
    },
},
{
    $match: {
        metacritic: {
            $gte: 91,
        },
        "imdb.rating": {
            $gte: 8.7,
        }
    },
},
{
    $project: {
        metacritic: 1,
        "imdb.rating": 1,
        title: 1,
        year: 1
    }
}
]).pretty()

The applied limits should be the the floor value for being in the top 10 by imdb rating/metacritic scores.
I get 7 hits here.

It isn’t included among the choices.

Is this as expected?