$lookup vs expressive $lookup

This post is about the lecture in chapter 2 entitled “Basic joins”.

When we add a new $lookup stage to a pipeline in Compass, it gives us the following parameter hints

/**
 * from - The target collection.
 * localField - The local join field.
 * foreignField - The target join field.
 * as - The name for the results.
 */

This is the form we learned in M121, the course on the aggregation pipeline.

But the parameters used in this lecture for the $lookup stage use the expressive form:

{
  from: 'comments',
  let: { 'id': '$_id' },
  pipeline: [
    { '$match':
      { '$expr': 
        { '$eq': [ '$movie_id', '$$id' ] }
      }
    }
  ],
  as: 'movie_comments'
}

Once I’d fixed all my typos, getting the square brackets and curly brackets and quotes in the right places, it was fairly clear from the results WHAT the intent of this pipeline is, in SQL Server it’d be something like

-- our pipeline is working on the movies collection
select * from movies m
-- $lookup stage to bring in the comments collection
-- Left join because we want all movies 
-- regardless of whether they have comments
left join comments c on c.movie_id = m._id
-- $match stage
where m.year >= 1980 and m.year < 1990

But it wasn’t at all clear WHY we were doing it this way, rather than using the (in my humble opinion) more intuitive from: , localField: , foreignField: , as: form. I had to replay the explanation of the let and pipeline parameters 4 or 5 times before I understood why they resulted in the results I was seeing, and even then it seemed like an overly complex way of achiving those results.

It was only once I’d watched the remainder of the video, from about 3 minutes onwards, that the penny dropped. We expanded this $lookup stage, adding a $count stage to the sub-pipeline on the comments collection, like this:

{
  from: 'comments',
  let: { 'id': '$_id' },
  pipeline: [
    { '$match':
      { '$expr': 
        { '$eq': [ '$movie_id', '$$id' ] }
      }
    },
    { $count: 'count' }
  ],
  as: 'movie_comments'
}

Ah, right, we’re not interested in the comments themselves, only in how many comments there are. And we could have done this using the from: , localField: , foreignField: , as: form of the $lookup stage, but then we’d have needed another stage in the main pipeline to count the number of comments for each movie, and between those two stages the pipeline would have become unnecessarily fat due to all the content of the documents in the comments collection (which we’re not interested in) being added to the documents in the main pipeline, only to be thrown away when we work out the number of comments for each movie (which is the thing we are interested in). And this is one of the things called out in chapter 6 of M121, which discusses performance and optimisation of pipelines. We could also use this sub-pipeline to do things like filter the comments to return only those posted within a given date range, again to ensure that the data in the main pipeline is no more than what we need to get the intended final result.

If only I’d known that at the start of the lecture, I think I’d have found it a lot easier to digest. But before I “report an issue” on that lecture requesting a lecture note or something to explain this up-front, can anyone confirm that my understanding is actually correct please?

Sorry if this sounds like a moan, this is the third MongoDB University course I’ve done, and on the whole I’ve found the quality of the training to be excellent, and far superior to some other online training which my employer recently paid a rather eye-watering amount of money for, so I hope the MongoDB University folks will take this feedback in the constructive manner in which it’s intended :+1:

EDIT: related post: Feedback on Chapter 2: User-Facing Backend Basic Joins video

Hey Simon_39939,

Thanks for the note. I can confirm that yes, your understanding is correct! :smile: The expressive $lookup feature gives more depth to joins in MongoDB, in a way that the old $lookup could not. In this example, the expressive lookup resulted in less data being pulled into memory, by simply returning a count. And as you suggested, this “sub-pipeline” allows us to do much more than that.

I agree that we do not contrast this expressive $lookup with the old form, but I think it’s out of scope for the course. For students coming from M001, we want them to start performing joins with expressive $lookup.

There is an argument to be made that students coming from M121 will find this new $lookup more confusing than the old form. However, using the expressive $lookup to solve these simple queries will make it easier to tackle larger and more complicated queries, especially where the old $lookup may not be powerful enough.

Thanks again for writing to us - these different perspectives give us a better sense of how our content is received by students with varying experience levels.

Good luck in the rest of the course!

Matt

Thanks Matt, it’s nice to have confirmation that I have actually understood correctly. Once I’ve got the idea of the sub-pipeline firmly embedded in my brain, I can see how it would make some of the more complex queries I write in SQL Server more efficient when trying to do the equivalent using Mongo.