Aggregate() arguments - breaking the command down into variables

Hi, I’ve arrived here at M121 from a background in relational databases (SQL Server), via the M001 course, so writing queries in JSON rather than T-SQL is still a very new way of thinking for me (a bit like when I first learned object oriented programming after years of procedural programming in Cobol).

When we call the aggregate() method, we need to give it an array of pipeline stages and an optional options object. The detailed answers for the labs show the call to aggregate() as a single command line:

db.myCollection.aggregate(<biiiiiiig chunk of JSON here>)

I was having real difficulty tracking down the errors in my code when writing it as a single command like this, so, bearing in mind that the Mongo shell is effectively a ECMAScript interpreter, I started pulling bits of that biiiiig chunk of JSON out into variables to make everything more readable. For example, when querying a books collection I might do this:

// I'm only interested in sci-fi and fantasy books
var matchStage = { 
  $match: {
    genre: { $in: [
      "Science Fiction", 
      "Fantasy"] 
    }
  }
};

// Each document has a chapters field, which is an array,
// and I want the number of chapters so that I can use it
// in the sort stage.
// The $cond defends against the chapters field being missing
// or something other than an array.
var addFieldsStage = {
  $addFields: {
    numberOfChapters: {
      $cond: {
        if: { $isArray: "$chapters" },
        then: { $size: "$chapters" },
        else: 0
      }
    }
  }
};

// Books with the most chapters should be returned first
var sortStage = { 
  $sort: { numberOfChapters: 1 } 
};

// No options needed yet, but it's here in case I need it in the future
var options = {};

// Now I've set up all the arguments as variables, I can call aggregate()
db.books.aggregate(
  [matchStage, addFieldsStage, sortStage], 
  options
).pretty();

There’s an obvious trade-off here. I’m doing more typing, but in exchange I’m writing code which is (for me anyway) more readable, and therefore easier to maintain and / or refer to in the future.

This approach has served me well so far whilst working on the labs, but are there any potential pitfalls that I ought to be aware of?

That’s what I do too. One good principle called divide and conquer

@Simon_39939

Exactly the correct approach IMHO. Notice that’s how Compass builds its aggregation queries as well – you’re in good company here!! :grin:

I like that a LOT! I didn’t realise it, but I was doing sort of the same thing - testing each piece of my $match stage with an individual find() test, before assembling into the $match stage. I like the idea of building separate discrete variables and using those - MUCH easier to follow.

Thanks guys, sounds like I can safely stick with this approach.

I hadn’t actually used the Compass aggregations tab before reading this, but I’ll definitely be using it for next week’s labs - the preview of the output from each stage and the ability to switch stages on and off looks like it’ll save me a lot of time, especially as I’m finding the labs for M121 a lot more challenging than those in M001.

I don’t think this feature of Compass was mentioned anywhere in the videos, so would it be possible to add a lecture note somewhere near the beginning of the course suggesting that students have a play with it please?

Thanks again :smiley:

@Simon_39939

I think that’s a good idea, and would suggest that you post it directly to the curriculum team using the “Report an issue” tab at the bottom of your screen.:wink: