Pipeline concepts

Hi everyone:

I appreciate your help validating my understanding of the Pipeline concepts. I understand that:

A pipeline is a set of stages that could be $match, $project, etc. and the stages process the documents based on aggregation operators (also called expressions) like $gte, $lte, and other that we studied in M001-Basics. But this expressions or aggregation operators could have an argument or a set of arguments like ["$numberOfMoons", 0]. I cannot understand well how the argument works in this pipeline context. I appreciate if you can explain me with some examples of every case.

Thanks for all.

Hey @William_49170

Yep this is correct.

Somewhat correct. There are aggregation operator that look like and function similar to query operators but are different in the fact of how many arguments they take and where they can be use.

With $match only taking query operators like find()

$match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. Instead, use a $expr query expression to include aggregation expression in $match .

Consider the following difference

// query operator
db.inventory.find( { qty: { $gt: 20 } } )
// aggregation expression
{ $gt: [ "$field", <expression that evaluates to a number> ] }

To clarify the `"$numberOfMoons" is a reference to the field in the documents being aggregated. This will check if the numberOfMoons is greater than 0

Hope this helps.

1 Like

Hi @William_49170,

Great explanation by @natac13 !!

I would also like to add that each stage will have operator expressions similar to functions that take arguments with the following format:

{ <operator> : [ <argument1>, <argument2> ... ] }

For example, if we have dataset as below:

{ “_id” : 1, “item” : “abc1”, description: “product 1”, qty: 300 }
{ “_id” : 2, “item” : “abc2”, description: “product 2”, qty: 200 }
{ “_id” : 3, “item” : “xyz1”, description: “product 3”, qty: 250 }
{ “_id” : 4, “item” : “VWZ1”, description: “product 4”, qty: 300 }
{ “_id” : 5, “item” : “VWZ2”, description: “product 5”, qty: 180 }

Now if we create an aggregation pipeline as below:
db.inventory.aggregate( [ { $project: { item: 1, qty: 1, cmpTo250: { $cmp: [ "$qty", 250 ] }, _id: 0 } } ] )
Here in the expression { $cmp: [ "$qty", 250 ] } we have specified arguments to compare the qty value with 250.

I hope it helps!!

Please let me know, if you have any questions.

Thanks,
Sonali

2 Likes

Thank you so much!

I undesrtood in synthesis that the $match stage filters the types of documents that we want to obtain based on find-typed queries and some expressions. Every document in a collection in a database that we want to filter, is checked against these queries or filters and if it matches, then pass to the next $project stage that transforms the document into another based on some expressions too.

And finally, I understood that we can use $expr to define the aggregation expressions to be part of the $match queries or filters.

Thanks for all again.

Thank you for your explanaitions Sonali.

I understood the following:
The $project stage is transforming the documents into others with the following fields: item, qty, cmpTo250; without the _id, because of the projection defined. But the cmpTo250 is 1 iff the qty field in the original document has 250 as a value, so the result must be a document like this:

{
“item” : “xyz1”,
“qty”: 250,
“cmpTo250”: 1
}

So, in synthesis, 1) we have the $match stage that uses aggregation operators like find-filters that are used to match every document to be filtered based on conditions or expressions defined using query operators. And 2) the $project stage that uses the find-filter projections to define new custom fields or to define what fileds of the original document we want to be included in the result, may be strictly defined (item:1) or may be expression-defined using operators like $cmp:["$fieldName", value].

I have to do more excersices to better understand.

Thank you so much!

Hi @William_49170,

You are right!!

I would like to clarify my previous example. So if we have documents as below :

Now the following query will return the documents with item, qty field and skip the _id as mentioned in the $project stage. Also, the output documents will also have cmpTo250 field with value evaluated from { $cmp: [ "$qty", 250 ] } expression:

The output of the above query will be:

{ “item” : “abc1”, “qty” : 300, “cmpTo250” : 1 }
{ “item” : “abc2”, “qty” : 200, “cmpTo250” : -1 }
{ “item” : “xyz1”, “qty” : 250, “cmpTo250” : 0 }
{ “item” : “VWZ1”, “qty” : 300, “cmpTo250” : 1 }
{ “item” : “VWZ2”, “qty” : 180, “cmpTo250” : -1 }

Please also refer to the $cmp documentation.

You can refer to the following documentation on pipeline stages and operators and practice more with examples given:

Please let me know, if you have any questions.

Thanks,
Sonali

1 Like

So, I understand.
Thank you so much again.
I really appreciate your help.