Chapter 3 $lookup Returns empty set

Ok… I’m having a real hard time with this lab…

I can get output… But my lookup is returning all empty arrays. This leads me to believe I’m all messed up LOL.

I am running my query from the air_alliances database.

My from is using the air_routes
local is airlines (that it the local field that contains the airline names)
foreign is the name field
I’m using an as of “Matched”

However, everything is coming back as empty… therefore I cannot match the plane type to use the count and get an answer… Recommendations?

The correct way to set this up is that I am using the FOREIGN field FROM the database I want to compare FROM (I.E the database that contains the foreignField operator) to the local field I want to compare to and output AS

This is right ???

Sometimes a simple typo messes up everything.

I think you could publish your pipeline. Make sure you blur it with the spoiler tool from the gear menu. It will do something like:

This is my spoiler.

Ok… I got the $lookup to work, but now I am stuck on how to count the air alliance names… This is my query so far:


use("aggregations");



var pipeline =[

{
    $match:
    {
       $or:
       
           [{"airplane": "747"}, {"airplane": "380"}]   
    }   
},

{
    $project:
    {
        "_id": 0,
        "airline.name": 1,
        "airplane": 1
    }
},

{
    $lookup:
    {
        from: "air_alliances",
        localField: "airline.name",
        foreignField: "airlines",
        as: "Matched"
    }
},


{
    $project:
    {
        "_id":0,
        "airline.name": 1,
        "Matched.name": 1,
        "airplane": 1
    }
}

];

db.air_routes.aggregate(pipeline);

Where I am having trouble is now I have an array called Matched and I need to filter the Matched.name field for the Air Alliances listed. I don’t know if I should use add fields and pull them out into an individual field or use the $map to map the Matched.name field to a separate field.

I have thought of $group, $project with an if statement but I question the if statement cause I need to access the same field three times. Basically I need to access the Matched.name field and pull out matches to the alliance into a separate field.

Thoughts?

First, I have two recommendations.

  1. Do not bother with $project while developing or debugging your aggregations. In particular, do not get rid of _id, unless it is a nuisance because it is always unique and helps find specific document when you are debugging. They can be added later. You might project out data that you need later. I do not think it is the case right now.

  2. Analyze your data. Compass is great for that. I won’t say more about than except that you should take a look at _id:56e9b39b732b6122f87810a9. Alternatively, with the shell, a $group stage on the field of interest to output all unique values is sometimes useful.

Finally, to count you need a group stage, like you were planning, with something a field like:

count : { "$sum" : 1 }

The hard part is the _id. In principle, your Matched array has only one element, so “Matched.0.name” (the field name of the first element, index 0, of the array Matched should provide you with the alliance’s name.

OK… I got it to give me an output, but the count I got was wrong… Don’t know how to troubleshoot this. What direction should I go?

use("aggregations");



var pipeline =[

{
    $match:
    {
       $or:
       
           [{"airplane": "747"}, {"airplane": "380"}]   
    }   
},


{
    $lookup:
    {
        from: "air_alliances",
        localField: "airline.name",
        foreignField: "airlines",
        as: "Matched"
    }
},


{
    $group:
    {
        "_id":"$Matched.name",
        "count": {$sum: 1}
        
        
    }
}

];

db.air_routes.aggregate(pipeline);

This is my pipeline and it gives me this output:

[
  {
    "_id": [
      "OneWorld"
    ],
    "count": 10
  },
  {
    "_id": [
      "Star Alliance"
    ],
    "count": 5
  },
  {
    "_id": [
      "SkyTeam"
    ],
    "count": 8
  },
  {
    "_id": [],
    "count": 21
  }
]

This would appear to be the right answer since not all airlines will be flying within an alliance. So… I picked the highest number based on my query but it says it is wrong.

( I could just pick them one at a time… 3 chances for three choices) but I want to figure this out. Any pointers? How would you go about troubleshooting this query?

The recommendation that says to look at

was so that you see that airplane has the value 380 744. Your first match will ignore that document. You need a regex to match.

You _id being $Matched.name rather than $Matched.0.name produce an array rather than a string but that should not stop you from getting the correct answer.

The result _id:[ ] is troublesome because its count is greater than all the others. This means that some airlines are not written the same in air_routes as in air_alliances so they are not counted in the appropriate group. I know KLM is one of them. It is KLM Royal Dutch Airlines in air_routes and simply KLM in air_alliances (SkyTeam).

While I really appreciate this… (it forces me to get into the documentation) I see two glairing problems with this course for beginners.

First, the mongo documentation is very fragmented and hard to follow for a beginner. The write ups on each stage while good for experienced people, is cryptic at best for beginners. I would wish that there were more examples but I understand also that proficiency comes with exposure… So I am reading documentation every day…

Second, I have had problems with other labs and the curriculum team has pointed out that the labs are intended to support the topic.

I am approaching the $lookup stage from the routes collection and importing the alliances document into the route. Is this the most efficient way to look at it? I ask this because I am now faced with a difference in airline name. Going on the response prior that the lab is only intended to enforce the $lookup process, making sure the fields to compare are the same is a daunting task for a beginner.
(the air_alliance document is matched by inserting it as an array…) I am also being advised to look at my collection. I would like to ask if I am expected to look at ALL entries in EACH collection that I am looking to work with? This seems to be outside the scope of both the course and the lab. (You could make an entire course out of data sanitization)

So… My current problem is this. I have matched routes that use “747” or “380” but I forgot to think about the ones that use both…

Also… the name in the routes collection for the airline is different than the name used for all entries in the air_alliance “airline” field. Therefore, I need to somehow identify the airlines that are different, and “sanitize” the entries for the affected airlines. This will entail me somehow modifying the array and only saving the airlines used.

So… It would be nice to at least be hinted in how to accomplish this lab. By the way… 3 chances with 3 answers makes it possible to pass without doing the coding… I have used this option because I have not seen or practiced any good data sanitization methods… It has just been said in lectures that one may have to clean up the data a bit… So… on to the next lab…

1 Like

I feel your pain.

So true. But “Its the not the Destination, It’s the journey.” Many chose the easy route, from air_routes, but you do the right thing. You go deep, you want to understand, you ask all the right questions, you’re in for the journey. You ask all the right question but I do not have all the answers.

1 Like

I looked at the lab’s solution and I now understand where I went wrong.

I do have a couple suggestions though… If you are going to use the $regex expression, you really should work it into your aggregation pipeline instruction. It is an art form to learn how to use, but once you understand it, it simplifies it a lot.

Also… Why unwind the $lookuop array so long as I know I will have empty sets? It seems to be a waste of resources.

Also… Group and Sort are blocker stages in the aggregation pipeline. Why what is the advantage of using both when I have already limited my output to four possible answers? Therefore the Sort is redundant in my opinion.

Group yes… Sort no… in my opinion

1 Like

In mine opinion too.

Lol Thanks for the encouragement. In my programming experience… You are always on the right track if you want to throw your computer out the window at times. LOL Thank you for your patience with me.

It is a pleasure. Especially since you seem to understand that even when I have the answer I do not give it. I try to stimulate the thinking process. Others did that to me and that’s how I learn so much.