graphLookup General considerations

I didn’t understand the part where he says “Also, unrelated match stages do not get pushed before graph lookup in the pipeline. Therefore, they will not be optimized if they are not related with the dollar graph lookup operator.”
Can someone explain it, please?

@jcotero

There are six $graphLookup lectures. Exactly which one and where in the lecture are you confused? If you can give us more information, we’ll try to help. Good luck.,

@jcotero @DHz So I believe the video that was mentioned is https://youtu.be/522TDFDfKKU?t=88 . And I think this part of the docs can help explain that the query optimizer can re-arrange aggregation stages for a more efficient pipeline.

And when it comes to $graphLookup the optimizer will only do that for related $match stages.

1 Like

Yes, that’s the video.
I understood from the doc link you passed that some $match are pushed before some operations to optimize the pipeline.

But I’m still struggling to understand what would be a “$match unrelated to a $graphLookup”. And why the observation in the video is important (i.e, what could be a possible expectation of optimization that would fail in that case?). It think an example would be interesting.

1 Like

Well after some testing I was not able to figure out what is meant by an unrelated $match stage. I have tried the following commands:

db.air_airlines.explain('executionStats').aggregate([
  {
    $match: {
      _id: ObjectId('56e9b497732b6122f87902ab'),
    },
  },
  {
    $graphLookup: {
      from: 'air_routes',
      startWith: '$base',
      connectFromField: 'dst_airport',
      connectToField: 'src_airport',
      as: 'chain',
      maxDepth: 1,
    },
  },
]);

db.air_airlines.explain('executionStats').aggregate([
  {
    $graphLookup: {
      from: 'air_routes',
      startWith: '$base',
      connectFromField: 'dst_airport',
      connectToField: 'src_airport',
      as: 'chain',
      maxDepth: 1,
    },
  },
  {
    $match: {
      iata: 'ABK',
    },
  },
]);

db.air_airlines.explain('executionStats').aggregate([
  {
    $graphLookup: {
      from: 'air_routes',
      startWith: '$base',
      connectFromField: 'dst_airport',
      connectToField: 'src_airport',
      as: 'chain',
      maxDepth: 1,
    },
  },
  {
    $addFields: {
      'test': 1
    }
  },
  {
    $match: {
      name: 'Alberta Citylink',
    },
  },
]);

As well as some others that just involved me moving around the $match stages. And all of these will show that the query optimizer has moved/grouped the $match stages to first filtered the documents.

I have also looked in the docs here; and here; and have found nothing to add further explanation to the statement made in the video. My apologies that I was not successful in answering your question about what unrealted $match stages are. :disappointed: @jcotero

I am also now interested to know further details of what an unrelated $match stage is and how to avoid them when constructing an aggregation pipeline. :smile:

1 Like