Conditional $lookup

Is there a way to do a conditional $lookup?

I have records in a collection that have an optional (null) field. I’d rather not waste processing on the records for which the lookup field is null. Not to mention that, depending on indexing, doing a $lookup on a null field could still be expensive.

1 Like

With this $lookup syntax,

{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}

you can specify condition(s) on the join in the pipeline. In your case, that would be a null check condition (in addition to the join conditions).

Yes, the problem with the solution as you posted is this bug:
https://jira.mongodb.org/browse/SERVER-40362

So when you get to nulls on a lookup join, the query runs SLOWWWWW. Now, one thing I could do is an ifNull, and then change the null to be a value that I am certain won’t be in in the lookup collection… but this seems… jenky. So, was hoping for alternative solution.

You can try using a $match before the $lookup, to exclude the documents with the non-existing field:

{ $match: { someField: { $exists: true } } }

The alternative workaround to use the $ifNull to substitute a null for non-existing field is fine too. Substitute let: { someField : '$someField' } with let: { someField: { $ifNull: ["$someField", null ] } }

In applications it is not unusual to have unusual cases - data or logic. It just happens this case is a program shortcoming (and temporary only). It is not that bad to have a case by covering with an additional condition, if it helps running the program efficiently. Some documentation around this workaround helps for reference and apply the issue fix later on when available.

2 Likes

Hello. This conditional lookup does not work for me
MongoDB version: 5
GUI: robo3t-snap


updatedFieldsByContractor.updatedValues.cities there is not in the original document. it just appears here because of that lookup. I mean when i run this command: db.getCollection('contractors').find({ _id: ObjectId('htat document') }) to get the same document it had not updatedFieldsByContractor.updatedValues.cities field. what should i do to lookup conventionally?

Hello @Kasir_Barati, you cannot refer the contractors collection document fields directly in the $lookup's pipeline $match stage. Please refer the above linked MongoDB documentation for the $lookup and find correct syntax and usage.

1 Like

@Kasir_Barati, was @Prasad_Saya’s answer help you solve your issue? If it has, please mark the post as the solution.

This is not my question. Therefore I cannot select it as the answer

1 Like

Here I am, a few years later. The original answer is not what I’m looking for. I don’t want to filter out results. I want the entire result set, but only do lookups on certain records.

For example, collection fruit and child_fruit. If I do db.child_fruit.aggregate(pipeline), I only want child_fruit records to perform $lookup against fruit collection if child_fruit HAS a value for a field “fruit_name”. For example:

##fruit collection
db.fruit.insert({name: ‘Orange’});

##child_fruit collection
db.child_fruit.insert({name: ‘Something’, fruit_name: ‘Orange’});
db.child_fruit.insert({name: ‘Another’});

db.child_fruit.aggregate([
{$lookup: {
from: ‘fruit’,

}}
]);

The problem is, given that I still want the “Another” record back in my final result set, I don’t believe there is a way to “ignore” the $lookup for child_fruit records without “fruit_name” field…

In this small example it’s irrelevant, but when I’m querying millions of records, many of which do not have a fruit_name field, it makes a huge difference. I want to just “bypass” the lookup operation in this case for these records. Maybe @Asya_Kamsky has an idea, she is a genius.

One idea that comes to mind is that you could use $facet.

One will start with

$match : { "fruit_name" : { "$exists" : 1 } }

and do the $lookup while the other will start with

$match : { "fruit_name" : { "$exists" : 0 } }

The issue I see is that each facet is limited to 16Gb which may or may not be an issue.

The other thing you could try is do 2 aggregations in a transaction. But I am not sure it is possible to do aggregations inside transactions.

Why 2 normal aggregations do not work?

Have you looked at the new $lookup variants with which you can have conditions and sub-pipeline?

1 Like

Because the example is part of a much larger aggregation doing other pipeline operations. So running it as two separate aggregations is much less optimal. And yes, $facet hitting that limit would be an issue (it’s 16mb, not Gb… Gb would be great lol).

But I appreciate the thought!

It would be awesome if there was a way to only conditionally $lookup. Oh well!

1 Like

$facet is definitely not the way to go. However, if the field you are doing $lookup on is an empty array then it won’t match anything in the foreign collection (because array is used sort of as {$in:[]} which matches nothing.

So the solution to your problem may just be to fill in something like: {$set:{fruit_name:{$ifNull:["$fruit_name", [] ]}}} before the $lookup and the empty [] as localField won’t match anything, unlike null or missing field.

Asya

3 Likes

And since $unionWith was added in 4.4 you can also do two aggregations within the same aggregation pipeline, btw.

Asya

3 Likes

Awesome, thank you, I’ll give that a go. Isn’t this a more common occurrence? I would think that doing a $lookup against a related collection defined by an optional field happens quite a bit, so it seems to be there should be a more straightforward way to do this? Or am I just bonkers?

Thanks again.

2 Likes

i am stuck with same case where i am trying to do a lookup with same collection as ben but i have the fruit_name fields already as an empty array [].
but it doesnt returns the another document

my aggregate query
db.child_fruits.aggregate([
{ $match: {
_id: ObjectId(‘624c577bc98e24f398b012ad’)
}
},
{ $lookup: {
from: ‘fruit’,
localField: ‘fruit_name’,
foreignField: ‘name’,
as: ‘fruitDetails’
}
},
{ $unwind: { path: ‘$fruitDetails’ } }
])

Thank you. This helped

Won’t this still run a lookup on the record with empty field even though it won’t match anything?

For anyone coming here, $unionWith is the way to go.
To use the original question’s context, in your aggregation, initially match the records where fruit_name is null, then union that with the records where fruit_name is defined, and perform the lookup operation within the unionWith pipeline. Something like this:

db.child_fruit.aggregate([
  { $match: { fruit_name: null }},
  { $unionWith: {
    coll: "child_fruit",
    pipeline: [
      { $match: { fruit_name: { $ne: null }}},
      { $lookup: {
        from: "fruit",
        ...
      }}
    ]
  },
  ...
]);

It is definitely an ‘alternative’, but has many drawbacks.

For example, { $match: { fruit_name: { $ne: null }}}, is going to be nonperformant even with an index, especially when “fruit_name” has many different values.

I would say, GENERALLY speaking, doing

db.child_fruit.aggregate([
  {$lookup: {
    from: ‘fruit’,
    …
  }}
]);

is going to be more performant than your unionWith example. Obviously depends on a bunch of factors (collection sizes, index values, etc.). The originally proposed solution by Asya is still the best (using some alternative operator to “fake” a lookup that will cancel out the possibility of a null lookup).

It would be awesome if the $lookup operator just supported some sort of skip field based on a conditional

$lookup: {
  ...,
  skip: <condition>
}
2 Likes