Query performance with null vs exists

which of the following operation is more efficient

{field:null}
{field:{$exists:false}}

these 2 may give different set of results if my document has explicit null values
but I know that my document will not have field with null value

If you

then

is completely useless.

1 Like

thanks Steeve, I agree on that
if there are documents that have null values , is {field:null} slower than {field:{$exists:false}} ?

Slower or not, their respective performance is irrelevant since they do not produce the same result sets. If you have an index on the field, then it is a different story. Finding documents with field:null result in an index lookup and only relevant documents are fetch. But field:{$exists:false}, most likely results in a (slower) collection scan.

However, I am usually in favour of null fields rather than non-existing fields. Updates might be more efficient when a field is null and it becomes a field with a value as it might be possible to write over the same record on disk as the resulting block might be the same size. It is more likely that a missing field needs more room when you assign a value to it. I underlined some of the words as I have to hard numbers to back my claim.

Hi @Ravi_Tatikonda1

Firstly, it is really much more important to focus on how an index is being used rather than on which index is being used when assessing any index suitability.

Secondly, there are some limitations in the index format that are relevant to this question. Specifically, it is typically not possible to get queries covered when they are searching for missing or null values and these will then require a FETCH stage.

That said, using a null value and a comparison (say {$eq: null}) allows the index entry to be checked whereas using {$exists: false} does not all for this. This means that null entries in the index can be found, whereas the document itself must be queried to determine if the field exists with explicit null or whether it even exists at all in the document.

So as @steevej pointed it and which my points essentially re-iterate, you should in the vast majority of cases tend to use the null field rather than non-existent fields.

I’d also say these kinds of deeper query performance questions are probably better moved to the “Working with Data” category as you otherwise could miss the input from other members in the community who can further share their experiences and suggestions.

Kindest regards and hope this helps,
Eoin