Performance benefit of $and

I have recently modified a complex multi-field query to use $and instead of simply assigning the fields as direct properties of the query. The idea being to sequence things so it fails fast, instead of applying every condition, even if we know it should not be, if the previous condition is already false.

For example, if we have a collection of people with a schema looking as follows:

  • Name
  • Address
    • Street
    • City
    • CountryCode
  • Favourite Colour

Then a condition such as:

  • people.address.country: ‘AU’
  • people.address.city: ‘Melbourne’
  • people.name: ‘john’

My understanding, based on documentation, is that without the $and all the conditions will be evaluated, even if the previous conditions have already evaluated to false.

I am now being challenged to show that there is a performance improvement in doing this, so I have started writing a test case for this, but can anyone confirm my understanding is correct and whether there is any existing documentation or test cases showing the performance gain?

Hi @Andre-John_Mas,

I think that the difference between the $and and just comma separated values os just syntactic. Don’t believe there is any difference from execution planning.

When the document is inspected for filtering the entire document is read therefore failing fast is really negligible…

Anyway, to improve the described query you should build compound indexes for all equility fields this is what will definitely improve performance.

In general, you should index fields based on the following rule: Equity, Sort , Range (ESR).

Best
Pavel

I ended up writing the test scenario and did find indexed fields coupled with and $and as part of the query did speed things up. Sometimes we were talking a 20ms improvement, going from 22ms to 2ms. For a site with a lot of data and traffic this can add up. Also, since it is such as simple optimisation I’ll take it.

The test scenario I wrote:

Feel free to comment on it. I have made it public so it can be improved on.

Hi @Andre-John_Mas,

I am not familiar how mongoose create the indexes? Is it compound index?

Can you provide db.User.getIndexes()? Also I would like to see the explain plan from both attempts…

I suspect that they are not using the same indexes therefore you see performance difference.

Best regards,
Pavel

For Mongoose index docs:

I added the following line to my code:

console.log(await connection.models.User.collection.getIndexes());

and got the following output

{
  _id_: [ [ '_id', 1 ] ],
  'address.countryCode_1': [ [ 'address.countryCode', 1 ] ],
  'address.town_1': [ [ 'address.town', 1 ] ]
}

The test is straight forward:

  • populate the database with a large amount of data
  • run both queries multiple times and compare the results

The official documentation states:

$and uses short-circuit logic: the operation stops evaluation after encountering the first false expression.

You indicate that:

I suspect that they are not using the same indexes therefore you see performance difference.

Can you provide any official documentation, beyond the one I shared, that suggests there should be no performance difference as you stated?

Hi @Andre-John_Mas,

I didn’t mean that the short circuit does not exist but when occuring on the same object its ngelegible as inspecting two additional comparison in a document we have to read is very fast. To speed this up we need to use index compound on all fields to not read the document at all if not all conditions are met.

You have an index only seperately on countryCode or on town instead of creating

db.User.createIndex({'name': 1,
        'address.town': 1,
        'address.countryCode': 1})

This is what really should tune that query without relying on order of the and expression.

Read more here

Best
Pavel

Hi @Andre-John_Mas

I’d like to answer directly to this question:

It’s mentioned in the $and page:

MongoDB provides an implicit AND operation when specifying a comma separated list of expressions.

To see that this is the case:

> db.test.explain().find({a:1, b:1})
....
		"parsedQuery" : {
			"$and" : [
				{
					"a" : {
						"$eq" : 1
					}
				},
				{
					"b" : {
						"$eq" : 1
					}
				}
			]
		},
....

and using $and:

> db.test.explain().find({$and: [ {a:1}, {b:1} ]})
...
		"parsedQuery" : {
			"$and" : [
				{
					"a" : {
						"$eq" : 1
					}
				},
				{
					"b" : {
						"$eq" : 1
					}
				}
			]
		},
....

Both queries shows identical parsed query explain output, so they’re the same query. In fact, the comma separated query was internally translated to use $and. I’m not sure why your test shows a difference between the two cases, but they should not differ.

One way to check is currently in the test you posted, you use the comma-separated case first, then the $and case second. Try reversing their order to eliminate the possibility of cache warm up interfering with the timing.

Best regards,
Kevin

3 Likes

One reason that might explain this is the cache behavior. Let assume you start with a cold server with the working set out of RAM. If you perform the implicit and first, you spend a couple of extra CPU cycle to bring everything in RAM. By running the explicit $and after on the now hot server with the with the working set in RAM you will definitively obtain better numbers.

Not enough information was supplied to be know if this was the case but I think it is a possibility.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.