Collation index best practice

We are using Mongo 3.6.12 (wired tiger) in a production (and dev) instance on mLab. The open source server that we use, Parse Server, recently updated to 4.x and the breaking change that was made was the addition of a new collation index value on a collection key value that already had an index. Is this good practice?

As an example, the _User table has an email key and the current index was:

    {
      "v": 2,
      "unique": true,
      "key": {
        "email": 1
      },
      "name": "email_1",
      "ns": "production._User",
      "background": true,
      "sparse": true
    }

When the new version of the server started up, it added the following index:

    {
      "v": 2,
      "key": {
        "email": 1
      },
      "name": "case_insensitive_email",
      "ns": "production._User",
      "background": true,
      "sparse": true
      "collation": { 
        "locale": "en_US",
        "strength": 2
      }
    }

We found that this resulted in extremely poor performance when $regex queries were made on email as the collation index was often hit (and per docs, that is highly inefficient).

We also found that it was extremely difficult to actually create a new index for a key that already had an index… but by making it a collation index we could do it.

It seems like this is NOT recommended. Could anyone shed further light on this situation?

Thanks

Hi @Rob,

Looks like this was a decision made by Parse Server (pull #5634) to add the case insensitive index to allow faster case insensitive signup check. i.e. username Rob vs username rob.

You could probably remove the previous index email_1 instead of having two indexes. If you have more questions about the reasoning behind the new case_insensitive_email index itself, I would suggest to reach to the Parse community. See parse-community: SUPPORT

You didn’t provide much about the $regex query, but please note that case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilise case-insensitive indexes. If you were using $regex query for case-insensitive search, it’s likely more performant to use normal query search with the case-insensitive index. See $regex query index use for more information.

This is actually documented on db.collection.createIndex(): Collation Option:

Unlike other index options, you can create multiple indexes on the same key(s) with different collations. To create indexes with the same key pattern but different collations, you must supply unique index names.

Regards,
Wan.

1 Like

Evening Wan,

Yep, we tracked that down and it definitely makes sense. Thanks for taking the time to look into it yourself, we didn’t expect that. :+1:

We hoped that could be a work around, but when Parse starts up it actually recreates that index.

Because the above idea of deleting the index didn’t pan out and $regex doesn’t work well with collation indexes (an understatement), we wanted to check with the Mongo community before going back to the Parse community and suggesting alternatives.

This brought us to the use of multiple indexes on the same key and wondering if that was good practice or not? The team at mLab support indicated that was not a good idea and they were curious how we ended up with 2 indexes on the same key.

Thanks for that call out, we missed it in our review. That is good to know and I will pass that on to the team at mLab as well.

So… that does create a follow on question. How does mongo choose the index it uses if it isn’t specified in the query?

If we had both indexes and used a $regex query via Compass (avoiding Parse altogether), it seemed like it would always hit the collation index. Is there a priority or order to the indexes?

Thanks again for the insight.

Hi @Rob,

The query optimiser processes queries and chooses the most efficient query plan for a query, given available indexes. The query system then uses this query plan each time the query runs. See Query Plans for more information.

So, it depends on the query itself. It’s worth to checkout the Explain Results for the query to learn more.

Regards,
Wan.

1 Like

Hi @wan,

If the query optimizer is supposed to be choosing the most efficient query plan for our $regex queries, it seems to be broken for the case where both a case-sensitive index and a case-insensitive index exists. As an example, when we run a particular $regex query and only the case-sensitive index exists, the query takes about 2 seconds. After we add a case-insensitive index, the $regex query tries to use that index and takes 8 seconds.

How can we ensure that the $regex query will use the case-sensitive index, and NOT attempt to use the case-insensitive index? Using neither index would be more efficient than attempting to use the case-insensitive index.

Thanks for your help

Hi @Robby_Helms,

Could you provide the output of the cursor.explain(“allPlansExecution”) when both indexes exist ? This hopefully should give more insights to the issue.

You can try to use cursor.hint() to override MongoDB’s default index selection.

Regards,
Wan.

1 Like

The data I’m testing with is a _User collection with about 4.4 million users. One of the users has an email of robby@pocketprep.com. I have two indexes on email: a case-insensitive one called case_insensitive_email and a case-sensitive one called email_1.

Explain Output Without Hint

  • indexName: case_insensitive_email
  • executionTimeMillis: 7718
db.getCollection('_User').find({ email: { $regex: /^Robby@pocketprep\.com$/i } }).explain("allPlansExecution")
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "pocketprep._User",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"email" : {
				"$regex" : "^Robby@pocketprep\\.com$",
				"$options" : "i"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"email" : {
					"$regex" : "^Robby@pocketprep\\.com$",
					"$options" : "i"
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"email" : 1
				},
				"indexName" : "case_insensitive_email",
				"collation" : {
					"locale" : "en_US",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 2,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"email" : [ ]
				},
				"isUnique" : false,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"email" : [
						"[\"\", {})",
						"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"inputStage" : {
					"stage" : "IXSCAN",
					"filter" : {
						"email" : {
							"$regex" : "^Robby@pocketprep\\.com$",
							"$options" : "i"
						}
					},
					"keyPattern" : {
						"email" : 1
					},
					"indexName" : "email_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"email" : [ ]
					},
					"isUnique" : true,
					"isSparse" : true,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"email" : [
							"[\"\", {})",
							"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
						]
					}
				}
			}
		]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 7718,
		"totalKeysExamined" : 1902678,
		"totalDocsExamined" : 1902678,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"email" : {
					"$regex" : "^Robby@pocketprep\\.com$",
					"$options" : "i"
				}
			},
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 6030,
			"works" : 1902679,
			"advanced" : 1,
			"needTime" : 1902677,
			"needYield" : 0,
			"saveState" : 25181,
			"restoreState" : 25181,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1902678,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1902678,
				"executionTimeMillisEstimate" : 1170,
				"works" : 1902679,
				"advanced" : 1902678,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 25181,
				"restoreState" : 25181,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"email" : 1
				},
				"indexName" : "case_insensitive_email",
				"collation" : {
					"locale" : "en_US",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 2,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"email" : [ ]
				},
				"isUnique" : false,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"email" : [
						"[\"\", {})",
						"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
					]
				},
				"keysExamined" : 1902678,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		},
		"allPlansExecution" : [
			{
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 1340,
				"totalKeysExamined" : 1320493,
				"totalDocsExamined" : 0,
				"executionStages" : {
					"stage" : "FETCH",
					"nReturned" : 0,
					"executionTimeMillisEstimate" : 1340,
					"works" : 1320493,
					"advanced" : 0,
					"needTime" : 1320493,
					"needYield" : 0,
					"saveState" : 20632,
					"restoreState" : 20632,
					"isEOF" : 0,
					"invalidates" : 0,
					"docsExamined" : 0,
					"alreadyHasObj" : 0,
					"inputStage" : {
						"stage" : "IXSCAN",
						"filter" : {
							"email" : {
								"$regex" : "^Robby@pocketprep\\.com$",
								"$options" : "i"
							}
						},
						"nReturned" : 0,
						"executionTimeMillisEstimate" : 1290,
						"works" : 1320493,
						"advanced" : 0,
						"needTime" : 1320493,
						"needYield" : 0,
						"saveState" : 20632,
						"restoreState" : 20632,
						"isEOF" : 0,
						"invalidates" : 0,
						"keyPattern" : {
							"email" : 1
						},
						"indexName" : "email_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"email" : [ ]
						},
						"isUnique" : true,
						"isSparse" : true,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"email" : [
								"[\"\", {})",
								"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
							]
						},
						"keysExamined" : 1320493,
						"seeks" : 1,
						"dupsTested" : 0,
						"dupsDropped" : 0,
						"seenInvalidated" : 0
					}
				}
			},
			{
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 4130,
				"totalKeysExamined" : 1320493,
				"totalDocsExamined" : 1320493,
				"executionStages" : {
					"stage" : "FETCH",
					"filter" : {
						"email" : {
							"$regex" : "^Robby@pocketprep\\.com$",
							"$options" : "i"
						}
					},
					"nReturned" : 0,
					"executionTimeMillisEstimate" : 4130,
					"works" : 1320493,
					"advanced" : 0,
					"needTime" : 1320493,
					"needYield" : 0,
					"saveState" : 20632,
					"restoreState" : 20632,
					"isEOF" : 0,
					"invalidates" : 0,
					"docsExamined" : 1320493,
					"alreadyHasObj" : 0,
					"inputStage" : {
						"stage" : "IXSCAN",
						"nReturned" : 1320493,
						"executionTimeMillisEstimate" : 790,
						"works" : 1320493,
						"advanced" : 1320493,
						"needTime" : 0,
						"needYield" : 0,
						"saveState" : 20632,
						"restoreState" : 20632,
						"isEOF" : 0,
						"invalidates" : 0,
						"keyPattern" : {
							"email" : 1
						},
						"indexName" : "case_insensitive_email",
						"collation" : {
							"locale" : "en_US",
							"caseLevel" : false,
							"caseFirst" : "off",
							"strength" : 2,
							"numericOrdering" : false,
							"alternate" : "non-ignorable",
							"maxVariable" : "punct",
							"normalization" : false,
							"backwards" : false,
							"version" : "57.1"
						},
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"email" : [ ]
						},
						"isUnique" : false,
						"isSparse" : true,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"email" : [
								"[\"\", {})",
								"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
							]
						},
						"keysExamined" : 1320493,
						"seeks" : 1,
						"dupsTested" : 0,
						"dupsDropped" : 0,
						"seenInvalidated" : 0
					}
				}
			}
		]
	},
	"serverInfo" : {
		"host" : "h083332.mongolab.com",
		"port" : 31949,
		"version" : "3.6.12",
		"gitVersion" : "c2b9acad0248ca06b14ef1640734b5d0595b55f1"
	},
	"ok" : 1,
	"operationTime" : Timestamp(1585918338, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1585918338, 1),
		"signature" : {
			"hash" : BinData(0,"Wrc5pfaJ6T92PVv8AR+NwoZQTWo="),
			"keyId" : NumberLong("6768738634519543813")
		}
	}
}

Explain Output With Hint

  • indexName: email_1
  • executionTimeMillis: 1405
db.getCollection('_User').find({ email: { $regex: /^Robby@pocketprep\.com$/i } }).hint("email_1").explain("allPlansExecution")
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "pocketprep._User",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"email" : {
				"$regex" : "^Robby@pocketprep\\.com$",
				"$options" : "i"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"filter" : {
					"email" : {
						"$regex" : "^Robby@pocketprep\\.com$",
						"$options" : "i"
					}
				},
				"keyPattern" : {
					"email" : 1
				},
				"indexName" : "email_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"email" : [ ]
				},
				"isUnique" : true,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"email" : [
						"[\"\", {})",
						"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1,
		"executionTimeMillis" : 1405,
		"totalKeysExamined" : 1902678,
		"totalDocsExamined" : 1,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1,
			"executionTimeMillisEstimate" : 1293,
			"works" : 1902679,
			"advanced" : 1,
			"needTime" : 1902677,
			"needYield" : 0,
			"saveState" : 14864,
			"restoreState" : 14864,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"filter" : {
					"email" : {
						"$regex" : "^Robby@pocketprep\\.com$",
						"$options" : "i"
					}
				},
				"nReturned" : 1,
				"executionTimeMillisEstimate" : 1233,
				"works" : 1902679,
				"advanced" : 1,
				"needTime" : 1902677,
				"needYield" : 0,
				"saveState" : 14864,
				"restoreState" : 14864,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"email" : 1
				},
				"indexName" : "email_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"email" : [ ]
				},
				"isUnique" : true,
				"isSparse" : true,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"email" : [
						"[\"\", {})",
						"[/^Robby@pocketprep\\.com$/i, /^Robby@pocketprep\\.com$/i]"
					]
				},
				"keysExamined" : 1902678,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		},
		"allPlansExecution" : [ ]
	},
	"serverInfo" : {
		"host" : "h083332.mongolab.com",
		"port" : 31949,
		"version" : "3.6.12",
		"gitVersion" : "c2b9acad0248ca06b14ef1640734b5d0595b55f1"
	},
	"ok" : 1,
	"operationTime" : Timestamp(1585918753, 305),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1585918753, 305),
		"signature" : {
			"hash" : BinData(0,"y1dKXHvMsn/J6ApjvkRg8UHXe1E="),
			"keyId" : NumberLong("6768738634519543813")
		}
	}
}

So, yes, we can improve the request time by providing the hint, but I’m very confused as to why the query optimizer would be selecting the index that results in a request that is 5.5x slower in this case.

Appreciate your help and input! Thanks.

Hi @Robby_Helms, Thanks for providing the explain output.

Based on the allPlansExecution, both execution plans have really high number of works (1.3M+). This usually implies that there is no index that can efficiently support the query shape using the given predicate values. This is inline with the documentation $regex: Index Use.

You could try to flush the plan cache to see whether it’s picking up a different plan i.e. db._User.getPlanCache().clear(). It is possible for a query plan to be chosen that is faster for initial results (or the same query shape with different values), but suboptimal for later results. There are few improvements added in the current version 4.2 to debug this further. (i.e. $planCacheStats operator).

Having said the above, the regex pattern is anchored on both sides with insensitive option set to true. In this case it is better to remove the first index (email_1) and do a query with a collation to utilise the case insensitive index instead of $regex i.e.

db._User.find({"email":"Robby@pocketprep.com"}).collation({"locale":"en_US", "strength":2})

Regards,
Wan.

1 Like

Hi @wan,

Thanks for the response.

Based on the allPlansExecution , both execution plans have really high number of works (1.3M+). This usually implies that there is no index that can efficiently support the query shape using the given predicate values. This is inline with the documentation $regex: Index Use.

While it may be true none of the indexes are “efficient” we have certainly observed that one of them is consistently much more efficient than the other. And it seems like the information shown under the allPlansExecution property demonstrates that the execution plan should have known which plan to choose. The plan that used the email_1 index had an executionTimeMillisEstimate of 1340 while the plan that used the case_insensitive_email index had an executionTimeMillisEstimate of 4130. Those estimates line up pretty closely with what we have observed, and so it’s confusing that the query seemingly is saying “this plan looks like it will take longer…and that’s the one we’ll use”.

We do have workarounds in place by either refactoring our queries or by utilizing the hint() method, but it still feels like this is a bug in the way that query plans are resolved for regex queries.

Thanks again!