Autocomplete search - Match multiple words in a search term (as AND)

Laurens · February 23, 2021, 11:25am

Hi all,

I’m trying to build a search that will allow the user to type multiple words in order to refine the search results. What I notice is that the default behaviour for Atlas Search is to search for each word in a search term separately.
I think this is strange behaviour, because this means the more words you type, the more results you get. I think almost all searches I know work differently: typing more words results in less, but more specific matches.

Here are some examples to explain what I want and what I’ve tried:

Create a collection with the following documents:

db.getCollection('test-search').insertMany([
	{ "name": "Chris Evans" },
	{ "name": "Chris Hemsworth" },
	{ "name": "Chris Pine" },
	{ "name": "Robert Pine" },
	{ "name": "Chris Pratt" }
])

Create a search index ‘default’ on this collection:

{
	"mappings": {
		"dynamic": false,
		"fields": {
			"name": {
				"type": "autocomplete"
			}
		}
	}
}

Now, lets run some queries with different search terms.
This is the base query I’ll use each time:

const searchTerm = ''; // Variable
db.getCollection('test-search').aggregate([
	{
		$search: {
			autocomplete: {
				query: searchTerm,
				path: 'name'
			}
		}
	}
]).toArray();

These are the results with different search-terms:

const searchTerm = 'Robert'; // => "Robert Pine" | As expected
const searchTerm = 'Pine'; // => "Chris Pine", "Robert Pine" | As expected
const searchTerm = 'Pine Chris'; => "Chris Pine", "Robert Pine", "Chris Evans", "Chris Hemsworth", "Chris Pratt" | NOT what I want, I would expect only "Chris Pine"

From the Atlas Search documentation:

“If there are multiple terms in a string, Atlas Search also looks for a match for each term in the string separately.”

So I guess this means match each string as an $or-condition.
So let’s modify the query to match each word in the search string as an $and-condition:

const searchTerm = ''; // Variable
db.getCollection('test-search').aggregate([
	{
		$search: {
			compound: {
				must: searchTerm.split(' ').map((word) => ({
					autocomplete: {
						query: word,
						path: 'name'
					}
				}))
			}
		}
	}
]).toArray();

These are the results with different search-terms:

const searchTerm = 'Robert'; // => "Robert Pine" | As expected
const searchTerm = 'Pine'; // => "Chris Pine", "Robert Pine" | As expected
const searchTerm = 'Pine Chris'; => "Chris Pine" | As expected - GREAT!

However, this does not seem to work if a word in the search term is only one letter:
const searchTerm = 'Pine C'; => No results

Why is this? I cannot find anything in the documentation that a $search does not work with a single-character query.

Another approach can be to put one-letter words in a should-clause:

db.getCollection('test-search').aggregate([
	{
		$search: {
			compound: {
				must: [{
					autocomplete: {
						query: 'Pine',
						path: 'name'
					}
				}],
				should: [{
					autocomplete: {
						query: 'R',
						path: 'name'
					}
				}]
			}
		}
	}
]).toArray();

=> "Chris Pine", "Robert Pine"
It will also give Chris, that’s fine, but it should at least put “Robert” first… So this is not very usefull.

I also tried a solution that puts the one-letter search-terms into a separate regex-query in the must-clause, but I fear for performance and the sorting of the results is also very strange sometimes.

So to summarize:

What’s the best approach to build a search that matches on multiple words together? (“Chris”, “Chris Pine”, “Pine Chris”, “Chris P”, “Chris Pi”, etc.)
Is it expected that a $search with a one-letter query never returns any results?

Pavel_Duchovny · February 25, 2021, 6:26am

Hi @Laurens,

I think atlas search by default searches as OR expression therefore to ensure that terms are included you should use the AND and Must or Should expressions.

On the other hand I don’t think single latters are being tokenized in a default analyzer. I believe you should build a custom analyzer for that

I think you should specifically look at ngrams which define the sizes of chunks.

Please note that the more granular your tokenizing will be the bigger the index might grow and this will have a performance and resources impact on your cluster.

Thanks
Pavel

Laurens · February 25, 2021, 10:07am

Hi @Pavel_Duchovny, thank you for your quick reply.
I managed to get it work by setting the tokenization to edgeGram with minGrams 1. Thank you very much for this solution! It also helped me to understand the logic behind the search better.

For reference, this is the final solution that works:

The index:

{
	"mappings": {
		"dynamic": false,
		"fields": {
			"name": {
				"type": "autocomplete",
				"tokenization": "edgeGram",
				"minGrams": 1
			}
		}
	}
}

The query:

const searchTerm = ''; // Variable
db.getCollection('test-search').aggregate([
	{
		$search: {
			compound: {
				must: searchTerm.split(' ').map((word) => ({
					autocomplete: {
						query: word,
						path: 'name'
					}
				}))
			}
		}
	}
]).toArray();

Because of changing the minGrams to 1, this now also works with one-letter words. I’ll keep an eye out for the impact on performance.

system · March 2, 2021, 10:08am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.