Watch keynotes and sessions from MongoDB.live, our virtual developer conference.

Full Text Search returning more results than expected

I have a full text search on this collection

db.cve_2014.createIndex( { product: “text” } )

When I do a search below I get more results then I should.

db.cve_2014.find( { $text: { $search: “nx-os” } } )

I have no idea what I am doing wrong.

So I changed the query to this below and it works perfectly but I have no idea why this is working

db.cve_2014.find( {$text: { $search: “/nx-os0/i” } } )

So this is weird I used a search like this

db.cve_2014.find( { $text: { $search: “junos” } })

And this returned “juno” items, I can’t figure out why it is returning partial results.

Welcome to the community @Arthur_Gressick!

To help understand your results, can you please provide some sample documents and your results that are expected (or not expected)?

The text search feature is designed to match strings using language-based heuristics. Based on your examples so far, I suspect you may be interested in matching string patterns using regular expressions rather than language-based text search.

You can get more insight into how a query is processed using explain(). For your questions about text search, I would start by looking at the parsedTextQuery section which shows how your original query was converted into search terms.

Text search creates a list of search terms from your original $search string and stems each term to an expected root form using language heuristics (in particular, the Snowball stemming library).

Non-word characters like whitespace and hyphens are tokenised as word delimiters, so nx-os will be an OR query on the stemmed terms nx OR os:

db.cve_2014.find( { $text: { $search: "nx-os" } } ).explain().queryPlanner.winningPlan.parsedTextQuery
{
	"terms" : [
		"nx",
		"os"
	],
	"negatedTerms" : [ ],
	"phrases" : [ ],
	"negatedPhrases" : [ ]
}

The stemming library has a general set of language rules rather than a complete language grammar or dictionary. Words ending with a single s in English (the default text search language) are typically plural, so the stemmed form of junos will be juno:

db.cve_2014.find( { $text: { $search: "junos" } } ).explain().queryPlanner.winningPlan.parsedTextQuery
{
	"terms" : [
		"juno"
	],
	"negatedTerms" : [ ],
	"phrases" : [ ],
	"negatedPhrases" : [ ]
}

There’s an online Snowball demo if you’d like a quick way to see the outcomes of stemming text using different language algorithms.

Regards,
Stennie

I think I am going to switch to regex until I can dig deeper.

db.cve_2014.count( { product: { $regex: ‘:junos:’} })