Mongodb wildcard index not being used for $ne query

I just found out that wildcard index on mongodb 4.2, doing a COLLSCAN for $ne query. So I was wondering, whether I did something wrong, or it was currently not being supported. And here I was looking a solution to use indexing for my ever growing (unstructured) extra field while using $ne operation. Because my extra field will store many kind of key-value string data.

This is my query,

db.coll.explain(true).find({"extra.tag": {$ne: "dummy-tag"}})

And here’s is the explain() result,

"executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : 10133998.0, 
        "executionTimeMillis" : 5018.0, 
        "totalKeysExamined" : 0.0, 
        "totalDocsExamined" : 10184077.0, 
        "executionStages" : {
            "stage" : "COLLSCAN", 
            "filter" : {
                "extra.tag" : {
                    "$not" : {
                        "$eq" : "dummy-tag"
                    }
                }
            }, 
            "nReturned" : 10133998.0, 
            "executionTimeMillisEstimate" : 394.0, 
            "works" : 10184079.0, 
            "advanced" : 10133998.0, 
            "needTime" : 50080.0, 
            "needYield" : 0.0, 
            "saveState" : 10184.0, 
            "restoreState" : 10184.0, 
            "isEOF" : 1.0, 
            "direction" : "forward", 
            "docsExamined" : 10184077.0
        }, 
        "allPlansExecution" : [

        ]
}, 

This is how I create my wildcard index

db.coll.createIndex({"extra.$**": 1})

Indexes on my collection

"indexSizes" : {
        "_id_" : 102354944.0, 
        ...
        ...
        ...
        "extra.$**_1" : 110243840.0
 }, 

Sample of document, due to the nature of our data, I omit some of the values

{ 
    "_id" : ObjectId("5c582f5577612608f3e6a333"), 
    "email" : "", 
    "createdAt" : ISODate(), 
    "name" : "" , 
    "firstname" : "", 
    "lastname" : "", 
    "birthDate" : ISODate(),
    "gender" : "", 
    "phone" : "", 
    "city" : "", 
    "country" : "",
    "company" : "", 
    "labels" : [
        "dummy-label"
    ], 
    "index" : 0.0,  
    "state" : "ACTIVE", 
    "extra" : {
        "tag" : "dummy-tag", 
        "note" : "dummy note"
    }
}

Please let me know If I’m not clear enough with my question. Thank you

Hi @nanangarsyad,

The $ne is not considered a selective operator and generally cannot utilize an index properly regardless of wildcard indexing capabilities.

What is the reason you use an exclusive search for tags? What are you trying to achieve?

It looks like the amount of returned data is a very big portion of total documents in this collection so COLLSCAN makes more sense anyhow.

totalDocsExamined" : 10184077.0
nReturned" : 10133998.0

Best
Pavel

Hi @Pavel_Duchovny,

Thanks for replaying.
Well, as for my reason using exclusive search for extra.tag. I was trying to find all extra.tag that isn’t dummy-tag, and then count the total them.

In the example above with the result,

totalDocsExamined" : 10184077.0
nReturned" : 10133998.0

I was using data with high variety of value for field extra.tag. To simplify my problem, I also tried to decrease the variety of my data into a data with this kind of distribution.

{ 
    "_id" : "not-dummy-tag", 
    "count" : 1.0
}
{ 
    "_id" : "dummy-tag", 
    "count" : 10184076.0
}

I don’t know why, but it also gave me the same COLLSCAN query plan like this.

"executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : 1.0, 
        "executionTimeMillis" : 4964.0, 
        "totalKeysExamined" : 0.0, 
        "totalDocsExamined" : 10184077.0, 
        "executionStages" : {
            "stage" : "COLLSCAN", 
            "filter" : {
                "extra.tag" : {
                    "$not" : {
                        "$eq" : "dummy-tag"
                    }
                }
            }, 
            "nReturned" : 1.0, 
            "executionTimeMillisEstimate" : 460.0, 
            "works" : 10184079.0, 
            "advanced" : 1.0, 
            "needTime" : 10184077.0, 
            "needYield" : 0.0, 
            "saveState" : 10184.0, 
            "restoreState" : 10184.0, 
            "isEOF" : 1.0, 
            "direction" : "forward", 
            "docsExamined" : 10184077.0
        }, 
        "allPlansExecution" : [

        ]
},

Is it the expected behavior of $ne operation for wildcard index?,
if it so, is there any other way to solve the $ne query for my ever growing key-value field (my extra field).

As a side note. I also tried experimenting normal index (not wildcard-index) for a different field named extra2 but structured exactly like field extra, and the result was like what I expected, that is, the query planner using index to find the result.
Here’e the detail,

db.coll.createIndex({"extra2.field1": 1})
db.coll.explain(true).find({"extra2.field1": {$ne: "dummy-field1"}})

the explain() result,

"executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : 1.0, 
        "executionTimeMillis" : 0.0, 
        "totalKeysExamined" : 2.0, 
        "totalDocsExamined" : 1.0, 
        "executionStages" : {
            "stage" : "FETCH", 
            "nReturned" : 1.0, 
            "executionTimeMillisEstimate" : 0.0, 
            "works" : 3.0, 
            "advanced" : 1.0, 
            "needTime" : 1.0, 
            "needYield" : 0.0, 
            "saveState" : 0.0, 
            "restoreState" : 0.0, 
            "isEOF" : 1.0, 
            "docsExamined" : 1.0, 
            "alreadyHasObj" : 0.0, 
            "inputStage" : {
                "stage" : "IXSCAN", 
                "nReturned" : 1.0, 
                "executionTimeMillisEstimate" : 0.0, 
                "works" : 3.0, 
                "advanced" : 1.0, 
                "needTime" : 1.0, 
                "needYield" : 0.0, 
                "saveState" : 0.0, 
                "restoreState" : 0.0, 
                "isEOF" : 1.0, 
                "keyPattern" : {
                    "extra2.field1" : 1.0
                }, 
                "indexName" : "extra2.field1_1", 
                "isMultiKey" : false, 
                "multiKeyPaths" : {
                    "extra2.field1" : [

                    ]
                }, 
                "isUnique" : false, 
                "isSparse" : false, 
                "isPartial" : false, 
                "indexVersion" : 2.0, 
                "direction" : "forward", 
                "indexBounds" : {
                    "extra2.field1" : [
                        "[MinKey, \"dummy-field1\")", 
                        "(\"dummy-field1\", MaxKey]"
                    ]
                }, 
                "keysExamined" : 2.0, 
                "seeks" : 2.0, 
                "dupsTested" : 0.0, 
                "dupsDropped" : 0.0
            }
        }, 
        "allPlansExecution" : [

        ]
}, 

Best regards.

Hi @nanangarsyad,

Well the wild card index does not support document or array inequality and not direct values:

Is tag field is an array?

Best
Pavel

Hi, @Pavel_Duchovny

Unfortunately, extra.tag just field with single string value.
eg.

extra: {
    tag: "dummy-tag",
    ...
    ...
}

Thank you,
Nanang