Compound index with null and non-null values

I understand that an index consisting of one field will not include documents whose field contains a null value. For example, if I was to create an index on FieldB, the index will not include the 3rd document.
FieldA FieldB
A 1
B 1
C
D 1

However, if then create another index with FieldB, FieldA. Will it exclude the 3rd document? Or because there is now a value for FieldA for the 3rd document?

I basically do not want to include the 3rd document in the index because FieldB has a null value. How can I achieve that?

I was thinking about creating a compound sparse index. However, how can I specify sparse index to ignore the 3rd document because FieldB does not exist? Note: I will then prevent inserting a null value to FieldB, so instead of null I will not create a value.

I guess my question is very simple.

Will a compound sparse index ignore documents if one of the index fields is empty or will it only ignore it if all fields are empty?

Hello @Bluetoba
a document will be ignored when one (or more) of the sparse index fields is (are) null / not exist.
Please keep in mind that you can not use sparse indexes for sorting. Since fields are potentially omitted the sort will end in a collection scan.

Here you can find further details:
Sparse Indexes MDB training
Sparse Indexes doc

Cheers Michael

1 Like

Thanks Michael,

What if I use partial index, instead of sparse index to include a field that will be used for sorting? The sorting field is positioned last.

Is it the same issue?

Hello @Bluetoba

you can build up the same behavior of a sparse index with an partial index:

db.test.createIndex(
   { field: 1 },
   { partialFilterExpression: { field: { $exists: true } } }
) 

Reading your first post I struggle to get your use case. I like to suggest you to read the attached links

General Index Documentation
Partial Index Documentation

and ideally follow the free MongoDB Training M201: MongoDB Performance which explains in detail the different indexes.

When you have further question after you went through the documentation and the class (you may only pick what you need, in the corona time this is self paced) please add you use case and question to this post, I keep it on my watch list. I am happy to help.

Cheers,
Michael

Okay the use case is for chat collection. The first chat document has a flag, Top_Chain_FLG = 1. All documents belonging to the same conversation, share the same foreign key, Top_Chain_ID, that point to the ID of first chat document.

The conversation is between merchants and buyers, as identified by Merchant_ID and Buyer_ID. So these IDs are part of the document.

The first chat document, also known as the Top_Chain documents are augmented with the attributes of the last chat message, in order for us to save many queries to extract information about the first and last chat message. It also includes information such as Last_Message_Date.

So, it essential fields look like the following:
ID, Top_Chain_ID, Top_Chain_FLG, Merchant_ID, Buyer_ID, Last_Message_Date, etc, etc (e.g. message itself, createdAt, createdDate)

What we need is to display a list of chat messages for a merchant or a buyer in the order of last messages received (known as Last_Message_Date descendingly). So, instead of doing a separate order by, I wanted to rely on the index to sort it.

So, what I initially had in mine is to create three partial indexes.

  1. Merchant_ID (partialFilterExpression where id exists)
  2. Buyer_ID (partialFilterExpression where id exists)
  3. Top_Chain_ID, Last_Message_Date descending (partialFilterExpression where Top_Chain_FLG=1). Note: the reason I included Last_Message_Date descending is because I don’t want to be sorting manually, and I want to use what’s already in the index table.

I also thought that Index Intersection will take place on both index#1 and #3 and when say I want to query on the following, but the explain plan didn’t indicate that.
Merchant_ID = xyx
Top_Chain_FLG=1

Question 1: Why doesn’t index intersection take place?

So, I created one index that combines content of index #1 and #3 together.
Merchant_ID, Top_Chain_ID, Last_Message_Date descending (partialFilterExpression where Merchant_ID exists, Top_Chain_FLG=1)

Question 2: The explain plain works as intended, but I still don’t know if the sorting really use the Last_Message_Date from the index? Does it?

Question 3: Should I use a hint? how can I do a hint based an index name?