MongoDB.live, free & fully virtual. Register Now MongoDB.live, free & fully virtual. Register Now

Index on the Time Series collection structure

Hi

we are in process to store historical time series data (millions of rows) for Reuters ric. we are following " Scenario 3: Size-based bucketing" as per https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices. We are storing 1000 quotes per document .

The structure of upsert query will be as following.

bulk.find({ric:"LCOJ1-V1",utc_date:ISODate("2020-08-13T00:00:00.000000000Z"),nquotes:{$lt:1000}}).upsert().updateOne(
{
	$push:{quotes:{utc_date_time:NumberInt("1597337417"),nano_seconds:NumberInt(784325554),ask_price:-0.98,ask_size:NumberInt(4),bid_price:-1.0,bid_size:NumberInt(4),qualifiers:["[BID_TONE]","[ASK_TONE]"]}},
	$setOnInsert:{exch_utc_offset:NumberInt(1)},
	$min:{quote_first:NumberLong("1597337417784")},
	$max:{quote_last:NumberLong("1597337417784")},
	$inc :{nquotes:NumberInt(1)}
});
bulk.find({ric:"LCOJ1-V1",utc_date:ISODate("2020-08-13T00:00:00.000000000Z"),nquotes:{$lt:1000}}).upsert().updateOne(
{
	$push:{quotes:{utc_date_time:NumberInt("1597337606"),nano_seconds:NumberInt(436207836),ask_price:-0.97,ask_size:NumberInt(4),bid_price:-1.0,bid_size:NumberInt(4)}},
	$setOnInsert:{exch_utc_offset:NumberInt(1)},
	$min:{quote_first:NumberLong("1597337606436")},
	$max:{quote_last:NumberLong("1597337606436")},
	$inc :{nquotes:NumberInt(1)}
});
bulk.find({ric:"LCOJ1-V1",utc_date:ISODate("2020-08-13T00:00:00.000000000Z"),nquotes:{$lt:1000}}).upsert().updateOne(
{
	$push:{quotes:{utc_date_time:NumberInt("1597337635"),nano_seconds:NumberInt(967713742),ask_price:-0.98,ask_size:NumberInt(4),bid_price:-1.0,bid_size:NumberInt(4)}},
	$setOnInsert:{exch_utc_offset:NumberInt(1)},
	$min:{quote_first:NumberLong("1597337635967")},
	$max:{quote_last:NumberLong("1597337635967")},
	$inc :{nquotes:NumberInt(1)}
});

Based on recommendation of article we have created composite index on ({ric:1,utc_date:1,nquotes:1}) which is perfectly fine for “upsert” operation.

But during the query time we are expected to query data using also “ric”,“utc_date”,“first_quote” and “last_quote”. So far we have created composite index on “ric,utc_date,nquotes”.

What would be the suggestion to create index if we are querying data using “ric”,“utc_date”,“first_quote” and “last_quote” ? Since we are processing millions of rows , we need to consider performance and required space for any additional index.

here is example read query

db.getCollection("time_series").find(
    { 
        "$and" : [
            { 
                "ric" : "CLV0-X0"
            }, 
            { 
                "utc_date" : ISODate("2020-08-03T00:00:00.000+0000")
            }, 
            { 
                "quote_first" : { 
                    "$gte" : NumberLong(1596493635301)
                }
            }, 
            { 
                "quote_first" : { 
                    "$lte" : NumberLong(1596499142016)
                }
            }, 
            { 
                "quote_last" : { 
                    "$lte" : NumberLong(1596499197995)
                }
            }
        ]
    }, 
    { 
        "ric" : 1.0, 
        "utc_date" : 1.0, 
        "quote_first" : 1.0, 
        "quote_last" : 1.0,
         "quotes" : 1.0
    }
).sort(
    { 
        "utc_date" : 1.0,"quote_first" : 1.0
    }
);

Thanks for looking into this.

Hi @Dhruvesh_Patel,

Index field order can be initially determine using Equility Sort Range order. Recommend reading more here:

The equility fields in your query is ric and utc_date while range is qoute_first and qoute_last.

So optimal index could be:

{ ric: 1, utc_date : 1, qoute_first : 1, qoute_last : 1 }

Thanks
Pavel