I think this is an interesting problem for the community I have today.
Here is the gist of the query I have:
We have a collection of 100 million records. Each document has 20 fields. Out of 20, 10 fields are individually indexed. And 1 compound index has all those 10 fields indexed. Document size is 4.5kb
Now each time, I fetch the records let say a limit of 2k or 3k to show on my frontend or to download my reports…each and every time mongodb CPU utilisation goes high.
Currently, the IT team informed me that the replicas are off due to high utilisation. We don’t have a complete shard mechanism implemented yet and the Mongo instance is also running on a single server.
I have tried a couple of things but not arrived at a permanent solution to this problem-
This is what I have tried -
On some of the fetch query. Let’s say a user selects data for 3 months. I put the data dump of 3 months into a separate collection for a read/view purpose. Similarly, I did for 6 months of data and yearly data as well.
Checked the explainStats on all the possible queries. Time taken for few queries with indexing on it takes more than 60 to 90 seconds
Removed all REGEX expressions for a date field (this was the case insensitive) and implemented data type for that field and updated 100 million records according.
Remove the compound index and also tried indexing multiple other indexes on which the data filter is dependent. Let’s say field 1 is mandatory, but field 2 and 3 aren’t. So, created compound indexes with field 1 & 2 and 1 & 3. Sadly, didn’t work for me!
Now, this solution won’t scale on a fact that what if user customize the date range. This would def fetch more data than expected
I have tried to optimise the query as much as I could …but due to multiple filters on the collection…creating multiple indexes would also not work for my use case and this would take more RAM space as well. I still won’t’ care about it as now and I do have enough space with me.
This does kill my docker node container as well - Here is how CPU Utilization looks like -
Currently, the AWS instance is also quite big enough to get more data to it and I am sure we increase the SSD capacity as well whenever required. This is a dedicated server for mongodb instance
Please advise on what changes can be made effectively on the Mongo server configurations, so that this would help me lower down my CPU utilization.