What happens when indexes don't fit in RAM

naman · February 16, 2021, 1:31pm

Hi. The current documentation around Ensure Indexes Fit in RAM and from another thread Working set MUST fit in memory? I could gather that

for best performance you will want your commonly used indexes and working set to fit in memory

My question further to the database developers is, what happens when the indexes don’t fit in the memory? I want to be able to compare the trade-off between adding any resource and the complete migration process to that of dealing with slight(or major) performance impacts on my data.

steevej · February 16, 2021, 2:52pm

For each query that targets key values that are not currently in the part of the index that is not in RAM, the index will have to be read from disk.

This means the system will slow down because disk I/O speed is much slower than RAM. You can experience the effect of too little RAM by adjusting wiredTiger cache size. See https://docs.mongodb.com/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB

That’s the beauty of running out of Atlas, migration up and down is easy. You do not have to be right the first time.

naman · February 16, 2021, 7:10pm

Hey @steevej, thank you for responding.

I am aware of the cacheSizeGB option and nonetheless about the disk ops slowing down the queries in practice too.
What I am further curious to understand is the underlying implementation of random key reads(some with index in RAM, some not in there) and the mechanism/role played by the on-disk format, filesystem cache, collection data in the internal cache, and indexes in the internal cache. More towards how their size during such reads vary and if there is a deterministic behavior for analyzing the impact when either of the factor changes.

Joe_Drumgoole · February 18, 2021, 10:30am

If it takes 5 minutes to read from RAM the same read will take 7 months to read from a spinning disk. So saying “disks are slower” sometimes doesn’t communicate how much slower they actually are.

If your index doesn’t fit in memory the slow down you will experience when querying on that index as pages are read from disk expired and then reread from disk will be dramatic.

The goal for every performant database is to have all its regularily queried indexes to fit completely in RAM. To futher increase performance the most regularily queried records should fit in RAM as well.

For a gaming application this means data associated with logged in users, for a banking application accounts that have transactions in the last thirty days etc. etc.

naman · February 23, 2021, 2:44am

@Joe_Drumgoole thank you for pitching in, I could sense the consequences aptly. With this question though I wanted to focus more on the mechanism to understand it in detail. Something like the “two page faults” mentioned in the comment by Alexey here.

Joe_Drumgoole · February 23, 2021, 11:26am

Sorry I don’t have that level of knowledge. Let me see if I can get someone else to respond.

John_Page · February 23, 2021, 11:38am

In MongoDB Collections and Indexes both use the same underlying data structure. WiredTiger tables - these are BTrees (B+ ish trees) , essentially key-value stores arranged into pages(or blocks depending who you ask) - each page is 32KB in size or one entry whichever is larger (Up to 16MB for a single 16MB BSON document) - it’s then compressed when flushed to disk. In memory a collection page contains both the the on-disk version of the page as well as a list of changes to any document not yet reconciled to disk. An Index page contains the index keys and for each key (which may be the full key or prefix compressed) a pointer to the identity of the document it points to.

When you access an index - MongoDB checks to see if the page is in cache - if not it reads it from the disk (going via the OS pagecache so it may be cached there) - index pages are not compressed on disk as they already have prefix compression.

The upshot is that each index lookup may in cache (nice and fast) or not in which case , nothing to do with page faults - this is all explicitly tracked, it will require a disk read which will be 32KB and a seek at the very least - if you have readahead set appropriately it may be more than 32KB. If that happens to be in your OS page cache it will be quicker but it still needs some processing to put it in cache. The seek will take 1 IO operation at least so a 1000 IOPS disk, with random seeks in an index which is much larger than ram will be very much throttled by IO.

You can look at the Read-Into-Cache metric using any of the MongoDB tooling (or look in db.serverStatus() and diff the entries) you can also observer the cache for collections and indexes using mongocacheview.js (GitHub - johnlpage/MongoCacheView: See what is in your MongoDB WT Cache and how it's being used.).

In general - if your working set does not fit in ram, expect one to two orders of magnitude slower operations that hit those keys (queries, inserts, deletes, updates of indexed values) , consider (a) Adding RAM (b) Making indexes smaller (c) Adding many IOPS - in that order.