Hi. I had few questions about assessing the performance impact of introducing a new index to one of my collections. My current cluster has machines having "memSizeMB": 80561
and each machine hosts 3 mongod
processes for corresponding shards. The current value of storage.wiredTiger.engineConfig.cacheSizeGB is 20GB considering some buffer for other mongo processes co-hosted on these machines. We have a single database with three collections, let’s call them user_data
, product_data
and static_data
.
What I am looking forward to is establishing a new index based upon a feature requirement for my application. Based on this S0 thread, I have estimated it to be of size ~2.8GB. To decide whether adding this index degrade the performance of my existing cluster or not, I had further inspected the index sizes reported by querying mongos
using stats. These already accounted for a total of 24.6GB detailed below per collection:
{
"uIdHash_hashed" : 2663313408, // 2.7GB
"bId" : 1703297024, // 1.7 G B
"_id_" : 9491111936 // 9.5 GB
}
{ "_id_" : 865091584, // 0.9 GB
"pIdHash_hashed" : 256389120, 0.3 GB
"pIdHash_name" : 877314048, // 1GB
"pid_products" : 8371068928, // 8.4GB
"bId" : 96018432 // 96MB
// planning to add index here for 2.8 GB
},
{
"_id_" : 36864, // 37 KB
"expiry" : 36864, // 37 KB
}
So my questions to follow here are(diving slightly deeper):
- How would this addition impact the existing reads and operations further on my mongo cluster?
- I could read that the storage engine uses page cache as a fallback, how could I tune the size for it to optimize further?
- Further what is really surprising me is the size of the indexes that resulted from the documents that I have, I wanted to confirm if these are broken further down to consider attributes like some document caching or cursor references as well?