Assess the right amount of cacheSize for wiredTiger upon index addition

naman · February 12, 2021, 4:08pm

Hi. I had few questions about assessing the performance impact of introducing a new index to one of my collections. My current cluster has machines having "memSizeMB": 80561 and each machine hosts 3 mongod processes for corresponding shards. The current value of storage.wiredTiger.engineConfig.cacheSizeGB is 20GB considering some buffer for other mongo processes co-hosted on these machines. We have a single database with three collections, let’s call them user_data , product_data and static_data.

What I am looking forward to is establishing a new index based upon a feature requirement for my application. Based on this S0 thread, I have estimated it to be of size ~2.8GB. To decide whether adding this index degrade the performance of my existing cluster or not, I had further inspected the index sizes reported by querying mongos using stats. These already accounted for a total of 24.6GB detailed below per collection:

{
  "uIdHash_hashed" : 2663313408, // 2.7GB
  "bId" : 1703297024,  // 1.7 G B
  "_id_" : 9491111936  // 9.5 GB
}
{    "_id_" : 865091584, // 0.9 GB
    "pIdHash_hashed" : 256389120, 0.3 GB
    "pIdHash_name" : 877314048, // 1GB
    "pid_products" : 8371068928, // 8.4GB
    "bId" : 96018432  // 96MB
     // planning to add index here for 2.8 GB
},
{
    "_id_" : 36864, // 37 KB
    "expiry" : 36864, // 37 KB
}

So my questions to follow here are(diving slightly deeper):

How would this addition impact the existing reads and operations further on my mongo cluster?
I could read that the storage engine uses page cache as a fallback, how could I tune the size for it to optimize further?
Further what is really surprising me is the size of the indexes that resulted from the documents that I have, I wanted to confirm if these are broken further down to consider attributes like some document caching or cursor references as well?

steevej · February 12, 2021, 5:52pm

I cannot answer your request, however I want to comment of the following:

Hosting multiple mongod instances on a single machine for sharding is a bad system architecture. You would be better off running 1 instance and avoid sharding. Sharding adds overhead. It adds processing overhead as queries have to be routed with mongos. It adds processing overhead as running the config server replica set is not free. It adds storage overhead as the config server has data about chunks location.

With 3 instances on the same hardware, you reduce the RAM available for cache available for all of them. The config server also needs RAM for cache. Yes, you potentially distribute the load. But you increase the chance of cache miss, and this is expensive.

I do not have any numbers to backup my opinion. Running multiple shards replica sets on the same host sounds wrong. Running multiple mongod instances on the same host sounds simply wrong unless you are developing or testing.