Hi @Dheeraj_G welcome to the community!
Although in theory the data size and to some extent the storage size should be evenly distributed between the number of shards, in practice this is difficult to determine. Every deployment is different, and the data size would depend on (off the top of my head):
- Whether every document are of the same size or not
- Whether the shard key have enough cardinality to allow this balance
- Whether each of the chunks in the collection are operated on evenly (i.e. are there “hot chunks” that receive more reads/inserts/updates than others)
It should be approximately evenly distributed if the collection was ideally distributed and the workload evenly distributed as well, however in practice this is not always the case.
It also gets more complicated due to how WiredTiger actually allocates the data files physically within each shard (which could be very different on each shard). Deleting documents and compacting the database may result in space returned to the OS, but this is not a guarantee. WiredTiger’s compression features should help you with disk space conservation to some extent, should disk space conservation is important to you.
However if you are expecting your data to grow in size, I don’t think you need to run compact
. The reasoning is because if you expect your data to grow, those spaces will have to be reclaimed again by WiredTiger in the future, thus resulting in no net useful work.
Best regards,
Kevin