Json format too bloated

we store not too small json documents in mongodb. We will add 10s of millions of documents every month.

We found that storing 30 mio documents eats up 2 TB of data.

As the format is json, there is a lot of repetition in each document (every key repeats in every document). So we started to shorten the keys and could reduce the size to 300 GB.

That’s good but I feel its still way too much. If we wouldn’t have the json boilerplate we still could reduce it by a factor of 10 easily again.

On the other hand I feel that messing around with json keys and transforming them from something human understandable like ‘baseCurrencyAmount’:‘EUR’ into something barely readable like ‘baCA’:‘EUR’ is in general the wrong direction. But you also see that the actual value is still smaller than the key.

Is there any hint you can give? Is this where a document-database is not the right thing and we need to go to good old SQL?

And side question: it seems that mongo compression compresses each document separately. As if would compress it somehow jointly, my guess is that the key lenghths would not play a role any more.

Hi @karto_sack

Welcome to MongoDB community!

Shorten document keys is a known way to compress document size.

However, there several additional ways:

  1. If you have small documents who are accessed together and have similar values for all you may consider using a bucket pattern to store the change values in an array. This is as long as your arrays keep reasonable number of elements.
    https://www.mongodb.com/blog/post/building-with-patterns-the-bucket-pattern
  2. You can change the default compression of the storage layer to allow better compression over some additional cpu overhead:
    https://docs.mongodb.com/manual/core/wiredtiger/#compression
  3. You can always consider sharding and spreading data cross different replica sets allowing u to scale the collection.

Best
Pavel

3 Likes