Oplog compression ratio

We’re running a mongodb 4.2 with zstd collection block compression. The oplog is set to oplogSizeMB: 3000000 MB(before compression), but the actual collection size is ~350GB(after compression). It’s like 10X compression ration. According to https://source.wiredtiger.com/3.1.0/compression.html, the default compression ration for zstd is 3. How can we understand the size difference between the data size and total size of oplog collection?

PRIMARY> db.getReplicationInfo()
{
“logSizeMB” : 3000000,
“usedMB” : 2996131.66,
“timeDiff” : 123687,
“timeDiffHours” : 34.36,
“tFirst” : “Sun Feb 21 2021 01:55:32 GMT-0700 (MST)”,
“tLast” : “Mon Feb 22 2021 12:16:59 GMT-0700 (MST)”,
“now” : “Mon Feb 22 2021 12:16:59 GMT-0700 (MST)”
}

PRIMARY> db.oplog.rs.totalSize()
379049607168

PRIMARY> db.oplog.rs.dataSize()
NumberLong(“3118230005927”)

Welcome to the MongoDB community @Bowen_Liu!

The Zstandard value you are referencing is a “compression level”, not a target compression ratio. The compression level determines the amount of effort that goes into the compression algorithm’s analysis: a lower level will produce results faster but may not result in as much compression as a higher level. Higher compression levels will have slower compression speed and use more resources (memory & CPU) in exchange for potentially better compression outcomes. There are diminishing returns in higher compression levels, especially if you want to minimise the latency for writing data to disk.

Quoting from the Zstandard manual:

The library supports regular compression levels from 1 up to ZSTD_maxCLevel(),
which is currently 22. Levels >= 20, labeled --ultra, should be used with
caution, as they require more memory. The library also offers negative
compression levels, which extend the range of speed vs. ratio preferences.
The lower the level, the faster the speed (at the cost of compression).

Note: compression level is currently not adjustable for MongoDB collections (although you can choose the algorithm to use like Zstandard vs Snappy). There’s a feature request you can upvote & watch for updates: SERVER-45690: Ability to customize collection compression level.

The compression ratio will vary based on the source data, which in this case will be block compression of oplog documents. The best estimate of expected compression ratio will be derived from observation of your deployment metrics over time.

Your current oplog workload is achieving about a 10:1 compression ratio. If the nature of your workload changes significantly in future (for example, if an application started storing binary data which is less compressible) the ratio may change.

In MongoDB 4.2 and earlier server versions, the maximum oplog size is based on a configured oplogSizeMB compared to the storage size of the oplog. MongoDB 4.4+ adds the option to set a time-based oplog retention period for admins who want to ensure the oplog covers an expected duration (in hours).

Regards,
Stennie

1 Like