Compaction requirements

Hi team,

First post here, so be kind :slight_smile:

I have two queries.

  1. I have a set of collections in a DB which haven’t been compacted in a while. I wanted to understand the memory and disk requirements of the compact command.

To clarify, I want to understand how much memory(and disk, if any is consumed) by the mongod process when I launch the compact command. This is to plan in advance if I need to stop other applications and ensure there’s enough disk space available.

  1. I know that compact is a blocking operation so if I start a compact command on a secondary node, when client(s) were connected to it, will the compact command wait for those reads to complete and then block all subsequent operations? Or will all client connections be terminated and those requests will be abruptly dropped?

Thanks for your inputs,
Murali

Hi Murali,

Unfortunately we can’t really tell beforehand how much resources in terms of time and space will be consumed by the compact command since it’s highly dependent on the state of the data files. It is, however, an extensive operation that preferably not done while the server is live in production as either a primary or a secondary.

If you want to do this on a secondary, it’s recommended that you do a rolling maintenance instead, where you take one secondary offline, do compaction on it, and rejoin it to the replica set. However, the time the compact command takes cannot be longer than the oplog window, or the secondary will fall off the oplog and not be able to rejoin the set later. To determine the oplog window, you can run rs.printReplicationInfo() on the mongo shell.

Having said all that, most of the time it’s not necessary to run the compact command, unless you have deleted a large part of your database and not planning to insert that much data anymore in the future (i.e. you’re downsizing your data). WiredTiger will reuse those empty spaces eventually. That is, returning space to the OS that will be reallocated again by WiredTiger in the near future results in zero net gain for you.

Best regards,
Kevin

Hi Kevin,

Thanks for the response. I think the point right at the end you make is one of utmost importance, that needs to go into public documentation. As you rightly mention, compaction is a maintenance procedure, and as such requires scheduled downtime in production environments.

If customers are running out of disk space, what they need to do is just remove unused/large documents/collections and not necessarily a compaction(unless they’re sure of the scenario you mention).

Once again, thanks for some valuable inputs!

Cheers,
Murali