How to remove all Collections older than 2 days?

I have a very big MongoDB 140 GB and I need to reduce it drastically before doing a resync. For that I would like to remove all collections older than 2 days ? Could you please help me to find the right command to do that ?

MAny thanks in advance
REgards
Richard

Add a document with a single datetimestamp field called birthOfCollection or whatever and search each collection for it once a day and drop the collection if the age is over the limit.

1 Like

Hi Jack
Thanks! I’m kind of beginner in Mongo… Could you please tell me what would be the Mongo query to do that ?

What language are you using?

NOt sure I got it…What do you mean which language ? I have been using Mongo 3.4.6 on Red Hat 6.1…Or maybe just tell me how to find out the language I’m using on my Mongo VM ?

Thanks Jack

@MARCELIN_Richard, I mean, you’re writing programs using MongoDB, right? What language are you writing them in? Python? Node.js? PHP? etc.

Or are these commands you are executing entirely in the mongo or mongosh shell?

Hi Jack

Thanks for your feedback
I would like to execute this command directly in Mongo or using a Mongo shell script

Again, I don’t know if there’s a magic command that deletes collections by date. Doubt it.

  1. Insert a document with a creationDate field into each collection on each new collection’s first day.
  2. Write a script to search each collection and compare today’s date to that unique doc in each collection.
  3. If the collection is 3 days old, drop the collection

Pretty easy to do in Python, or you could write a mongo script and load it into a mongo shell.

Welcome to the community forum @MARCELIN_Richard!

The MongoDB server does not track collection creation dates.

As @Jack_Woehr suggested, one approach would be to add a document explicitly tracking the creation date. I think a more straightforward alternative would be to create your collections using a date-based naming convention, eg: tempdata_20201105 for a tempdata collection created on the 5th of Nov.

Another option for automatically removing old data would be to create a Time-To-Live (TTL) Index on a date field. Depending on your index definition, a TTL index will automatically remove documents from a collection after a certain amount of time or at a specific clock time. This would be less efficient than using a naming convention to identify collections to drop, as the TTL index still needs to remove individual documents.

However, it sounds like you already have collection data and would like to try to determine when each was created. If you are using default ObjectIDs for _id (primary key), you can infer the collection creation data based on the leading timestamp in the ObjectID.

For example, in the mongo shell:

db.getCollectionNames().forEach(function(coll) {
    // Find document with smallest key
    let doc = db[coll].find({}).sort({_id:1}).limit(1).next();
    if (doc) {
        if (typeof doc._id.getTimestamp === 'function') {
            // Print collection name, ObjectID, timestamp 
            print(
                coll,
                doc._id,
                doc._id.getTimestamp()
            )
        } else {
            print(
                coll,
                doc._id,
                '(not an ObjectID)'
            )
        }
    }
})

Instead of printing timestamps, you could compare with the current time and drop the collection if it is older than your desired expiry. I would recommend writing this sort of script using a full MongoDB driver (eg Python or Node) rather than the mongo shell. Language drivers have more robust error handling and debugging features, which will be especially useful for destructive actions like dropping collections.

Definitely take a backup before running any custom scripts that might delete or modify data, and test in a representative staging or development environment before running in production. There’s no proper “undo” for deleting data aside from restoring from a recent backup.

Regards,
Stennie

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.