Recovering partial data folder (and backup policies)

Marco_De_Vitis · May 14, 2021, 3:52pm

Hello,
is it possible to start a mongodb instance when some collection files are missing from the WiredTiger data folder? Only some collection files missing, ALL other files from the data path are present.
I tried this on a 3.4 instance, but mongo would not start because of the missing files. I now solved my issue in a different way, but I’m curious to know anyway, just in case I will need this again in the future.

On a more general topic, I have to manage this 3.4 instance; we currently cannot upgrade to a more recent version. The instance has some huge collections, so an idea for backup was to do frequent backups via lvm snapshots + data files copy skipping the largest files, and copy the large files less frequently instead, e.g. during weekends.
But is such a backup usable in case of disaster recovery? As reported above, I was not able to start the db without some collection files (the largest ones). I also tried replacing the missing files with copies from a different backup, but again it did not work, mongodb complained about wrong checksums and so on.

I’m now going back to backups using mongodump, which I mostly used before. I’m only not sure how fast and reliable can be the mongodump of a 1TB collection…

Thanks for any info.

chris · May 20, 2021, 4:00pm

Hi @Marco_De_Vitis

The short answer is no.

No. Until you’ve restored from a backup you don’t have a backup, just some files you think are a backup. Regular restoration test are required as part of any backup strategy.

MongoDB Backup Methods has this to say on mongodump. It has also been discouraged as a production backup method on a few other threads.

When connected to a MongoDB instance, mongodump can adversely affect mongod performance. If your data is larger than system memory, the queries will push the working set out of memory, causing page faults.

MongoDB Backup Methods Has options with some of their products, you would have to pay for these or have MongoDB Enterprise.

Not to mention MongoDB Atlas where you just click and configure your backups.

Percona offer a backup tool too. I don’t have any experience with this yet.

I have had success building backup nodes on a ZFS filesystem and used GitHub - zfsonlinux/zfs-auto-snapshot: ZFS Automatic Snapshot Service for Linux.

Marco_De_Vitis · May 20, 2021, 11:53pm

Thank you @chris.
My initial plan was indeed to use filesystem snapshots (with LVM), and then copy data from the snapshot elsewhere, but the size of the >1TB collection made it impossible to keep a full db history. That’s why I tried copying everything but the huge collections, there is no warning about this in MongoDB Backup Methods.
But then, in what can be considered a first restoration test, I discovered that recovering such a partial backup is not easy. For the record, I succeded anyway in recovering data by copying files in a 4.4 installation and running --repair.

mongodump performance impact is not so important in this case because I plan to do the big dump in non-business hours.

Paid services becomes quickly expensive for such sizes, while Percona seems interesting, thanks, but it will need time to be tried and most of all does not work with MongoDB 3.4 which I’m currently forced to use .

system · May 31, 2021, 8:02am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.