I have a MongoDB Sharded cluster with a hybrid storage, i.e. some fast SSD and some slower and cheaper spinning rust.
For archiving I like to move some data to the slower disc. For legal reason we have to keep them, they are queried only occasionally.
In principle I would do it like this:
mongo --eval "sh.stopBalancer()" mongos-host:27017
# Repeat below on each shard host:
mongo --eval "db.fsyncLock()" localhost:27018
cp /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt
ln --force --symbolic /mongodb/data/collection/3109--6926861682361166404.wt /slow-disc/mongodb/collection/3109--6926861682361166404.wt
mongo --eval "db.fsyncUnlock()" localhost:27018
# After all shards are done:
mongo --eval "sh.startBalancer()" mongos-host:27017
The indexes shall remain on the fast disc.
Would this be a reliable way to archive my data? What happens if the collection is read while move?
Another approach would be a file system like this:
/mongodb/data/collection
/mongodb/data/index
/mongodb/archive/collection -> /slow-disc/mongodb/collection
/mongodb/archive/index
And then move the collection as this:
mongo --eval 'sh.shardCollection("archive.coll", shardKey)' mongos-host:27017
mongodump --uri "mongodb://mongos-host:27017" --db=data --collection=coll --archive=- | mongorestore --uri "mongodb://mongos-host:27017" --nsFrom="data.coll" --nsTo="archive.coll" --archive=-
mongo --eval 'db.getSiblingDB("data").getCollection("coll").drop()' mongos-host:27017
Main disadvantage: the balancer has to distribute the whole data across the shards. It creates additional load on my shared cluster.
Which approach would you recommend?