How can I backup part of documents when the number of documents reaches the number I set?

DongHyun_Lee · June 17, 2020, 12:44pm

Hello!
This is a good night to have a good dream.

I wonder how to backup part of documents when the number of documents reaches the number I set?

I searched, But I just find “How to create capped collection”.
This just deletes old documents when the collection reaches maximum number of documents.

All I need is these.
When the number of documents in a collection reaches the number I set,

Backup part of documents in a collection.
Delete backed up documents in a collection.

Help me

kevinadi · June 18, 2020, 11:56pm

Hi,

Backup part of documents in a collection.

Do you mean backup part of the collection?

As I understand it, you wanted to do something like a capped collection. But instead of deleting the old documents, you want to move them somewhere else. Is this correct?

If this is not correct, could you provide some examples of what you have in mind?

Best regards,
Kevin

DongHyun_Lee · June 19, 2020, 4:44am

No, not the back up part.

I was just saying that, I need to know how to back-up some part of my document when the no of document in my collection reaches certain limit. (No of document. Not the size )

So, basically I am trying to back-up some no of documents when i have no of documents more than i need in order not to have huge file size. ( and delete the documents that were backed-up in the original collection of course )

Thnx in advance

Prasad_Saya · June 19, 2020, 4:59am

You can use Change Streams.

This will let your application watch the collection, that is the number of documents in the collection, and when the number increases a previously set limit, a process is started to backup (or write to another collection) a selected number of documents (based upon some criteria you have).

Stennie_X · June 19, 2020, 9:25am

Hi @DongHyun_Lee,

For self-managed deployments, using change streams (as suggested by @Prasad_Saya) is certainly one approach. However, do consider the potential impact of triggering a count every document is inserted or updated.

A more efficient approach would be to write your own scheduled task that runs periodically and exports documents according to your expiry rules before removing them. You can schedule the task (using O/S scheduling tools like cron) to run during off-peak hours on a suitable frequency (twice daily, daily, every 3 days, weekly, …) to minimise impact on a production deployment.

If you happen to be using MongoDB Atlas (or might consider doing so), we recently added a new Atlas Online Archive beta feature which archives data greater than an expiry date (based on rules you configure) into more cost-effective S3 storage. With Online Archive and Atlas Data Lake you can continue to query both live and archived data.

Regards,
Stennie

Prasad_Saya · June 19, 2020, 10:25am

Yes, the countDocuments query can take time, for each insert.

The document counting can be tracked within the application, for example, a variable can be used (and the variable value can be persisted, once in every n number of documents) . Also, application servers have mechanisms to persist state (variable value) in the event of application failures.

system · June 24, 2020, 10:25am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.