My use case is that I want to upsert many documents (>1000, large enough that bulk operation seems the only suitable way). As “upsert” suggests, it is either an insert or an update.
So I use BulkWrite, with ReplaceOne model, and with ReplaceOptions.upsert(true).
So far, the bulk operation only returns a BulkWriteResult with overall inserted, modified, matched counts.
My question is: how do I know whether update or insert actually carried out for each operation in the bulk ? (To know which document got updated, and which one got inserted)
OR is there another alternative to achieve this ? I reckon one can perform update(upsert=true) for individual document but it’s too costly in my use case.
Wellcome all comments, feedback and suggestions
Tuan
As you can see from the output I get from the bulk operations, I know that {_id: 1} was upserted and the 2 others found a matching document so they performed a replace operation.
Thanks @MaBeuLux88 for a very detailed explanation (Must say my experience in this community forum is pretty positive so far)
My domain language is Java, but I realised the similar thing, there is a BulkWriteUpsert list returned in the BulkWriteResult with only the document that are INSERTED and their indexes on the original list. From there, I was able to figure out which ones have been updated.
Another question though: Is there a good way to obtain document “before” it is replaced ?
So far, I have gone the long way, before running the upsert, I take the snapshot of the DB, then I run the upsert operation, from the result I work out the documents have been updated (there’s an extra field acting like unique ref). After that, I restore the DB using the previous snapshot and query for the “old” documents using those ref. I can tell it a long way, but can’t think of a faster way.
I was about to suggest to use the Change Streams with a filter on the updates only + updateLookup but you will only get the new version of the documents, not the old one.
Also, as you already know which filters you will use to replaceOne your documents in your bulk operations, maybe you could also run some find operations before running your bulk. Document that don’t exist yet won’t be matched and you could retrieve the old versions of the one that do exist.
I bet these 2 solutions are faster than the restore snapshot one .