Handling "Duplicated key error" in bulk insert retry scenarios

Tuan_Dinh · April 17, 2020, 1:17am

Hi All,

I’m using Java driver 3.12 and facing the following issue:

I’m trying bulk insert a huge amount of documents with a retry mechanism in case there is connection disruption issue. The problem is, once the retry kicks in, it starts the bulk insert all over again. Since there are some documents have been inserted in previous try, it results in “Duplicate Key error”.

My question is that is there a way to ignore the “duplicate key error” in subsequent retry ?

Update: I have a follow-up question. The above problem can be solved using “Upsert” opertation instead of “Insert” (using Replace model instead of Insert model in the bulk write). But performance is an important factor here (dealing with ~ 100k records) and my assumption is “Upsert” is significantly more expensive than “Insert”. Is that assumption valid ?

Thanks,
T

Prasad_Saya · April 17, 2020, 3:12am

You can ignore the error by catching the exception. For example:

List<WriteModel<Document>> bulkWrites = new ArrayList<>();
Document doc1 = new Document("_id", 13).append("fld", "a");
Document doc2 = new Document("_id", 13).append("fld", "a");
Document doc3 = new Document("_id", 14).append("fld", "c");
bulkWrites.add(new InsertOneModel<Document>(doc1));
bulkWrites.add(new InsertOneModel<Document>(doc2));
bulkWrites.add(new InsertOneModel<Document>(doc3));

BulkWriteOptions bulkWriteOptions = new BulkWriteOptions().ordered(false);		
BulkWriteResult bulkResult = null;

try {
	bulkResult = collection.bulkWrite(bulkWrites, bulkWriteOptions);
}
catch(MongoBulkWriteException e) {
    // print a short error message _and_ the result (inserted count)
    System.out.println(e.toString());
	System.out.println(e.getWriteResult().getInsertedCount());
}
finally {
    // print the result when there are no errors
    if (bulkResult != null) {
        System.out.println(bulkResult.getInsertedCount());
	}
}

Tuan_Dinh · April 17, 2020, 6:33pm

Thanks @Prasad_Saya for the quick reply.

What I meant by “ignoring the duplicate key error” is for the database to ignore the document with duplicated key and keep inserting the rest of the bulk write. (The context here is the retry, first bulkwrite attempt inserted some documents already, and then there is a connectivity issue. Next, the retry kicks in and do the bulkwrite all over again, is there away for mongo to keep inserting valid document eventhough there are already inserted documents in the bulk ?)

By the way, in your example, how many entries would have been inserted into the db ? And what is the value of:

e.getWriteResult().getInsertedCount()

Prasad_Saya · April 18, 2020, 2:23am

What I meant by “ignoring the duplicate key error” is for the database to ignore the document with duplicated key and keep inserting the rest of the bulk write.

The bulk operation tries to insert every document you had supplied to insert. If the document already exists in the collection, the document is not inserted (because of the duplicate key error), and the next document’s insert operation will be attempted. Note that the insert will be attempted on all the documents supplied.

By the way, in your example, how many entries would have been inserted into the db ? And what is the value of: e.getWriteResult().getInsertedCount()

The first time the code is run, it inserts two documents and the e.getWriteResult().getInsertedCount() returns 2. If you run the same code again, no documents are inserted and output is 0.

Tuan_Dinh · April 20, 2020, 1:28am

Thanks @Prasad_Saya, you have pointed out my fundamental misunderstanding of the bulkwrite operation. My assumption was that it would halt the process soon the “duplicated key error” occurs but in fact, it keeps attempting all documents.

Appreciate it!

Prasad_Saya · April 20, 2020, 2:26am

You are welcome

Stennie_X · April 20, 2020, 7:03am

Hi Tuan,

Your assumption is actually correct. There are two modes for bulk write operations: Ordered (the default) or Unordered.

Borrowing descriptions from the Bulk Write Operations documentation:

With an ordered list of operations, MongoDB executes the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list. See Ordered Bulk Write.

With an unordered list of operations, MongoDB can execute the operations in parallel, but this behavior is not guaranteed. If an error occurs during the processing of one of the write operations, MongoDB will continue to process remaining write operations in the list. See Unordered Bulk Write.

@Prasad_Saya’s example above sets ordered(false) in the options, so will continue on duplicate key errors.

Regards,
Stennie

Tuan_Dinh · April 21, 2020, 3:46am

Thank @Stennie_X for further clarifying the issue.