How to avoid duplicate upserts without a unique index

Angelsfly_Because · July 15, 2020, 12:50pm

Hi, I am new to MongoDB. Have a basic question of avoiding duplicate inserts when using update() with upsert=true. Based on the concurrency control document (https://docs.mongodb.com/manual/core/write-operations-atomicity/index.html#concurrency-control), it requires a unique index to prevent insertions or updates from creating duplicate data. But due to the nature of our data, it is not possible define a unique index on documents in a collection. Is there any other way to avoid the duplicate insert problem? Thanks.

Stennie_X · July 21, 2020, 11:10pm

Welcome to the community @Angelsfly_Because!

Can you provide more detail on your use case and why a unique index would not be an option? An example document or two would be helpful.

Regards,
Stennie

Angelsfly_Because · July 22, 2020, 11:45am

Hi Stennie,

Thanks for asking. Here is my explanation of our unique situation that really makes a unique index or compound unique index really difficult.

The data model is a bit complicated. My explanation could be long. Please bear with me.
Initially, we have 3 types of incoming documents and they are placed into the same collection.

type 1 could have 3 attributes: X, Y, Z
type 2 could have 4 attributes: X, Y, M, B
type 3 could have 2 attributes: L, M, B

In each type, each attribute value is uniquely defined. For instance, 2 type(3) documents have the
same L value will be considered as equivalent and they will be merged into a new document, and old
documents will be removed. It is also true for M and B for type(3). Each attribute is optional but
as least on of them will be present. So it is possible, one type(3) has L, B and the other type(3)
only has a M. So if a new type(3) comes in with L, M and their values are the same. This new document
will trigger a merge of all three.

It is possible after merge, some attribute could have multiple values, for instance, two type(3)
documents with the same L but different M, then the new document will have both values for M
attribute. The merge also applies to the different types. For instance, one type(1) with X, Y
and Z, and one type(2) with X, M, and B. They both have the same X value, then they will be merged
as well. After the merge, the new document will have X, Y, Z, M, and B attributes. Say a new
type(3) document comes in with B and it has the same value as B, then it will be merged into as well.
In other words, a type(1) and a type(3) could be merged because of a type(2).

Now the problem is we have a type(1) document with X, Y and a type(2) document with Y, B. During the
database query stage, both find there is no conflict. So each one tries to write into the db. But
that is incorrect because they have the same Y value. Therefore, they should be written into db as
one document.

We cannot define a unique index. For instance, if we set the X as the unique index, then it is
possible type(2) and type(3) conflict because of B. If we define a compound index X and B, then it
is possible we have two type(3) documents with the same L value being written into db and that is
a conflict.