Unique key on array fields in a single document

Hi All,

We have a document as below. Need to have only unique combination of productId and serviceId in “arrayfileds” array.

We have added unique index as below, but still duplicates with serviceId , productId combination are being allowed.

Could you help is restricting the duplicates in array.

Index:

db.collection.createIndex( { _id : 1, “subscriber.msisdn”: 1, “arrayfileds.serviceId”: 1, “arrayfileds.productId”: 1 }, { unique: true } )

db.collection.find({"_id" : "669486112345"}).pretty()
{
        "_id" : "669486112345",
        "subscriber" : {
                "msisdn" : "669486112345",
                "state" : 1,
                "createdOn" : "2020-04-23 22:13:35.228",
                "updatedOn" : "2020-04-23 22:13:35.228",
                "lbChargeCode" : "",
                "userClass" : "1",
                "imei" : "",
                "tg" : {
        },
        "arrayfileds" : [
                {
                        "subsId" : "5217331305876549695359482",
                        "state" : 1,
                        "prvsState" : 0,
                        "createdOn" : "2020-04-23 22:16:09.584",
                        "updatedOn" : "2020-04-23 22:16:09.584",
                        "productId" : 4000,
                        "opid" : 0,
                        "reqChrgCode" : "49880127010",
                        "cCode" : "49880127010",
                        "lbPrice" : 0,
                        "serviceId" : 3000,
                },
                {
                        "subsId" : "5217331305876549695359482",
                        "state" : 1,
                        "prvsState" : 0,
                        "createdOn" : "2020-04-23 22:16:09.584",
                        "updatedOn" : "2020-04-23 22:16:09.583",
                        "productId" : 4000,
                        "opid" : 0,
                        "reqChrgCode" : "49880127010",
                        "cCode" : "49880127010",
                        "lbPrice" : 0,
                        "serviceId" : 3000,
                 },
                {
                        "subsId" : "5217331305876549695359482",
                        "state" : 1,
                        "prvsState" : 0,
                        "createdOn" : "2020-04-23 22:16:09.584",
                        "updatedOn" : "2020-04-23 22:16:09.583",
                        "productId" : 4000,
                        "lbPrice" : 0,
                        "serviceId" : 3000,
                }
        ]
}

Welcome to the community @sai_krishna :grinning: !

It’s not currently possible, with an index, to ensure uniqueness of the elements within an array as you can see here.
For “simple” elements, we could be smart and use the $addToSet operator but for object elements it’s impossible.

It’s something already known as you can see SERVER-1068.

Two solutions for you:

  1. Implement it in your application.
  2. I don’t know the details but you can, maybe, review your data schema and make 2 collections and you ensure uniqueness in your new collection. If you are comfortable with schema design it can be done quickly.
1 Like

Hi Gaetan_MORLET,

Thanks for the response,

– Implement it in your application.

Any specific operator can be used to overcome this. What can be the approach if we have to implement it in application .

Regards,
Sai

My method is simple.

  1. You make a query on your document to find out if in your array you already have an element (in your case an object) with serviceId: yourValue and productId: yourValue.
    Something like that:

db.collection.find ({_ id: “669486112345”, arrayfileds: {$elemMatch: {productId: 4000, serviceId: 3000}}}).count()

  1. If the number of documents returned is 1, this means that the combination already exists.
    Otherwise, 0 documents returned, the combination does not exist and therefore you can add your element.
    Here is the query for insert your new object iniside your array:

db.collection.updateOne({_id: “669486112345”},{$addToSet: {arrayfileds: {productId: 4000, serviceId: 3000,…}}})

To be more efficient during find(), make an index like this:

db.collection.createIndex( {“_id”: 1, “arrayfileds.serviceId”: 1, “arrayfileds.productId”: 1 })
or
db.collection.createIndex( {“subscriber.msisdn”: 1, “arrayfileds.serviceId”: 1, “arrayfileds.productId”: 1 })

because in your case “_id” and “subscriber.msisdn” are the same value.

This is how i would do :grinning:

Any better idea @Prasad_Saya ?

1 Like

@Gaetan_MORLET Hello :slightly_smiling_face:

To make sure that no duplicate values of the productId and serviceId (of arrayfileds array) are not introduced into the array during update operations, you can try this:

NEW_DOC = {    // sample element to be added
            "productId": 22,
            "serviceId": "service-22",
            "othersFields": "xyz"
}

db.collection.updateOne(
  { _id: ObjectId("5eaab2df1347cc3a123a2878"),
    arrayfileds: { $not: { $elemMatch: { productId: NEW_DOC.productId, serviceId: NEW_DOC.serviceId } } }
  },
  { $push: { arrayfields: NEW_DOC } }
)

This will make sure that the new element is added to the array - only if the productId and serviceId do not exist.

5 Likes

Hi @Prasad_Saya, @Gaetan_MORLET

Thank you very much for the inputs :blush: This worked for me.

we are also checking with an application for race condition. I will post you the status once tested.
Could you help with the script to remove the existing duplicates with the same condition.

remove the existing duplicates

This can be done in two steps:

  1. Write an aggregation query to identify the documents with duplicate productId and serviceId combination. Get a list of documents with the productId and serviceId duplicates.
  2. Using the list from step 1, update the documents - remove the duplicate data.

Here are couple of approaches, you can use one of them.

The aggregation framework’s $group stage allows group documents by the productId and serviceId, and get the counts for each document. When the count is greater than 1, it means that there are duplicates in the array; so filter by the count and and get the unique identifier for the embedded document within the arrayfields array. Use this information to remove the duplicate array elements with an update operation.

Another approach is to group by productId and serviceId, and collect all the array elements (into an array). Next, keep the first element and remove the remaining elements - this will remove the duplicates from the array. Use this aggregation result to update your collection through an update operation.

Please note, I will not be writing scripts for this :slightly_smiling_face:

Hi,

I am able to fetch all the “_id” s which are having duplicates in its array with the below aggregation.However, the for loop is failing to remove duplicates. Am i missing something here?

Query find duplicates.

db.getCollection("collection").aggregate([{$unwind: {path : "$arrayfileds",}},{$group: { _id : { _id: "$_id" , ProductId : "$arrayfileds.productId" , serviceId:"$arrayfileds.serviceId"},dups: { $addToSet: "$_id" },count: { $sum: 1 }}},{$match: {count: {"$gt": 1}}}])


Loop to remove duplicats: 

var duplicates = [];
db.getCollection("collection").aggregate([{$unwind: {path : "$arrayfileds",}},{$group: { _id : { _id: "$_id" , ProductId : "$arrayfileds.productId" , serviceId:"$arrayfileds.serviceId"},dups: { $addToSet: "$_id" },count: { $sum: 1 }}},{$match: {count: {"$gt": 1}}}]).forEach(function(doc) {
    doc.dups.shift();     
    doc.dups.forEach( function(dupId){ 
        duplicates.push(dupId);  
        }
    )    
});
printjson(duplicates);

This worked for me… We are able to overcome duplicated even with RACE condition.

Thanks