MongoDB Atlas insertOne query data loss

Scenario : I need to insert a record in collection A & then along with the _ids of that inserted record from collection A , i need to insert another record in collection B .
For ex - After insertion
Collection A record

{ _id: ObjectId(“abcdef”) , name: “L” , records : [ { id: “def” , name : “none” } ] }

Collection B record

{ _id: ObjectId(“b_collection_id”) , a_col_id : ObjectId(“abcdef”) ,
b_records : [ { id: “b_id” , a_records_id : “def” } ] }

This same process needs to be done for Millions of records but for few records insertOne in collection A gives successful response along with all the ids & then got inserted into Collection B but when we try to search the record in Collection in A it wasn’t present there but Collection B record had all the values that are required from collection A record.

Now the question here is are there known scenarios of data loss in Mongodb atlas because this has happened to me while loading 500K records & 800K records & both time the loading of records got failed because 1-3 records didn’t appear in collection A but their corresponding records in collection B got created with all the correct data format along with Collection A ids ( that somehow are not present in Collection A ).

Is there any solution to it ?
Will applying writeConcern { w: majority } options would make sure that data loss never happens ?

1 Like

@Prateek_Gupta1, you can use BulkWrite Insert to catch the failed error records using try and catch.

The example which I provided is the c# sample.

try
{
   // BulkWriteLogic goes here
}
catch(MongoBulkWriteException ex)
{
  erroredList = new List<string>(); // Declare this error List Globally
   foreach (var item in ex.WriteErrors)
                {
                    erroredList .Add(CollectionAListData.ElementAt(item.Index).propertyName);
                }
}

Using the error list, you can remove or re-insert the data in another collection.

I’m confused , could u pls elaborate how can On-Demand Materialized Views can help in this case ?
I forgot to mention that i’m not using any pipeline . I make two insertOne calls from server one to insert record in Collection A , grab the info of that record & do another insertOne call in Collection B.

Which driver are you using? Looks like node driver but not sure.

Your scenario is very simple. I suspect the issue is with your code specially since

Looks like an object reference or something that is not updated correctly in your code.

If you do not handle error and exceptions, then yes you might lose some writes.

No it does not, it is safer but if the majority cannot write you will get an exception/error that you must handle in your code.

2 Likes

Hi @Prateek_Gupta, and welcome to the forums!

I need to insert a record in collection A & then along with the _ids of that inserted record from collection A , i need to insert another record in collection B

If the use case requires that for every document A it needs to be in document B please consider to embed document A in document B. See also Embedded one-to-many relationships for more information.

For different strategies on data modelling please see Building With Patterns: A Summary

For use cases that require atomicity of reads and writes to multiple documents in multiple collections (A and B), MongoDB supports multi-document transactions.

To add to what @steevej has mentioned above, if you are still facing this issue please provide:

  • MongoDB driver that you’re using and version
  • A minimal code example that able to reproduce the issue (please remove any MongoDB Atlas URI credentials before posting)

Regards,
Wan.

2 Likes

Hey @steevej, Thanks for your response. I wish the problem was with the code but that’s not the case here. Its definitely something wrong with mongo driver/db. I finally managed to replicate the scenario again where I get the successful response from DB that the document got inserted in collection A & after that its corresponding document got inserted in collection B in the following call but the document in record A is no where to be found.

@wan
I’m using nodejs to interact with mongodb.
Version:
node:14.16.0
driver - mongodb: 3.6.5
DB - Mongodb Atlas.

Payload :

doc = { name : “name” , active: true , createdOn : “date_time”,
accounts : [ { number: “1234” , active: true , “id”: “60a38ac80317f5efd0027d69” }]
}

Operation - collection.insertOne(doc, {});

DB Response :

{    
          result:  {  n: 1 , ok: 1 ,  "operationTime": "6963562057622880257" ,  "$clusterTime" : { ... }   } , 
          connection : { ... }, 
          "ops" : [  {  
                 "_id": "60a38ac80317027d6a7f53e0",  name : "name"  ,  active: true , createdOn : "date_time" ,    
                   accounts : [ {  number: "1234" , active: true ,  "id":  "60a38ac80317f5efd0027d69" }] 
             }  ] ,
          "insertedCount": 1,
          "insertedId": "60a38ac80317027d6a7f53e0",
          "n": 1,
          "ok": 1,
          "operationTime": "6963562057622880257",
          "$clusterTime": {...}
    }

The response here says insertedCount as 1 & that’s where the validation in code is put to verify if the document got inserted or not . Also this case is only happening when I’m trying to load more than 100K records using the above mentioned steps(all the steps are performed for each of the records that needs to be inserted into the DB) & that too very rarely.

I’m now confused as to where to go from here. One thing I’m sure is that I wont be able to alter the schema at all.

First, if you have already have your 100K documents at the beginning of the process, I would recommend that you use

I still think there is an issue with your code, especially since you seem to process and result correctly. Since you confirmed that you received insertedCount : 1, then it is inserted. May be it is inserted in the wrong collection as you mentioned:

May be the variable collection starts to point to collection B under some circumstances. Wherever you print the insert result, I would also print the namespace of the collection to make sure you still insert at the right place.

Since you let the system generate _id, it would be nice to see the _id of the document in collection B that has all the fields you wanted in collection A.

Current Op that are being performed - For each record -> insert record in Collection A & if successful then insert the record in Collection B.

First, if you have already have your 100K documents at the beginning of the process, I would recommend that you use bulk OP

I cannot use this OP since it wont work for the overall use-case & data-format of the records that I’m receiving and I cant really change the data format that I’m receiving because its being passed to me via some other sources. (Out of My Scope) So only available option for now is to insert the records one by one as per the data-format & how it is being processed in our system & as per the complicated DS we have to maintain to store the data inside Mongodb.

May be the variable collection starts to point to collection B under some circumstances. Wherever you print the insert result, I would also print the namespace of the collection to make sure you still insert at the right place.

Actually I’m calling two different microservices for insertion from a controller, one microservice endpoint to insert into Collection A & then from its response (doc that got returned from the collection A microsvc after successful insertion operation otherwise error is returned which is handled in Controller itself) I’m creating a new document & calling another microservice to insert into collection B & one microsvc can only interact with only one collection.

Since you let the system generate _id, it would be nice to see the _id of the document in collection B that has all the fields you wanted in collection A.

Collection B record :

{
“_id”: “60a38ad3165a989c5e2e177f”,
“active”: true,
“A_Col_id”: “60a38ac80317027d6a7f53e0”,
“email”: “testabc@ups.com”,
accounts : [ { id: “60a38ad3f4405a3ca6cbb18e” , number: “1234” , active: true ,
“A_Col_Account_id”: “60a38ac80317f5efd0027d69” }]
}

My concern here is that whether is it possible that MongoDB didn’t store the data but returns a successful response, as per my observation yes because I have shared the payload & the success response which Mongodb responded that indicates that it has inserted the document but it was not actually inserted.

This is puzzling, I admit.

Which Atlas tier are you using?

If I understand correctly your code looks like:

document_a = { ... }
result_a = service_a.insert( document_a )
if result_a is valid 
then
   document_b = { ... , A_Col_id : result_a._id , ... }
   result_b = service_b.insert( document_b )
   if result_b is valid
   then
      found_a = collection_a.find( result_a._id )
      if found_a is null
      then
         // this should never happen but it does
      endif
   endif
endif

Could it be that service_b delete in collection_a?

Is there any delete anywhere?

What is the status of the cluster when the issue happens? From


the only thing I can see is a rollback situation. May be you could try

For clusters where members have journaling enabled, combining "majority" write concern with j : true can prevent rollback of write concern acknowledged data.

1 Like

You should absolutely be using writeConcern majority (which you can set in the connection string as shown in Atlas examples) or in the driver.

Can you also confirm that you are not using secondary reads?

By the way, if the two inserts must both happen and if it’s not appropriate to embed the data into a single record, have you considered using transactions to make sure either both documents are inserted or neither one is?

In any case, if you are using majority writeConcern then successful writes will be there even if there’s a failure of the primary (and a failover to another node). It would be great to get to the bottom of what’s going on but we definitely need more details about exact versions of server, driver, which Atlas tier, and more details about how you are connecting to the cluster.

Asya

1 Like