Manual vs. Programmatic Data entry into Atlas

Michael_Murray4 · March 22, 2024, 5:02pm

Hi Everyone,

I am running into an interesting issue. I created a few objects in a collection programmatically via pymongo. I have an aggregation that runs against that collection via an API call. When I call the API endpoint i get a 500 error as the $match stage on the pipeline fails, but when I call the API endpoint with an _id that was created manually in Atlas the pipeline runs correctly and returns the data formatted properly.

Has anyone else experienced this before? To me an _id is an _id and the match stage should be able to locate that in the collection.

-M

Jack_Woehr · March 22, 2024, 6:53pm

can you code a repeatable example?

steevej · March 22, 2024, 7:17pm

This is symptomatic to a type mismatch on the _id.

Most likely the _id created via pymongo are ObjectId.

The one you created manually is either string or number.

For the $match to work, the value and the type must match. Most of the time, with the Data API you have string and numbers so you must convert the string _id to an object id.

Try changing your query from

"_id" : the_id

to

"_id" : { "$oid" : the_id }

Michael_Murray4 · March 22, 2024, 11:02pm

Within the script I have pymongo inserting an object like so:

{
name: {
first: “”,
last: “”
},
email: “”,
etc.
}

I am using the insert_one helper and the data is inserting into the DB just fine. The ObjectIds are in fact ObjectIds I double checked the data type, but still the gag pipeline does not return a doc from those objects, on objects I create manually in the dashboard.

steevej · March 23, 2024, 12:46pm

It is almost impossible to tell you what is wrong with your aggregation or find query if you do not share the aggregation or query.

Please share the code you are using, then we might help.

Please share sample un-redacted documents, one that works and one that does not, then we might help.

Michael_Murray4 · March 23, 2024, 2:23pm

Team,

I apologize for the confusion. There was a variable in my script that was not being populated for programmatic upload and was corrupting the docs, thats why they didnt show in agg pipeline. Anyways that is fixed now! Sorry for the confusion. Just needed to step away from my laptop.

Best,
M

steevej · March 23, 2024, 2:59pm

No problem. The important thing is that it is working now.

Michael_Murray4 · March 23, 2024, 7:29pm

I found another little oddity. For all python users to create ObjectIds its best to use the bson library and import ObjectId directly

The following worked in the aggregation pipeline:

from bson import ObjectId
transformed_object = {
    "name": {"first": "John", "middle": "", "last": "Doe"},
    "email": "john.doe@gmail.com",
    "org": ObjectId("65d8b391afa9babbc50d461c"),
}

What did not work:

transformed_object = {
    "name": {"first": "John", "middle": "", "last": "Doe"},
    "email": "john.doe@gmail.com",
    "org": {"$oid": "65d8b391afa9babbc50d461c"},
}

Best,
M

raman_saini · March 25, 2024, 11:34pm

This issue is due to difference in the id format between the document created programmatically and manually. make sure both type of id value match in format and data type.

Mark_Smith · March 26, 2024, 10:09am

Hi @Michael_Murray4,

I’ve done exactly what you’re talking about before, and the reason, as far as I understand it, is that the data structures you’ve created there are not JSON - they’re Python dicts. PyMongo doesn’t support extended JSON in Python dicts.

The whole purpose of Extended JSON is to be able to serialize BSON types to JSON that JSON doesn’t support, and as you’ve mentioned, Python dicts can natively contain ObjectId instances. It can definitely be a bit confusing.

Mark

Mark_Smith · March 26, 2024, 10:12am

One thing that’s worth noting is that if you’re starting off with actual Extended JSON data, PyMongo does support de-serialization from that into Python data structures.