Any drawbacks in having _id an object

Is there any drawbacks to use an object as the _id?

For example,

{  "_id" :  {  "major" : 1 ,  "minor" : 0 } } 
{  "_id" :  {  "major" : 1 ,  "minor" : 1 } } 
{  "_id" :  {  "major" : 1 ,  "minor" : 2 } } 
{  "_id" :  {  "major" : 2 ,  "minor" : 0 } } 

The mail goal is to be able to do queries for specific major/minor combination. While at the same time easily find all the minors of a major. I know I can have 2 normal fields with a compound unique index. The problem I have with the latter is that in a delete change stream I have the _id only. So I cannot know the specific major/minor being deleted. Being the _id, I have it.

I found https://stackoverflow.com/questions/7764781/store-id-as-object-or-string-in-mongodb but not convincing enough for me not to do it.

All opinions are welcomed.

Hi @steevej,

You can definitely use an object with _id like you’re showing in your example! That will allow you to take advantage of the required index on _id and provide a unique constraint. However, there are a few things worth noting:

First, querying on either major or minor within _id will not use the index on _id. You’d need an additional index to query on just the major or minor version number.

Second, please consider the following:

> db.foo.insertOne({ _id: { a: 1, b: 1 } })
{ "acknowledged" : true, "insertedId" : { "a" : 1, "b" : 1 } }
> db.foo.find({ _id: { a: 1, b: 1 } })
{ "_id" : { "a" : 1, "b" : 1 } }
> db.foo.find({ _id: { b: 1, a: 1 } })
>

All queries must be an exact match for the index to be useful (or even for a result to appear at all). In the example above, swapping { a: 1, b: 1 } with { b: 1, a: 1 } is the difference between returning a document and returning none even though they are logically equivalent.

What are you querying by the most often? If it’s an exact match or the major version (to retrieve all minor versions), may I suggest using a concatenated string? Try something like this:
{ _id: "1|0" }
{ _id: "1|1" }
{ _id: "1|2" }
{ _id: "2|1" }

This approach has several benefits: (1) you can query by exact match and utilize the index on _id; (2) you can query on major version and list all minor versions and utilize the index on _id, e.g. { _id: /^2|/ } to return all documents with major version 2; and (3) I just like strings.

I hope this helps!

Thanks,

Justin

2 Likes

Very good points.

Thanks.

I will come back later with a real analysis of all the points you covered.

Your application has to make sure that when inserting a document into the collection the object needs to be unique; that is the application handles the duplicate key insertion. This will be more of an application functionality rather than a drawback.

That’s a major drawback.

My goal was to avoid creating an index. 8-(

The above was surprising but would not have been a problem except may be when doing manual query while debugging. I am planning to have all object creation and all queries to be done via an well defined API. But I play a little bit and I found that the following works and it is usually how I do manual queries.

> db.foo.find({ "_id.b" : 1 , "_id.a" : 1 } )
{ "_id" : { "a" : 1, "b" : 1 } }
> db.foo.find({ "_id.a" : 1 , "_id.b" : 1 } )
{ "_id" : { "a" : 1, "b" : 1 } }

I like strings too but wanted to avoid sub-string search or regex.

Most of my use cases are complete matches on major/minor but one is to use the change stream on major to monitor the creating and deletion of minor.

Actually, major/minor was may be over simplifying. It is more of a parent/child thing. Like may be domain names, where major would be like .com, .edu and minor would be subdomain like mongodb or google. File paths are also good example. So _id would be like

{ _id : { parent : com , name : mongodb } }
{ _id : { parent : com , name : google } }
{ _id : { parent : null , name : com } }
{ _id : { parent : mongodb.com : community } }
or
{ _id : { parent : null , name : / } }
{ _id : { parent : / , name : etc } }
{ _id : { parent : /etc , name : systemd } }

I think I could live with

{ _id : mongodb.com }
{ _id : google.com }
{ _id : com }
{ _id : developer.mongodb.com/community/forums }
or
{ _id : / }
{ _id : /etc }
{ _id : /etc/systemd }

Which use less memory anyway. There is also a minor drawback with Compass as by default object are not expanded so when I am looking for a specific document, I need to do an exact search or click on each document to expand the _id. I probably can tolerate a little bit of duplication by having a parent field for the few documents implicated in the use cases where parent is needed, because most will not.

And finally

yes just like I always do.

{
    "conclusion" : "string I will go" ,
    "thanks" : "all"
}