Sub-document to new document along with existing main document

Hi,
I need solution for these case, help me out.
sub-document should be converted to new document along with main existing document.

Example:

{
  _id: "user_id",
  first: "first name",
  last: "last name",
  addresses: [
    {city: "New York", state: "NY", zip: "10036", address: "229 W 43rd"},
    {city: "Palo Alto", state: "CA", zip: "94301", address: "100 Forest Ave"}
  ]
}

Output should be like these:

{_id:"user_id",first: "first name", last: "last name"}
{city: "New York", state: "NY", zip: "10036", address: "229 W 43rd"}
{city: "Palo Alto", state: "CA", zip: "94301", address: "100 Forest Ave"}

Please help me out on this.

Hello, @AKASH_SKY! Welcome to the community!
Have a look at this aggregation:

db.your_collection.aggregate([
  {
    $group: {
      _id: null,
      usersDocs: {
        $push: {
          _id: '_id',
          first: '$first',
          last: '$last',
        },
      },
      addrDocs: {
        $push: '$addresses',
      },
    },
  },
  // flatten array in $addressDocs prop
  {
    $set: {
      addrDocs: {
        $reduce: {
          input: '$addrDocs',
          initialValue: [],
          in: {
            $concatArrays: ['$$value', '$$this'],
          },
        },
      },
    },
  },
]);

The above command will provide you with the result like this:

{ 
   usersDocs: [...],
   addrDocs: [...] 
} 

This is, often, a better structure, as you group group all the objects of the same type in one array:

  • one array of users objects
  • and another one array is for addresses objects

Such approach, often, is more practical :wink:

But, of course, you can get array of mixed objects, by adding in the end of that aggregation pipeline, those stages:

{
  $project: {
    mixedDocs: {
      $concatArrays: ['$usersDocs', '$addrDocs'],
    },
  },
},
{
  $unwind: '$mixedDocs',
},
{
  $replaceWith: '$mixedDocs',
},

I am curious, what is the reason, that you want to have an array of mixed objects? :slight_smile:

Hi, @slava I got your point. But, i have a parent child relationship in a same collection. So, to fetch the child objects, i’m using lookup but it returns child objects in an array.

Result like this:

{
    _id: 1,
    name: "Parent product",
    is_child: false,
    is_parent: true,
    children: [
        {
            _id: 1,
            name: "Child 1",
            is_child: true,
            is_parent: false,
            children : []
        },
        {
            _id: 2,
            name: "Child 2",
            is_child: true,
            is_parent: false,
            children : []
        }
    ] 
}

This is my exact structure. Parent & child record should be in different objects.

Output like this:

{
    _id: 1,
    name: "Parent product",
    is_child: false,
    is_parent: true,
},
 {
    _id: 1,
    name: "Child 1",
    is_child: true,
    is_parent: false,
    children : []
},
{
    _id: 2,
    name: "Child 2",
    is_child: true,
    is_parent: false,
    children : []
}

@slava So can you help me out to get the exact structure of output.

Actually, the code I provided above, will provide you the exact result that you want.
You just need to put everything into 1 single aggregation:

db.your_collection.aggregate([
  { $group: { ... } },
  { $set: { ... } },
  { $project: { ... } },
  { $unwind: { ... } },
  { $replaceWith: ... }
]);

Also, consider to restructure your documents like this:

{
   _id,
   parentId,
   name
}

With this structure, every object, parent or child will be in a separate documents in the collection. If parentId is not null, then this document is a child, otherwise it is parent. Also, it will be easy to do a $lookup of children by parentId value :wink:

I have followed the step which you have mentioned
db.your_collection.aggregate([
{ $group: { … } },
{ $set: { … } },
{ $project: { … } },
{ $unwind: { … } },
{ $replaceWith: … }
]);

But, i’m getting error

Provide solution

Seems like your version of MongoDB is not the latest.
$set and $replaceWith stages are available in MongoDB since v4.2.

Try this aggregation:

db.your_collection.aggregate([
  {
    $group: {
      _id: null,
      usersDocs: {
        $push: {
          _id: '$_id',
          first: '$first',
          last: '$last',
        },
      },
      addrDocs: {
        $push: '$addresses',
      },
    },
  },
  // flatten array in $addressDocs prop
  {
    $addFields: {
      addrDocs: {
        $reduce: {
          input: '$addrDocs',
          initialValue: [],
          in: {
            $concatArrays: ['$$value', '$$this'],
          },
        },
      },
    },
  },
  {
    $project: {
      mixedDocs: {
        $concatArrays: ['$usersDocs', '$addrDocs'],
      },
    },
  },
  {
    $unwind: '$mixedDocs',
  },
  {
    $replaceRoot: {
      newRoot: '$mixedDocs',
    },
  },
]);

I’m using MongoDB version v4.0.6.
I’ve tried $addFields & $replaceRoot, it’s working mean while i got the solution what i’ve expected.
Thank you so much @slava
The query may cause any performance related issue, if have lakhs of records in collection ?

Well, with this aggregation you process every single document in your collection. That means, each new inserted document will increase the time, needed for this aggregation to execute. You can remove those two stages:

{
  $unwind: '$mixedDocs',
},
{
  $replaceRoot: {
    newRoot: '$mixedDocs',
  },
},

The result will be a bit different, but the aggregation will be a bit faster.
The aggregation is already well-optimised. But, sooner or later, it will become slow.

In your situation it is better to rethink the structure of you document. Looks like you have a tree structure.
I think, in your case, you should consider using Tree structures with parent refs.

Thank you @slava for your kind information regarding performance related. Once again thank u.

1 Like

Hello @AKASH_SKY

concerning aggregation and performance there are some rules of thumb, I try to compile a list here. This might not be complete, but hopefully a good starter:

Performance and indexes

Basically, you want to ensure that your aggregation queries are able to use indexes as much as possible.
Important to know: data moves through your pipeline from the first operator to the last, once the server encounters a stage that is not able to use indexes, all of the following stages will no longer be able to use indexes either.
In order to determine how aggregation queries are executed and whether or not indexes are being utilized, you can pass ,{ explain: true} as an option to the aggregation method. This will produce an explain output with lot of prepossessing details.

The $match operator is able to utilize indexes. Operators that use indexes must be at the front of your pipelines. Similarly, you want to put $sort stages as close to the front as possible. Performance can be degraded when sorting isn’t able to use an index. For this reason, make sure that your sort stages come before any kind of transformations so that you can make sure that indexes are used for sorting.

If you’re doing a $limit and doing a $sort, make sure that they’re near each other and at the front of the pipeline. Then, the server is able to do a top-k sort. This is when the server is able to only allocate memory for the final number of documents. This does not need indexes!

Performance and Memory(consumption)

Your results are all subject to the 16 megabyte document limit that exist in MongoDB. Aggregation generally outputs a single document, and that single document will underlie this limit. This limit does not apply to documents as they flow through the pipeline. The best way to mitigate this issue is by using $limit and $project to reduce your resulting document size.

Another limitation is that for each stage in your pipeline, there’s a 100 megabyte limit of RAM usage. The best way to mitigate this is to ensure that your largest stages are able to utilize indexes. If you’re still running into this 100 megabyte limit even if you’re using indexes, then there’s an additional way to get around it. And that is by specifying ,{ allowDiskUse: true} on your aggregation query. This will allow you to spill to disk, rather than doing everything in memory. A word of warning : this is a absolute last resort measure. Hard drives are dramatically slower than memory, so by splitting to disk, you’re going to see serious performance degradation.

Cheers,
Michael

3 Likes