Only one item per group

{
    "_id" : ObjectId("5f6a81a9690a0787557f797e"),
    "attributes" : {
        "first_name" : "John",
        "last_name" : "Doe",
        "email" : "john.doe@example.net",
    },
    "events" : [ 
        {
            "id" : "unique-id-1",
            "event" : "clicked",
            "event_data" : {
                "some-data" : 1
            }
            "created_at" : ISODate("2020-01-01T10:10:10.000Z")
        }, 
        {
            "id" : "unique-id-2",
            "event" : "opened",
            "event_data" : {
                "some-data" : 1
            }
            "created_at" : ISODate("2020-01-01T10:10:10.000Z")
        }, 
        {
            "id" : "unique-id-3",
            "event" : "add_to_cart",
            "event_data" : {
                "some-data" : 1
            },
            "created_at" : ISODate("2020-01-01T10:10:10.000Z")
        }, 
        {
            "id" : "unique-id-4",
            "event" : "add_to_cart",
            "event_data" : {
                "some-data" : 1,
            },
            "created_at" : ISODate("2020-02-01T10:10:10.000Z")
        }, 
    ]
}

This is my query:

db.clients.aggregate([
    { $match: { "_id" : ObjectId("5f6a81a9690a0787557f797e") }},
    { $unwind : "$events" },
    { $project: {
        "_id": "$events.id",
        "event": "$events.event",
        "event_data": "$events.event_data",
        "created_at": "$events.created_at"
    }},
    { $sort: { "_id" : -1 }}
])

And this is where I hit a roadblock. I am able to get events sorted by the created_at. However I’d like to show only last events (somehow group it by event).

In my example I would need t return unique-id-1, unique-id-2, unique-id-4 (because it is created after unique-id-3).

What would be the best approach in terms of performance? Also, at this point I need it to be ‘grouped’ by one event, but maybe in the future I’d like to select last 5 records of each event (last 5 add_to_cart, last 5 opened, etc.)

Thank you.

Hi @jellyx ,

You should be using a $limit stage after you sort by events.created_date : -1 .

For running multiple different groups in one aggregation set I would recommend using $facet

Best
Pavel

Thank you for your answer.

Limit is definitely not an option, but facet is great. Totally forgot about it.

Thanks!

Hi @jellyx,

Not sure why limit is not the way to get last x documents. Perhaps, I didn’t understand the exact outcome you want can you post the desired output?

Best
Pavel

@Pavel_Duchovny

yes, I guess that’s the reason. Here is my explanation.

If I have 100 events. Let’s say 95 od them are opened and 5 are add_to_cart, respectively added.

Now, if I limit to show 2 events, then It means I’ll get last 2 opened events. And I want to get 1 opened, 1 add_to_cart, etc.

Hope it makes sense. But, with facet works great.

@jellyx,

Ok I get now what you have requested. I see how you can do it with a $facet, however, the aggregation framework is very powerful and you can achieve what you need with a standard grouping stages game and $sort and $slice as following (see my comments on each stage):

db.clients.aggregate( [{
 // Match the needed doc
    $match: {
        _id: ObjectId('5f6a81a9690a0787557f797e')
    }
}, 
// Unwind events to go over each event
{
    $unwind: '$events'
}, 
// Sort all events by date
{
    $sort: {
        "events.created_at": -1
    }
}, 
// Group the events by event name and push them to a new array of ordered events by type
{
    $group: {
        _id: "$events.event",
        events: {
            $push: "$$CURRENT"
        }
    }
}, 
// Construct the document back to original shape and slice the created arrays by "n" , eg. first 2 most recent elements
{
    $project: {
        _id: {
            $arrayElemAt: ['$events._id', 0]
        },
        event_type: "$_id",
        attributes: {
            $arrayElemAt: ['$events.attributes', 0]
        },
        events: {
            $slice: ['$events', 0, 2]
        }

    }
},
// reconstruct again to have the original view with events array (could use $addFields as well)
 {
    $project: {
        _id: 1,
        attributes: 1,
        event_type: 1,
        events: "$events.events"
    }
}]);

This aggregation with my test data retrieve 2 most recent events for every unique type:

[
  {
    _id: 5f6a81a9690a0787557f797e,
    event_type: 'add_to_cart',
    attributes: {
      first_name: 'John',
      last_name: 'Doe',
      email: 'john.doe@example.net'
    },
    events: [
      {
        id: 'unique-id-6',
        event: 'add_to_cart',
        event_data: { 'some-data': 1 },
        created_at: 2020-02-01T10:10:10.000Z
      },
      {
        id: 'unique-id-8',
        event: 'add_to_cart',
        event_data: { 'some-data': 1 },
        created_at: 2020-02-01T10:10:10.000Z
      }
    ]
  },
  {
    _id: 5f6a81a9690a0787557f797e,
    event_type: 'opened',
    attributes: {
      first_name: 'John',
      last_name: 'Doe',
      email: 'john.doe@example.net'
    },
    events: [
      {
        id: 'unique-id-2',
        event: 'opened',
        event_data: { 'some-data': 1 },
        created_at: 2020-01-01T10:10:10.000Z
      },
      {
        id: 'unique-id-3',
        event: 'opened',
        event_data: { 'some-data': 1 },
        created_at: 2020-01-01T10:10:10.000Z
      }
    ]
  },
  {
    _id: 5f6a81a9690a0787557f797e,
    event_type: 'clicked',
    attributes: {
      first_name: 'John',
      last_name: 'Doe',
      email: 'john.doe@example.net'
    },
    events: [
      {
        id: 'unique-id-1',
        event: 'clicked',
        event_data: { 'some-data': 1 },
        created_at: 2020-01-01T10:10:10.000Z
      }
    ]
  }
]

Hope that helps!

Best regards,
Pavel

Thanks for the answer. That works too.

Here is how I did it:

db.clients.aggregate([
    {
         $match: { 
             "_id" : ObjectId("5f6a81a9690a0787557f797e") 
        }
    },
    { 
        $unwind : "$events" 
    },
    { 
        $project: {
            "_id": "$events.id",
            "event": "$events.event",
            "event_data": "$events.event_data",
            "created_at": "$events.created_at"
        }
    },
    { 
        $sort: { 
            "created_at" : 1 
        }
    },
    { 
        $facet : {
            "add_to_cart" : [
                {  $match: { "event" : "add_to_cart"  } },
                { $limit: 2 }
            ],
            "delivered" : [
                {  $match: {  "event" : "delivered" } },
                { $limit: 3 } // e.g.
            ]
        }
    }
])

Your option is better because I don’t need to ‘manually’ through backend add events in facet. Thanks.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.