Watch keynotes and sessions from MongoDB.live, our virtual developer conference.

How to delete old documents only if there's a more recent document with the same value

Quite an interesting case. I have an enormous MongoDB collection with lots of documents. These are two of the fields ( I changed the field names).

{
"pidNumber" : NumberLong(12103957251), 
"eventDate" : ISODate("2018-05-15T00:00:00.000+0000")
} 

I need to count all the instances where the date is older than 1 year but ONLY if there’s a more recent document with the same pidNumber.

So for example: If there’s only one document with pidNumber 1234 and it’s from three years ago - keep it (don’t count). But if on top of that there’s another document with pidNumber 1234 and it’s from two years ago - then count the three years old one.

Is it possible to do? Does anyone have on how to do it?

Thanks ahead!

Hello, @Jack_Smith! Welcome to the community!

If I understood you correctly, you need to count all the ‘pidNumber’ in the collection, that have two or more dates, older than specified date, right?

If so, this aggregation will provide you the desired result:

db.your_collection.aggregate([
  {
    $group: {
      _id: '$pidNumber',
      dates: {
        $push: '$eventDate',
      },
    },
  },
  {
    $addFields: {
      totalOldDates: {
        $filter: {
          input: '$dates',
          cond: {
            // replace <oneYearAgoDate> with your date value
            $gte: ['$$this', <oneYearAgoDate>], 
          },
        },
      },
    },
  },
  {
    $project: {
      hasEnoughOldDates: {
        $gt: [{ $size: '$totalOldDates' }, 1],
      },
    },
  },
  {
    $match: {
      hasEnoughOldDates: true,
    },
  },
  {
    $count: 'total',
  },
]);