Overlapping dateranges

I need to calculate a resource utilization ratio on a per-minute basis. The data has overlapping date ranges. I have to remove overlapping minutes to get the right ratio. Has anybody clue how to solve this on the aggregation framework? Is there any other function/technique to solve this on MongoDB?

Below is an example of how I can pull data from the original source.

{
    "resource": "room123",
    "reservations": [
        {"startTime": "2020-11-28 08:00:00.000Z", "endTime": "2020-11-28 09:15:00.000Z"},
        {"startTime": "2020-11-28 09:00:00.000Z", "endTime": "2020-11-28 12:15:00.000Z"},
        {"startTime": "2020-11-28 13:00:00.000Z", "endTime": "2020-11-28 15:00:00.000Z"},
        {"startTime": "2020-11-28 14:30:00.000Z", "endTime": "2020-11-28 15:45:00.000Z"}
    ]
}

Hi @Pasi,

Welcome to MongoDB community!

You should be able to do it with $map and on “in” expression use a $let to calculate if previous endTime is higher than current startTime and replace the startTime.

Please note that if the reservations array is not guaranteed to be sorted by startDate you will need to unwind /sort / group back.

In 4.4 you can use $function to perform this task in a more convenient way

Having said that, it feels like if the arrays are not massive doing this operation on the client side where you have more flexibility with code, just run from start of the array to end and calculate the overlap… You can use aggregation to make sure array is properly sorted for easier code.

Best
Pavel

This is an interesting problem but let me check if there are enough details here. What does it mean to remove overlapping ranges? For this room123 is it that you just need to know how many hours/minutes it was “reserved” and in this specific example even though the reservations are for 75, 195, 120 and 75 minutes because of overlaps, there’s a total 420 minutes used and not 465?

And another question - will this array span a single day or multiple days?

I’m going to make an assumption that each array represents one day (though it’s not necessary) and that you want total minutes of the day the room was reserved. I’ve got some made up sample data in addition to your example document that I used to test the code. I defined a couple of functions to make the pipeline a little more readable - these definitions are in JS because I’m using the mongo shell but of course it can be done in any language, since the functions just generate aggregation expressions.

The definition for sortArray function definition is in github

diffInMinutes = function(t1, t2) {
     return {$divide: [ {$subtract:[{$toDate:t2},{$toDate:t1}]}, 1000*60 ]}; 
}

// sample data
db.rooms.find()
{ "_id" : ObjectId("5fc7cb106214f82648ddefe9"), "resource" : "room123", "reservations" : [ { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 09:15:00.000Z" }, { "startTime" : "2020-11-28 09:00:00.000Z", "endTime" : "2020-11-28 12:15:00.000Z" }, { "startTime" : "2020-11-28 13:00:00.000Z", "endTime" : "2020-11-28 15:00:00.000Z" }, { "startTime" : "2020-11-28 14:30:00.000Z", "endTime" : "2020-11-28 15:45:00.000Z" } ] }
{ "_id" : ObjectId("5fc7dc476214f82648ddefea"), "resource" : "room555", "reservations" : [ { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 17:15:00.000Z" }, { "startTime" : "2020-11-28 09:00:00.000Z", "endTime" : "2020-11-28 12:15:00.000Z" }, { "startTime" : "2020-11-28 13:00:00.000Z", "endTime" : "2020-11-28 14:00:00.000Z" } ] }
{ "_id" : ObjectId("5fc7dd2f6214f82648ddefeb"), "resource" : "room99", "reservations" : [ { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 12:25:00.000Z" }, { "startTime" : "2020-11-28 09:00:00.000Z", "endTime" : "2020-11-28 12:15:00.000Z" }, { "startTime" : "2020-11-28 15:00:00.000Z", "endTime" : "2020-11-28 16:00:00.000Z" } ] }
{ "_id" : ObjectId("5fc7dd5a6214f82648ddefec"), "resource" : "roomA", "reservations" : [ { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 09:25:00.000Z" }, { "startTime" : "2020-11-28 11:00:00.000Z", "endTime" : "2020-11-28 12:15:00.000Z" }, { "startTime" : "2020-11-28 15:30:00.000Z", "endTime" : "2020-11-28 16:00:00.000Z" } ] }
{ "_id" : ObjectId("5fc7e58c6214f82648ddefed"), "resource" : "roomB", "reservations" : [ { "startTime" : "2020-11-28 09:00:00.000Z", "endTime" : "2020-11-28 09:25:00.000Z" }, { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 08:15:00.000Z" } ] }
{ "_id" : ObjectId("5fc7e62e6214f82648ddefee"), "resource" : "roomC", "reservations" : [ { "startTime" : "2020-11-28 08:45:00.000Z", "endTime" : "2020-11-28 09:20:00.000Z" }, { "startTime" : "2020-11-28 09:00:00.000Z", "endTime" : "2020-11-28 09:25:00.000Z" }, { "startTime" : "2020-11-28 08:00:00.000Z", "endTime" : "2020-11-28 08:15:00.000Z" } ] }

db.rooms.aggregate([
    {$set:{reservations: sortArray("$reservations", "startTime", true)}},  
    {$set:{reservations: {$reduce:{
        input:{$slice:["$reservations",1,{$size:"$reservations"}]},
        initialValue:{i:0, r:{$slice:["$reservations",0,1]}},
        in:{$cond:{
            if:{$gt:["$$this.startTime",{$arrayElemAt:["$$value.r.endTime","$$value.i"]}]}, 
            then:{i:{$add:[1,"$$value.i"]}, r:{$concatArrays:["$$value.r", ["$$this"]]}}, 
            else: {i:"$$value.i", r:{$concatArrays:[
                           {$cond:{if:{$eq:["$$value.i",0]},then:[],else:{$slice:["$$value.r",0,"$$value.i"]}}}, 
                           [{startTime:{$arrayElemAt:["$$value.r.startTime","$$value.i"]} , endTime:{$max:["$$this.endTime", {$arrayElemAt:["$$value.r.endTime","$$value.i"]}]}}]
            ]}}
        }}
    }}}}, 
    {$project:{_id:0, resource:1, diffSum:{$sum:{$map:{input:"$reservations.r", in: diffInMinutes("$$this.startTime", "$$this.endTime") }}}}})
])
{ "resource" : "room123", "diffSum" : 420 }
{ "resource" : "room555", "diffSum" : 555 }
{ "resource" : "room99", "diffSum" : 325 }
{ "resource" : "roomA", "diffSum" : 190 }
{ "resource" : "roomB", "diffSum" : 40 }
{ "resource" : "roomC", "diffSum" : 55 }

To state what this is doing in English - first it makes sure reservations array is sorted by startTime, then it iterates over reservations, comparing starting time to previous ending time and constructing an array of non-overlapping ranges. The last stage just sums up the differences between starting and ending times in minutes and formats the output fields.

I’m using 4.4 so I took advantage of a couple of new expressions ($first and $last) and rewrote the middle stage to be a little bit shorter:

db.rooms.aggregate([
    {$set:{reservations: sortArray("$reservations", "startTime", true)}}, 
    {$set:{reservations: {$reduce:{
        input:{$slice:["$reservations",1,{$size:"$reservations"}]},
        initialValue:[{$first:"$reservations"}],
        in:{$cond:{
            if:{$gt:["$$this.startTime",{$last:"$$value.endTime"}]}, 
            then:{$concatArrays:["$$value", ["$$this"]]}, 
            else: {$concatArrays:[
                           {$cond:{if:{$eq:[{$size:"$$value"},1]},then:[],else:{$slice:["$$value",0,{$add:[-1,{$size:"$$value"}]}]}}}, 
                           [{startTime:{$last:"$$value.startTime"} , endTime:{$max:["$$this.endTime", {$last:"$$value.endTime"}]}}]
            ]} 
        }}
    }}}}, 
    {$project:{_id:0, resource:1, diffSum:{$sum:{$map:{input:"$reservations", in: diffInMinutes("$$this.startTime", "$$this.endTime") }}}}}
])
3 Likes

Since people are still finding this answer I thought I’d mention that starting in version 5.2 there is a $sortArray expression in the aggregation negating the need to use a more complex function I’m using in this answer.

Asya