Add field on document, compared to the rest of documents

I’ve been thinking about this example:

{_id:"doc1", score:29, length:7, time:3},
{_id:"doc2", score:19, length:11, time:13}
{_id:"doc3", score:3, length:7, time:13}
...

Is there any way I could set up a query to retrieve, on each document, a field like this Xproperty:"max value"?

{_id:"doc1", score:29, length:7, time:3, scoreProp:"maxValue"},
{_id:"doc2", score:19, length:11, lengthProp:"maxValue", time:13}
{_id:"doc3", score:3, length:7, time:13}

It requires a scan over the array of documents, so I guess it’s only using $group we can start. Maybe push out the _id or _ids. But, what then?

I tried using $switch but it does not work as plainJS (breaks out by default after a match.)

Tried this approach too, which doesn’t look nice

1 Like

I think its better to do in your client side language, or You can use the $function operator to define custom functions to implement behavior not supported by the MongoDB Query Language. See also $accumulator Start from MongoDB v4.4.

You can change JS code inside $function as per your understanding,

Stage 1: Group all in a array
Stage 2:

{
    $project: {
        allDocs: {
            $function: {
                body: function(allDocs) {
                    var length, score, time;
                    for (var i=0; i<allDocs.length; i++) {
                        if (!length || allDocs[i].length > length.length) length = allDocs[i];
                        if (!score || allDocs[i].score > score.score) score = allDocs[i];
                        if (!time || allDocs[i].time > time.time) time = allDocs[i];
                    }
                    for (var i=0; i<allDocs.length; i++) {
                        if (allDocs[i]._id == length._id) allDocs[i].lengthProp = "maxValue";
                        if (allDocs[i]._id == score._id) allDocs[i].scoreProp = "maxValue";
                        if (allDocs[i]._id == time._id) allDocs[i].timeProp = "maxValue";
                    }
                    return allDocs;
                },
                args: ["$allDocs"],
                lang: "js"
            }
        }
    }
},

Stage 3: unwind array
Stage 4: replace object to root

You can skip 3rd and 4th stages, query will return only single document and we know the field name is allDocs, can directly access like queryResult[0]['allDocs'].

1 Like

Very interesting, never tried function yet.
Thanks!

I would like to add two comments despite the thread being solved.

First, personally, I would reverse the added field name and value. Rather than scoreProp:"maxValue" I would have created maxField:"score". My reasoning is that documents will have a consistent schema. This helps since you can create an index of this new field and simply queries. Having a polymorphic schema is not a problem (and is something I like very much about MongoDB) but when there is an alternative I prefer the consistent version.

My second comment is only relevant if the transformation is a one time thing to update a collection because a new requirement has popped up. I would do the transformation in multiple passes simply because I find it is simpler to implement.

  1. add a field maxValue : the value of score field and another field maxField : "score".
  2. for all other fields modify maxValue and maxField for documents where the field to check is higher than maxValue

So if we start with

{_id:"doc1", score:29, length:7, time:3},
{_id:"doc2", score:19, length:11, time:13}
{_id:"doc3", score:3, length:7, time:13}

In pass 1 all documents are modified to become

{_id:"doc1", score:29, length:7, time:3, max : { v:29, k:"score" } },
{_id:"doc2", score:19, length:11, time:13, max: { v:19, k:"score"} },
{_id:"doc3", score:3, length:7, time:13, max: { v:3, k:"score"} }

In pass 2 we only modify documents where length > maxValue to get

{_id:"doc1", score:29, length:7, time:3, max : { v:29, k:"score"},
{_id:"doc2", score:19, length:11, time:13, max : { v:19, k:"score"}
{_id:"doc3", score:3, length:7, time:13, max : { v:7, k:"length"}

Since we have 3 source fields we have 3 passes and the third pass will update document where time > max.v to get the final result

{_id:"doc1", score:29, length:7, time:3, max : { v:29, k:"score"}},
{_id:"doc2", score:19, length:11, time:13, max : { v:19, k:"score"}},
{_id:"doc3", score:3, length:7, time:13, max : { v:13, k:"time"}}

There are many drawbacks with this approach. One of them is that some documents are modified multiple time. Another one is that all documents have the new schema after the first pass but they contain invalid data until all passes are completed.

To the keen observer that saw the switch from maxValue:number and maxField:string to max:{v:number,k:string}, I realized while writing that if I have max as an object I could $project : { max : 1 } to get both the maximum value and the key of the field that has the maximum value.

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.