Watch keynotes and sessions from MongoDB.live, our virtual developer conference.

Merging two arrays into one and calculate the difference of each value

I am trying to manipulate a dataset to make it easy to display in mongoCharts. There are two sets of scores firstscores and last scores, Each contains a set of score names and score values. I want to be able to calculate the difference between first and last values for each of A, ,B and C.

Example input…

{
    "firstscores" : [ 
        {
            "content" : {
                "scores" : [ 
                   
                    {
                        "score_name" : "A",
                        "score_value" : 1
                    }, 
                    {
                        "score_name" : "B",
                        "score_value" : 2
                    }, 
                    {
                        "score_name" : "C",
                        "score_value" : 3
                    }
                ]
            }
        }
    ],
 "lastestscores" : [ 
        { 
            "content" : {
                "scores" : [ 
                    {
                        "score_name" : "A",
                        "score_value" : 9
                    }, 
                    {
                        "score_name" : "B",
                        "score_value" : 8
                    }, 
                    {
                        "score_name" : "C",
                        "score_value" : 7
                    }
                ]
            }
        }
    ]
}

Desired Output

"allscores" : [{
    "content" : {
        "scores" : [ 
            {
                "score_name" : "A",
                "first_score_value" : 1,
                "last_score_value" : 9
                "diff": 8
            }, 
            {
                "score_name" : "B",
                "first_score_value" : 2,
                "last_score_value" : 8
                "diff": 6
                }, 
            {
                "score_name" : "C",
                "first_score_value" : 3,
                "last_score_value" : 7
                "diff": 4
            }
        ]
    }
}]

Note

                "first_score_value" : 1,
                "last_score_value" : 9

are optional/nice to haves. Its the diff that I’m really after.

So far I've tried adding the latest score to the first score array , 
  {$addFields: {
   'firstscores.content.scores.lastscore_name.0': { $arrayElemAt: ['$lastscores.content.scores.score_name',0]}
   }}

gives all the latestscore values

0:Object
score_name:"A"
score_value:1
lastscore_name:Object
    0:"A"
    1:"B"
    2:"C"

Also tried a combination of zips

  {$addFields: {  
  both: {$zip:{inputs: ['$firstscores.content.scores','$firstscores.content.scores.score_value','$lastscores.content.scores.score_value']}}
}}
, {$addFields: {
botha:  {$arrayElemAt: ['$both',0]},
}}, 
{$addFields: {
      firstandlastscores: {$zip:{inputs: [{ $arrayElemAt: ['$botha',0]},{ $arrayElemAt: ['$botha',1]},{ $arrayElemAt: ['$botha',2]}]}}
    }}]

but lost the attribute names,

     firstandlastscores:Array
    0:Array
         0:Object
             score_name:"A"
             score_value:1
         1:1
         2:9
     1:Array
         0:Object
             score_name:"B"
             score_value_string:2
         1:2
         2:8
     2:Array
         0:Object
             score_name:"C"
             score_value_string:3
         1:3
         2:7

mergeObjects overwrote the fields of the same name .

I think map reduce may be the way to go but I have not managed to get any where near with that option.

Any guidance or pointers gratefully received.

Hello, @Neil_Albiston1 !
‘allscores’, ‘firstscores’ and ‘lastestscores’ arrays always hold one object?

Also, please:

  • refine the description of your topic, so code examples were embedded as code, not quotes.
  • make sure that you provided proper examples - there is inconsistency in the type of value for score_value prop.
  • format your code examples, so it would be easier to read them.

Yes, ‘allscores’, ‘firstscores’ and ‘lastestscores’ arrays always hold one object.

I’ve reformatted as you suggested and fixed the value inconsistency.

Thank you

This aggregation should be a perfect fit for your situation.

db.test1.aggregate([
  {
    $project: {
      merged_scores: {
        $concatArrays: [
          // order or concatenated arrays is important here
          // makes sense only if array always contain 1 single object
          {
            $arrayElemAt: ['$firstscores.content.scores', 0],
          },
          {
            $arrayElemAt: ['$lastestscores.content.scores', 0],
          },
        ],
      },
    },
  },
  {
    $unwind: '$merged_scores',
  },
  {
    $group: {
      _id: '$merged_scores.score_name',
      first_score_value: {
        $first: '$merged_scores.score_value',
      },
      last_score_value: {
        $last: '$merged_scores.score_value',
      },
    }
  },
  {
    $project: {
      _id: false,
      first_score_value: true,
      last_score_value: true,
      score_name: '$_id',
      diff: {
        // added this conversion, for the case
        // numeric values are stored as strings
        { $toDouble: '$last_score_value' },
        { $toDouble: '$first_score_value' }
      }
    }
  },
  // all the calculations are done here,
  // later stages are purposed to re-structure output document
  {
    // collect all score objects into one array
    $group: {
      _id: null,
      scores: {
        $push: '$$CURRENT',
      },
    },
  },
  {
    // remove unneeded prop
    $unset: ['_id'],
  },
  {
    // add additional props to the document structure
    $project: {
      _id: false,
      allscores: [
        {
          content: {
            scores: '$$CURRENT',
          }
        }
      ]
    }
  }
]);

If your array always contains one value, why not convert it to an object? I can reduce some overhead in your aggregations. For example, $arrayElemAt would not be needed in the current aggregation, if you had an object instead of array.

Looks promising . Should the first part of the pipeline be an addFields ?

You can use $addFields instead of $projection only if you need to output initial data as well.
But, that will also require modifications to other stages in the pipeline.

It did take me some time to knit that solution into the existing code. I had simplified the input and output,
…but …

The aggregate pipeline output is perfect. (… and you’ve saved me the task of understanding map reduce. )

Thank you

Hi @Neil_Albiston1, @slava,

Here is just another way of getting the desired output:

db.collection.aggregate([
  { 
      $addFields: { 
          firstscores: { $arrayElemAt: [ "$firstscores", 0 ] },
          lastestscores: { $arrayElemAt: [ "$lastestscores", 0 ] }
      }   
  },
  { 
      $project: { 
          content: { 
              $map: {
                  input: "$firstscores.content.scores", as: "f",
                  in: {
                      $let: {
                          vars: { varin: { 
                                    $arrayElemAt: [ { $filter: {
                                                         input: "$lastestscores.content.scores", as: "n", 
                                                         cond: { $eq: [ "$$n.score_name", "$$f.score_name" ] }
                                    } }, 0 ] 
                          } },
                          in: { 
                              score_name: "$$f.score_name",
                              first_score_value: "$$f.score_value",
                              last_score_value: "$$varin.score_value",  
                              diff: { $subtract: [ "$$varin.score_value", "$$f.score_value" ] } 
                          }
                      }
                  }
              }
          }
      }
  },
  { 
      $project: { allscores: [ "$content" ] } 
  }
]).pretty()

Nice solution. I really must get the hang of map. I could not work out how to pass two arrays into the function. … Now I know how.
Thanks.

MapReduce is not a good fit for this situation, because it should be used in situations, where you need to multiple emits per same key in order to do calculations in reduce stage and when it is hard to achieve the result with aggregation pipeline.

Moreover, mapReduce is less performant, that aggregation pipeline. To get the same result for 1 document from your example it took about 0.5 seconds. This is mainly because it ran javascript code to calculate the results and because of the output to another collection. And to get that result to your application code you will need to make additional call to fetch the result from db. So, I do not recommend you to use it.

Here, I did this mapReduce version of the solution to illustrate the differences in both approaches:

// 1. define map-function
function mapFn() {
  const first = this.firstscores[0].content.scores;
  const last = this.lastestscores[0].content.scores;

  const result = first.map(firstScore => {
    const lastScore = last.find(item => {
      return item.score_name === firstScore.score_name;
    });

    const diff =  lastScore.score_value ? 
                  lastScore.score_value - firstScore.score_value : 
                  firstScore.score_value;

    // lastScore may be undefined, that is why there are 
    // some 'N/A' placeholders used down below
    return {
      score_name: firstScore.score_name,
      first_score_value: firstScore.score_value || 'N/A',
      last_score_value: lastScore.score_value || 'N/A',
      diff,
    };
  });

  // format the new document structure:
  const allScores = [];
  allScores.push({
    content: {
      scores: result,
    },
  });

  emit(this._id, { allScores });
}

// 2. define reduce-function
// Since we do not need to group any documents,
// and to provide the result per each document,
// we do not need this function, 
// that is why we can leave it without any logic
// but we need it for the .mapReduce fuction
// as it is a required parameter
function reduceFn(key, values) {
  /* no logic */
}

// 3. Call .mapReduce method on the source collection
db.test1.mapReduce(mapFn, reduceFn, {
  out: 'test2',
});
// the result will be written to 'test2' collection

Additionally, the output will be a bit different, as you can not exclude ‘_id’ prop (because the document will be written to a collection) and the mapReduce result will be always stored in ‘value’ prop - so the output document structure will always be:

{ _id: <id>, value: <mapReduceResult> }

Output sample:

[
  {
    "_id" : ObjectId("5f021ff1c87e385fe0c9bcb0"),
    "value" : {
      "allScores" : [
        {
          "content" : {
            "scores" : [
              {
                "score_name" : "A",
                "first_score_value" : 1,
                "last_score_value" : 9,
                "diff" : 8
              },
              {
                "score_name" : "B",
                "first_score_value" : 2,
                "last_score_value" : 8,
                "diff" : 6
              },
              {
                "score_name" : "C",
                "first_score_value" : 3,
                "last_score_value" : 7,
                "diff" : 4
              }
            ]
          }
        }
      ]
    }
  }
]
1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.