Finding average while updating documents

harris · April 9, 2021, 4:57pm

I am wondering if i can find the average and upload it along with my data.
My code is this:

for file in sorted_files:
    df = process_file(file)

    for row,item in df.iterrows():
        data_dict = item.to_dict()
        mycol1.update_one(
            {"nsamples": {"$lt": 13}},
            {
                "$push": {"samples": data_dict},
                "$min": {"first": data_dict['timestamp1'],"minid13":data_dict['id13']},
                "$max": {"last": data_dict['timestamp1'],'maxid13':data_dict['id13']},
                "$inc": {"nsamples": 1,"totid13":data_dict['id13']}
            },
            upsert=True
        )

My data look like this:

{'_id': ObjectId('6068da8878fa2e568c42c7f1'),
 'first': datetime.datetime(2018, 1, 24, 14, 5),
 'last': datetime.datetime(2018, 1, 24, 15, 5),
 'maxid13': 12.5,
 'minid13': 7.5,
 'nsamples': 13,
 'samples': [{'c14': 'C',
              'id1': 3758.0,
              'id10': 0.0,
              'id11': 274.0,
              'id12': 0.0,
              'id13': 7.5,
              'id15': 0.0,
              'id16': 73.0,
              'id17': 0.0,
              'id18': 0.342,
              'id19': 6.3,
              'id20': 1206.0,
              'id21': 0.0,
              'id22': 0.87,
              'id23': 0.0,
              'id6': 2.0,
              'id7': -79.09,
              'id8': 35.97,
              'id9': 5.8,
              'timestamp1': datetime.datetime(2018, 1, 24, 14, 5),
              'timestamp2': datetime.datetime(2018, 1, 24, 9, 5)},
             {'c14': 'C',
              'id1': 3758.0,
              'id10': 0.0,
              'id11': 288.0,
              'id12': 0.0,
              'id13': 8.4,
              'id15': 0.0,
              'id16': 71.0,
              'id17': 0.0,
              'id18': 0.342,
              'id19': 6.3,
              'id20': 1207.0,
              'id21': 0.0,
              'id22': 0.69,
              'id23': 0.0,
              'id6': 2.0,
              'id7': -79.09,
              'id8': 35.97,
              'id9': 6.2,
              'timestamp1': datetime.datetime(2018, 1, 24, 14, 10),
              'timestamp2': datetime.datetime(2018, 1, 24, 9, 10)},
               .
               .
               .
               .

I use totid13 for that purpose but i if need to to find the average in many document its not very helpful.
I tried something like that:

for file in sorted_files:
    df = process_file(file)
    #df.reset_index(inplace=True)  # Reset Index
    #data_dict = df.to_dict('records')  # Convert to dictionary
    #to row einai o arithmos ths grammhskai to item ti periexei h grammh
    for row,item in df.iterrows():
        data_dict = item.to_dict()
        mycol1.update_one(
            {"nsamples": {"$lt": 13}},
            {
                "$push": {"samples": data_dict},
                "$min": {"first": data_dict['timestamp1'],"minid13":data_dict['id13']},
                "$max": {"last": data_dict['timestamp1'],'maxid13':data_dict['id13']},
                "$avg":{"avg_id13":data_dict['id13']},
                "$inc": {"nsamples": 1,"totid13":data_dict['id13']}
            },
            upsert=True
        )

But the output is:

pymongo.errors.WriteError: Unknown modifier: $avg. Expected a valid update modifier or pipeline-style update specified as an array, full error: {'index': 0, 'code': 9, 'errmsg': 'Unknown modifier: $avg. Expected a valid update modifier or pipeline-style update specified as an array'}

Thanks in advance!

Prasad_Saya · April 9, 2021, 6:23pm

Hello @harris, the statement you are using with the collection.update_one is not valid - this is because there is no such $avg update operator (see the available Update Operators). The error message says that much.

You have to use a pipeline to do such an update operation - see Update with Aggregation Pipeline.

harris · April 9, 2021, 6:51pm

Thank you! I didnt know that for "$avg":{"avg_id13":data_dict['id13']},I have one thing more to ask.Is it fine that i use update_one instead of update_many for finding max and min?I did it like that because i thought it find the max and min for the nsamples less than 13.

Prasad_Saya · April 10, 2021, 5:39am

Yes, it is fine to use update_one. Please note that update_one will always update only one document (or none if there is no match found as per the specified filter). Even if more than one document is matching, only one document will be updated.