Time-bucket document on time-series

Hi guys.I am doing my thesis for university of ioannina with subject:Performance evaluation of time-series data management across different database systems.More specifically i am benchmarking PostgreSQL vs Mongodb. All this time i was storing each acquired data as a single document but after i read herehttps://levelup.gitconnected.com/time-series-data-in-mongodb-and-python-fccfb6c1a923that i can create a document for bucketing of multiple consecutive data reads.The code the article gives for python is this:

data = {
  "pressure":945.65,
  "humidity":42.12,
  "temperature":28.41,
}
deviceId = 1
minute = datetime.utcnow().replace(second=0, microsecond=0)
    
db.time_bucket.update_one(
   {'deviceId': deviceId, 'd': minute},
   {
    '$push': {'samples': data},
    '$inc': {'nsamples': 1}
  },
  upsert=True
)

i want to create time based buckets ,specifically for every hour or more if needed.I also read here https://docs.mongodb.com/manual/tutorial/model-time-data/#example about the bucket pattern but i dont know what code to use with python pymongo.
My time-series data contains around 1.5millions rows from 11 files from 2010 to 2020 and look like this:

  _id:ObjectId("603fb0b7142a0cbb439ae2e1")
    id1:3758
    id6:2
    id7:-79.09
    id8:35.97
    id9:5.5
    id10:0
    id11:-99999
    id12:0
    id13:-9999
    c14:"U"
    id15:0
    id16:99
    id17:0
    id18:-99
    id19:-9999
    id20:33
    id21:0
    id22:-99
    id23:0
    timestamp1:2010-01-01T00:05:00.000+00:00
    timestamp2:2009-12-31T19:05:00.000+00:00

All the attributes change every 5 minute expect the id1 which remains the same in every document.
The code i have used for importing the data after converting to df into the mongodb table is this:

files =  os.listdir('sampl/')
sorted_files =  sorted(files)

myclient = MongoClient("mongodb://localhost:27017/")
mydb1 = myclient["mongodbtime"]
mycol1 = mydb1["mongodbindextimestamp1"]

for file in sorted_files:
    df = process_file(file)
    data_dict = df.to_dict('records')  # Convert to dictionary
    mycol1.insert_many(data_dict)

Any help would be appreciated!Thanks in advance!

Here is the answer on how to insert data with bucket pattern in mongodb:

for file in sorted_files:
    df = process_file(file)
    for row,item in df.iterrows():
        data_dict = item.to_dict()
        id1=3758
        mycol1.update_many(
            {"id1":id1,"nsamples": {"$lt": 12}},
            {
                "$push": {"id24": data_dict},
                "$min": {"first": data_dict['timestamp1']},
                "$max": {"last": data_dict['timestamp1']},
                # "$min":{"minid13":data_dict['id13']},
                # "$max": {"maxid13": data_dict['id13']},
                "$inc": {"nsamples": 1}
            },
            upsert=True
        )
1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.