Hi guys.I am doing my thesis for university of ioannina with subject:Performance evaluation of time-series data management across different database systems.More specifically i am benchmarking PostgreSQL vs Mongodb. All this time i was storing each acquired data as a single document but after i read herehttps://levelup.gitconnected.com/time-series-data-in-mongodb-and-python-fccfb6c1a923that i can create a document for bucketing of multiple consecutive data reads.The code the article gives for python is this:
data = {
"pressure":945.65,
"humidity":42.12,
"temperature":28.41,
}
deviceId = 1
minute = datetime.utcnow().replace(second=0, microsecond=0)
db.time_bucket.update_one(
{'deviceId': deviceId, 'd': minute},
{
'$push': {'samples': data},
'$inc': {'nsamples': 1}
},
upsert=True
)
i want to create time based buckets ,specifically for every hour or more if needed.I also read here https://docs.mongodb.com/manual/tutorial/model-time-data/#example about the bucket pattern but i dont know what code to use with python pymongo.
My time-series data contains around 1.5millions rows from 11 files from 2010 to 2020 and look like this:
_id:ObjectId("603fb0b7142a0cbb439ae2e1")
id1:3758
id6:2
id7:-79.09
id8:35.97
id9:5.5
id10:0
id11:-99999
id12:0
id13:-9999
c14:"U"
id15:0
id16:99
id17:0
id18:-99
id19:-9999
id20:33
id21:0
id22:-99
id23:0
timestamp1:2010-01-01T00:05:00.000+00:00
timestamp2:2009-12-31T19:05:00.000+00:00
All the attributes change every 5 minute expect the id1 which remains the same in every document.
The code i have used for importing the data after converting to df into the mongodb table is this:
files = os.listdir('sampl/')
sorted_files = sorted(files)
myclient = MongoClient("mongodb://localhost:27017/")
mydb1 = myclient["mongodbtime"]
mycol1 = mydb1["mongodbindextimestamp1"]
for file in sorted_files:
df = process_file(file)
data_dict = df.to_dict('records') # Convert to dictionary
mycol1.insert_many(data_dict)
Any help would be appreciated!Thanks in advance!