Hi ,
I’ve a goal to query data from collection which is huge > 2TB, I need to query only the data I need as CSV file to process in Python. What I need to do while query data is
- Grouping and sorting document.
- Select only the certain range and export as CSV.
Right now I can do pipeline like this.
db.collection.aggregate(
[
{ "$limit": 200000 },
{
"$group":
{
"_id":
{
"TID": "$TID",
"Opt": "$Opt",
"DSN1": "$DSN1",
"DSN2": "$DSN2",
"Column": "$Column",
"Row": "$Row",
"CSN": { "$substr": ["$CSN", 2, -6] }
},
"details": {
"$push":
{
"A": "$A",
"B": "$B",
"C": "$C",
"D": "$D",
"timestamp": "$timestamp"
}
}
}
}
],
{
allowDiskUse: true
}
);
Output from this pipeline is like {"_id": { set of “details” according to _id} } as picture
From this point I’m not sure how to sort by “details.timestamp” and select only certain range of “details”. Moreover, how can I break output as 1 “_id”: to 1 “details” , not 1 “_id” to set of “detail” which currently I have? 1 “_id” to 1 “details” is more suitable for my process.
Please kindly advise.
Thanks.