So I’ve a use case where I don’t know what data is coming and in which form. The user might be sending a simple JSON object or a n level deep nested JSON object.I need to store it in such a way that I can run complex aggregations on it.
To solve that I made the incoming JSON into an array of objects with each object containing
‘key’ as one key,prev as parent of that key and value of the key as ‘value’
Example -
{
"_id" : ObjectId("5e71a1f3081c4b70cdbc438f"),
"DataSetID" : ObjectId("5e71a1f3081c4b70cdbc438e"),
"row" : [
{
"key" : "Region",
"prev" : "root",
"value" : "Australia and Oceania",
"typeOfValue" : "string",
"currentDepth" : 1
},
{
"key" : "Country",
"prev" : "root",
"value" : "Tuvalu",
"typeOfValue" : "string",
"currentDepth" : 1
},
{
"key" : "Item Type",
"prev" : "root",
"value" : "Baby Food",
"typeOfValue" : "string",
"currentDepth" : 1
},
{
"key" : "Sales Channel",
"prev" : "root",
"value" : "Offline",
"typeOfValue" : "string",
"currentDepth" : 1
},
{
"key" : "Order Priority",
"prev" : "root",
"value" : "H",
"typeOfValue" : "string",
"currentDepth" : 1
},
{
"key" : "Order Date",
"prev" : "root",
"value" : ISODate("2010-05-27T18:30:00.000Z"),
"typeOfValue" : "date",
"currentDepth" : 1
},
{
"key" : "Order ID",
"prev" : "root",
"value" : 669165933,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Ship Date",
"prev" : "root",
"value" : ISODate("2010-06-26T18:30:00.000Z"),
"typeOfValue" : "date",
"currentDepth" : 1
},
{
"key" : "Units Sold",
"prev" : "root",
"value" : 9925,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Unit Price",
"prev" : "root",
"value" : 255.28,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Unit Cost",
"prev" : "root",
"value" : 159.42,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Total Revenue",
"prev" : "root",
"value" : 2533654,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Total Cost",
"prev" : "root",
"value" : 1582243.5,
"typeOfValue" : "number",
"currentDepth" : 1
},
{
"key" : "Total Profit",
"prev" : "root",
"value" : 951410.5,
"typeOfValue" : "number",
"currentDepth" : 1
}
]
}
Now the aggregations work for any type of data using $reduce but it takes up a lot of time. 40 sec to run aggregation on a million records.
So
- Ignoring the Array of JSON objects , do we have any other way to store this data and aggregate it easily ?
- If this is the only way that how to increase performance ?
I’ve added indexes on key and prev too.