MongoDB.live, free & fully virtual. June 9th - 10th. Register Now MongoDB.live, free & fully virtual. June 9th - 10th. Register Now

how to store data when you are not sure about the schema

So I’ve a use case where I don’t know what data is coming and in which form. The user might be sending a simple JSON object or a n level deep nested JSON object.I need to store it in such a way that I can run complex aggregations on it.
To solve that I made the incoming JSON into an array of objects with each object containing
‘key’ as one key,prev as parent of that key and value of the key as ‘value’
Example -

{
    "_id" : ObjectId("5e71a1f3081c4b70cdbc438f"),
    "DataSetID" : ObjectId("5e71a1f3081c4b70cdbc438e"),
    "row" : [ 
        {
            "key" : "Region",
            "prev" : "root",
            "value" : "Australia and Oceania",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Country",
            "prev" : "root",
            "value" : "Tuvalu",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Item Type",
            "prev" : "root",
            "value" : "Baby Food",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Sales Channel",
            "prev" : "root",
            "value" : "Offline",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order Priority",
            "prev" : "root",
            "value" : "H",
            "typeOfValue" : "string",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order Date",
            "prev" : "root",
            "value" : ISODate("2010-05-27T18:30:00.000Z"),
            "typeOfValue" : "date",
            "currentDepth" : 1
        }, 
        {
            "key" : "Order ID",
            "prev" : "root",
            "value" : 669165933,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Ship Date",
            "prev" : "root",
            "value" : ISODate("2010-06-26T18:30:00.000Z"),
            "typeOfValue" : "date",
            "currentDepth" : 1
        }, 
        {
            "key" : "Units Sold",
            "prev" : "root",
            "value" : 9925,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Unit Price",
            "prev" : "root",
            "value" : 255.28,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Unit Cost",
            "prev" : "root",
            "value" : 159.42,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Revenue",
            "prev" : "root",
            "value" : 2533654,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Cost",
            "prev" : "root",
            "value" : 1582243.5,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }, 
        {
            "key" : "Total Profit",
            "prev" : "root",
            "value" : 951410.5,
            "typeOfValue" : "number",
            "currentDepth" : 1
        }
    ]
}

Now the aggregations work for any type of data using $reduce but it takes up a lot of time. 40 sec to run aggregation on a million records.
So

  1. Ignoring the Array of JSON objects , do we have any other way to store this data and aggregate it easily ?
  2. If this is the only way that how to increase performance ?
    I’ve added indexes on key and prev too.