MongoDB Basics aggregation help

I have the following json context and I have requirement to read the complete document and identify the distinct keyword under “conversion_token” group and output how many times keyword is occurred, count, classify in the output results;

input:

{
“select_emp”: {
“specification”: {
“input”: [
“p_empno”
],
“declare_stmt”: {
“anchorvariable”: [
“V_ENAME”,
“V_HIREDATE”,
“V_TITLE”,
“V_REPORTSTO”,
“V_DISP_DATE”,
“V_INS_COUNT”,
“CITY_FROM”
],
“tablename_variable”: [
“EMPLOYEE.V_ENAME”,
“EMPLOYEE.V_HIREDATE”,
“EMPLOYEE.V_TITLE”,
“EMPLOYEE.V_REPORTSTO”,
“EMPLOYEE.V_DISP_DATE”,
“EMPLOYEE.V_INS_COUNT”,
“EMPLOYEE.CITY_FROM”
]
}
},
“body”: {
“select_stmt1”: {
“columns”: [
“FIRSNAME”,
“HIREDATE”,
“TITLE”,
“REPORTSTO”
],
“tablename”: [
“EMPLOYEE”
],
“conversion_token”: [
{
“keyword”: “NVL”,
“count”: 1,
“classify”: 2
}
]
},
“select_stmt2”: {
“columns”: [
“CITY”
],
“tablename”: [
“EMPLOYEE”
],
“conversion_token”: [
{
“keyword”: “DECODE”,
“count”: 1,
“classify”: 3
}
]
},
“dbms_stmt1”: {
“dbms_putline”: [
“P_EMPNO”,
“V_ENAME”,
“V_DISP_DATE”,
“V_REPORTSTO”
],
“conversion_token”: [
{
“keyword”: “DBMS”,
“count”: 1,
“classify”: 2
}
]
},
“forloop1”: {
“select_stmt”: {
“columns”: [
“EMPLOYEEID”,
“ROWID”
],
“tablename”: [
“EMPLOYEE”
],
“conversion_token”: [
{
“keyword”: “DBMS”,
“count”: 1,
“classify”: 2
}
]
}
},
“merge_stmt1”: {
“merge_into”: “EMPLOYEE”,
“merge_using”: {
“columns”: [
“EMPLOYEEID”,
“LASTNAME”,
“TITLE”,
“BIRTHDATE”,
“HIREDATE”,
“ADDRESS”,
“CITY”,
“STATE”,
“COUNTRY”,
“POSTALCODE”,
“PHONE”,
“FAX”,
“EMAIL”,
“BONUS”
],
“tablename”: [
“EMPLOYEE”
]
},
“merge_update”: {
“columns”: [
“BONUS”
],
“tablename”: [
“EMPLOYEE”
]
},
“merge_delete”: {
“columns”: [
“BONUS”
],
“tablename”: [
“EMPLOYEE”
]
},
“merge_insert”: {
“columns”: [
“EMPLOYEEID”,
“LASTNAME”,
“FIRSTNAME”,
“TITLE”,
“BIRTHDATE”,
“HIREDATE”,
“ADDRESS”,
“CITY”,
“STATE”,
“COUNTRY”,
“POSTALCODE”,
“PHONE”,
“FAX”,
“EMAIL”,
“BONUS”
],
“tablename”: [
“EMPLOYEE”
]
},
“conversion_token”: [
{
“keyword”: “Merge”,
“count”: 1,
“classify”: 4
}
]
},
“exception_handling1”: {
“dbms_putline”: [
“P_EMPNO”
],
“conversion_token”: [
{
“keyword”: “DBMS”,
“count”: 1,
“classify”: 2
}
]
}
}
}
}

output: it should aggregate the group “keyword”, count, classify and results the final array;

{
“conversion_token”: [{
“keyword”: “NVL”,
“count”:1,
“classify”:2},
{
“keyword”: “DBMS”,
“count”:6,
“classify”:2},
{
“keyword”: “DECODE”,
“count”:2,
“classify”:3
}
}

& my next step is to loop through the same functionality across all the documents inside one collection and display the above output;

Hi Nishanth,

Is the top level field always called select_emp?

Does each document only contain one top level field such as select_emp which contains the statements or could there be more than one?

Does conversion_token occur at different depths? (for example, it occurs at a different depth for the nested loops statement than the other statements in your sample document)

Ronan

Is the top level field always called select_emp ?
[A] no, it changes for every different document inside the collection…

Does each document only contain one top level field such as select_emp which contains the statements or could there be more than one?
[A] it contains only one top level field

Does conversion_token occur at different depths? (for example, it occurs at a different depth for the nested loops statement than the other statements in your sample document)
[A] yes, it can be possible. based on the construct it formed ‘conversion_token’ can found in different nested ladder.

Hi Nishanth,

Try this:

db.test.aggregate(
    [
        {$project:{_id:0, "arrayofkeyvalue":{"$objectToArray":"$$ROOT"}}},
        {
           $project: {
              item: {
                 $filter: {
                    input: "$arrayofkeyvalue",
                    as: "item",
                    cond: { $ne: [ "$$item.k", "_id" ] }
                 }
              }
           }
        },
        {$project:{ item: {$arrayElemAt: [ "$item", 0 ] }}},
        {$project: { body: "$item.v.body"}},
         {
           $project: {
            conversion_key: {
                $function: {
                    body: function conversion_tokens(object, key) {
                            var values = [];
                            search_keys(object);
                            function search_keys(object) {
                          
                              if (key in object) values.push(object[key][0]);
                          
                              for (var property in object) {
                                if (object.hasOwnProperty(property)) {
                                  if (typeof object[property] == "object") {
                                   search_keys(object[property]);
                                  }
                                }
                              }
                            }
                            return values;
                          } ,
                    args: ["$body", "conversion_token"],
                    lang: "js"
                }
            }
        }
    },
    {$unwind:"$conversion_key"},
    {
        $group: {
        _id: "$conversion_key.keyword",
        count: {$sum:1},
        classify: {$sum:"$conversion_key.classify"}
    }},
    {
        $group: {
            _id: "null",
            conversion_token: {
                $push: {
                     keyword: "$_id",
                     count: "$count",
                     classify: "$classify"
                }
            }

        }
    },
    {
        $project: {
            _id:0,
            conversion_token:1
        }
    }
])

Result:

{
        "conversion_token" : [
                {
                        "keyword" : "Merge",
                        "count" : 1,
                        "classify" : 4
                },
                {
                        "keyword" : "NVL",
                        "count" : 1,
                        "classify" : 2
                },
                {
                        "keyword" : "DECODE",
                        "count" : 1,
                        "classify" : 3
                },
                {
                        "keyword" : "DBMS",
                        "count" : 3,
                        "classify" : 6
                }
        ]
}

$function is new to 4.4.

Let me know if you’re on a different version.

**EDIT: I just wanted to add that while it is possible to query the data in this way, longer term you may want to review your schema design. Having a fixed top-level field name and conversion_token at a fixed depth would allow this query to be greatly simplified. While the above may work, it may not be performant.

Ronan

1 Like