Mongo lookup querying is failed from a collection consists of ~12 lak records

There are 2 collections: Alerts & AlertTypes . The Alerts collection have a field called: alertTypeId which is the lookup/foreign key of the AlertTypes collection.

I need to optimize the following query where I fetch the data from the Alerts collection along with the AlertType Name by joining the corresponding collection.

I used the aggregate function as follows:

db.Alerts.aggregate([{
    "$match": {
        "status": {
            "$ne": -1
        },
        "type": 4
    }
}, {
    "$lookup": {
        "localField": "alertTypeId",
        "from": "AlertTypes",
        "foreignField": "_id",
        "as": "alertTypeRel"
    }
}, {
    "$project": {
        "title": 1,
        "type": 1,
        "alertTypeId": 1,
        "alertTypeRel.alertTypeName": 1,
        "priority": 1,
        "message": 1,
        "status": 1,
        "startDate": 1,
        "createdAt": 1,
        "createdBy": 1,
        "validUntil": 1,
        "errorFlag": 1,
        "extApiId": 1,
        "errorMessage": 1,
        "autoPublish": 1,
        "statusChangedBy": 1
    }
},{
    "$sort": {
        "status": 1,
        "createdAt": -1
    }
}, {
    "$group": {
        "_id": null,
        "count": {
            "$sum": 1
        },
        "results": {
            "$push": "$ROOT"
        }
    }
}, {
    "$project": {
        "total": "$count",
        "_id": 0,
        "results": {
            "$slice": ["$results", 0, 10]
        }
    }
}], {
    "collation": {
        "locale": "en",
        "strength": 2
    },
    "allowDiskUse": true,
    "cursor": {}
}).pretty();

I have indexed the fields as well. for egs:

{
        "v" : 2,
        "key" : {
            "status" : 1,
            "createdAt" : -1
        },
        "name" : "status_1_createdAt_-1"
}

There are 1250543 & 117 records in the Alerts & AlertTypes collections respectively. I tried the facet query as well but it took more than 5 mins to execute. The first query throws the folloring error:

uncaught exception: Error: command failed: {
    "ok" : 0,
    "errmsg" : "$push used too much memory and cannot spill to disk. Memory limit: 104857600 bytes",
    "code" : 146,
    "codeName" : "ExceededMemoryLimit"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:639:17
assert.commandWorked@src/mongo/shell/assert.js:729:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
@(shell):1:1

db.Alerts.explain(“executionStats”) output is included in the following link: Dropbox - Mongo-explain-stats.docx - Simplify your life

Can anyone help me?

Thanks

1 Like

Welcome to the community @Sanjay_Kumar_N_S,

May you please provide samples of your documents?

Have you tried this facet + addFields

   {
      "$facet":{
         "total":[
            {
               "$count":"count"
            }
         ],
         "result":[
            {
               "$limit":10
            }
         ]
      }
   },
   {
      "$addFields":{
         "total":{
            "$first":"$total.count"
         }
      }
   }

instead of

{
    "$group": {
        "_id": null,
        "count": {
            "$sum": 1
        },
        "results": {
            "$push": "$ROOT"
        }
    }
}, {
    "$project": {
        "total": "$count",
        "_id": 0,
        "results": {
            "$slice": ["$results", 0, 10]
        }

I tried the facet as follows:

db.Alerts.aggregate([
  {
    "$match": {
      "status": { "$ne": -1 },
      "type": 4
    }
  },    
  {
    "$sort": {
      "status": 1,
      "createdAt": -1
    }
  },
  {
    $facet: {
      result: [
        { $skip: 0 },
        { $limit: 10 },
        {
          "$lookup": {
           "localField": "alertTypeId",
           "from": "AlertTypes",
           "foreignField": "_id",
           "as": "alertTypeRel"
          }
        },
        {
          "$project": {
            "title": 1,
            "type": 1,
            "alertTypeId": 1,
            "alertTypeRel.alertTypeName": 1,
            "priority": 1,
            "message": 1,
            "status": 1,
            "startDate": 1,
            "createdAt": 1,
            "createdBy": 1,
            "validUntil": 1,
            "errorFlag": 1,
            "extApiId": 1,
            "errorMessage": 1,
            "autoPublish": 1,
            "statusChangedBy": 1
          }
        }
      ],
      count: [{ $count: "total" }]
    }
  } 
], 
{
  "collation": {
    "locale": "en",
    "strength": 2
  },
  "allowDiskUse": true,
  "cursor": {}
})
.pretty();

Sample:

Alerts:

{
	"_id" : ObjectId("598d5746d11eb54eb7da2t50"),
	"title" : "Security alert",
	"alertTypeId" : ObjectId("598a43345d5f673eb152d180"),
	"message" : "Under investigation",
	"priority" : "medium",
	"status" : -1,
	"alertLocations" : [
		{
			"data" : {
				"type" : "Point",
				"coordinates" : [
					51.519812,
					-0.093933
				]
			},
			"radius" : 7000,
			"dataPolygon" : {
				"type" : "Polygon",
				"coordinates" : [
					[
						[
							51.519812,
							-0.031050930111633495
						],
						[
							51.525448705006035,
							-0.03130407448443756
						],
						[
							51.53104002671351,
							-0.032061469436875226
						],
						[
							51.50308305277655,
							-0.03331701688130983
						],
						[
							51.50858397328649,
							-0.032061469436875226
						],
						[
							51.51417529499397,
							-0.03130407448443756
						],
						[
							51.519812,
							-0.031050930111633495
						]
					]
				]
			}
		}
	],
	"createdAt" : ISODate("2017-08-11T08:13:58.869Z"),
	"extraMessage" : "Other theft",
	"startDate" : ISODate("2017-08-11T19:08:00Z"),
	"validUntil" : null,
	"createdBy" : ObjectId("5885b186db6df92d3ada7777"),
	"statusChangedBy" : ObjectId("5885b186db6df92d3ada7777"),
	"type" : 2
}

AlertTypes:

{
	"_id" : ObjectId("598a43345d5f673eb152d180"),
	"alertTypeName" : "Amber Alert",
	"description" : "Amber Alert, keep your eyes open",
	"status" : 1,
	"createdBy" : ObjectId("5885b186db6df92d3ada7777"),
	"createdAt" : ISODate("2017-08-08T23:03:16.657Z"),
	"isSpecialType" : -1
}

does the facet version issues "ExceededMemoryLimit" too?

No facet query executed successfully, but it took more than 5 mins to get the result.

How long does it take if you forget about getting total count and just get the first ten documents and corresponding $lookup?

In other words, how long does this pipeline take?

db.Alerts.aggregate([
  {
    "$match": {
      "status": { "$ne": -1 },
      "type": 4
    }
  },    
  {
    "$sort": {
      "status": 1,
      "createdAt": -1
    }
  },
  { $limit: 10 },
  {
      "$lookup": {
       "localField": "alertTypeId",
       "from": "AlertTypes",
       "foreignField": "_id",
       "as": "alertTypeRel"
      }
  },
  {
      "$project": {
        "title": 1,
        "type": 1,
        "alertTypeId": 1,
        "alertTypeRel.alertTypeName": 1,
        "priority": 1,
        "message": 1,
        "status": 1,
        "startDate": 1,
        "createdAt": 1,
        "createdBy": 1,
        "validUntil": 1,
        "errorFlag": 1,
        "extApiId": 1,
        "errorMessage": 1,
        "autoPublish": 1,
        "statusChangedBy": 1
      }
  } 
], 
{
  "collation": {
    "locale": "en",
    "strength": 2
  },
"cursor": {}
})

This pipeline should be fast if it’s using appropriate indexes - I suspect that maybe you don’t have an appropriate index here? Note that when you specify collation the pipeline can only use indexes with appropriate collations. If you specify explain: true option then you can see whether an index is being used.

Asya

1 Like

In fact if this is the complete index definition then you’re not using it at all as it should have the same collation as your query/aggregation, namely it should look like this:

{
	"v" : 2,
	"key" : {
		"status" : 1,
		"createdAt" : -1
	},
	"name" : "status_1_createdAt_-1",
	"collation" : {
		"locale" : "en",
		"caseLevel" : false,
		"caseFirst" : "off",
		"strength" : 2,
		"numericOrdering" : false,
		"alternate" : "non-ignorable",
		"maxVariable" : "punct",
		"normalization" : false,
		"backwards" : false,
		"version" : "57.1"
	}
}

But we can’t do the lookup after limiting ith 10 because, sometimes, the sort parameter will be alertTypeName. So we should get the result(with lookup) before setting the limit(10).

I tried dropping the index and again added the index with the options suggested by you. Still, it is taking too much time.

I have shared the explain stats of the query execution here: Dropbox - Mongo-explain-stats.docx - Simplify your life.
There it is already using the index: status_1_createdAt_ (Before reindexing itself it was using.).

If "alertTypeName" is unique per AlertType you can sort by "alertTypeId" and take advantage of indexing. of course you need to create the appropriate index.
I highly recommend doing what Asya asked, and share the valuable requested feedback

Since Stennie took the pain of reformatting your code, I would like to note that ObjectId("598d5746d11eb54eb7da2t50") is not a valid ObjectId. Please change the t near the end of the hex number

Regards,

Oh…okay. Unfortunately, the hex code was changed(one letter t instead of f) while editing. Please have a look at it.

Alerts:

{"_id" : ObjectId(“598d6746d11eb54eb7da2f50”)}

But how can we sort with alertTypeId instead of alertTypeName?

I just executed the normal aggregate without lookup. ie only from the collection alerts. THat also taking too much time. Can you all check this image: This itself took a few minutes to complete

Do you have access to the logs? Slow operations are logged there with some details and I suspect that if the original match is slow, it’s because your data doesn’t fit in RAM and fetching from disk is a very expensive operation (especially when you’re doing it for 1,233,806 document.

But we can’t do the lookup after limiting ith 10 because, sometimes, the sort parameter will be
alertTypeName. So we should get the result(with lookup) before setting the limit(10).

In this pipeline you are sorting by something available before the $lookup so we told you how you can make things a lot faster (split up count into a separate query and have this pipeline do the $sort and $limit first and then $lookup.

Asya

2 Likes

hey Sanjay,
take a look at :
https://www.mongodb.com/how-to/subset-pattern/
https://www.mongodb.com/how-to/bucket-pattern/
the subset pattern maybe your savior

Regarding the $match>$count pipeline. Using proper indexing, you can get a faster result with db.collection.count(query)

1 Like

Yes, I can see so many queries logged as the slow in the mongo logs. I will recheck the configurations.

Suppose I have to sort based on the alert type name, but I get the alert type name after the lookup rt. So how can I sort/limit before the lookup stage?

May you share an actual use case scenario where you want to sort by alert type name?
If you are interested in just the recent alerts, wouldn’t it be better to use a capped collection for querying?

Actually, this query is to serve the alert list along with the alert type name. The list columns are sortable including alert type. I hope, you got it

please fill:

{
    "$sort": {
      "......":"..",
      "......":".."
    }
  },
  { $limit: 10 }