Getting data from query result taking a long time

Hello Mongo newbie here. I have a query which works fine but when running against a lot of data and wide date range it can take many seconds or even minutes. I narrowed down the bottle neck which is the following snippet that lies within a couple of for loops. In my current date range the whole query takes less than a second…

db.collection.find({'SecondEvent.Code': "1111"}, 
    {_id: 0, 'TrackTime': 1}).sort({'TrackTime.0': -1}).limit(1)

However what I want to do is get the actual value of the TrackTime which means doing the following…

db.collection.find({'SecondEvent.Code': "1111"}, 
    {_id: 0, 'TrackTime': 1}).sort({'TrackTime.0': -1}).limit(1)[0].TrackTime[0].valueOf()

The whole query now takes about 12 seconds! Is there any other way of getting the time value out for the result without the performance falling over like this?

Any help is much appreciated

Thanks

I’m curious what the documents in your collection look like.

Also do you have any indexes beyond the default _id?

Hi, and thanks. There are other indexes around various values I even created one for the Tracking Time but didn’t make any difference (Update: although this won’t make any difference anyway as I’m not selecting based on the TrackTime). An example document is as follows…

{
"_id" : LUUID("5e4cf57f-2b1d-2d4a-b158-b54c43b17364"),
"Event" : {
    "Code" : "22",
    "Description" : "example description"
},
"Confirmed" : [ 
    NumberLong(636469684832760000), 
    0
],
"Exported" : [ 
    NumberLong(636449680517180000), 
    0
],
"Location" : "locationA",
"SecondEvent" : {
    "_id" : LUUID("b838f2c5-033c-c64b-95b7-b636cca4d2bf"),
    "Code" : "1111",
    "Description" : "example description",
    "Default" : true,
    "Location" : {
        "Name" : "Default",
        "Code" : "AAA"
    },
    "EndOfLife" : false
},
"Ships" : {
    "Id" : "123456789-id",
    "Name" : "xyz",
    "Number" : "123456789",
    "Parts" : [ 
        {
            "Codes" : [ 
                {
                    "Number" : "987654321",
                    "Name" : "xyz",
                    "DeletedDate" : null
                }
            ],
            "TracNumbers" : [ 
                {
                    "Number" : "123456789",
                    "Name" : "abc",
                    "DeletedDate" : null
                }, 
                {
                    "Number" : "987654321",
                    "Name" : "xyz",
                    "DeletedDate" : null
                }
            ]
        }
    ],
    "Status" : 3
},
"Identifier" : {
    "_id" : null,
    "Number" : "123456789",
    "Code" : "987654321",
    "TrackNumber" : "123456789"
},
"TrackTime" : [ 
    NumberLong(636445689000000000), 
    60
],
"UserEmail" : "joe.blogs@mail.com",
"Version" : "0.0.2"

}

Thanks

So I am wondering, with this example where is the field Scan that you are querying for?
And yes you are right the TrackTime index will not help right now. Do you have one for Scan?

He is not selecting TrackTime, however he is projecting and then sorting. An index will help very much because the query will become covered if the index also include SecondEvent.Code.

1 Like

@steevej You are absolutely correct. My mistake. However the field SecondEvent.Code in the query was originally Scan; @Alex_Pickard must of changed it . Therefore I was under the impression that the stand alone TrackTime index would not help as the query was off Scan.
Would TrackTime be a multi-key index? As the field is an array?

Considering there is 2 edits on the first post that is most likely:

Multi index yes. But it looks like he is intereseted in index 0. Probably the start time or creating time. Which might be better be a different field to avoid multi-index.

Apologies, I didn’t change that when I copied the document. In the original snippet “Scan” is actually “SecondEvent.Code”, and yes there is an index for that.

Hi. I did reply earlier but it’s still under review for some reason. Basically I said “Apologies, I didn’t change that when I copied the document. In the original snippet “Scan” is actually “SecondEvent.Code”, and yes there is an index for that.”

I’ve also tried creating a multi index for both SecondEvent.Code and TrackTime.0 but no improvement. As I said before it’s fast when not doing anything with the result returned…

db.collection.find({'SecondEvent.Code': "1111"}, 
{_id: 0, 'TrackTime': 1}).sort({'TrackTime.0': -1}).limit(1)

It’s simply when adding [0].TrackTime[0].valueOf() to the end to get the time value out of the NumberLong, or even just simply adding [0] causes the massive increase in time to the query.

@Alex_Pickard The following Aggregation query prints the maximum value of the array field TrackTime, efficiently.

db.collection.aggregate( [
  { 
      $match: { "SecondEvent.Code": "1111" } 
  },
  { 
      $project: { _id: 0, maxValue: { $max: "$TrackTime" } } 
  }
] )

If you want all the other fields in the document you can substitute $project with the $addFields.

If there are a large number of documents in the collection, it helps to have an index on the field "SecondEvent.Code". If you have other compound indexes with this field, then the SecondEvent.Code must be the first key in that index.

Thanks but this is no better performance wise. It actually shows the same slow 12 seconds run time even before I get the TrackTime value. The original snippet was less than half a second before getting the value…

db.collection.find({'SecondEvent.Code': "1111"}, 
{_id: 0, 'TrackTime': 1}).sort({'TrackTime.0': -1}).limit(1)

It’s only when trying to get the TrackTime[0] value to do something with it that it’s slowing everything down for some reason.

I am very very surprise with

All the hard work is already done. Something else must be happening.

On a side note, since you are mostly interested in TrackTime[0], could you project

{ '_id' : 0 , 'TrackTime.0' : 1 }

rather than

I know it’s weird. If you want to see it in “action” run the first part which I’ve stripped lots of stuff out of, to create 4000 records in a temp collection. Then run the second part against it. Try with the [0] commented out first. For me this runs in about 1/20th of a second. When uncommenting the [0] it takes about 8-10 seconds!

var toDate = new Date()
var epochTicks = 621355968000000000
var toTicks = ((toDate.getTime() * 10000) + epochTicks);
var count = 0
var shipNum = 621355968000000000

for (i = 0; i < 4000; i++) {
  count = count + 123456789
  shipNum = shipNum + 123456789
  db.tracking_temp.insert({
      "SecondEvent" : {
          "Code" : "1111"
      },
      "Ships" : {
          "Name" : "xyz",
          "Number" : NumberLong(shipNum)
      },
      "TrackTime" : [ 
          NumberLong(toTicks + count), 
          60
      ]
  })
}

…then run this. Doesn’t print or do anything but will show the jump in run duration.

var ships = db.tracking_temp.find({}, {_id:0, 'Ships.Number': 1}).toArray()
var shipsCount = ships.length

for (i = 0; i < shipsCount; i++) {
    var shipsNumber = ships[i].Ships.Number

    var minTick = db.tracking_temp.find({$and: [{'Ships.Number': shipsNumber}, {'SecondEvent.Code': "1111"}]}, 
    {_id: 0, 'TrackTime': 1}).limit(1)//[0]
    
    print()
}

Thanks. I will certainly try this over the week end. I let you know on my findings.

@Alex_Pickard, here are my findings.

I was able to observe the difference you explained. But I was curious so rather that just
print()
I did
print( db.tracking_temp.find(query, project).limit(1) )

Not being too familiar with .js, I was surprised that documents were not printed but the following:

DBQuery: test.tracking_temp -> {
	"$and" : [
		{
			"Ships.Number" : NumberLong("621356461827072000")
		},
		{
			"SecondEvent.Code" : "1111"
		}
	]
}

Which is probably some kind of promise that is not executed when you do not do [0] but it is when you do. Said otherwise with [0], you do not execute the query and you do not transfer any document over the wire. Wonderful world of async I guess.