Mongodb Aggregation query Optimization

I have the below aggregation query which is using the index but it still take more than 23sec to execute.Can anyone please let me how what more i can tune here to reduce the executiontime?

db.getCollection('booked-slot').explain("executionStats").aggregate([{"$match":{"facility":ObjectId("63ee2e8458b6fd9eb66511ce")}},{"$addFields":{"userId":{"$cond":{"if":{"$eq":["$participant.isPrimary",true]},"then":"$participant.primaryUserId","else":"$participant.secondaryUserId"}},"dateValue":{"$dateFromString":{"dateString":"$date"}}}},{"$match":{}},{"$lookup":{"from":"users","localField":"userId","foreignField":"_id","as":"user"}},{"$unwind":{"path":"$user","preserveNullAndEmptyArrays":true}},{"$lookup":{"from":"practitioners","localField":"practitioner","foreignField":"_id","as":"practitioner"}},{"$unwind":{"path":"$practitioner","preserveNullAndEmptyArrays":true}},{"$lookup":{"from":"facility-practitioners","localField":"facilityPractitioner","foreignField":"_id","as":"facilityPractitioners"}},{"$unwind":{"path":"$facilityPractitioners","preserveNullAndEmptyArrays":true}},{"$addFields":{"userName":{"$toLower":{"$concat":["$user.name.first"," ","$user.name.last"]}},"photo":"$user.profile.picture"}},{"$match":{"$and":[{"practitioner._id":ObjectId("64f97e5d7884cee0c928a44c")},{"$or":[{"state":{"$in":["CANCELLED","FULLFILLED","UN_ATTENDED","USER_NO_SHOW","PRACTITIONER_NO_SHOW","NETWORK_ERROR","INCOMPLETE"]}}]}]}},{"$lookup":{"from":"master-data-call-status","localField":"appointmentUpdateReason","foreignField":"_id","as":"callStatus"}},{"$unwind":{"path":"$callStatus","preserveNullAndEmptyArrays":true}},{"$sort":{"date":-1,"slotStartTime":-1,"userName":1}},{"$project":{"_id":1,"prescription":1,"paymentOrder":1,"user":{"name":1,"_id":1},"photo":1,"practitioner":{"name":1,"_id":1},"consultationType":1,"date":1,"slotStartTime":1,"facilityPractitioner":1,"records":1,"createdAt":1,"updatedAt":1,"state":1,"updatedBy":1,"callStatus":{"code":1}}},{"$skip":0},{"$limit":10}])

Below is my exlain plan for the above query.

{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'test_tuning.booked-slot',
          indexFilterSet: false,
          parsedQuery: {
            '$and': [
              {
                facility: { '$eq': ObjectId("63ee2e8458b6fd9eb66511ce") }
              },
              {
                state: {
                  '$in': [
                    'CANCELLED',
                    'FULLFILLED',
                    'INCOMPLETE',
                    'NETWORK_ERROR',
                    'PRACTITIONER_NO_SHOW',
                    'UN_ATTENDED',
                    'USER_NO_SHOW'
                  ]
                }
              }
            ]
          },
          queryHash: 'F3E52FD7',
          planCacheKey: 'AEC7120F',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'PROJECTION_DEFAULT',
            transformBy: {
              _id: 1,
              appointmentUpdateReason: 1,
              'callStatus.code': 1,
              consultationType: 1,
              createdAt: 1,
              date: 1,
              facilityPractitioner: 1,
              'participant.isPrimary': 1,
              'participant.primaryUserId': 1,
              'participant.secondaryUserId': 1,
              paymentOrder: 1,
              photo: 1,
              practitioner: 1,
              prescription: 1,
              records: 1,
              slotStartTime: 1,
              state: 1,
              updatedAt: 1,
              updatedBy: 1,
              'user._id': 1,
              'user.name': 1,
              'user.profile.picture': 1,
              userId: 1,
              userName: 1
            },
            inputStage: {
              stage: 'FETCH',
              inputStage: {
                stage: 'IXSCAN',
                keyPattern: { facility: 1, state: 1, date: -1, slotStartTime: -1 },
                indexName: 'facility_1_state_1_date_-1_slotStartTime_-1',
                isMultiKey: false,
                multiKeyPaths: {
                  facility: [],
                  state: [],
                  date: [],
                  slotStartTime: []
                },
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'forward',
                indexBounds: {
                  facility: [
                    "[ObjectId('63ee2e8458b6fd9eb66511ce'), ObjectId('63ee2e8458b6fd9eb66511ce')]"
                  ],
                  state: [
                    '["CANCELLED", "CANCELLED"]',
                    '["FULLFILLED", "FULLFILLED"]',
                    '["INCOMPLETE", "INCOMPLETE"]',
                    '["NETWORK_ERROR", "NETWORK_ERROR"]',
                    '["PRACTITIONER_NO_SHOW", "PRACTITIONER_NO_SHOW"]',
                    '["UN_ATTENDED", "UN_ATTENDED"]',
                    '["USER_NO_SHOW", "USER_NO_SHOW"]'
                  ],
                  date: [ '[MaxKey, MinKey]' ],
                  slotStartTime: [ '[MaxKey, MinKey]' ]
                }
              }
            }
          },
          rejectedPlans: []
        },
        executionStats: {
          executionSuccess: true,
          nReturned: 87251,
          executionTimeMillis: 23902,
          totalKeysExamined: 87252,
          totalDocsExamined: 87251,
          executionStages: {
            stage: 'PROJECTION_DEFAULT',
            nReturned: 87251,
            executionTimeMillisEstimate: 238,
            works: 87253,
            advanced: 87251,
            needTime: 1,
            needYield: 0,
            saveState: 102,
            restoreState: 102,
            isEOF: 1,
            transformBy: {
              _id: 1,
              appointmentUpdateReason: 1,
              'callStatus.code': 1,
              consultationType: 1,
              createdAt: 1,
              date: 1,
              facilityPractitioner: 1,
              'participant.isPrimary': 1,
              'participant.primaryUserId': 1,
              'participant.secondaryUserId': 1,
              paymentOrder: 1,
              photo: 1,
              practitioner: 1,
              prescription: 1,
              records: 1,
              slotStartTime: 1,
              state: 1,
              updatedAt: 1,
              updatedBy: 1,
              'user._id': 1,
              'user.name': 1,
              'user.profile.picture': 1,
              userId: 1,
              userName: 1
            },
            inputStage: {
              stage: 'FETCH',
              nReturned: 87251,
              executionTimeMillisEstimate: 113,
              works: 87253,
              advanced: 87251,
              needTime: 1,
              needYield: 0,
              saveState: 102,
              restoreState: 102,
              isEOF: 1,
              docsExamined: 87251,
              alreadyHasObj: 0,
              inputStage: {
                stage: 'IXSCAN',
                nReturned: 87251,
                executionTimeMillisEstimate: 62,
                works: 87253,
                advanced: 87251,
                needTime: 1,
                needYield: 0,
                saveState: 102,
                restoreState: 102,
                isEOF: 1,
                keyPattern: { facility: 1, state: 1, date: -1, slotStartTime: -1 },
                indexName: 'facility_1_state_1_date_-1_slotStartTime_-1',
                isMultiKey: false,
                multiKeyPaths: {
                  facility: [],
                  state: [],
                  date: [],
                  slotStartTime: []
                },
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'forward',
                indexBounds: {
                  facility: [
                    "[ObjectId('63ee2e8458b6fd9eb66511ce'), ObjectId('63ee2e8458b6fd9eb66511ce')]"
                  ],
                  state: [
                    '["CANCELLED", "CANCELLED"]',
                    '["FULLFILLED", "FULLFILLED"]',
                    '["INCOMPLETE", "INCOMPLETE"]',
                    '["NETWORK_ERROR", "NETWORK_ERROR"]',
                    '["PRACTITIONER_NO_SHOW", "PRACTITIONER_NO_SHOW"]',
                    '["UN_ATTENDED", "UN_ATTENDED"]',
                    '["USER_NO_SHOW", "USER_NO_SHOW"]'
                  ],
                  date: [ '[MaxKey, MinKey]' ],
                  slotStartTime: [ '[MaxKey, MinKey]' ]
                },
                keysExamined: 87252,
                seeks: 2,
                dupsTested: 0,
                dupsDropped: 0
              }
            }
          }
        }
      },
      nReturned: Long("87251"),
      executionTimeMillisEstimate: Long("473")
    },
    {
      '$addFields': {
        userId: {
          '$cond': [
            { '$eq': [ '$participant.isPrimary', { '$const': true } ] },
            '$participant.primaryUserId',
            '$participant.secondaryUserId'
          ]
        },
        dateValue: { '$dateFromString': { dateString: '$date' } }
      },
      nReturned: Long("87251"),
      executionTimeMillisEstimate: Long("493")
    },
    {
      '$lookup': {
        from: 'users',
        as: 'user',
        localField: 'userId',
        foreignField: '_id',
        unwinding: { preserveNullAndEmptyArrays: true }
      },
      totalDocsExamined: Long("87250"),
      totalKeysExamined: Long("87250"),
      collectionScans: Long("0"),
      indexesUsed: [ '_id_' ],
      nReturned: Long("87251"),
      executionTimeMillisEstimate: Long("12126")
    },
    {
      '$lookup': {
        from: 'practitioners',
        as: 'practitioner',
        localField: 'practitioner',
        foreignField: '_id',
        unwinding: { preserveNullAndEmptyArrays: true }
      },
      totalDocsExamined: Long("87250"),
      totalKeysExamined: Long("87250"),
      collectionScans: Long("0"),
      indexesUsed: [ '_id_' ],
      nReturned: Long("87251"),
      executionTimeMillisEstimate: Long("22935")
    },
    {
      '$match': {
        'practitioner._id': { '$eq': ObjectId("64f97e5d7884cee0c928a44c") }
      },
      nReturned: Long("3850"),
      executionTimeMillisEstimate: Long("22948")
    },
    {
      '$lookup': {
        from: 'facility-practitioners',
        as: 'facilityPractitioners',
        localField: 'facilityPractitioner',
        foreignField: '_id',
        unwinding: { preserveNullAndEmptyArrays: true }
      },
      totalDocsExamined: Long("3849"),
      totalKeysExamined: Long("3849"),
      collectionScans: Long("0"),
      indexesUsed: [ '_id_' ],
      nReturned: Long("3850"),
      executionTimeMillisEstimate: Long("23438")
    },
    {
      '$addFields': {
        userName: {
          '$toLower': [
            {
              '$concat': [
                '$user.name.first',
                { '$const': ' ' },
                '$user.name.last'
              ]
            }
          ]
        },
        photo: '$user.profile.picture'
      },
      nReturned: Long("3850"),
      executionTimeMillisEstimate: Long("23438")
    },
    {
      '$lookup': {
        from: 'master-data-call-status',
        as: 'callStatus',
        localField: 'appointmentUpdateReason',
        foreignField: '_id',
        unwinding: { preserveNullAndEmptyArrays: true }
      },
      totalDocsExamined: Long("3669"),
      totalKeysExamined: Long("3849"),
      collectionScans: Long("0"),
      indexesUsed: [ '_id_' ],
      nReturned: Long("3850"),
      executionTimeMillisEstimate: Long("23902")
    },
    {
      '$sort': {
        sortKey: { date: -1, slotStartTime: -1, userName: 1 },
        limit: Long("10")
      },
      totalDataSizeSortedBytesEstimate: Long("109465"),
      usedDisk: false,
      spills: Long("0"),
      nReturned: Long("10"),
      executionTimeMillisEstimate: Long("23902")
    },
    {
      '$project': {
        _id: true,
        updatedAt: true,
        state: true,
        photo: true,
        records: true,
        slotStartTime: true,
        date: true,
        facilityPractitioner: true,
        updatedBy: true,
        paymentOrder: true,
        consultationType: true,
        createdAt: true,
        prescription: true,
        user: { _id: true, name: true },
        practitioner: { _id: true, name: true },
        callStatus: { code: true }
      },
      nReturned: Long("10"),
      executionTimeMillisEstimate: Long("23902")
    }
  ],
  serverInfo: {
    host: 'ip-10-22-11-89.ap-south-1.compute.internal',
    port: 27017,
    version: '6.0.5',
    gitVersion: 'c9a99c120371d4d4c52cbb15dac34a36ce8d3b1d'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  command: {
    aggregate: 'booked-slot',
    pipeline: [
      { '$match': { facility: ObjectId("63ee2e8458b6fd9eb66511ce") } },
      {
        '$addFields': {
          userId: {
            '$cond': {
              if: { '$eq': [ '$participant.isPrimary', true ] },
              then: '$participant.primaryUserId',
              else: '$participant.secondaryUserId'
            }
          },
          dateValue: { '$dateFromString': { dateString: '$date' } }
        }
      },
      { '$match': {} },
      {
        '$lookup': {
          from: 'users',
          localField: 'userId',
          foreignField: '_id',
          as: 'user'
        }
      },
      {
        '$unwind': { path: '$user', preserveNullAndEmptyArrays: true }
      },
      {
        '$lookup': {
          from: 'practitioners',
          localField: 'practitioner',
          foreignField: '_id',
          as: 'practitioner'
        }
      },
      {
        '$unwind': { path: '$practitioner', preserveNullAndEmptyArrays: true }
      },
      {
        '$lookup': {
          from: 'facility-practitioners',
          localField: 'facilityPractitioner',
          foreignField: '_id',
          as: 'facilityPractitioners'
        }
      },
      {
        '$unwind': {
          path: '$facilityPractitioners',
          preserveNullAndEmptyArrays: true
        }
      },
      {
        '$addFields': {
          userName: {
            '$toLower': {
              '$concat': [ '$user.name.first', ' ', '$user.name.last' ]
            }
          },
          photo: '$user.profile.picture'
        }
      },
      {
        '$match': {
          '$and': [
            {
              'practitioner._id': ObjectId("64f97e5d7884cee0c928a44c")
            },
            {
              '$or': [
                {
                  state: {
                    '$in': [
                      'CANCELLED',
                      'FULLFILLED',
                      'UN_ATTENDED',
                      'USER_NO_SHOW',
                      'PRACTITIONER_NO_SHOW',
                      'NETWORK_ERROR',
                      'INCOMPLETE'
                    ]
                  }
                }
              ]
            }
          ]
        }
      },
      {
        '$lookup': {
          from: 'master-data-call-status',
          localField: 'appointmentUpdateReason',
          foreignField: '_id',
          as: 'callStatus'
        }
      },
      {
        '$unwind': { path: '$callStatus', preserveNullAndEmptyArrays: true }
      },
      { '$sort': { date: -1, slotStartTime: -1, userName: 1 } },
      {
        '$project': {
          _id: 1,
          prescription: 1,
          paymentOrder: 1,
          user: { name: 1, _id: 1 },
          photo: 1,
          practitioner: { name: 1, _id: 1 },
          consultationType: 1,
          date: 1,
          slotStartTime: 1,
          facilityPractitioner: 1,
          records: 1,
          createdAt: 1,
          updatedAt: 1,
          state: 1,
          updatedBy: 1,
          callStatus: { code: 1 }
        }
      },
      { '$skip': 0 },
      { '$limit': 10 }
    ],
    cursor: {},
    '$db': 'test_tuning'
  },
1 Like

From the looks of the query plan it’s the lookups that are taking the time you may want to re-format the query, it’s unreadable in the current format).

You say you’re using indexes, but what exact indexes do you have on each collection and how many documents in each?

Have you tried breaking down the aggregation pipeline and running individual stages to try and optimise the stages that are causing the long execution times?

What is this running on? local hosted or Atlas? If Atlas which tier are you running it on?

As john mentioned your query is unreadable in the current format but from far view i noticed you have an empty $match stage in your pipeline! also after every lookup you are performing unwind operation even though the subsequent stages are not dependent on it, this will result in unnecessary operations if the lookup array contains multiple elements. It’s recommended that you perform the unwind operations at last in case no other stages are dependant on them.
For further assistance more information will be needed as mentioned by John.

Hi John,

Please find the details below.


> db.getCollection('booked-slot').explain("executionStats").aggregate([{"$match":{"facility":ObjectId("63ee2e8458b6fd9eb66511ce")}},{"$addFields":{"userId":{"$cond":{"if":{"$eq":["$participant.isPrimary",true]},"then":"$participant.primaryUserId","else":"$participant.secondaryUserId"}},"dateValue":{"$dateFromString":{"dateString":"$date"}}}},{"$match":{}},{"$lookup":{"from":"users","localField":"userId","foreignField":"_id","as":"user"}},{"$unwind":{"path":"$user","preserveNullAndEmptyArrays":true}},{"$lookup":{"from":"practitioners","localField":"practitioner","foreignField":"_id","as":"practitioner"}},{"$unwind":{"path":"$practitioner","preserveNullAndEmptyArrays":true}},{"$lookup":{"from":"facility-practitioners","localField":"facilityPractitioner","foreignField":"_id","as":"facilityPractitioners"}},{"$unwind":{"path":"$facilityPractitioners","preserveNullAndEmptyArrays":true}},{"$addFields":{"userName":{"$toLower":{"$concat":["$user.name.first"," ","$user.name.last"]}},"photo":"$user.profile.picture"}},{"$match":{"$and":[{"practitioner._id":ObjectId("64f97e5d7884cee0c928a44c")},{"$or":[{"state":{"$in":["CANCELLED","FULLFILLED","UN_ATTENDED","USER_NO_SHOW","PRACTITIONER_NO_SHOW","NETWORK_ERROR","INCOMPLETE"]}}]}]}},{"$lookup":{"from":"master-data-call-status","localField":"appointmentUpdateReason","foreignField":"_id","as":"callStatus"}},{"$unwind":{"path":"$callStatus","preserveNullAndEmptyArrays":true}},{"$sort":{"date":-1,"slotStartTime":-1,"userName":1}},{"$project":{"_id":1,"prescription":1,"paymentOrder":1,"user":{"name":1,"_id":1},"photo":1,"practitioner":{"name":1,"_id":1},"consultationType":1,"date":1,"slotStartTime":1,"facilityPractitioner":1,"records":1,"createdAt":1,"updatedAt":1,"state":1,"updatedBy":1,"callStatus":{"code":1}}},{"$skip":0},{"$limit":10}])

I am using the below index on booked-slot collection.

{
    v: 2,
    key: { facility: 1, state: 1, date: -1, slotStartTime: -1 },
    name: 'facility_1_state_1_date_-1_slotStartTime_-1',
    background: true
  }

Pleae find the document count of each documents.
booked-slot - 95563
users - 698422
practitioners - 39
facility-practitioners -37
master-data-call-status - 52

Also I have tried breaking down the aggregation pipelines and ram it individually but still it takes time.

db.getCollection('booked-slot').explain("executionStats").aggregate([{"$match":{"facility":ObjectId("63ee2e8458b6fd9eb66511ce")}},{"$addFields":{"userId":{"$cond":{"if":{"$eq":["$participant.isPrimary",true]},"then":"$participant.primaryUserId","else":"$participant.secondaryUserId"}},"dateValue":{"$dateFromString":{"dateString":"$date"}}}},{"$match":{}},{"$lookup":{"from":"users","localField":"userId","foreignField":"_id","as":"user"}},{"$unwind":{"path":"$user","preserveNullAndEmptyArrays":true}}])
{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'test_tuning.booked-slot',
          indexFilterSet: false,
          parsedQuery: { facility: { '$eq': ObjectId("63ee2e8458b6fd9eb66511ce") } },
          queryHash: '1EA0CE2C',
          planCacheKey: 'F5E357E3',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'FETCH',
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: { facility: 1, state: 1, date: -1, slotStartTime: -1 },
              indexName: 'facility_1_state_1_date_-1_slotStartTime_-1',
              isMultiKey: false,
              multiKeyPaths: { facility: [], state: [], date: [], slotStartTime: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                facility: [
                  "[ObjectId('63ee2e8458b6fd9eb66511ce'), ObjectId('63ee2e8458b6fd9eb66511ce')]"
                ],
                state: [ '[MinKey, MaxKey]' ],
                date: [ '[MaxKey, MinKey]' ],
                slotStartTime: [ '[MaxKey, MinKey]' ]
              }
            }
          },
          rejectedPlans: []
        },
        executionStats: {
          executionSuccess: true,
          nReturned: 95563,
          executionTimeMillis: 13559,
          totalKeysExamined: 95563,
          totalDocsExamined: 95563,
          executionStages: {
            stage: 'FETCH',
            nReturned: 95563,
            executionTimeMillisEstimate: 128,
            works: 95564,
            advanced: 95563,
            needTime: 0,
            needYield: 0,
            saveState: 156,
            restoreState: 156,
            isEOF: 1,
            docsExamined: 95563,
            alreadyHasObj: 0,
            inputStage: {
              stage: 'IXSCAN',
              nReturned: 95563,
              executionTimeMillisEstimate: 79,
              works: 95564,
              advanced: 95563,
              needTime: 0,
              needYield: 0,
              saveState: 156,
              restoreState: 156,
              isEOF: 1,
              keyPattern: { facility: 1, state: 1, date: -1, slotStartTime: -1 },
              indexName: 'facility_1_state_1_date_-1_slotStartTime_-1',
              isMultiKey: false,
              multiKeyPaths: { facility: [], state: [], date: [], slotStartTime: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                facility: [
                  "[ObjectId('63ee2e8458b6fd9eb66511ce'), ObjectId('63ee2e8458b6fd9eb66511ce')]"
                ],
                state: [ '[MinKey, MaxKey]' ],
                date: [ '[MaxKey, MinKey]' ],
                slotStartTime: [ '[MaxKey, MinKey]' ]
              },
              keysExamined: 95563,
              seeks: 1,
              dupsTested: 0,
              dupsDropped: 0
            }
          }
        }
      },
      nReturned: Long("95563"),
      executionTimeMillisEstimate: Long("248")
    },
    {
      '$addFields': {
        userId: {
          '$cond': [
            { '$eq': [ '$participant.isPrimary', { '$const': true } ] },
            '$participant.primaryUserId',
            '$participant.secondaryUserId'
          ]
        },
        dateValue: { '$dateFromString': { dateString: '$date' } }
      },
      nReturned: Long("95563"),
      executionTimeMillisEstimate: Long("273")
    },
    {
      '$lookup': {
        from: 'users',
        as: 'user',
        localField: 'userId',
        foreignField: '_id',
        unwinding: { preserveNullAndEmptyArrays: true }
      },
      totalDocsExamined: Long("95562"),
      totalKeysExamined: Long("95562"),
      collectionScans: Long("0"),
      indexesUsed: [ '_id_' ],
      nReturned: Long("95563"),
      executionTimeMillisEstimate: Long("13531")
    }
  ],
  serverInfo: {
    host: 'ip-10-22-11-89.ap-south-1.compute.internal',
    port: 27017,
    version: '6.0.5',
    gitVersion: 'c9a99c120371d4d4c52cbb15dac34a36ce8d3b1d'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  command: {
    aggregate: 'booked-slot',
    pipeline: [
      { '$match': { facility: ObjectId("63ee2e8458b6fd9eb66511ce") } },
      {
        '$addFields': {
          userId: {
            '$cond': {
              if: { '$eq': [ '$participant.isPrimary', true ] },
              then: '$participant.primaryUserId',
              else: '$participant.secondaryUserId'
            }
          },
          dateValue: { '$dateFromString': { dateString: '$date' } }
        }
      },
      { '$match': {} },
      {
        '$lookup': {
          from: 'users',
          localField: 'userId',
          foreignField: '_id',
          as: 'user'
        }
      },
      {
        '$unwind': { path: '$user', preserveNullAndEmptyArrays: true }
      }
    ],
    cursor: {},
    '$db': 'test_tuning'
  },

This instance is running on localy hosted environment.

I had a bit of a play locally with two collections I setup, one with 700,000 users and a booked-slot collection with 100,000 entries. Each of the booked-slot had two properties for the userID, a primary and secondary and also a boolean that was random that determined which to pull in the query.

Running a basic query:

db.getCollection("booked-slot").aggregate([
   {
      "$addFields":{
         "userId":{
            "$cond":{
               "if":{
                  "$eq":[
                     "$participant.isPrimary",
                     true
                  ]
               },
               "then":"$participant.primaryUserId",
               "else":"$participant.secondaryUserId"
            }
         },
         "dateValue":{
            "$dateFromString":{
               "dateString":"$date"
            }
         }
      }
   },
   {
      "$lookup":{
         "from":"users",
         "localField":"userId",
         "foreignField":"_id",
         "as":"user"
      }
   },
   {
       $out:'OUTPUT'
   }
])

Took about 7 seconds, running just a lookup onto the users table without the calculation:

db.getCollection("booked-slot").explain("executionStats").aggregate([
   {
      "$lookup":{
         "from":"users",
         "localField":"participant.primaryUserId",
         "foreignField":"_id",
         "as":"user_1"
      }
   },
   {
       $out:'OUTPUT'
   }
])

Took one (1) second, even adding a second lookup to the user table only took a second longer.

I think one of the issues you have above is you’re linking onto a calculated field, so the optimisations the server can do are a touch limited.

Running this still only took a couple of seconds:

db.getCollection("booked-slot").aggregate([
   {
      "$lookup":{
         "from":"users",
         "localField":"participant.primaryUserId",
         "foreignField":"_id",
         "as":"user_1"
      }
   },
   {
      "$lookup":{
         "from":"users",
         "localField":"participant.secondaryUserId",
         "foreignField":"_id",
         "as":"user_2"
      }
   },
   {
       $addFields:{
           actualUser:{
               $cond:{
                   if:"$participant.isPrimary",
                   then:{user:'$user_1', sit:'1'},
                   else:{user:'$user_2', sit:'2'},
               }               
           }
       }
   },
   {
       $out:'OUTPUT'
   }
])

The other thought was that if this is a lot of data that’s used often together, then storing it apart and having to re-form it on often used queries is going to cause you performance issues.
It may be an idea to keep the data you query together, well together.

(This is what I used to generate dummy data in case anyone else wants to play, I know I could have used a bulk operator as opposed to insertMany with an array but it was quick…)

var userCount = 700000
db.getCollection("users").drop()

var batchSize = 1000;
var bucket = [];

for(var x=0; x< userCount; x++){
    bucket.push({
        _id:x,
        FirstName:'Joe',
        LastName:'Smith'
    })
    if(bucket.length > batchSize){
        db.getCollection("users").insertMany(bucket);
        bucket = [];
        print(`${x} / ${userCount}`)
    }
}



//////////////////
var slotCount = 100000
//db.getCollection("booked-slot").find({})
db.getCollection("booked-slot").drop()

var batchSize = 1000;
var bucket = [];

for(var x=0; x< slotCount; x++){
    bucket.push({
        _id:x,
        participant:{
            isPrimary: (Math.random() < 0.5),
            primaryUserId:Math.floor((Math.random() * userCount)),
            secondaryUserId:Math.floor((Math.random() * userCount))
        }
        
    })
    if(bucket.length > batchSize){
        db.getCollection("booked-slot").insertMany(bucket);
        bucket = [];
        print(`${x} / ${slotCount}`)
    }
}

Also I ignored the edge case of the last bucket not being written to the server if it was not on the batch size limit.

Note, this was also running locally on a Mongo instance but version 6, so not latest. This was just my desktop box so 16 core / 32 thread and 32 Gigs Ram running off NVMe SSD.

1 Like

I ran the above script to generate the data set, but in my case the query took only a few milliseconds -

{
  "explainVersion": "2",
  "stages": [
    {
      "$cursor": {
        "queryPlanner": {
          "namespace": "missionChampion.booked-slot",
          "indexFilterSet": false,
          "parsedQuery": {},
          "queryHash": "E475932B",
          "planCacheKey": "FA0A1518",
          "maxIndexedOrSolutionsReached": false,
          "maxIndexedAndSolutionsReached": false,
          "maxScansToExplodeReached": false,
          "winningPlan": {
            "queryPlan": {
              "stage": "COLLSCAN",
              "planNodeId": 1,
              "filter": {},
              "direction": "forward"
            },
            "slotBasedPlan": {
              "slots": "$$RESULT=s4 env: { s1 = TimeZoneDatabase(America/Iqaluit...Pacific/Samoa) (timeZoneDB), s2 = Nothing (SEARCH_META), s3 = 1710334983041 (NOW) }",
              "stages": "[1] scan s4 s5 none none none none lowPriority [] @\"8ccb3be2-fb21-48a4-82c3-dd959762869b\" true false "
            }
          },
          "rejectedPlans": []
        },
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 99099,
          "executionTimeMillis": 2870,
          "totalKeysExamined": 0,
          "totalDocsExamined": 99099,
          "executionStages": {
            "stage": "scan",
            "planNodeId": 1,
            "nReturned": 99099,
            "executionTimeMillisEstimate": 26,
            "opens": 1,
            "closes": 1,
            "saveState": 107,
            "restoreState": 107,
            "isEOF": 1,
            "numReads": 99099,
            "recordSlot": 4,
            "recordIdSlot": 5,
            "fields": [],
            "outputSlots": []
          }
        }
      },
      "nReturned": 99099,
      "executionTimeMillisEstimate": 26
    },
    {
      "$addFields": {
        "dateValue": {
          "$dateFromString": {
            "dateString": "$date"
          }
        },
        "userId": {
          "$cond": [
            {
              "$eq": [
                "$participant.isPrimary",
                {
                  "$const": true
                }
              ]
            },
            "$participant.primaryUserId",
            "$participant.secondaryUserId"
          ]
        }
      },
      "nReturned": 99099,
      "executionTimeMillisEstimate": 26
    },
    {
      "$lookup": {
        "from": "users",
        "as": "user",
        "localField": "userId",
        "foreignField": "_id"
      },
      "totalDocsExamined": 99063,
      "totalKeysExamined": 99063,
      "collectionScans": 0,
      "indexesUsed": [
        "_id_"
      ],
      "nReturned": 99099,
      "executionTimeMillisEstimate": 2870
    }
  ],
  "serverInfo": {
    "host": "192.168.1.7",
    "port": 27018,
    "version": "7.0.2",
    "gitVersion": "02b3c655e1302209ef046da6ba3ef6749dd0b62a"
  },
  "serverParameters": {
    "internalQueryFacetBufferSizeBytes": 104857600,
    "internalQueryFacetMaxOutputDocSizeBytes": 104857600,
    "internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
    "internalDocumentSourceGroupMaxMemoryBytes": 104857600,
    "internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
    "internalQueryProhibitBlockingMergeOnMongoS": 0,
    "internalQueryMaxAddToSetBytes": 104857600,
    "internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600,
    "internalQueryFrameworkControl": "trySbeEngine"
  },
  "command": {
    "aggregate": "booked-slot",
    "pipeline": [
      {
        "$addFields": {
          "dateValue": {
            "$dateFromString": {
              "dateString": "$date"
            }
          },
          "userId": {
            "$cond": {
              "if": {
                "$eq": [
                  "$participant.isPrimary",
                  true
                ]
              },
              "then": "$participant.primaryUserId",
              "else": "$participant.secondaryUserId"
            }
          }
        }
      },
      {
        "$lookup": {
          "from": "users",
          "localField": "userId",
          "foreignField": "_id",
          "as": "user"
        }
      }
    ],
    "cursor": {},
    "$db": "abc"
  },
  "ok": 1
}

Akshat, did you pull all data from the aggregation when running or just get an iterator back and get the first set of results?
Wondering on the difference between run time on your environment and mine.

Running to get first 50 results takes 0.018s but running to get all results took about 7s on my environment.

1 Like

John, in the explain output which I shared its mentioned returned documents as 99099, so i think its the full result, I even verified it on compass and got the result in same time. Here is the code through which i ran my query -

async function executeQuery() {
    const db = await connect();
    const Slot = db.collection('booked-slot');

    const output = await Slot.aggregate([
        {
            $addFields: {
                dateValue: {
                    $dateFromString: {
                        dateString: "$date"
                    }
                },
                userId: {
                    $cond: {
                        if: {
                            $eq: [
                                "$participant.isPrimary",
                                true
                            ]
                        },
                        then: "$participant.primaryUserId",
                        else: "$participant.secondaryUserId"
                    }
                },
            }
        },
        {
            $lookup: {
                from: "users",
                localField: "userId",
                foreignField: "_id",
                as: "user"
            }
        }
    ]).explain("executionStats");
    console.log(JSON.stringify(output));
    await disconnect();
    return output;
}

As for the environment i am running this on m1 macbook pro, locally hosted mongo instance.

Interesting, I ran your same code on my machine and it did indeed take longer than yours, your final execution is 2870ms vs mine at 7700ms.

I’ve a MBP M1 in the house somewhere I’ll have to test it on that!

So John,

Is there any ways to optimize the above mentioned query to take a less execution time with 4 lookups?

Link back to similar if not identical topic.

1 Like

Take a look at the linked post from Steevej, look at why you’re needing to join to this many collections and if the join order is valid, i.e. as opposed to lookup from A->B you could go from B->A.

Is this is common use-case for data?

If you have a collection that references data in 4 other collections and 99.9 is just joining them up to read and 0.1% is an update to one of them then you can massively boost performance but putting the data you need into the main collection and doing an update to keep it refreshed once every month or whatever when related data changes then you have a straight read which will take no time at all to run.

There are some other comments in that thread that are great points (i.e. dates are dates, not strings) and also think about how much data you’re returning to the client, can you filter it early and cut down on work.

To paraphrase, “Data that plays together, stays together”, Mongo is not a relational database (yet!) and trying to go down that road can lead to performance issues.