Watch keynotes and sessions from MongoDB.live, our virtual developer conference.

Change Stream Java driver COLLSCAN

I’m watching 3 collections in a DB using the Java driver. Each of those collections has one document only, each of which have embedded documents. My client code looks like this:

ChangeStreamIterable<Document> changes =
          client.getDatabase(<DBNAME>)
              .watch(Collections.singletonList(
                      Aggregates.match(Filters.in("ns.coll", Arrays.asList(WATCHED_COLLECTIONS)))))
          .fullDocument(FullDocument.UPDATE_LOOKUP);

where the variable “WATCHED_COLLECTIONS” is an array of the 3 collection names that I want to watch.

Since I’ve used the “match” stage, this filtering should be happening at the server side right?
Despite that, in the mongo logs, I can see that ‘docsExamined’ is very high!

Why would that be happening, since there’s only one document in each collection? Even if we count all the embedded documents it doesn’t come up to 11000 documents.

Excerpt from mongo log below:

COMMAND [conn20161] command DBNAME.$cmd command: getMore { getMore: 1760441711222280319, collection: “$cmd.aggregate”, $db: “DBNAME”, $clusterTime: { clusterTime: Timestamp(1590477125, 7396), signature: { hash: BinData(0, 17B8B1B3ADE3FEFC381F56E9201694DC9509BC38), keyId: 6829683829607759874 } }, lsid: { id: UUID(“f88e3593-bec6-47cc-a067-6042f36aa1a3”) } } originatingCommand: { aggregate: 1, pipeline: [ { $changeStream: { fullDocument: “updateLookup” } }, { $match: { ns.coll: { $in: [ “COLLECTION1”, “COLLECTION2”, “COLLECTION3” ] } } } ], cursor: {}, $db: “DBNAME”, $clusterTime: { clusterTime: Timestamp(1590160602, 2), signature: { hash: BinData(0, 39A22239ED8BA07ED1E8B710D4212AE8CDB52663), keyId: 6829683829607759874 } }, lsid: { id: UUID(“f88e3593-bec6-47cc-a067-6042f36aa1a3”) } } planSummary: COLLSCAN cursorid:1760441711222280319 keysExamined:0 docsExamined:11890 numYields:7138 nreturned:0 reslen:305 locks:{ ReplicationStateTransition: { acquireCount: { w: 7141 } }, Global: { acquireCount: { r: 7141 } }, Database: { acquireCount: { r: 7141 } }, Mutex: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 7141 } } } storage:{ data: { bytesRead: 14 } } protocol:op_msg 351ms

Trying to bounce this topic up in case anybody has any insights…