Efficiency of change streams when large oplog entries are created

Prasanth_R · June 8, 2020, 1:33pm

We are trying to build an application which tries to watch a collection and identify a specific field being written to it as part of a db commit. While this has worked out well for the most part, we are struggling to keep up with the db changes associated to one particular collection in our cluster

The collection that is being tailed slow can typically generate large (1MB+) oplog entries, and this arises due to the usage of addToSet against certain large array fields in it.

The watch operation we run only projects a single json field (~400 bytes) in an effort to remain efficient, but the getmore queries (printed in mongod during the watch operation) is only able to scan very few documents before hitting the 16mb bson size limit in reslen. The queries take time (150ms+) and the rate of processing is never able to keep up with our load.

It would be good to understand if the filter pipeline used with the watch operation is indeed of any help in this scenario or if there is a better way to watch & process changes involving large oplog entries.

Katya · May 28, 2021, 2:28pm

hi @Prasanth_R, do you observe this on a shared cluster? If yes, there is a ticket tracking improvements for this use case https://jira.mongodb.org/browse/SERVER-48694
It is on our roadmap, stay tuned for the updates.

system · June 2, 2021, 2:28pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.