We are trying to build an application which tries to watch a collection and identify a specific field being written to it as part of a db commit. While this has worked out well for the most part, we are struggling to keep up with the db changes associated to one particular collection in our cluster
The collection that is being tailed slow can typically generate large (1MB+) oplog entries, and this arises due to the usage of addToSet against certain large array fields in it.
The watch operation we run only projects a single json field (~400 bytes) in an effort to remain efficient, but the getmore queries (printed in mongod during the watch operation) is only able to scan very few documents before hitting the 16mb bson size limit in reslen. The queries take time (150ms+) and the rate of processing is never able to keep up with our load.
It would be good to understand if the filter pipeline used with the watch operation is indeed of any help in this scenario or if there is a better way to watch & process changes involving large oplog entries.