Performance Difference Between Aggregation and Find

Using db.collection.aggregate().match() and db.collection.find() produces the same result: an array of documents matched on the filter/query.

Do they have similar performance as in what kind of time benchmarks we get if we compare the two assuming that they are both run on the same collection?

Also is the performance of an aggregation pipeline including multiple stages that perform the transformations faster than if I were to do a find operation and then treat the documents with code?

Hey @Rachelle_98421

I believe that a simple aggregation pipeline with only a $match stage would preform very similar to a find() command. The difference is in the power of the aggregation pipeline to transform the data however the author sees fit!

I believe this would have to get answered by a couple of factors. One could be:

  • Power of the machine(s) running mongod vs power of the machine/mobile device running the code to treat the documents yourself.
    If you plan to treat the documents client side and that happens to be an old cell phone, then I would think the aggregation pipeline is a better choice.

I would think is depends on your situation

Hopefully someone more experienced then I can help better answer your question. :slight_smile:

1 Like

Hi @natac13,

Thanks a lot for your answer! You bring up a great point with the second part of your answer. I should have been a bit clearer in my question.

I was assuming that we were talking about a back-end environment exclusively and one where we’re not necessarily benchmarking the arrival of data on the client-side. Basically as soon as the server would get data from MongoDB, it would do the transformation of the documents.

My real world example is a Node server that has to generate rather complex queries that in the past have been handled with simple find/findOneAndUpdate style queries to the database and carry out transformation on the results. I’m interested to see if converting all of these into aggregation pipelines would make the server run better.

My guess would be yes but I was hoping to hear if anyone has had any experience with this?

Hi @Rachelle_98421,

Excellent Question!

Any pipeline stage needs aggregation to fetch the BSON for the document then convert them to internal objects in the pipeline for processing - then at the end of the pipeline they are converted back to BSON and sent to the client.
This has a very significant overhead compared to a find where the BSON is just sent back to the client - this means that aggregation is not a substitute for a find and should not be used unless you need additional functionality that aggregation provides.
Because of this, if you have a simple aggregation pipeline or one which does not cut down the data volume much it can often be quicker to use a find() and perform the aggregation client side. Aggregation wins where the volume of data returned is much less than the original data or where you don’t have the skill to build fast client side aggregations.

I hope it answers your query.

Kanika

2 Likes