MongoDB concurrent queries on different collections are slow

Ajinkya_Surnis · May 27, 2020, 7:37am

I have multiple collections with 100000 documents in each collection and 10000 columns in each document. There is a python script which executes aggregate queries in a multi-threaded fashion. Each thread invokes an aggregate on a separate collection.

When the script is executed, the amount of time it takes to complete the aggregation is proportional to the number of threads. i.e. the latency linearly increases with number of queries concurrently executed.

If aggregation on single collection takes ‘x’ amount of time, then multi-threaded aggregations on ‘n’ collections takes almost ‘n*x’ amount of time.
My expectation was that the multi-threaded queries would take roughly the same amount of time as the single-threaded one. Now it’s apparent that multiple queries are not executed concurrently in mongodb. Is this the known limitation? Is there any configuration parameter in mongodb to control concurrency?

I’ve asked the same question on stackoverflow as well: multithreading - MongoDB concurrent queries on different collections are slow - Stack Overflow

steevej · May 27, 2020, 12:16pm

In client/server system performance analysis is not as easy as you seem to think. It depends of a multiple of factors. But the first thing to do is to isolate the bottleneck. How did you come to the following conclusion?

It can be the server as you seem to think. What are the configurations?
It can be the network. What amount of data is returned? You could eliminate network issues by running everything locally, but then your script and the server will battle for the same resources which brings us back to 1.
It can be your python script.
It can be your data schema. Do you have covered queries? Do you have big documents that needs to be fetch from disk? Which brings us back to 1.