How to know if the Pipeline usage is > 100 MB?

Hi, I’m in Chapter 2: Basic Aggregation - Utility Stages: Cursor-like stages: Part 2 where the instructors speaks about the 100 MB limit of a pipeline and the use of the allowDiskUse option and I was wondering:

  1. How could we know how much memory will our pipeline use when we are developing it and if it is going to exceed the 100 MB limit?
  2. How to know if that amount of memory used will increase while the database increases? We may have an issue in the future if we don’t detect that when developing.
  3. I’m thinking that the 100 MB limit is for the whole pipeline. Am I right? Or is it for every unique $sort stage? So… is it A) we will have 100 MB limit for the whole pipeline process, B) for every $sort stage we will have new 100 MB free space?

Also, he was speaking about using $sort as the first stage or at least be at the near beginning and before a $project stage in order to increase the performance. Does this mean it’s going to be faster and use less memory? Or only use less memory? And how much “less memory” is that?

Thank you

Hi @Luis_Fernando_18793,

Great Question!!

Let me check details regarding this and get back to you.

Thanks,
Sonali

Whilst we await @Sonali_Mamgain’s insight, here are my thoughts:

Points 1 & 2:
A combination of the indexing strategy, the Working Set, the storage engine (WiredTiger or MMAPv1) and the strategy used to query the data all play an important role in this. Your first point of call would be the Explain Plan for the query. Any more info will need to be obtained from diagnostic/monitoring tools/functions such as the database profiler, mongostat, dbStats, collStats, serverStatus, enabling free monitoring in MongoDB, Ops Manager, Cloud Manager, and a host of others. Future memory requirements can sometimes be calculated based on historic usage patterns and regular monitoring.

Point 3:
100MB limit is per stage and thus becomes the limit of the whole pipeline’s result.

With regards $sort, it’s a very resource intensive operation hence the advice to try and sort early so that it can make use of all the available indexes. Otherwise you may end up doing a sort that fails to utilise indexes. The Explain Plan is a good indicator for this. $match should be the first stage in your pipeline rather than $sort.

Some of the questions you ask are more DBA/Architect related and considering you’re on the Developer path (i.e. MongoDB University Developer Learning path), I would suggest that you register for the DBA Learning path where some of these strategies are discussed - M201 MongoDB Performance and M312 Diagnostics and Debugging.

1 Like