Thank you for your response Kevin
We are working on a dynamic market scanner. It lets users to define and run custom dynamic scans across our data set. Scanners are filters built upon a set of primitives, (i.e. last price or previous day closing price) but also functions’ result like total volume, that is the sum of all the trades’ size of the day up to the current timestamp.
MongoDB’s aggregation pipeline seems to be the perfect match for this new feature, because it can express many of the primitives we need without precomputing the values, which is an essential requirement for a dynamic scanner.
So far we found that simple primitives like closing prices are pretty fast, as they essentially need just a lookup across the symbols on a given timestamp. Unfortunately this is not the case for aggregated primitives, like the total volume, that has to scan thousands rows of the selected symbols in that day.
We tried different setups, and we found that nested documents are faster than flat ones, because they need less disk access. Disk is obviously playing a big role here, and for that reason we have pretty fast and expensive NVMEs disk. We ran a set of benchmarks to test disk performances, and we found that our NVME disks are able to match up the memory bandwidth when reading blocks with a 512KB size : 5,5GB/s vs 8,5 GB/s.
This should mean that sequential reads from disk can be as fast as memory, and for our scenario means we should be able to read 3 GB of uncompressed data in almost half a second. It turns out that Mongo is way slower than that.
While exploring the issue, we found that Mongo is actually allocating blocks at 4KB (wiredTiger.block-manager.file allocation unit size).
So, we tried the same disk benchmarks with the same block size, assuming this is the block size mongo is reading from the disks. The benchmarks show a 850MB/s maximum bandwidth, ~7 time less than the optimum. This matches what we are seeing on our mongo benchmarks: the aggregation pipeline is 6 time faster on nested documents than on unwound flat ones.
So, we are wondering if we can improve the overall mongodb performances by increasing the wiredtiger file allocation unit size to 512K, matching the optimum block size of our disk benchmarks. Is that possible? Is there any other tricks to achieve the NVMEs maximum read speed from Mongo?
Let me know what you think and feel free to ask me any more detail.