How to increase create index build performance?

I did some tests to create index on MongoDB. My goal is to understand how to shorten the build index time. I was thinking to change “maxIndexBuildMemoryUsageMegabytes” value to allow mongod to use more RAM to build index.
https://docs.mongodb.com/manual/reference/parameters/#param.maxIndexBuildMemoryUsageMegabytes

I did the following tests and got some interesting results:
Test #1
VM memory: 2GB
Document size: 100 million docs (7.7 GB data size)
maxIndexBuildMemoryUsageMegabytes: 500 MB
Took 2269 sec to build index.

Test #2
Changed maxIndexBuildMemoryUsageMegabytes to 800 MB
Took 1865 sec to build index.
This is what I expected.

Test #3
Increase the document size to 150 million (11.642 GB db size)
maxIndexBuildMemoryUsageMegabytes: 500 MB
Took 6085 (1.69 hrs) to build index

Test #4
Same as test #3 but changed maxIndexBuildMemoryUsageMegabytes to 800 MB
Took 26315 (7.3 hrs) to build index.
This is NOT what I’m expected. After the index built, used swap is 477 MB.

Then I tried on another VM will more memory and larger document size.

Test #5
VM memory: 4GB
Document size: 200 million docs (15 GB data size)
maxIndexBuildMemoryUsageMegabytes: 500 MB
Took 15032 sec to build index.

Test #6
Same as test #5 but changed maxIndexBuildMemoryUsageMegabytes to 1 GB.
Took 15053 sec to build index.
It didn’t shorten the build index time.

Then I increase the document size to 400 million docs (37 GB data size).
I have tried both 500 MB and 1 GB on maxIndexBuildMemoryUsageMegabytes value.
The build index time are exactly the same.

My question is:

  1. What should I set for maxIndexBuildMemoryUsageMegabytes? I know it is depended on memory size vs data size and other factors.
    What is the good ratio that I can maximize the memory usage on building index but not fall into using swap. (swappiness is set to 1 already).

  2. Any other ideas on how to increase build index performance?

Thanks!

With so little RAM compared to the data size I suspect disk I/O is the bottleneck.

Test #4 seems to corroborate that. Giving more RAM for the index build means less RAM for the working set which means more disk I/O.

The above in my untested opinion.

1 Like