Hi, we are creating an index using the rolling build index pattern (https://docs.mongodb.com/manual/tutorial/build-indexes-on-replica-sets/) on a very large collection (aprox 450 M documents).
Index is a partial index where all documents at the beginning of the collection does not have the contentId field:
db.getCollection('activities').createIndex({
_userId: 1,
_userActivityTypeId: 1,
contentId: 1,
_entityId: 1,
_createdAt: 1
},
{
partialFilterExpression: {
contentId: { $exists: true }
},
name: 'activities__userId__userActivityTypeId__contentId__entityId__createdAt'
});
The creation is very slow, estimation time to complete is around 100 hours where our oplog time window is currently only 23 hours.
To improve the process we did the following tuning but without luck:
- increased
maxIndexBuildMemoryUsageMegabytes
from 200 to 20000. However we did not see increase on memory usage. - turn off TTL Monitor
Looking at the I/O metrics we found that read speed is almost stable at 1.5 MB/s.
CPU is not bounded, memory is not under pressure, IOPS are low (largely under the limit of the VM/Disk)
During normal operation as secondary node we can observe the read speed go beyond 1.5 MB/s, and also reading an index file manually provide us the speed of 47.6 MB/s:
$ dd if=index-87--9072640376711127209.wt of=/dev/null
7854200+0 records in
7854200+0 records out
4021350400 bytes (4.0 GB, 3.7 GiB) copied, 84.4896 s, 47.6 MB/s
As commented above the index creation process is just scanning the collection and not write anything, as at the beginning all the documents do not have the contentId field. We can observe that the index file size stay always at 4096 bytes.
We have been looking at the doc to try to find any parameter that can improve the read speed of the scanning collection but without luck.
Does anybody know any way to speed up the collection scan speed? Or if this limitation is documented somewhere?
Further Info:
- MongoDB 4.2.12
- Topology: ReplicaSet 3 data bearing nodes
- VM: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1046-azure x86_64)
- CPU: 4 core
- RAM: 32 GiB
Thank you so much,
Francesco