listDatabases slow with nameOnly=false when many collections exist

Hi.

On my 4.4.2 deployment, I’ve created one database with 20000 collections. When I issue show dbs or db.adminCommand( { listDatabases: 1}) in mongo shell for several minutes, mongod will always log the slowQuery message:

{"t":{"$date":"2021-05-18T16:51:38.567+08:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn839","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","comma
nd":{"listDatabases":1.0,"nameOnly":false,"lsid":{"id":{"$uuid":"eed01c17-7afa-4207-9144-b7aec97721fa"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1621326575,"i":1}},"signature":{"hash":{"$b
inary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"reslen":405,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":5}},"ReplicationStateTr
ansition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":5}},"Database":{"acquireCount":{"r":4}},"Collection":{"acquireCount":{"r":20014}},"Mutex":{"acquireCount":{"r":4}},"oplog":{"acquire
Count":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":715}}

As the number of collections get bigger, the durationMillis gets larger.

Another thing I’ve noticed that if nameOnly: true is given, it’ll be fast and no slow query.

The following metrics are those that are noticeable. We can see the latency of commands goes up.

  1. Is the case expected ? Can we make show dbs fast even if there’re many collections ?
  2. I’ve noticed db.stats() also has slow query log, but show tables hasn’t. Any possible requests include any CRUD will have slow query log under this situation ?

Hi @Lewis_Chan,

20k collections is a lot to handle for a single replica set (RS). Usually MongoDB doesn’t recommend more than 10k per RS. I’m not surprised to see the performance go down.

One collection corresponds to at least 2 files: 1 for the data, 1 for the _id index, and probably more index files if you are doing this correctly and probably more data files if you have a lot of data in this collection (as the files are split at some point).
The OS has to keep all these files open and 20k times 3/4/5 per collection is just too much.

So the real question here is: why do you have so many collections? Can you find a strategy to reduce that number?

2 Likes

Actually some customers/applications just use mongo like that. 10000 is really small number, comparing to mysql, although it has shared tablespace to alleviate overheads.

I’m curious whether mongo team can guarantee all your customers don’t create collections as they like ? When customers do that no matter what kind of business logic is, is there always a way to reconstruct the original schema ? If there’s no way, maybe mongodb is not suitable for that application ?

In the screenshot you shared, you could replace MySQL by MongoDB and it would still work. Both systems rely on the same OS constraints and obey the same rules.

Sharding could be a solution here though if this large number of collections is the only way. 10k collections maximum per replica set is a generally accepted good practice, but you could totally use a 10 shards cluster and scale easily to 10*10k collections.

If you are building a multi-tenant application, I would recommend checking out this presentation. Multi-tenant application are usually where this problem of large number of connections occurs.

Also, remember that the document design is a crucial step to be successful with MongoDB.

Data that is accessed together should be stored together

2 Likes