I would like to know if you think that having a mongodb instance with 1000 databases named Chicago, Detroit, Washington and so on, and then having 3 (same name and data) collections per database with about 1000 documents each is a good idea, in ANY situation.
3000 collections isn’t a large number of collections. 1 million is though.
Using only 3 collections and in each you have a field city_id with an index is also a good solution. I’d prefer this one but 3000 collections is still manageable I think.
Thank you for the feedback.
But remember, it is not just a matter of 3000 collections… they are 3 collections (always the same) , across 1000 databases for the same application, each collection with about 1000 documents.
The number of databases seems normal/reasonable to you?
The number of databases is not really important. It is the number of collections. Database is just a grouping of collections under a name space. Number of collections is important because a collection is at least 2 files. The number of open files is the limiting factor.
The more databases and collection, the more complexity you will have to deal with in your code, that’s for sure. But maybe it makes more sense that way.
But our answers were more around the technical feasibility.
Here is a /data/db from a brand new MongoDB node:
-rw------- 1 999 docker 20480 May 5 15:48 collection-0-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-10-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-11-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-13-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-15-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 collection-17-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 collection-19-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 collection-21-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-2-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-4-7189677319263731.wt
-rw------- 1 999 docker 36864 May 5 15:48 collection-6-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 collection-8-7189677319263731.wt
drwx------ 2 999 docker 4096 May 5 15:49 diagnostic.data
-rw------- 1 999 docker 20480 May 5 15:48 index-12-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-14-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-16-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-1-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 index-18-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 index-20-7189677319263731.wt
-rw------- 1 999 docker 4096 May 5 15:48 index-22-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-3-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-5-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-7-7189677319263731.wt
-rw------- 1 999 docker 20480 May 5 15:48 index-9-7189677319263731.wt
drwx------ 2 999 docker 4096 May 5 15:48 journal
-rw------- 1 999 docker 36864 May 5 15:48 _mdb_catalog.wt
-rw------- 1 999 docker 2 May 5 15:48 mongod.lock
-rw------- 1 999 docker 20480 May 5 15:48 sizeStorer.wt
-rw------- 1 999 docker 114 May 5 15:48 storage.bson
-rw------- 1 999 docker 47 May 5 15:48 WiredTiger
-rw------- 1 999 docker 4096 May 5 15:48 WiredTigerHS.wt
-rw------- 1 999 docker 21 May 5 15:48 WiredTiger.lock
-rw------- 1 999 docker 1338 May 5 15:48 WiredTiger.turtle
-rw------- 1 999 docker 102400 May 5 15:48 WiredTiger.wt
One collection is at least one file for the data and one file for the _id index. But you will likely have 5 indexes per collections. So 3000 collections * (1+5 files) = 18000 open files, and probably a lot more if your collections and indexes grow large enough to generate multiple files for each.
From an operating system perspective, this could start to be an issue at some point.
I would not do it in most case for the reasons given by @MaBeuLux88. The exemption might be a multi-tenant application. Using your city example, each tenant is a city and the data of each city is isolated and not accessible from another city. The data being isolated one city to the other the code can still be simple as your queries for a given city is limited to the collections of this city. In that case, having each city a database, helps security wise because you can give roles to the users of a given city that enforce the data isolation of each city.