What's the rough breakeven on amounts of embedded documents?

Brian_McQueen · March 30, 2020, 6:11pm

I’m working with objects that consist of varying levels of depth. One document may contain recursively embedded documents to 6 levels deep, 3, or even none.

These embedded documents contain values that dictate whether or not the overall document needs to be updated. To make finding documents that use these values more performant, the top level documents has a lookup containing the value that the determination is made on.
This means that if A embeds B and C, and B embeds B and B``, and C embeds C and C, A has a lookup of [B, C, B`, B, C`, C``]

As these documents are partitioned by owner on the collection level (CollectionA belongs to StorefrontA, etc), I also have a kind of Directory/Metadata object correlating to each Collection that also has a lookup which is essentially a collation of all the lookups for each document in the respective collection.

It’s at this level that my question comes:
There may be upwards of 6000 values in the Directory/Metadata object’s lookup field, which is a simple map of
{“value1”: “amount of times the value is currently in use”,
“value2”: “amount of times the value is currently in use”,
“valueN”: “amount of times the value is currently in use”}.

At what point should I break this lookup into a separate collection? At ~6000, I’m concerned I’ve already passed that point, but MongoDB is the first DB I’ve been learning since the ground up, so I’m not sure what scale of performance I should be considering.

I know I can use compass or .explain() to give me a good idea of the current performance implications, but I was hoping someone could supply some heuristics or something of the sort.