Related with querying a single document

Gabriel_Betancourt · July 13, 2020, 7:04am

Let’s say I have an entity (Document) with 20 fields, one of them is an array of objects with 2 fields each. I read that when a single document is queried, mongo stores in memory that document (Not 100% sure of this fact though, feel free to correct me if I am wrong).
If I query that document but I want to retrieve just the rest of the fields (Without the array of objects), mongo will map the whole document or just the fields I specify? I am a little worried about this fact since that array con growth to likely 500 or more elements.

In one of the basics queries in my Workload that array is essential, but not when I query the document directly (By ID).

Any info or suggestions will be welcome.
Thanks in advance!

michael_hoeller · July 13, 2020, 8:39am

Hello @Gabriel_Betancourt welcome to the community!

That is true, MongoDB will read the data into your working set (say: RAM) . Basically you need to make sure that your data fits into RAM to get a good performance. For a first step to get familiar I like to point you to the free MongoDB University classes (in this sequence)

Concerning your query question: By default, queries in MongoDB return all fields in matching documents. To limit the amount of data that MongoDB sends to applications, you can include a projection document to specify or restrict fields to return. Taken from: Project Fields to Return from Query
In this document you can read how to limit the files returned, and further down also how to show only specific fields of an array (just in case).
Stepping forward, I like to mention that infinity growing arrays or very large arrays can lead to performance issues. How to workaround that can vary and depends on your use case. One common path is to move a huge constantly growing array to an extra collection and utilize the MongoDBs features e.g. indexing, covered index searches, …

Hope that helps
Michael

Gabriel_Betancourt · July 13, 2020, 10:00am

Hello @michael_hoeller,
Thanks for the fast and helpful reply!

I am relative new in the NoSQL world from a SQL background so there is a lot to learn. I am also indeed checking those lesson from MongoDB university at the same time I am building the project with Atlas.

I was aware about the projection feature, but my doubt was if even when I specify certain fields to return in the query, the full document still will be managed in memory (RAM as you mentioned). If I understood well what you said, it actually does, so yes, probably that array there can be problematic over time.

My use case can be described like this:
I need to query a collection of users and filter for a few fields, but also, check into that embedded array that the user who made the query doesn’t exist there. The embedded option for this particular query looks useful, as the array is embedded I can add a new condition to the query in an easy way, but now with the fact that mongo ‘‘read’’ the whole document , the growing possibility of the array don’t looks very well.

About your approach, resolve the array growing problem but I’ll need then to make join with the other documents to validate its ‘‘non-existen’’ state, and we are talking about a critical and very often operation in the system. So in terms of performance not sure what is better for my case.

michael_hoeller · July 15, 2020, 4:25pm

Hello @Gabriel_Betancourt,

sorry for the delay, I was abroad. Concerning your use case: I don’t think there is enough clarity to provide a recommendation. You mention that you commonly want to look for fields in an embedded array (so all of the array is being used?), but you are also concerned about RAM. I’d like point you to the following documents to support your decision:

great blog post from @Ken_Alger and @Daniel_Coupal Building with patterns: A summary
Data Modeling Introduction
Data Model Design
MongoDB University Class: M320 Data Modeling .

When you still feel unsure after visiting the mentioned docs, feel free to provide some sample data and what you want to archive. I am pretty sure that we, as in the community, will find an answer.

Regards,
Michael

Gabriel_Betancourt · July 22, 2020, 11:45am

Hello @michael_hoeller.
I was researching and taking a look of all the docs, so allow me to share with you the conclusions for a second approach about it.

My use case is the next: I have a Users collection, but I need to keep a record of the interactions of that user with other users, in an array. Something like this:

{
"degree" : NumberInt(1), 
"rating" : NumberInt(3), 
"records" : [
    {
        "userId" : ObjectId("5f0b29c78f491172cfe8b049"), 
        "type" : "pending"
    },
   {
        "userId" : ObjectId("5f0b29c78f491172ct48b077"), 
        "type" : "done"
    }
]}

This is in case of the embedded solution. One of the main queries is to filter the Users collection and fetch those users with whom I, (The user who query) did not have any interaction yet, so I need to check in the ‘‘records’’ array that my ID is not there.

The thing is that this array can be short in some cases (40, 50) in the minor case, but it can be hundreds or thousands as well, so, the Users collection is getting queried very often (and the previous array check is not always necessary), Taking that in consideration, I thought having that array embedded is not a good idea.

So, I think the other solution is to have the Records in another collection, One to One relationship and made the query via $lookup (I already tested in Compass and it’s possible, it works).

The records collection will look something like:

{ 
"_id" : ObjectId("5f0fb901b320f5ec21269279"), 
"userId" : ObjectId("5f0b29c78f491172cfe8b04a"), 
"record" : [
    {
        "user" : ObjectId("5f0b29c78f491172cfe8b049"), 
        "name" : "Gabriel", 
        "type" : "progress"
    }, 
    {
        "user" : ObjectId("5f0b29c78f491172cfe8b04b"), 
        "name" : "Rivaldo", 
        "type" : "sended"
    }
]

}

Also it allows me to add more fields or modify the Records schema if the specificities of the project changes, (very high probability) without worry about growing, or modifying the users collection too often, since it is the most important collection of the project.

What are your thoughts about it? Thanks in advance!

Regards, Gabriel