Comments collection design

Hi I am new to MongoDB.

From the database design perspective, is it better to store user_id instead of storing ‘name’ and ‘email’ to the comments collection? For example, if the application allows users to update their name then the name stored to the comments collection are no longer update to date.

With my relational database head on, I’d agree that it’s probably better to store the primary key of the user in the comments collection, rather than duplicating other fields from the users collection.

But we’re not using a relational database here, so the rules of relational database design don’t necessarily apply. I suspect this design may be for performance reasons - whenever we display a comment we probably want to display the user’s name and email address too, and if we didn’t store those in the comments collection then we’d need a $lookup every time we wanted to display a comment, meaning more work for the server to do, as well as more complex code for accessing the comments collection.

Thanks for your response, Simon. Are there any guidelines or examples to show when should we use $look up for 1 to n or 1 to 1 relationship?

That’s a question that I don’t know the answer to, I’m afraid, and one of the things I hope to lean here.

The article below give us some hints of designing the schema

1 Like

Hi @brianhei,

There is a course coming up soon specifically on schema design and data modeling on MongoDB, so stay tuned for that.

In the meantime I recommend you take a look to this set of recent blog posts on the subject:

Hope this helps clarify some of your questions.

But in regard to your particular question, in MongoDB, you want to access data together when it is used together. That is a nice rule of thumb that you should take into account when designing your schema.

N.

1 Like

So I’ve read the blog posts linked by @Norberto and @brianhei, and in the absence of the upcoming course on schema design and data modeling, I thought I’d check my understanding here. I hope it’s not straying too off-topic.

Taking a collection of recipes as an example, because it’s used as an example in one of those posts, most of the data in a recipe consists of a list of ingredients and a list of instructions. One of the instructions or one of the ingredients doesn’t make a great deal of sense on its own outside the context of the recipe, so in a document database it makes sense to embed both of them into the recipe document.

However, in my personal data model for a recipe, each ingredient consists of 3 things; what the ingredient is (e.g. flour), how much of it (e.g. 250) and the unit of measurement (e.g. grams). OK so far, my recipe document can have an array of ingredient child documents, each of which has those 3 fields.

But we humans are inconsistent in the way we measure things. Some people measure liquids in multiples or fractions of pints, some in millilitres, some in cups. Some people measure solid ingredients in grams, some in ounces, some in cups (which I’m sure is strictly speaking a liquid measure, but when I get my recipes by searching the web I keep finding myself measuring out a 1/2 pint of flour, what’s that all about? But that’s definitely out of scope for this forum).

So, say I’m collecting recipes from all over the world, written by people who use different units of measurement, and one of my requirements for the recipe application is that it’ll display the ingredients using my preferred units of measurement rather than the units of measurement in the original recipe.

Forget for now how we convert from grams to ounces etc, I’m sure I can work that bit out on my own. But how should we store that relationship between the ingredient quantity in (for example) grams and the ingredient quantity in (for example) ounces? I’m reasonably sure we don’t want to embed the quantity in both grams and ounces in every ingredient in every recipe, because the relationship between a gram and an ounce isn’t something which is specific to that recipe, it’s common to all recipes.

So I suspect that the unit of measurement, along with its relationship with other units of measurement, actually belongs in a separate collection, with all the extra $lookups required to support my requirement to see the ingredients in my preferred units of measurement?

There are other things to think about here, such as the fact that MongoDB doesn’t enforce foreign key constraints between collections (e.g. to prevent someone entering a recipe which specifies the amount of some ingredient in units which the application doesn’t recognise) and that if I go down that route then my application will need to do that sort of validation itself rather than relying on the database to do it, but at this point I’m not too worried about that.

I just want to know, in this scenario, is it a good idea to put units of measurement into their own collection and have the overhead of extra $lookups, or would it be better to embed the quantity of ingredients in all known units of measurements directly in my recipe document and have the overhead of using more disk space?

Sorry, that’s a rather long question, but I think it’s quite an important one for people like me who are on the journey from working with relational databases to working with document databases. And I hope I’ve framed it in a way which will be useful for other people on the same journey.

1 Like

Hi @Simon_39939,
This is a question that we will address in a schema design course, but I do not want to make you wait for that so here it goes:

In this example is around units of liquid measurements, but it could also be applied to date time formats (UTC vs DST) or even metric units (km vs miles).
As a best practice, we recommend your data to be stored in a convertible unit value that is consistent across documents.
In this particular example you have two choices.
Convert all documents to use a single measurement in the storage layer, making any type of unit conversion on write operations.
That said, if your application / system / script needs to merge and handle lots of different legacy data or the write workload is preventing you from doing that conversion, you should at least store, along side the value, the unit of that numeric value.

E.j:

{ 
  name: 'fruit cake', 
  ingredients: [
    {  name: 'flour', value: 30, unit: 'gr' } 
   ...
  ]
},
{ 
  name: 'buttermilk pancakes', 
  ingredients: [
    {  name: 'flour', value: 300, unit: 'ounces' } 
   ...
  ]
}

You can then create a view for each of the units by applying the conversion between the different units with a projection stage.

View for each unit or just applying the conversion with an aggregation pipeline stage that collects from your personal record your preferred unit is very possible. However, I would prefer using views for this.

I would not create units of measurement in a separate collection. These tend to be well known and the conversions based on constant values (at least in this particular example) so there would be little need for an extra collection.

Nothing to be worried about. Keep them coming.

N.

1 Like

Thanks @Norberto. So if I’ve understood the documentation on views correctly, a view is effectively an aggregation pipeline which becomes part of the database, and which can be used in many ways as if it were a collection? If so then it’s functionally fairly similar to views in SQL Server. I can see how a view could be used to convert the units of measurement in a recipe into my preferred units, even if I’m not sure just yet what the pipeline would look like. I may have to play with this idea… :slight_smile:

I know this is quite an old thread now, but I’ve just watched a video of a presentation from MongoDB World 2017 which is relevant to the subject of schema design.

https://www.mongodb.com/presentations/advanced-schema-design-pattern

(it’s 36 minutes long)

I’m looking forward to the course on schema design that @Norberto hinted at, but in the mean time, this video talks about some of the patterns that we can use, for example…

The attribute pattern. Say you’re running a web site where people sell things, and you want users to be able to search on attributes of those things, and you don’t know at design time what those attributes are. So one user might be looking for laptops, and they’ll probably be interested in the laptop’s CPU speed, RAM size and disk size. Another user might be searching for trousers, so they’ll be searching on waist size and leg size. You probably don’t want to create a field and an index for every possible attribute that every possible thing could have, because that’d be really bad for write performance. So you don’t want this…

[
{
  product: 'jeans',
  legSize: 30,
  waistSize: 30,
},  
{
  product: 'laptop',
  cpuSpeed: '3GHz',
  RAM: '8GB',
  diskSize: '1.5TB'
}
]

But you could do this instead…

[
{
  product: 'jeans',
  attributes: 
  [ 
    { name: 'legSize', value: 30 },
    { name: 'waistSize', value: 30 }
  ]
},
{
  product: 'laptop',
  attributes:
  [
    { name: 'cpuSpeed', value: '3GHz' },
    { name: 'RAM', value: '8GB' },
    { name: 'diskSize', value: '1.5TB' }
  ]
}
]

And you can create an index on the attributes array field, and then users can search for things where an attribute whose name you don’t know at design time has a particular value, and MongoDB will use the index for that query rather than doing a (resouce intensive) collection scan.

That’s just one example, there are other patterns available, and I can’t possibly claim to be an expert.

@brianhei I recommend doing the M201 performance course, which explains indexes and how to make sure your indexes and queries are aligned with each other.

@Norberto any news on that schema design course? Thank you for all your help as always :slight_smile:

1 Like

Hi @Simon_39939,

The M320: Data Modeling course will soon be published. Most probably around July/August 2019. Our team is working hard on this. :slight_smile:

Kanika

1 Like

Excellent news, I look forward to M320 :slight_smile:

Hi @Simon_39939,

it is open for registration:
https://university.mongodb.com/courses/M320/about

N.

1 Like