My situation: I have a post collection where each post have many comments and comments collections.
post_id: "post_1" comments: [comment_1, comment_2, comment_3, ..., comment_10000] content: "Here is a row of post collection"
and
comment_id: "comment_1", content: "Here is a row of comment collection",
Now, I can easy retrieve a post with comments by query:
post.find()
then lookup or populate (using mongoose). I call this as SCHEMA_1, follow one-to-many pattern.
After, I think about one-to-sqiullions patterns, then build other SCHEMA_2, that using reverse reference in comment collection, like:
comment_id: "comment_1", post_id: "post_1"
Now, I aslo can easy retrieve comments of a post by query:
comment.find( {post_id} )
Just one query with post_id had index.
OVERALL query for read:
- With SCHEMA_1: post.find() then lookup(or populate with mongoose)
- With SCHEMA_2: comment.find( {post_id} ) - just one query with post_id had index
MY QUESTION IS: Which schema is better performance for read query ? I don’t actually deeply understand how lookup work behind with array of references. In SCHEMA_1, if for each reference of comment, lookup need travel comment collection one (like correlate subquerry in SQL), it’s terrible if array of objectID is large.
And if I follow one-to-sqiullition patterns, I have a thought that performance is better because it just run one query, no need to JOIN.
How can I understand this problem ?