Retrieve data from three different unrelated collections in a single query

Hi everyone,

Question:

i’m using the Node MongoDB driver. I’m trying to determine whether i should write a single query that gets data from three collections or whether the database needs to have one collection with references or embedded documents etc… that joins these three unrelated collections.

User case:

During search i get an array of objects, i take the first 10 from the array, each object is meta data about a document belonging in one of the three collections. The collections are unrelated but have some common fields and this meta data is the only way to go get information at later stages.

For example, during search i get and store this array in React state (see example object below), then when the user clicks on a search result, i have to go and loop inside this array so that i can go grab the relevant metadata to be able to retrieve more content…

Example Object inside Array of Objects (Meta data):

[{
  collection: 'pmc_test',
  id_field: 'id_int',
  id_type: 'int',
  id_value: 2657156
},
{
  collection: 'arxiv',
  id_field: 'id_int',
  id_type: 'int',
  id_value: 2651582
},
{
  collection: 'crossref',
  id_field: 'DOI',
  id_type: 'string',
  id_value: "10.1098/rsbm.1955.0005"
},
 ...] // different collections, usually passed with 10 objects

However to display the 10 search results to begin with i have to loop over each object in the array, modify and run a query which could result in 10 separate queries. So i can at least minimise this by doing 3 queries using the $in operator and provide three arrays of IDs representing each collection.

However this is still multiple queries, i have to go to the 1st collection, then 2nd collection, then 3rd collection and then combine all the results together for display search results. This is what i’m trying to avoid. This is how each of the three collections roughly look like.

Any suggestions on what querying approach i could use? Will the database benefit from having a single collection / approach that will avoid having to use the meta data to look in three different collections?

Currently this is a massive breaking change to the application resulting in at least 15 features / api calls needing updates, i’d like to maintain the ability to query one collection and suggest this as an optimal change.

Thanks in advance.

Example collections here:

Arxiv collection: https://gist.github.com/Natedeploys/6734dffccea7b293ca16b5bd7c73a6b6

Crossref collection: https://gist.github.com/Natedeploys/9b0d3b02c665d7507ed75c9d5fbff159