HomeLearnHow-to

How to Use the Union All Aggregation Pipeline Stage in MongoDB 4.4

Published: Aug 17, 2020

  • MongoDB
  • Atlas
  • MongoDB 4.4
  • ...

By Adrienne Tacke

Share

With the release of MongoDB 4.4 comes a new aggregation pipeline stage called $unionWith. This stage lets you combine multiple collections into a single result set!

Here's how you'd use it:

Simplified syntax, with no additional processing on the specified collection

1
2
3
db.collection.aggregate([ { $unionWith: "<anotherCollection>" } ])

Extended syntax, using optional pipeline field

1
2
3
db.collection.aggregate([ { $unionWith: { coll: "<anotherCollection>", pipeline: [ <stage1>, etc. ] } } ])

⚠ If you use the pipeline field to process your collection before combining, keep in mind that stages that write data, like $out and $merge, can't be used!

Your resulting documents will merge your current collection's (or pipeline's) stream of documents with the documents from the collection/pipeline you specify. Keep in mind that this can include duplicates!

#This sounds kinda familiar..

If you've used the UNION ALL operation in SQL before, the $unionWith stage's functionality may sound familiar to you, and you wouldn't be wrong! Both combine the result sets from multiple queries and return the merged rows, some of which may be duplicates. However, that's where the similarities end. Unlike MongoDB's $unionWith stage, you have to follow a few rules in order to run a valid UNION ALL operation in SQL:

  • Make sure your two queries have the same number of columns
  • Make sure the order of columns are the same
  • Make sure the matching columns are compatible data types.

It'd look something like this in SQL:

1
2
3
4
5
6
SELECT column1, expression1, column2 FROM table1 UNION ALL SELECT column1, expression1, column2 FROM table2 WHERE [conditions]

With the $unionWith stage in MongoDB, you don't have to worry about these stringent constraints.

#So how is MongoDB's $unionWith stage different?

The most convenient difference between the $unionWith stage and other UNION operations is that there's no matching schema restriction. This flexible schema support means you can combine documents that may not have the same type or number of fields. This is common in certain scenarios, where the data we need to use comes from different sources:

  • TimeSeries data that's stored by month/quarter/some other unit of time
  • IoT device data, per fleet or version
  • Archival and Recent data, stored in a Data Lake
  • Regional data

With MongoDB's $unionWith stage, combining these data sources is possible.

Ready to try the new $unionWith stage? Follow along by completing a few setup steps first. Or, you can skip to the code samples. 😉

#Prerequisites

First, a general understanding of what the aggregation framework is and how to use it will be important for the rest of this tutorial. If you are unfamiliar with the aggregation framework, check out this great Introduction to the MongoDB Aggregation Framework, written by fellow dev advocate Ken Alger!

Next, based on your situation, you may already have a few prerequisites setup or need to start from scratch. Either way, choose your scenario to configure the things you need so that you can follow the rest of this tutorial!

Choose your scenario:

I don't have an Atlas cluster set up yet:

  1. You'll need an Atlas account to play around with MongoDB Atlas! Create one if you haven't already done so. Otherwise, log into your Atlas account.
  2. Setup a free Atlas cluster (no credit card needed!). Be sure to select MongoDB 4.4 (may be Beta, which is OK) as your version in Additional Settings!

💡 If you don't see the prompt to create a cluster: You may be prompted to create a project first before you see the prompt to create your first cluster. In this case, go ahead and create a project first (leaving all the default settings). Then continue with the instructions to deploy your first free cluster!

  1. Once your cluster is set up, add your IP address to your cluster's connection settings. This tells your cluster who's allowed to connect to it.
  2. Finally, create a database user for your cluster. Atlas requires anyone or anything accessing its clusters to authenticate as MongoDB database users for security purposes! Keep these credentials handy as you'll need them later on.
  3. Continue with the steps in Connecting to your cluster.

I have an Atlas cluster set up:

Great! You can skip ahead to Connecting to your cluster.

Connecting to your cluster

To connect to your cluster, we'll use the MongoDB for Visual Studio Code extension (VS Code for short 😊). You can view your data directly, interact with your collections, and much more with this helpful extension! Using this also consolidates our workspace into a single window, removing the need for us to jump back and forth between our code and MongoDB Atlas!

💡 Though we'll be using the VS Code Extension and VS Code for the rest of this tutorial, it's not a requirement to use the $unionWith pipeline stage! You can also use the CLI, language-specific drivers, or Compass if you prefer!

  1. Install the MongoDB for VS Code extension (or install VS Code first, if you don't already have it 😉).
  2. To connect to your cluster, you'll need a connection string. You can get this connection string from your cluster connection settings. Go to your cluster and select the "Connect" option:
Connecting to your Atlas cluster
  1. Select the "Connect using MongoDB Compass" option. This will give us a connection string in the DNS Seedlist Connection format that we can use with the MongoDB extension.
Choosing the "Connect with MongoDB Compass" option

💡 The MongoDB for VS Code extension also supports the standard connection string format. Using the DNS seedlist connection format is purely preference.

  1. Skip to the second step and copy the connection string (don't worry about the other settings, you won't need them):
Copy the connection string in MongoDB Atlas connection settings
  1. Switch back to VS Code. Press Ctrl + Shift + P (on Windows) or Shift + Command + P (on Mac) to bring up the command palette. This shows a list of all VS Code commands.
Showing the command palette in VS Code
  1. Start typing "MongoDB" until you see the MongoDB extension's list of available commands. Select the "MongoDB: Connect with Connection String" option.
Searching for MongoDB extension commands
  1. Paste in your copied connection string. 💡 Don't forget! You have to replace the placeholder password with your actual password!
Pasting and modifying our MongoDB Atlas cluster connection string
  1. Press enter to connect! You'll know the connection was successful if you see a confirmation message on the bottom right. You'll also see your cluster listed when you expand the MongoDB extension pane.

With the MongoDB extension installed and your cluster connected, you can now use MongoDB Playgrounds to test out the $unionWith examples! MongoDB Playgrounds give us a nice sandbox to easily write and test Mongo queries. I love using it when prototying or trying something new because it has query auto-completion and syntax highlighting, something that you don't get in most terminals.

Let's finally dive into some examples!

#Examples

To follow along, you can use these MongoDB Playground files I have created to accompany this blog post or create your own!

💡 If you create your own playground, remember to change the database name and delete the default template's code first!

#$unionWith using a pipeline

📃 Use this playground if you'd like follow along with pre-written code for this example.

Right at the top, specify the database you'll be using. In this example, I'm using a database also called union-walkthrough:

1
use('union-walkthrough');

💡 I haven't actually created a database called union-walkthrough in Atlas yet, but that's no problem! When the playground runs, it will see that it does not yet exist and create a database of the specified name!

Next, we need data! Particularly about some planets. And particularly about planets in a certain movie series. 😉

Using the awesome SWAPI API, I've collected such information on a few planets. Let's add them into two collections, separated by popularity.

Any planets that appear in at least 2 or more films are considered popular. Otherwise, we'll add them into the lonely_planets collection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
// Insert a few documents into the lonely_planets collection. db.lonely_planets.insertMany([ { "name": "Endor", "rotation_period": "18", "orbital_period": "402", "diameter": "4900", "climate": "temperate", "gravity": "0.85 standard", "terrain": "forests, mountains, lakes", "surface_water": "8", "population": "30000000", "residents": [ "http://swapi.dev/api/people/30/" ], "films": [ "http://swapi.dev/api/films/3/" ], "created": "2014-12-10T11:50:29.349000Z", "edited": "2014-12-20T20:58:18.429000Z", "url": "http://swapi.dev/api/planets/7/" }, { "name": "Kamino", "rotation_period": "27", "orbital_period": "463", "diameter": "19720", "climate": "temperate", "gravity": "1 standard", "terrain": "ocean", "surface_water": "100", "population": "1000000000", "residents": [ "http://swapi.dev/api/people/22/", "http://swapi.dev/api/people/72/", "http://swapi.dev/api/people/73/" ], "films": [ "http://swapi.dev/api/films/5/" ], "created": "2014-12-10T12:45:06.577000Z", "edited": "2014-12-20T20:58:18.434000Z", "url": "http://swapi.dev/api/planets/10/" }, { "name": "Yavin IV", "rotation_period": "24", "orbital_period": "4818", "diameter": "10200", "climate": "temperate, tropical", "gravity": "1 standard", "terrain": "jungle, rainforests", "surface_water": "8", "population": "1000", "residents": [], "films": [ "http://swapi.dev/api/films/1/" ], "created": "2014-12-10T11:37:19.144000Z", "edited": "2014-12-20T20:58:18.421000Z", "url": "http://swapi.dev/api/planets/3/" }, { "name": "Hoth", "rotation_period": "23", "orbital_period": "549", "diameter": "7200", "climate": "frozen", "gravity": "1.1 standard", "terrain": "tundra, ice caves, mountain ranges", "surface_water": "100", "population": "unknown", "residents": [], "films": [ "http://swapi.dev/api/films/2/" ], "created": "2014-12-10T11:39:13.934000Z", "edited": "2014-12-20T20:58:18.423000Z", "url": "http://swapi.dev/api/planets/4/" }, { "name": "Bespin", "rotation_period": "12", "orbital_period": "5110", "diameter": "118000", "climate": "temperate", "gravity": "1.5 (surface), 1 standard (Cloud City)", "terrain": "gas giant", "surface_water": "0", "population": "6000000", "residents": [ "http://swapi.dev/api/people/26/" ], "films": [ "http://swapi.dev/api/films/2/" ], "created": "2014-12-10T11:43:55.240000Z", "edited": "2014-12-20T20:58:18.427000Z", "url": "http://swapi.dev/api/planets/6/" } ]); // Insert a few documents into the popular_planets collection. db.popular_planets.insertMany([ { "name": "Tatooine", "rotation_period": "23", "orbital_period": "304", "diameter": "10465", "climate": "arid", "gravity": "1 standard", "terrain": "desert", "surface_water": "1", "population": "200000", "residents": [ "http://swapi.dev/api/people/1/", "http://swapi.dev/api/people/2/", "http://swapi.dev/api/people/4/", "http://swapi.dev/api/people/6/", "http://swapi.dev/api/people/7/", "http://swapi.dev/api/people/8/", "http://swapi.dev/api/people/9/", "http://swapi.dev/api/people/11/", "http://swapi.dev/api/people/43/", "http://swapi.dev/api/people/62/" ], "films": [ "http://swapi.dev/api/films/1/", "http://swapi.dev/api/films/3/", "http://swapi.dev/api/films/4/", "http://swapi.dev/api/films/5/", "http://swapi.dev/api/films/6/" ], "created": "2014-12-09T13:50:49.641000Z", "edited": "2014-12-20T20:58:18.411000Z", "url": "http://swapi.dev/api/planets/1/" }, { "name": "Alderaan", "rotation_period": "24", "orbital_period": "364", "diameter": "12500", "climate": "temperate", "gravity": "1 standard", "terrain": "grasslands, mountains", "surface_water": "40", "population": "2000000000", "residents": [ "http://swapi.dev/api/people/5/", "http://swapi.dev/api/people/68/", "http://swapi.dev/api/people/81/" ], "films": [ "http://swapi.dev/api/films/1/", "http://swapi.dev/api/films/6/" ], "created": "2014-12-10T11:35:48.479000Z", "edited": "2014-12-20T20:58:18.420000Z", "url": "http://swapi.dev/api/planets/2/" }, { "name": "Naboo", "rotation_period": "26", "orbital_period": "312", "diameter": "12120", "climate": "temperate", "gravity": "1 standard", "terrain": "grassy hills, swamps, forests, mountains", "surface_water": "12", "population": "4500000000", "residents": [ "http://swapi.dev/api/people/3/", "http://swapi.dev/api/people/21/", "http://swapi.dev/api/people/35/", "http://swapi.dev/api/people/36/", "http://swapi.dev/api/people/37/", "http://swapi.dev/api/people/38/", "http://swapi.dev/api/people/39/", "http://swapi.dev/api/people/42/", "http://swapi.dev/api/people/60/", "http://swapi.dev/api/people/61/", "http://swapi.dev/api/people/66/" ], "films": [ "http://swapi.dev/api/films/3/", "http://swapi.dev/api/films/4/", "http://swapi.dev/api/films/5/", "http://swapi.dev/api/films/6/" ], "created": "2014-12-10T11:52:31.066000Z", "edited": "2014-12-20T20:58:18.430000Z", "url": "http://swapi.dev/api/planets/8/" }, { "name": "Coruscant", "rotation_period": "24", "orbital_period": "368", "diameter": "12240", "climate": "temperate", "gravity": "1 standard", "terrain": "cityscape, mountains", "surface_water": "unknown", "population": "1000000000000", "residents": [ "http://swapi.dev/api/people/34/", "http://swapi.dev/api/people/55/", "http://swapi.dev/api/people/74/" ], "films": [ "http://swapi.dev/api/films/3/", "http://swapi.dev/api/films/4/", "http://swapi.dev/api/films/5/", "http://swapi.dev/api/films/6/" ], "created": "2014-12-10T11:54:13.921000Z", "edited": "2014-12-20T20:58:18.432000Z", "url": "http://swapi.dev/api/planets/9/" }, { "name": "Dagobah", "rotation_period": "23", "orbital_period": "341", "diameter": "8900", "climate": "murky", "gravity": "N/A", "terrain": "swamp, jungles", "surface_water": "8", "population": "unknown", "residents": [], "films": [ "http://swapi.dev/api/films/2/", "http://swapi.dev/api/films/3/", "http://swapi.dev/api/films/6/" ], "created": "2014-12-10T11:42:22.590000Z", "edited": "2014-12-20T20:58:18.425000Z", "url": "http://swapi.dev/api/planets/5/" } ]);

This separation is indicative of how our data may be grouped. Despite the separation, we can use the $unionWith stage to combine these two collections if we ever needed to analyze them as a single result set!

Let's say that we needed to find out the total population of planets, grouped by climate. Additionally, we'd like to leave out any planets that don't have population data from our calculation. We can do this using an aggregation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Run an aggregation to view total planet populations, grouped by climate type. use('union-walkthrough'); db.lonely_planets.aggregate([ { $match: { population: { $ne: 'unknown' } } }, { $unionWith: { coll: 'popular_planets', pipeline: [{ $match: { population: { $ne: 'unknown' } } }] } }, { $group: { _id: '$climate', totalPopulation: { $sum: { $toLong: '$population' } } } } ]);

If you've followed along in your own MongoDB playground and have copied the code so far, try running the aggregation!

And if you're using the provided MongoDB playground I created, highlight lines 264 - 290 and then run the selected code.

💡 You'll notice in the code snippet above that I've added another use('union-walkthrough'); method right above the aggregation code. I do this to make the selection of relevant code within the playground easier. It's also required so that the aggregation code can run against the correct database. However, the same thing can be achieved by selecting multiple lines, namely the original use('union-walkthrough') line at the top and whatever additional example you'd like to run!

You should see the results like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[ { _id: 'arid', totalPopulation: 200000 }, { _id: 'temperate', totalPopulation: 1007536000000 }, { _id: 'temperate, tropical', totalPopulation: 1000 } ]

Unsurprisingly, planets with "temperate" climates seem to have more inhabitants. Something about that cool 75 F / 23.8 C, I guess 🌞

Let's break down this aggregation:

The first object we pass into our aggregation is also our first stage, used here as our filter criteria. Specifically, we use the $match pipeline stage:

1
2
3
4
5
{ $match: { population: { $ne: 'unknown' } } },

In this example, we filter out any documents that have unknown as their population value using the $ne (not equal) operator.

The next object (and next stage) in our aggregation is our $unionWith stage. Here, we specifiy what collection we'd like to perform a union with (including any duplicates). We also make use of the pipeline field to similarly filter out any documents in our popular_planets collection that have an unknown population:

1
2
3
4
5
6
7
8
9
10
11
12
{ $unionWith: { coll: 'popular_planets', pipeline: [ { $match: { population: { $ne: 'unknown' } } } ] } },

Finally, we have our last stage in our aggregation. After combining our lonely_planets and popular_planets collections (both filtering out documents with no population data), we group the resulting documents using a $group stage:

1
2
3
4
5
6
{ $group: { _id: '$climate', totalPopulation: { $sum: { $toLong: '$population' } } } }

Since we want to know the total population per climate type, we first specify _id to be the $climate field from our combined result set. Then, we calculate a new field called totalPopulation by using a $sum operator to add each matching document's population values together. You'll also notice that based on the data we have, we needed to use a $toLong operator to first convert our $population field into a calculable value!

#$unionWith without a pipeline

📃 Use this playground if you'd like follow along with pre-written code for this example.

Now, if you don't need to run some additional processing on the collection you're combining with, you don't have to! The pipeline field is optional and is only there if you need it.

So, if you just need to work with the planet data as a unified set, you can do that too:

1
2
3
4
5
6
// Run an aggregation with no pipeline use('union-walkthrough'); db.lonely_planets.aggregate([ { $unionWith: 'popular_planets' } ]);

Copy this aggregation into your own playground and run it! Alternatively, select and run lines 293 - 297 if using the provided MongoDB playground!

Tada! Now you can use this unified dataset for analysis or further processing.

#Different Schemas

Combining the same schemas is great, but we can do that in regular SQL too! The real convenience of the $unionWith pipeline stage is that it can also combine collections with different schemas. Let's take a look!

#$unionWith using collections with different schemas

📃 Use this playground if you'd like follow along with pre-written code for this example.

As before, we'll specifiy the database we want to use:

1
use('union-walkthrough');

This time, we'll use some acquired information about certain starships and vehicles that are used in this same movie series. Let's add them to their respective collections:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
// Insert a few documents into the starships collection db.starships.insertMany([ { "name": "Death Star", "model": "DS-1 Orbital Battle Station", "manufacturer": "Imperial Department of Military Research, Sienar Fleet Systems", "cost_in_credits": "1000000000000", "length": "120000", "max_atmosphering_speed": "n/a", "crew": 342953, "passengers": 843342, "cargo_capacity": "1000000000000", "consumables": "3 years", "hyperdrive_rating": 4.0, "MGLT": 10, "starship_class": "Deep Space Mobile Battlestation", "pilots": [] }, { "name": "Millennium Falcon", "model": "YT-1300 light freighter", "manufacturer": "Corellian Engineering Corporation", "cost_in_credits": "100000", "length": "34.37", "max_atmosphering_speed": "1050", "crew": 4, "passengers": 6, "cargo_capacity": 100000, "consumables": "2 months", "hyperdrive_rating": 0.5, "MGLT": 75, "starship_class": "Light freighter", "pilots": [ "http://swapi.dev/api/people/13/", "http://swapi.dev/api/people/14/", "http://swapi.dev/api/people/25/", "http://swapi.dev/api/people/31/" ] }, { "name": "Y-wing", "model": "BTL Y-wing", "manufacturer": "Koensayr Manufacturing", "cost_in_credits": "134999", "length": "14", "max_atmosphering_speed": "1000km", "crew": 2, "passengers": 0, "cargo_capacity": 110, "consumables": "1 week", "hyperdrive_rating": 1.0, "MGLT": 80, "starship_class": "assault starfighter", "pilots": [] }, { "name": "X-wing", "model": "T-65 X-wing", "manufacturer": "Incom Corporation", "cost_in_credits": "149999", "length": "12.5", "max_atmosphering_speed": "1050", "crew": 1, "passengers": 0, "cargo_capacity": 110, "consumables": "1 week", "hyperdrive_rating": 1.0, "MGLT": 100, "starship_class": "Starfighter", "pilots": [ "http://swapi.dev/api/people/1/", "http://swapi.dev/api/people/9/", "http://swapi.dev/api/people/18/", "http://swapi.dev/api/people/19/" ] }, ]); // Insert a few documents into the vehicles collection db.vehicles.insertMany([ { "name": "Sand Crawler", "model": "Digger Crawler", "manufacturer": "Corellia Mining Corporation", "cost_in_credits": "150000", "length": "36.8 ", "max_atmosphering_speed": 30, "crew": 46, "passengers": 30, "cargo_capacity": 50000, "consumables": "2 months", "vehicle_class": "wheeled", "pilots": [] }, { "name": "X-34 landspeeder", "model": "X-34 landspeeder", "manufacturer": "SoroSuub Corporation", "cost_in_credits": "10550", "length": "3.4 ", "max_atmosphering_speed": 250, "crew": 1, "passengers": 1, "cargo_capacity": 5, "consumables": "unknown", "vehicle_class": "repulsorcraft", "pilots": [], }, { "name": "AT-AT", "model": "All Terrain Armored Transport", "manufacturer": "Kuat Drive Yards, Imperial Department of Military Research", "cost_in_credits": "unknown", "length": "20", "max_atmosphering_speed": 60, "crew": 5, "passengers": 40, "cargo_capacity": 1000, "consumables": "unknown", "vehicle_class": "assault walker", "pilots": [], "films": [ "http://swapi.dev/api/films/2/", "http://swapi.dev/api/films/3/" ], "created": "2014-12-15T12:38:25.937000Z", "edited": "2014-12-20T21:30:21.677000Z", "url": "http://swapi.dev/api/vehicles/18/" }, { "name": "AT-ST", "model": "All Terrain Scout Transport", "manufacturer": "Kuat Drive Yards, Imperial Department of Military Research", "cost_in_credits": "unknown", "length": "2", "max_atmosphering_speed": 90, "crew": 2, "passengers": 0, "cargo_capacity": 200, "consumables": "none", "vehicle_class": "walker", "pilots": [ "http://swapi.dev/api/people/13/" ] }, { "name": "Storm IV Twin-Pod cloud car", "model": "Storm IV Twin-Pod", "manufacturer": "Bespin Motors", "cost_in_credits": "75000", "length": "7", "max_atmosphering_speed": 1500, "crew": 2, "passengers": 0, "cargo_capacity": 10, "consumables": "1 day", "vehicle_class": "repulsorcraft", "pilots": [], } ]);

You may be thinking (as I first did), what's the difference between starships and vehicles? You'll be pleased to know that starships are defined as any "single transport craft that has hyperdrive capability". Any other single transport craft that does not have hyperdrive capability is considered a vehicle. The more you know! 😮

If you look at the two collections, you'll see that they have two key differences:

  • The max_atmosphering_speed field is present in both collections, but is a string in the starships collection and an int in the vehicles collection.
  • The starships collection has two fields (hyperdrive_rating, MGLT) that are not present in the vehicles collection, as it only relates to starships.

But you know what? That's not a problem for the $unionWith stage! You can combine them just as before:

1
2
3
4
5
6
// Run an aggregation with no pipeline and differing schemas use('union-walkthrough'); db.starships.aggregate([ { $unionWith: 'vehicles' } ]);

Try running the aggregation in your playground! Or if you're following along in the MongoDB playground I've provided, select and run lines 185 - 189! You should get the following combined result set as your output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[ { _id: 5f306ddca3ee8339643f137e, name: 'Death Star', model: 'DS-1 Orbital Battle Station', manufacturer: 'Imperial Department of Military Research, Sienar Fleet Systems', cost_in_credits: '1000000000000', length: '120000', max_atmosphering_speed: 'n/a', crew: 342953, passengers: 843342, cargo_capacity: '1000000000000', consumables: '3 years', hyperdrive_rating: 4, MGLT: 10, starship_class: 'Deep Space Mobile Battlestation', pilots: [] }, { _id: 5f306ddca3ee8339643f137f, name: 'Millennium Falcon', model: 'YT-1300 light freighter', manufacturer: 'Corellian Engineering Corporation', cost_in_credits: '100000', length: '34.37', max_atmosphering_speed: '1050', crew: 4, passengers: 6, cargo_capacity: 100000, consumables: '2 months', hyperdrive_rating: 0.5, MGLT: 75, starship_class: 'Light freighter', pilots: [ 'http://swapi.dev/api/people/13/', 'http://swapi.dev/api/people/14/', 'http://swapi.dev/api/people/25/', 'http://swapi.dev/api/people/31/' ] }, // + 7 other results, omitted for brevity ]

Can you imagine doing that in SQL? Hint: You can't! That kind of schema restriction is something you don't need to worry about with MongoDB, though!

#$unionWith using collections with different schemas and a pipeline

📃 Use this playground if you'd like follow along with pre-written code for this example.

So we can combine different schemas no problem. What if we need to do a little extra work on our collection before combining it? That's where the pipeline field comes in!

Let's say that there's some classified information in our data about the vehicles. Namely, any vehicles manufactured by Kuat Drive Yards (AKA a division of the Imperial Department of Military Research).

By direct orders, you are instructed not to give out this information under any circumstances. In fact, you need to intercept any requests for vehicle information and remove these classified vehicles from the list!

We can do that like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
use('union-walkthrough'); db.starships.aggregate([ { $unionWith: { coll: 'vehicles', pipeline: [ { $redact: { $cond: { if: { $eq: [ "$manufacturer", "Kuat Drive Yards, Imperial Department of Military Research"] }, then: "$$PRUNE", else: "$$DESCEND" } } } ] } } ]);

In this example, we're combining the starships and vehicles collections as before, using the $unionWith pipeline stage. We also process the vehicle data a bit more, using the $unionWith's optional pipeline field:

1
2
3
4
5
6
7
8
9
10
// Pipeline used with the vehicle collection { $redact: { $cond: { if: { $eq: [ "$manufacturer", "Kuat Drive Yards, Imperial Department of Military Research"] }, then: "$$PRUNE", else: "$$DESCEND" } } }

Inside the $unionWith's pipeline, we use a $redact stage to restrict the contents of our documents based on a condition. The condition is specified using the $cond operator, which acts like an if/else statement.

In our case, we are evaluating whether or not the manufacturer field holds a value of "Kuat Drive Yards, Imperial Department of Military Research". If it does (uh oh, that's classified!), we use a system variable called $$PRUNE, which lets us exclude all fields at the current document/embedded document level. If it doesn't, we use another system variable called $$DESCEND, which will return all fields at the current document level, except for any embedded documents.

This works perfectly for our use case. Try running the aggregation (lines 192 - 211, if using the provided MongoDB Playground). You should see a combined result set, minus any Imperial manufactured vehicles:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[ { _id: 5f306ddca3ee8339643f137e, name: 'Death Star', model: 'DS-1 Orbital Battle Station', manufacturer: 'Imperial Department of Military Research, Sienar Fleet Systems', cost_in_credits: '1000000000000', length: '120000', max_atmosphering_speed: 'n/a', crew: 342953, passengers: 843342, cargo_capacity: '1000000000000', consumables: '3 years', hyperdrive_rating: 4, MGLT: 10, starship_class: 'Deep Space Mobile Battlestation', pilots: [] }, { _id: 5f306ddda3ee8339643f1383, name: 'X-34 landspeeder', model: 'X-34 landspeeder', manufacturer: 'SoroSuub Corporation', cost_in_credits: '10550', length: '3.4 ', max_atmosphering_speed: 250, crew: 1, passengers: 1, cargo_capacity: 5, consumables: 'unknown', vehicle_class: 'repulsorcraft', pilots: [] }, // + 5 more non-Imperial manufactured results, omitted for brevity ]

We did our part to restrict classified information! 🎶 Hums Imperial March 🎶

#Restrictions for UNION ALL

Now that we know how the $unionWith stage works, it's important to discuss its limits and restrictions.

#Duplicates

We've mentioned it already, but it's important to reiterate: using the $unionWith stage will give you a combined result set which may include duplicates! This is equivalent to how the UNION ALL operator works in SQL as well. As a workaround, using a $group stage at the end of your pipeline to remove duplicates is advised, but only when possible and if the resulting data does not get inaccurately skewed.

There are plans to add similar fuctionality to UNION (which combines result sets but removes duplicates), but that may be in a future release.

#Sharded Collections

If you use a $unionWith stage as part of a $lookup pipeline, the collection you specify for the $unionWith cannot be sharded. As an example, take a look at this aggregation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Invalid aggregation (tried to use sharded collection with $unionWith) db.lonely_planets.aggregate([ { $lookup: { from: "extinct_planets", let: { last_known_population: "$population", years_extinct: "$time_extinct" }, pipeline: [ // Filter criteria { $unionWith: { coll: "questionable_planets", pipeline: [ { pipeline } ] } }, // Other pipeline stages ], as: "planetdata" } } ])

The coll questionable_planets (located within the $unionWith stage) cannot be sharded. This is enforced to prevent a significant decrease in performance due to the shuffling of data around the cluster as it determines the best execution plan.

#Transactions

Aggregation pipelines can't use the $unionWith stage inside transactions because a rare, but possible 3-thread deadlock can occur in very niche scenarios. Additionally, in MongoDB 4.4, there is a first-time definition of a view that would restrict its reading from within a transaction.

#$out and $merge

The $out and $merge stages cannot be used in a $unionWith pipeline. Since both $out and $merge are stages that write data to a collection, they need to be the last stage in a pipeline. This conflicts with the usage of the $unionWith stage as it outputs its combined result set onto the next stage, which can be used at any point in an aggregation pipeline.

#Collations

If your aggregation includes a collation, that collation is used for the operation, ignoring any other collations.

However, if your aggregation doesn't include a collation, it will use the collation for the top-level collection/view on which the aggregation is run:

  • If the $unionWith coll is a collection, its collation is ignored.
  • If the $unionWith coll is a view, then its collation must match that of the top-level collection/view. Otherwise, the operation errors.

#You've made it to the end!

We've discussed what the $unionWith pipeline stage is and how you can use it in your aggregations to combine data from multiple collections. Though similar to SQL's UNION ALL operation, MongoDB's $unionWith stage distinguishes itself through some convenient and much-needed characteristics. Most notable is the ability to combine collections with different schemas! And as a much needed improvement, using a $unionWith stage eliminates the need to write additional code, code that was required because we had no other way to combine our data!

If you have any questions about the $unionWith pipeline stage or this blog post, head over to the MongoDB Community forums or Tweet me!

MongoDB Icon
  • Developer Hub
  • Documentation
  • University
  • Community Forums

© MongoDB, Inc.