Unique Constraints on Arbitrary Fields & Migrations Collection?

Kate_Pond · April 13, 2020, 5:27pm

I’m looking to figure out whether or not I might need to create a “migrations” collection.

The complete task is that I implement a unique constraint on an arbitrary field of a collection. I understand that to do so, a proxy collection is needed. This documentation seems to explain that well; https://docs.mongodb.com/manual/tutorial/unique-constraints-on-arbitrary-fields/.

What I don’t understand is if I need a “migrations” collection in order to keep track of migrations that have occurred for that proxy collection or if I even need migrations. Somehow I’m under the impression that schemas update based on the object that is inserted into the database.

I look forward to your response! Cheers!

Stennie_X · April 14, 2020, 4:26am

Hi Kate,

Welcome to the MongoDB community!

The documentation you referenced is specific to creating additional unique indexes for a sharded collection, although this may not be clear from a direct link to the page. MongoDB does not support unique indexes across shards, except when the unique index contains the full shard key as a prefix of the index.

If you are working with an unsharded collection, you can create unique indexes without the need for a proxy collection.

MongoDB does not have a fixed schema catalog, so there isn’t a strict requirement for all documents in a collection to have the same structure (or to keep track of migrations). You can impose schema validation requirements for insert/update operations using JSON Schema, but changing schema validation rules does not perform migrations of existing documents.

Depending on your use case, it may make sense to implement a migration strategy if your code is expecting identical schema in all documents. However, with flexible schema you have more control over the impact of migrations (instead of being limited to the “all-or-none” approach of a fixed schema).

For example, you could add a schema version to documents and migrate them incrementally when they are next read by your application (or as a background task). There’s also a $jsonSchema query operator if you want to find documents matching a specific schema pattern.

Regards,
Stennie

Kate_Pond · April 14, 2020, 8:34pm

This is very helpful. I’m not completely sure what a sharded collection is, but I’m fairly sure that I don’t have one. Thank you for the welcome and the information @Stennie_X! Cheers!

Stennie_X · April 15, 2020, 3:11am

Hi Kate,

If you are connected to a sharded cluster using the mongo shell, the prompt should change to mongos>. You can also check if a collection is sharded by calling db.collectionname.getShardDistribution(), which will report something like “Collection test.collectionname is not sharded.” for an unsharded collection.

Sharding is an approach for distributing data across multiple servers (or “shards”). Sharding is typically used to scale deployments with very large data sets and high throughput operations, but can also be useful for workload isolation (for example Segmenting Data by Location or Tiering Hardware for Varying SLA or SLO). Collections in a sharded cluster can be unsharded (the default) or sharded.

A sharded collection is partitioned based on a shard key index that you define based on one or more field values. Sharding enables horizontal scaling since each shard only has to manage a subset of the data for a sharded collection.

From an application point of view, a sharded collection is a single logical collection: you can query a sharded collection without being aware of which shard has the relevant results. However, since each shard only has a subset of the data enforcing uniqueness values other than the shard key requires some extra consideration (per the link you originally referenced).

If you are interested in learning more about MongoDB, there are free online courses available at MongoDB University and a few learning paths (Developer or DBA/Operations) with recommended courses to take.

Regards,
Stennie