Advanced Schema Design Patterns

Anoop_Sidhu · November 22, 2020, 10:10pm

I watched the following presentation and had few questions from it:

In the mongo insurance portal, they talk about synching data from legacy system to mongo db. How do we typically achieve this.
How do we move data from old legacy systems to the different collections in the mongo db in the mongo insurance portal example
how do typically achieve legacy db to mongo db transformation. We have lots of tables in oracle and how do we move those into mongo db collections

Pavel_Duchovny · November 23, 2020, 6:23am

Welcome to MongoDB community.

There are many ways to achieve data streaming from verious sources to MongoDB , whether its Oracle or other data sources.

The method depands on the purpose and length of migration.

You can dump the data into a CSV or json files and use mongoimport or other scripts to transform and load the data
https://docs.mongodb.com/database-tools/mongoimport/
One of the popular ways to stream data is by using our connectors like a spark connector or kafka connector where your MongoDB acts as a sink.
https://docs.mongodb.com/kafka-connector/master/
Writing applications which query data on one database and bulk write to MongoDB.
https://docs.mongodb.com/manual/core/bulk-write-operations/

However moving to MongoDB from a legacy database has more to it then getting and loading data, I recommend to read Relational Database To MongoDB | MongoDB

Best regards
Pavel

Anoop_Sidhu · November 23, 2020, 7:35pm

Thanks for replying. We are strangling the monolith into microservices, so the idea is while we are selectively moving functionality out of legacy i.e monolith we can use cdc(change data capture) from oracle into mongo db. Then eventually we could sunset the oracle db and legacy later on. I have follow up questions:

In the advance design pattern webinar, just talked about user collection and related policy, claims and other things like messages and documents. As we are streaming data from oracle we would have to transform the data from relational to user collections. Is that right assumptions.
When we use extended reference from one collections to the other how do we typically do that when a legacy system is involved. Do we use a batch job to first populate the collections and then go to the other side and populate the extended reference collections. It looks lot of work to me for data migrations

Pavel_Duchovny · November 24, 2020, 6:05am

Hi @Anoop_Sidhu,

(1) You are correct moving the data as 1-1 from oracle to MongoDB will not allow you to benifit MongoDB model, where data that accessed together can be flexibly stored together. As part of your cdc processing transform the data.

(2) You can potentially store the data in staging collections and transform them post migration to their target collections. But I would recommend transforming data as it is being migrated and prepared for MongoDB. Potentially join all data of a target MongoDB document on oracle side so it would be easier to store it directly in MongoDB format (extended reference).

Thanks,
Pavel

Anoop_Sidhu · November 26, 2020, 7:01pm

Thanks again for giving wonderful pointers. So if we have collections that have extended references and we have parent collections that are being referenced they would be populated from oracle as the data migration is going on. Like you mentioned we joined the data and then move it together with master tables in mongo db. Do you recommend any etl tools that are good at moving data from rdbms(oracle) into mongo db.

I am guessing there will be one time data load and then there could be scenarios where cdc is only moving changes across. These are two separate scenarios