Data Model Pattern for Analytics/Reporting (Star Schema or Materialized View)?

ManiK · September 18, 2020, 6:06pm

Just wanted to understand if we have any reference data model patterns for Reporting & Analytics kind of applications something like Star schema having daily, weekly, monthly, qtrly, yearly fact table(s) & multiple dimension tables to help create business KPI dashboards/reports ?

Thanks!

Pavel_Duchovny · September 19, 2020, 4:39pm

Hi @ManiK,

I think the following post will answer your questions:
Can I use Mongo DB for Star Schema type of Data Model? - #2 by slava

Also have a look at the bucket pattern over our patterns suggestion for time based analytics:

Building with Patterns: A Summary | MongoDB Blog

Best
Pavel

ManiK · September 21, 2020, 8:20am

Thanks @Pavel_Duchovny ! I’ve already gone through the links you’ve provided above. And, I agree that the pattern provided in the first link is very simple and makes much sense.

So let me be more specific here, the goal is not to copy or migrate an existing star schema from a relational DB to NoSQL - MongoDB, instead to design a data model pattern in MongoDB that is as efficient as a star schema and which also enables to do “self-service” analytics kind of reporting.

Let’s assume, if we want to process and store the historical data for reporting like a periodic snapshot fact tables like weekly, monthly, quarterly and yearly with some “conformed” dimension tables around like - Date, Geography, Customer, Products etc. There could be several dimensional attributes related to each dimensions and their associated measures, which could make a document very long and bulky (w.r.t document size) and at the same time we also need to ensure the hierarchies are maintained within each dimension like for example:- Date (i.e. Day, Week, Month, Qtr, Year) and Geography (i.e. Region, Country, State, City, County).

Also, the user can try to analyze just the sales by customer and/or just the sales by product and time. But a single document would still fetch all the dimensional attributes of other dimensions everytime.

Considering all these, would you still recommend to have all the dimensions, associated hierarchies etc embedded - like in the first link you provided ? Or, would it make sense to have separate collections created at different granularity level like - one for weekly, one for monthly and one for quarterly etc. However, in this approach then how do we handle the reporting context - meaning - if a user wants to see or analyze only the “monthly” sales, how do we change the context during runtime that instead of weekly collection, select/use the “monthly” collection for reporting. This might require it to be handled programmatically though, it seems.

I guess for such situations instead of just a bucket pattern, what we may need could be a combination of computed and bucket (in separate collections for each period - week, month, qtr, year)?

ManiK · October 8, 2020, 7:04am

Just following up again on above topic - mainly, want to understand if I have a need for doing daily, weekly, monthly, quarterly “sales” kind of reporting/analysis. Should we create separate collections for each aggregation level (daily, weekly…) or have all the calculations at different aggregation levels in one single collections ? But if we have separate collections, how to handle the Date “hierarchy” to ensure it properly drill’s up/down/across ?

Pavel_Duchovny · October 8, 2020, 10:06am

Hi @ManiK,

As you mentioned the main consideration should be how data is best accessed and seperate your data respectfully.

The meanings of “star schema” belongs to relational databases more than semistructured.

Please note that we recommend having as less collections as possible since having many collections means many files on disk and presents OS limitations as well as memory and disk overhead.

On the other hand, timesieries data on partitioned collections based on day/week/month is a known pattern for MongoDB so it might make sense if it respect the data access requirements as well as your system available resources. Be ready that having many collections may introduce challenges when you decide to shard the environment or might require much more expensive HW to scale.

I recommend reading all of our design patterns and schema antipattern for avoiding or following whats best for you.

https://www.mongodb.com/article/schema-design-anti-pattern-summary

Hope this helps

Best
Pavel