Recently, Power BI has enabled self-service data preparation for business analysts using Power BI Dataflows. Power BI dataflows can ingest data from a large number of transactional and observational data sources, and cleanse, transform, enrich, schematize, and persist the result. Dataflows are metamorphic and can be refreshed automatically and can be chained to create powerful data preparation pipelines. Furthermore, support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.
You probably already know that Power BI dataflows store their data in Common Data Model (CDM) folders. But what does this actually mean?
CDM is a metadata system
The Common Data Model is a metadata system that simplifies data management and application development by federating the data into an understandable form and applying structural and semantic consistencies across multiple apps and deployments.
CDM standard entity schemas
Microsoft has joined with SAP and Adobe to form an Open Data Initiative to encourage the definition and adoption of standard entities across a range of domains to make it easier for applications and tools to share data through an enterprise Data Lake.
Therefore, Microsoft and its partners have published the Common Data Model with standardized, extensible data schemas. The collection of built-in schemas includes entities, attributes, semantic metadata, and relationships. The schemas represent commonly used concepts and activities, such as Account and Campaign, to simplify the creation, aggregation, and analysis of data.
CDM folders are data storage that uses CDM metadata
A CDM folder (a folder in Azure Data Lake Gen2) follows to discrete standardized metadata and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers.
CDM folders contain the metadata in a model.json file. This metadata conforms to the CDM metadata format and can be read by any client application or code that knows how to work with CDM.
You don’t need to use any standard entities
It’s not necessary that all the time you will be storing standard data/schema. The data in a CDM entity may map to a standard entity schema, but for most of the entities, you will create a custom schema. There is nothing in CDM or CDM folders that requires you to use a standard schema.
Grip on the access control on every entity and dataflow
Since, these dataflows/CDM folders gets stored in Azure Data Lake (ADLS) Gen2, therefore, ADLS Gen2’s RBAC & ACLs can be used to accomplish the access management layer.
With seamless and easy integration,CDM helps in dissembling the applications and data sources from each other. So, the you can have a report/dashboard built for a concrete pretention, alongside the report built on CDM, it can be easily be incorporated into similar scenarios where the data source is different.
Enable CDM for Azure Data Services
Data engineers can use Azure Data Factory, Azure Databricks, Azure HDInsights to combine data from CDM folders with data from across the enterprise to create a historically accurate, curated enterprise-wide view of data in Azure SQL Data Warehouse. At any point, data processed by any Azure Data Service can be written back to new CDM folders, to make the insights created in Azure accessible to Power BI and other CDM-enabled apps or tools. The same CDM folders can be used for performing advance analytics like machine learning or artificial intelligence
Since, Azure Data Services supports CDM (aka External DataFlows in case of managed by Data Services), therefore Power BI’s role gets reduced to being a consumer of the data. Although Power BI doesn’t take responsibility for updating external dataflow, the dataflow can be consumed by PBIX files like any other dataflow. This is great because the user experience for using the dataflows output doesn’t change.
In such scenarios, where the data lake is part of your essential data platform architecture, external dataflows help in balancing the corporate BI and managed self-service BI requirements.