Medaillon Architecture
Published date: 12/10/2025
Published date: 12/10/2025
After a week off, I'm back! And today, we're tackling a slightly more theoretical subject (but I promise, not too boring): the medallion architecture.
For those who aren't familiar with it, the medallion architecture is based on the idea that your data is organized into three layers: a bronze layer, a silver layer, and a... dirty... no, gold layer, obviously! (Sorry, my sense of humor didn't improve over the holidays...). So: bronze, silver, gold.
To break it down:
The bronze layer is where the raw data lands, without any transformation. It's an exact, historical replica, useful for auditing or reprocessing.
Next comes the silver layer, whose goal is to improve data quality: cleansing, standardization, minimal enrichment... In short, it's about preparing a clean foundation for future analysis.
And finally, the gold layer: the cream of the crop. This layer is designed for business users (reporting, data science, etc.). Here you'll find refined, aggregated data, ready to shine on your dashboards like Olympic medals!
Data arriving in this layer is raw and unmodified. The golden rule? We don't touch anything!
This is the principle of immutability: once data, even incorrect data, has entered, too bad, it's too late. You can't go in and edit it (it's like a tattoo you got at 4 AM; you have to live with it, at least at this stage). At this point, the data is not validated, and the "business" is not supposed to have access to it.
Supported Formats:
Any file format can be supported, but we generally use CSV, JSON, Parquet, or Delta. (I promise, an article will be written one day to explain what's hiding behind these barbaric names). Since the goal of this blog is to talk about Fabric, let's make the connection.
Fabric Components for Ingestion and Storage:
Lakehouse (via OneLake): This is the primary storage for raw files.
Data Factory Pipeline: To orchestrate batch ingestion via the "Copy Data" activity (over 200 connectors available).
Shortcuts: To directly reference external data stored in Amazon S3, Dataverse, etc., without physically copying it.
Notebooks: For ingesting complex data sources (APIs, JSON, etc.).
Before landing in this layer, the data from the bronze layer will undergo transformations to become usable. Generally, the transformations applied to get from bronze to "silver" are:
Cleansing: We remove or handle null or invalid values, normalize formats (dates, text), and filter out duplicates.
Merging: We consolidate similar data from multiple sources (example: combining several tables, managing column name changes, etc.).
Enrichment: Adding supplementary information (e.g., assigning keys, geocoding, etc.).
Quality Control: We apply business rules to validate consistency (e.g., testing that every sale has an existing customer, verifying that amounts are positive, etc.).
After these transformations, the data is structured into tables, ready to be analyzed, and presents a first source of shared truth.
Data Format
The Delta Lake format is generally preferred for its ACID properties, time travel capabilities, and optimizations (V-Order).
Fabric Components Used for Transformation and Storage:
"Silver" Lakehouse: This is the primary storage component for the cleaned Delta tables.
Notebook: Very effective for large or complex processing (more technical, but much more efficient).
Dataflow Gen2: Suitable for simpler transformations via a graphical interface (for Power BI users, you'll recognize the Power Query interface with some minimal differences).
Data Factory Pipeline: Here, the pipeline will act as an orchestrator to schedule the transformation flows from bronze to silver.
Unlike the previous, more general-purpose layers, the gold layer is use-case oriented. For example: a sales dashboard, a predictive model for customer retention, financial reports, etc.
The data here is modeled to facilitate queries: desired aggregation level (daily, monthly...), star or snowflake schema (fact table + dimensions), optimized materialized views...
Example Tables/Views:
Fact tables (consolidated sales), dimension tables (enriched customer profiles). These tables are generally stored in a Warehouse.
Fabric Components Used for Transformation and Storage:
Data Warehouse: Often preferred for optimized and complex T-SQL queries.
Power BI (Direct Lake): Allows Power BI to connect directly to the tables for high performance without copying the data (Direct Lake, yet another topic to come...).
SQL Analytical Endpoint: The Warehouse exposes a SQL interface that allows the use of third-party client tools (objectively speaking, you have the best tool available, aka Power BI, why would you want to use a more expensive and worse tool?).
Semantic Models: The final semantic layer to define relationships between tables, hierarchies, and DAX measures.
We're reaching the end of the article, so let's do a brief recap of what we've just seen before moving on to best practices.
Summary of the Data Journey
The Medallion Architecture structures data processing into a progressive and controlled flow, from the raw source to final consumption:
Bronze: Ingestion of data in its original state.
Silver: Application of cleansing, standardization, and enrichment processes to make the data reliable.
Gold: Modeling and aggregation of data to optimize it for specific analytical use cases (BI, Data Science).
To ensure a robust, scalable, and secure architecture, it is recommended to follow several key principles:
Logical and Physical Layer Isolation: Physically separate each layer into distinct data containers: one Lakehouse for the Bronze layer, a second for the Silver layer, and a Data Warehouse for the Gold layer. For large-scale projects, also consider dedicating a Fabric workspace per layer to partition access management and resources.
Automation of Ingestion: Use Data Factory pipelines to automate feeding the Bronze layer. Set up scheduled (e.g., nightly execution) or event-based triggers (e.g., upon the arrival of a new file) for continuous and autonomous updates.
Modular Orchestration of Transformations: Structure the transformation flows (ETL/ELT) into distinct and independent pipelines:
Bronze to Silver Pipeline: Triggers the cleansing Dataflows or Notebooks after each successful ingestion.
Silver to Gold Pipeline: Orchestrates the execution of modeling and aggregation logic to build the final data products.
In this approach, I haven't mentioned Shortcuts, which provide hyper-connectivity between layers (still another article to come :-) ).
Adopting this modular architecture is a guarantee of flexibility and resilience, as it allows for isolating processes and re-running a specific step without impacting the other layers. This structured approach is the key to sustainably transforming raw data into strategic assets for the company.
There you have it! I hope the Medallion Architecture seems clearer to you now!
If you have any questions or remarks, don't hesitate to share them in the comments—that's what they're for 🙂