Mirroring vs Shortcut

Shortcut vs. Mirroring: Point to It or Copy It ?

After a short (long) break, I'm back with a brief article before tackling a series of four consecutive articles (yes, just like Hollywood shows, I'm teasing you).

Today, we're going to make a little comparison between shortcuts and mirroring.

Shortcuts: The path to your data

Who hasn't placed a shortcut on their desktop to access a folder hidden in a forest of subfolders? Well, shortcuts in Fabric are pretty much the same thing.

Shortcuts in OneLake are pointers that redirect to data stored elsewhere (either internally or externally). They function like symbolic links: the data is neither copied nor duplicated but remains in its original directory.

We can distinguish between two types of shortcuts:

Internal shortcuts: These point to other Fabric items such as lakehouses, data warehouses, or KQL databases (I promise, one day I'll explain what that is), whether within the same workspace or across multiple workspaces.
External shortcuts: These connect Fabric to external storage systems like Azure Data Lake Storage (ADLS) Gen2, Amazon S3, Google Cloud Storage (GCS), and Dataverse.

The key feature of shortcuts is their transparency to Fabric services. They simply appear as regular folders in OneLake. This allows any service that can access data in OneLake (like Apache Spark, SQL, Real-Time Intelligence, or Power BI) to use them directly without any additional configuration.

Mirroring in Fabric: Near real-time replication

Mirroring in Fabric is a continuous, low-cost, and low-latency data replication solution designed to transfer data from various database systems directly into OneLake.

The process creates a physical replica of the source data in Parquet format and generates a SQL endpoint for query access. The core mechanism of mirroring is Change Data Capture (CDC), which allows for the replication of changes in near real-time.

The data capture mechanism varies depending on the source.

For SQL Server versions 2016 to 2022, mirroring relies on the built-in CDC technology in SQL Server. A capture agent reads the transaction log to detect changes and replicates them via a data gateway to OneLake.
For SQL Server 2025 and Azure SQL, replication uses a "change feed" that allows the source database to write changes directly to a OneLake destination area, after which Fabric's mirroring engine processes them.
For Azure Cosmos DB, the continuous backup feature is a prerequisite, enabling replication without affecting the performance of transactional workloads.

Mirroring is not a monolithic feature; there are three distinct approaches:

Database mirroring, the "classic" approach, continuously and physically replicates entire databases or tables.
Metadata mirroring, which only synchronizes metadata (table names, schemas) from the source, relying on underlying shortcuts. This approach makes data accessible without moving it, as is the case with Azure Databricks' Unity Catalogs.
Open mirroring, which allows application developers to write change files directly into a OneLake "landing zone," which are then replicated into an analytical format.

Mirroring is specifically designed for transactional data sources and data warehouses. Supported sources include Azure SQL Database, Azure SQL Managed Instance, SQL Server, Azure Cosmos DB, Snowflake, and Azure Databricks catalogs.

However, each source has its own prerequisites and limitations. For example, for a table to be "mirrorable," it must have a primary key. Replication is limited to 500 tables per mirrored database. Additionally, certain data types (like json or xml) or database features (like Temporal History Tables, Always Encrypted) are not supported.

Perfect, now that you understand (at least, I hope you do) the difference, the next question must be: which is better? It's simple, it's the shortcut!!!! Uh no, I meant, it depends…

When to favor shortcuts?

Shortcuts are the preferred solution when the main goal is virtualization and quick access to data without moving it. This approach is ideal for:

Use cases where the source is an existing data lake (ADLS, S3) and data duplication must be avoided at all costs.
Implementing modern data architecture models like the "data mesh," where each domain is responsible for its own data and shares it via links, rather than centralized replication.
Simplifying ETL pipelines by allowing engineers to build transformations directly on the source.
Scenarios where storage costs must be minimized by avoiding data duplication, as storage remains billed at the original source.

When to choose mirroring?

Mirroring is the most appropriate solution for replicating transactional databases and data warehouses, especially when access to near real-time data is required. This approach is ideal for:

Offloading resource-intensive analytical and reporting queries from an OLTP database to OneLake, without impacting the source's performance.
Scenarios requiring high query performance on large volumes of data, as locally replicated data benefits from minimal latency.
Radically simplifying ingestion processes from supported sources by eliminating the need to develop complex ETL pipelines.
Use cases where the compute cost for replication is less of a concern than query performance, as the compute for replication is free.

And what about the costs?

Once again, there's no one-size-fits-all answer.

For shortcuts, compute is billed to the capacity of the user consuming the data, in a "consumer-pays" model. Storage is billed at the original source, with no additional storage cost in OneLake. However, it's important to note the risk of egress costs for external sources like S3.

For mirroring, the compute for replicating data into OneLake is free and does not consume Fabric capacity. Storage for the replicas is also free up to a limit based on the Fabric SKU's Capacity Unit (CU) (for example, 64 terabytes are free for an F64 capacity). Only when the limit is exceeded is OneLake storage billed at the standard rate. On the other hand, any compute for queries run on the mirrored data is billed normally according to the Fabric capacity pricing model, and mirroring between regions can lead to unexpected egress costs.

In conclusion

As your brilliant minds have understood (I hope, otherwise the blogger is less brilliant than expected), shortcuts and mirroring are not competing solutions, but complementary tools in the data architect's toolbox. Shortcuts excel at virtualizing data lakes by avoiding data movement, minimizing storage costs, and promoting decentralized architectures. Mirroring, on the other hand, is a powerful and simple solution for the continuous replication of transactional databases, offloading operational systems and optimizing the performance of analytical queries on localized data.

The existence of an approach like "metadata mirroring," which uses underlying shortcuts to synchronize data from Azure Databricks, perfectly illustrates the synergy between the two concepts. Mirroring handles the complexity of database replication, while shortcuts provide the flexibility and transparency of data access.

The choice of the appropriate tool depends on the data source and business objectives. However, a realistic assessment must also take into account the maturity level of the features. Community feedback highlights that mirroring, despite its promise, still suffers from rigidity and management issues that could make shortcuts more reliable for critical use cases until these problems are resolved.

Ultimately, the modern enterprise data architecture in Microsoft Fabric will not rely on just one of these tools, but on a combination of both, to build a data ecosystem that is at once flexible, high-performing, and governed.

And to finish, a small comparison table — or just for those who couldn't be bothered to read everything (yes, yes, I have the stats and you accepted the cookies, so I see YOU who just scrolled all the way down here).

Page updated

Google Sites

Report abuse