Data Architecture - Data Mesh Foundation



The concept of data mesh was introduced by Zhamak Dehghani, CEO of Nextdata. It's not a specific technology but a new way of working with data. You can use different technologies like data warehouses or data lakes to build a data mesh. This chapter explains what a data mesh is, how it works, and when to use it. This chapter covers:



What is Data Mesh?

A data mesh is a way of organizing data in a company by making each team responsible for its own data. Instead of having one central team control all the data, each team (or domain) handles their own data like a product that others can easily use.

It has four main ideas.

  • Domain Ownership: Each team owns and manages its own data.
  • Data as a Product: Data is treated like a product, made easy to find and use by others.
  • Automated Infrastructure: The system automatically takes care of the tools needed to manage data.
  • Governance: There are rules to make sure data is safe, secure, and follows company standards.

When to Use Data Mesh?

Data Mesh is helpful when:

  • Data processes are slow or delayed
  • Data quality is inconsistent across the organization
  • The organization struggles to scale its data capabilities
  • The business misses opportunities because data is hard to access

Decentralized Data Architecture

Traditional data systems like data warehouses and data lakes are centralized, meaning a central team controls all the data.

In data mesh, data is decentralized. Each team manages their own data, decides how to use it, and keeps it in their own domain. You can access data directly where it is, without moving it to a central system. This makes data more manageable and scales better as the company grows.

In centralized systems, the central team handles everything, including storing data, ensuring quality and security, managing data pipelines, and backups. These systems grow by adding more power to a single central system, while data mesh grows by giving each team control over their own data.

Data Mesh Hype

Data mesh has gained attention since 2019 but is still in its early stages, with limited adoption (5%-20%). Gartner predicts it will be replaced by data fabric as businesses shift to using passive metadata.

While some argue that data mesh solves scaling problems in data warehouses, the real cause of failure is often issues with people or processes, not the technology itself.

Even with all the hype, large-scale data solutions have been working well for years. Very few companies are actually using data mesh, and most that say they are, are actually using other systems like data fabric or lakehouses.

Dehghani's Four Principles of Data Mesh

The four key principles, which aim to improve data management, scalability, and collaboration within organizations, are as follows.

Domain Ownership

In a Data Mesh, each business area (such as sales, manufacturing, or marketing) is responsible for its own data. The people who understand the data best are in charge of managing it, rather than a central team. By decentralizing data ownership, the process of managing and scaling data becomes more efficient and adaptable.

Data as a Product

Data should be treated as a product that is developed, maintained, and improved continuously. Just like any product, it should be of high quality, easy to find, and user-friendly. Teams are responsible for ensuring their data is reliable, well-documented, secure, and accessible to others.

Self-Serve Data Infrastructure

Domain teams need tools that make it easy to create and manage data products. Instead of building complex systems from scratch, a central platform should provide ready-made solutions for storing, processing, and sharing data. This approach allows domain teams to focus on their data and not worry about the technical infrastructure

Federated Computational Governance

Data governance should be a shared responsibility between the central team and each business area. The central team sets the main rules for security, data quality, and legal requirements, while each business area makes sure those rules are followed for their own data. This way, the organization stays consistent, but each area can still meet its own specific needs.

Data Domains in Data Mesh

In a Data Mesh, each business area that creates or uses data is responsible for it. The teams who know the data best are in charge of managing it. There are three main types of data domains.

  • Source-aligned data: This is data directly from original systems, transformed for analysis. It's not customized for any specific group but is used across multiple business areas.
  • Aggregated data: Data combined from different domains to simplify reports or analysis, like merging sales and manufacturing data for profit reports.
  • Consumer-aligned data: Data that is modified to meet the needs of specific departments or use cases, like making it easier for non-technical teams or machine learning models to use.

Data Mesh Logical Architecture

In a Data Mesh, data is distributed among different business areas or domains, with each domain owning its own data products. Here's how it works.

  • Source-aligned Domains: These domains handle data directly from their operations. For example, the sales team stores customer data in a data lake and combines it with other data for analysis.
  • Consumer-aligned Domains: These domains simplify complex data to make it easier for non-technical teams, like suppliers, to understand and use the information.
  • Aggregated Domains: These domains combine data from different sources, such as sales and manufacturing, to create reports or perform analysis. This improves the speed and efficiency of querying the data.
  • Customer 360 Domain: This domain combines customer data from different sources (e.g., demographics, transactions, feedback) into a single, complete view that is shared across all relevant teams.

Data Mesh Topologies

A Data Mesh can be organized in three ways, depending on the level of centralization or decentralization: each with its own pros and cons.

  • Mesh Type 1: All domains use the same technology and a single, shared data lake. This makes it easier to manage security and data, and avoids performance issues from using separate lakes.
  • Mesh Type 2: Domains use the same technology but have their own separate data lakes. This gives more freedom but can make it harder to combine data from different lakes.
  • Mesh Type 3: Domains can use different technologies and cloud services (like AWS, Azure, or GCP). This offers more flexibility, but it also brings challenges with security, managing data, and integrating data across different platforms.

Data Mesh vs. Data Fabric

Data Mesh and Data Fabric are both important concepts, but they serve different purposes, as shown in the table below.

Aspect Data Mesh Data Fabric
What it is A way to manage data by dividing it across different parts of a business. A system that connects and manages data in one place.
Data Ownership Different teams or departments own and manage their own data. One central team (like IT) manages all the data.
How Data is Organized Data is divided by business areas (like sales, marketing, etc.). All data is kept in one place and organized together.
Flexibility Each team can use the tools and tech they prefer. Everyone uses the same tools and tech across the system.
Best For Companies with many departments that need control over their own data. Companies that want all their data in one central system.
Scalability Easy to grow as more departments join. Can be harder to scale when there is lots of data.
Data Sharing Data is shared between teams through APIs and other methods. Data is stored in one place, so teams can easily access it.
Main Focus Giving different teams control over their own data. Making it easier to connect and manage all data in one place.

How Do Data Mesh and Data Fabric Work Together?

Data Mesh and Data Fabric work together to help manage and connect data across an organization. Here's how they each play a role:

  • Data Mesh: Breaks down data by different business areas, so each team is in charge of its own data.
  • Data Fabric: Gives the tools and system to link all the data together, making it easy for everyone to access.

When to Use Data Mesh vs. Data Fabric

Data Mesh and Data Fabric are both useful for managing data, but each is suited for different needs. Here's when to use each.

Use Case Data Mesh Data Fabric
Best For Decentralized teams managing their own data Centralized control of all data sources
Ideal For Complex organizations with multiple domains Simplifying data from different systems
Scale Scaling data across teams without central control Managing large data from multiple sources

Use Cases of Data Mesh

Data Mesh is helpful in situations like:

  • Financial Services: Manages data for customer accounts, trading, and risk.
  • Healthcare: Organizes patient records, claims, and research data.
  • Retail: Connects customer, inventory, and sales data.
  • Slow Data Processes: Speeds things up by giving teams control over their own data.
  • Poor Data Quality: Helps improve data quality in each department.
Advertisements