Read more
From our Data, Analytics and AI experts
Data mesh is the solution to the long-standing problem of data platform scalability
Data mesh is hot in the world of data platforms and Azure
Data mesh is hot. This is understandable, as data mesh offers the solution to a long-standing problem: scalability of data platforms. Perhaps data mesh can provide a breakthrough within your organization as well. In this blog, I'll tell you what the main benefits and challenges of a data mesh architecture are.
The term data mesh was introduced by ThoughtWorks consultant Zhamak Dehghani. It is an architecture in which distributed data products are developed and managed by data engineers and data product owners in domain teams. A shared infrastructure is used to host, prepare and offer data. As central data teams are acknowledging en masse that they are running into limits, data mesh has become a major trend in the world of data platforms. How did this situation arise?
In the 1980s, the data warehouse emerged: a central data environment to report from. It was the answer to the question of how to get an overall picture of the state of an organization. This centralization, in turn, brought new challenges, for example in terms of technology, knowledge and staffing. Finally, the logical consequence of an ever-growing data platform, an ever-growing server and an ever-growing team.
A solution to the technical challenge arrived in the year 2000 or thereabouts: parallel processing, on the Hadoop ecosystem, among others. Before the advent of parallel processing, the most common solution to performance problems was a larger server: scale up.
With the advent of parallel processing, multiple servers could be deployed to host the data platform: scale out.
But having a data warehouse and parallel processing does not solve the scalability problem for knowledge and staffing. A larger data platform still requires a larger central team with centrally collected data engineering knowledge: scale up. This is why vertical splits are often made in IT environments, with data engineers and data analysts working in different teams. The disadvantage of this split is that different teams are needed for each data product.
The big advantage of data mesh is that it provides a full-fledged scale-out solution:
Splitting the central data team and the knowledge around it into domain teams, each with its own expertise. This enables domain teams to deliver optimal business value within their own areas of expertise. With the right standards, tools and knowledge, domain teams are able to deliver data products themselves and offer them centrally.
• The domain team manages data quality and can monitor and improve it well;
• The domain team knows the right definitions and can apply and share them well;
• The domain team knows the data users, can serve them well and give them peace of mind;
Essential questions that every organization must have answered prior to a data mesh implementation are:
An implementation of data mesh makes sense only if the benefits of decentralization outweigh the investment in setting up the platform and standards. Therefore, data mesh is an appropriate solution for (in particular) organizations with multiple divisions and/or an international character.
Also, data mesh requires a new role for IT teams, both in support and control. The IT teams must support the domain teams with the platform and appropriate tools. In addition, they must control the domain teams by overseeing the application of uniform standards.
With multiple domain teams each delivering their own data products, good support is needed in the following areas: standards for accessible description of data products, support for modern tooling, and understandable data transformation standards.
You have probably already asked yourself the question: how can I maintain control in an environment with multiple independent teams? The answer: standardization and policy. Establishing standards ensures that there is no proliferation of code and descriptions. When managing domain teams, firmly delineated policies are needed: it should not be possible to release code or documentation that does not conform to standards relating to naming, structure and tagging.
Data mesh in the Azure cloud
Data mesh is not a cloud service that you just turn on or off. It's a combination of the right approach with the right tools. When optimally applying a data mesh architecture in the Azure cloud, the following services deliver maximum value:
Want to learn more about the key benefits and challenges of data mesh within your organization, or get started on it immediately? Connect with our Data & Analytics experts!
Read more
From our Data, Analytics and AI experts