The Rise of Metadata
My Impressions and Perspectives on the Gartner® Data & Analytics Summit 2022
Contents of the blog series:
- Two Years from Now (Part 1)
- The Rise of Metadata
- Governance is Not a Defensive Play
- Rolling the Sleeves on Metadata/Governance initiatives
In part 1 of the blog, I discussed the Data Fabric – what it is, who it is for, and the benefits and timelines for the adoption.
In Part 2 of the blog, I will cover Active Metadata, and its crucial position on the intersection of business and data.
The Rise of Metadata
Based on my impressions, Metadata was the most frequently used word throughout the conference. During Gareth Herschel and Debra Logan’s keynote [1] “Unleash Innovation, Transform Uncertainty” I understood Metadata to be defined as a communication mechanism: not only does it tell us which data we have, but it also describes meaning.
I further realized that Active Metadata enables implementation of the Data Fabric framework (among other benefits), thus allowing it to identify data drifts, to find new categories of data and to discover interrelated decisions.
Mark Beyer developed this topic further in his session “Metadata is the Key to Self-Learning, Augmented Data Governance” [2].
He defined four sources of metadata uses: technical, operational, business and social. As Looking at this different metadata sources, I realized that the first two types are “passive” – byproducts of data practice – while the last two types are “active” – purposeful creation of metadata and learnings about it.
For me, a way to understand it is to think about passive metadata as a runtime static report, it doesn’t change the way it behaves. Despite this static nature, most of the customers focus on that passive type. As proof – illumex has received many requests from prospective customers about data lineage.I think the problem is that Lineage doesn’t show if data actually works.
For me, a nice metaphor can illustrate that: when we think about formulas, we think about math and physics. In data, metadata is formulas – it is not just data about data, but also the logic of the data application.
My thoughts after Mr. Beyer’s session were:
- All data exists in the world, it is just not captured correctly.
- Automation of functions starts with metadata discovery.
- Metadata is the best method to determine if governance principles are being adhered to.
- And “Data Fabric can resolve existing Metadata into Automated Governance”.
I further gathered that if the same data is used in 20 different places throughout the organization, it forms a community around a specific topic. Leveraging those communities to communicate between business and data teams improves data literacy.
Graphs take on a bigger role as an underlying technology for Active Metadata and other use cases, and they have a decisive role in converting traditional catalogs into augmented catalogs.
The session inspired a vision: machines can only improvise – humans innovate: when metadata is available, we can automate; otherwise, machines need humans.
Later on, I attended Melody Chien’s session, “Maximize Business Outcome By Adopting Modern Data Catalog with AI-enabled Metadata Capabilities” [3].
Augmented metadata discovery, as I understood it, is an automated process of finding the right dataset for your business question, and then connecting Metadata and Data Catalogs. Data Catalogs address the “three Cs” – curate, collaborate, communicate.
Data Catalogs, I learned from this session, are not only a collection of metadata, but they also provide critical metadata about your data:
- Where the data is and where it came from
- What the data means and how it should be interpreted
- What the importance of the data is and its quantifiable value
- Who uses the data and for which business processes
- Which policies and workflows are defined on top of the data
I learned the shortcomings of the traditional catalogs: being static, siloed, too technical.
For me, the most surprising part was the prediction of the evolution from passive traditional data catalogs to Active Metadata Management by AI/ML. This as I understand means that we have a feedback loop from collecting metadata to leveraging it to learning from its consumption:
Machine Learning and AI are key to modern Data Catalogs – via insights, engagement and collaboration. In my opinion, this will drive user adoption.
For those of us who are used to the lack of user adoption of the current data catalogs, you must be curious how the modern data catalogs are going to address that challenge?
The common use cases of the catalogs are still around analytics, GRC and data valuation, but those 3 alleys, in my opinion, can be used in any industry and for any use case – from security and customer Success.
For me, the most important takeaway from Ms. Chien’s session was the dramatic acceleration of the speed of delivery of new data assets by metadata analysis – as much as 70%!
Ms. Chien sees how stand alone catalogs could live SIDE BY SIDE with the use case specific ones:
For me, Ms. Chien opened the session from a broader Metadata perspective and then dived into the Catalogs context. But he further elaborated that catalogs are just part of the metadata management capabilities, and active metadata initiatives should not be limited to the catalogs only:
To me, Ms. Chien session was very constructive and practical – it was also shown in the session recommendations:
[1] Gartner, “Gartner Opening Keynote: Unleash Innovation, Transform Uncertainty”, Gartner Data & Analytics Summit, Orlando, Florida, 22-24 August, 2022.
[2] Gartner, “Metadata is the Key to Self Learning, Augmented Data Governance”, Gartner Data & Analytics Summit, Orlando, Florida, 22-24 August, 2022.
[3] Gartner, “Maximize Business Outcome By Adopting Modern Data Catalog with AI-enabled Metadata Capabilities”, Gartner Data & Analytics Summit, Orlando, Florida, 22-24 August, 2022.
* GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.