Are Data Catalogs Actually Catalogs?

We might need to change our way of thinking, and stop using the term “catalog” in the data context. Here’s the long explanation and the surprise in it end:

Not many organizations can say they are happy with the ROI of their investments in the Data & Analytics stack. Why is that?

They still have siloed data across the organization
They have Inconsistent analytics
They struggle to locate the comprehensive and relevant data

The “debt” created by the above-mentioned situations causes every data initiative to become labor intensive.

Put that in the context of the enormous growth in global digital data, with 90% of it generated only in the past two years (!!!). Soon enough, the growth rate will be 200% year over year. Even for the most seasoned data practitioner, this is hard to comprehend.

We will not be able to keep up pace with this data flood.

Even today, businesses report on:

Incongruent data and metrics: business decision makers cannot see different versions of ACV in the BI Dashboards and have trust in the insights.
High labor costs: today’s benchmark is that a software license for an analytics tool will account for less than 10% of the entire implementation costs; despite layoffs, analysts’ job postings are still on a spike making their salaries soar high.
Very slow analytics implementation cycles: imagine asking a question and getting an answer two months later – would you still aspire to make decisions based on data?
Tribal knowledge only increases: when employees leave, their knowledge leaves with them, and the team members who stay on have even less time to document any processes and logic.
Duplicates of data and analytics: how many “temp” and “final” tables and pipelines exist in organizations? How many times do you pay to calculate the answer to the same question?

What is your D&A annual budget? Whatever the number is, according to the problems above, half of it is wasted. At least.

We were taught that to manage assets effectively, we must first map and measure them. The tools which are intuitively associated with the mapping of D&A assets are Catalogs.

Only this is the very last thing catalogs actually do.

DataOps catalogs help data teams to collaborate.

Governance catalogs are repositories of manual documentation and access policies.

None of them can promise full disclosure on the inventory of D&A assets and their (mis)use from the efficiency standpoint.

So let’s stop calling them “catalogs” – (manual) Business Glossary, Data Dictionary, Lineage, Access Rights Management – these are better definitions of what those tools can do for you.

Are Data Catalogs Actually Catalogs?

Related Posts

Empowering Agentic AI With Business Ontologies and MCP for Enterprise Data

Why Your Proprietary Data Is the Real MVP in the Agentic AI Race

Your Ultimate Guide to Agentic AI-Powered D&A Self-Service