Data mapping is such an obvious concept. It’s one of those things that every organization says they want or need. Yet, so few actually have a proper map of their data. Even with data governance and cataloging being at the top of the strategy list by any analyst, most companies have not embarked on this process.
“It’s a ‘next quarter’ thing”
Mapping data is the very first step in almost any metadata management project. Whether in the simple act of bringing on a centralized mapping solution, or when building a data catalog, a data dictionary, or a business glossary. Having the entire organization’s data mapped and accessible is a massive enabler for anyone from data engineering teams to Business Intelligence & Analytics teams through to the very edge of business function leaders.
If it is such an obvious task that enables critical projects and teams across the company to work better, why is it still just the exclusive club of conglomerates, government, and big-tech who have put significant effort into this?
Data maps come in different formats
Here are the top reasons I see and hear most during my day-to-day.
There are many data mapping solutions out there. To some degree, even business intelligence solutions or query execution interfaces do some level of data mapping but they are not designed for sustainable schema mapping nor do they offer the search capabilities and navigation that proper mapping tools and solutions do. Data dictionaries, data catalogs, and some data governance solutions use mapping techniques to serve the need they were designed for and a few of them actually offer some sort of a data map.
It is important to note that I view data mapping as a comprehensive and active solution similar to online maps (like Google and Bing). These maps should zoom out to show where data is located across the org and in every database (not just a single database). Zooming in should show relationships and clusters of data (not just the most used database tables) and the full zoom should include usage statistics, documentation, and certification of data elements. The entire map should be easily searchable and be designed with business enablement in mind.
As I said, after speaking with dozens of companies every month I have found they all have the same familiar reasons for why they are now just starting to look at mapping data or why they are earmarking this as a 2023/4 project:
- “Bandaids hide small wounds” – younger companies simply don’t think they need this right now or don’t see enough benefit in mapped data to make it a priority. These companies are typically 5-10 years old, they are still on a single, fairly clean, database, and have small agile analytics teams. Because of the smaller size, lack of complexity, and agility these companies have, the pains of not having their data mapped are very limited. It can easily be covered by quick and dirty “bandaids” by using Google Docs and Wikis to manually map data. While it does not serve the long-term challenge. It is “good enough for now” and means these companies can point their limited resources to more critical areas.
- “Surgery is long, expensive and results are questionable” – Slightly more mature companies feel the pain much more. Some of them have the “quick and dirty” solutions in place but have outgrown them, and now need something that can scale, adapt, and self-maintain. Mapping templates based on schema maps with 5-word definitions just don’t cut it. These companies need something to go deep and they need something that is specifically designed for complexity and scale.
The real problem here is cost. Not just in terms of budget but in terms of effort and human resources. Most existing solutions are semi-automated data mapping solutions at best. They can extract metadata names and put them in a list for someone to go over and document. These solutions are pricy and require that special “someone” who is supposed to document the data, usually external consultants or solution implementors who lack the specific business context and information to perform tasks properly and speedily.
Paying for an army of people to perform this task is what many large companies have done. Many with pretty disappointing results. The lesson these companies are learning is that having your data mapped is not the same as having a data mapping solution. If it’s not active and self-maintaining, its value is questionable.
- “Fix everything or nothing at all” – There are several companies that sound very eager at first to bring on a proper data mapping solution but as the conversation progresses, so does the long long list of expectations they have from the solution. These companies expect data mapping solutions to solve every data governance need in an automated way and at an affordable cost. They want a solution to either solve everything from mapping to data quality observability and downtime prevention or want nothing at all.
The interesting fact I found here is that these projects started with a very specific, and usually basic need to have searchable documented data. But they evolved and evolved until even the basic requirements didn’t get answered.
All of the reasons above are valid. You might even be able to point clearly to where you are at on this spectrum and explain why this is not something your company has started yet. After all, the need for mapping data is fairly new. It is only with the growing complexity of data landscapes and the desire to enable the business to use more data that having a sophisticated data map became so critical. There are lots of solutions today that can handle this and plenty of startups in the space that, in my prediction, will become table stakes for any modern data architecture over the next 12-24 months. The top solutions will focus on adaptability, automation, scale, insight, and affordability.
Photo by Alina Grubnyak on Unsplash