Table of Contents

Share
Subscribe to our blog

Stay in the loop on all things Metadata, LLM Governance, GenAI, and Semantic Data Fabric. By subscribing you’re agreeing to the illumex Privacy Policy.

Submit this form to subscribe to illumex Blog. Your privacy is important; we won’t share your details. Use ‘unsubscribe’ in the blog email digest to stop receiving Metadata, Data Fabric, and LLM Governance content.

Table of Contents

Share
Subscribe to our blog

Stay in the loop on all things Metadata, LLM Governance, GenAI, and Semantic Data Fabric. By subscribing you’re agreeing to the illumex Privacy Policy.

Submit this form to subscribe to illumex Blog. Your privacy is important; we won’t share your details. Use ‘unsubscribe’ in the blog email digest to stop receiving Metadata, Data Fabric, and LLM Governance content.

2023 Predictions in Data and Analytics

My Predictions on: 

  1. IT and Data to Enable Sustainability
  2. Observability to Power Up Resilience
  3. Transparency to Bring Back Trust

 

My take on 2023 is concise – three significant themes: Sustainability, Resilience, and Trust.2023 will be marked as the year of “back to basics,” doing more with less, and improving existing practices with transparency and collaborative goals. Having said that, an “annual house cleaning” always helps to purge the waste and unleash the creativity of what could be done now with more focus. 

In 2022 the leading theme for me was Quality: fast implementation of the new processes and use cases after the COVID-19 hit brought a lot of pressure on the infrastructure, and the “pipes” didn’t deliver well. Lots of patch solutions, duplicate and conflicted pipelines, tons of manual labor and reverse engineering. Discussions on Governance and Data Literacy exploded – all in order to make our infrastructure’s quality trustworthy enough to allow credible decision-making.

2023 is the third year of the pandemic-driven Digital Transformation acceleration. After the initial outpour of resources into everything Data and Analytics to enable growth, we are now much more mature and able to evaluate its efficiency: the Total Cost of Ownership, the degree of automation, the adaptivity of the solution to the changing reality, and the consolidation of various functionality under fewer tools. 

These are my own takes on the focus for Data and Analytics leaders and the respective tech solutions trends in those areas.


IT and Data to Enable Sustainability

It is widely known that S&P companies’ lifespan went from 60 to 18 years within a few decades. And this is well before the COVID-19 pandemic and the current economic downturn we face.

It is also proven that data-savvy organizations outpace their less tech-adaptive competitors by far. This ambition to leverage tech to win the market might explain why global spending on digital transformations of business practices, products, and organizations is forecasted to reach $2.8 trillion in 2025.

The bad news is that over half of digital initiatives take too long to complete, and the value is yet to be realized. And most of the data is stored but unused.

Due to such poor business performance, organizations trade their lavish spend on tools for initiatives in which tools enable Data Literacy, transparency, and observability of the assets and their usage.

We also see a consolidation of the tech stack for interoperability and cost reduction, often resulting in a shift to a single vendor infrastructure. It also becomes more and more evident that organizations aspire to an infrastructure that allows them to organize and access their data and analytics assets, so they can understand and use them with trust.

Moreover, by 2025, 50% of CIOs will have performance metrics tied to the sustainability of the IT organization. 

So, now we finally get the opportunity to push the Sustainability initiative as a part of the whole Business Resilience fueled by observable adaptive and cost-efficient tech.

Furthermore, we can share this load with our peers: by 2027, more than 50% of enterprises will use industry cloud platforms to accelerate their business initiatives.

Industry cloud platforms might take some time to ramp up, but the major cloud platforms have already introduced data marketplaces a while back. The idea is to allow organizations to trade and exchange data, objects, and products. Integrating and management become more manageable if you already use the same cloud platform for your internal data. Industry platforms and marketplaces are poised to enrich this experience to provide tailored composable solutions. They open up data monetization opportunities and synergies in both generic and industry marketplaces. And, while core organizational data sharing is a hard legal pill to swallow, collaboration around sustainability and climate initiatives could be a great start for data synergies.

Another thing to consider when you shift into a Sustainable, cost-effective future is the TCO (Total Cost of Ownership). The new solution may require a different headcount to maintain it. It might only solve part of the problem, which will require a purchase of  further solutions,  and the costs may be hard to predict.

To conclude this part, we aspire to make our organizational data stack more in a sustainable way as a part of the overall organizational effort. Ideally, this also improves the overall sustainability of the organizational stakeholders.

Tech Trends to Know:

Let’s be frank, organizations struggle to create, monitor and govern data pipelines. Data Platforms spotted that, and they slowly but surely aim to incorporate possible solutions in their offering. It might be a good idea. But it also creates data silos since not your whole organization runs on a CDW.

To that point, most organizations still divide data per store based on the analytic use case: ad-hoc questions, reports, and machine-learning models. In the last five years, CDWs grew to process 50% of all analytics data load and will reach 75% in two years. However, it is too early to give up on data lakes and in-memory databases.

We still have to address the “feedback loop” or Reverse-ETL: insights from data pipelines or analysis are written to an application or file or back to the data source. Since the CDW’s aspiration to replace operational data sources or data lakes, is still not mature enough, I recommend against it. Instead,  I  advocate a simpler architecture of using APIs to communicate directly between the analytics and operational apps.

I will elaborate on the Mapping and Governance of the data infrastructure in the next section.

Over 20% of venture capital raised in 2021 by startup companies, over $60B, went to  startups with data-related products. With regards to Sustainability and cost-efficiency, you can find better TCO and automation and better ROI among the smaller startups and companies. The data solutions market is booming with a multitude of products of various price ranges from many different vendors. 

 

Observability to Power Up Resilience

What would you give to decrease your system downtime by 80%?

If your organization invests in building a digital immune system, it might gain just that in a mere couple of years.

So, what does it take to become digitally immune? Just mix observability with augmented testing, monitoring, and automated remediation, add security and risk management, and here you are – digitally resilient and adaptive!

I want to specifically focus on Observability.

By 2026, 70% of organizations that successfully applied observability will achieve shorter latency for decision-making.

This was exactly what I covered in the previous section – the implementation of highly automated tools to enable fast and confident decision-making.

Observability in the data and analytics context is mapping data and analytics assets, business processes, and applications – it is called “metadata.” It is about understanding various assets’ meanings (semantics) and context (relations). Monitoring patterns of the context and usage can be the basis for activating metadata into automated decision loops.

Both passive and active metadata give you transparency and an understanding of which data and analytics you have and how they are used. This is the basis for waste elimination in unused and stale data, broken pipelines, and conflicted metrics (more in the next theme)

Till now, most progressive organizations created data contracts for specific assets to understand their lineage, usage, applicability, and impact. With the metadata activation, we can aspire to map all our assets automatically. In that reality, we come closer to monetizing data assets (as discussed in the previous theme).

 

Tech Trends to Know:

The tools to address these challenges and provide the solutions to Observability are currently spread out between different categories, such as Data Lineage, Data Observability, Catalog, Active Metadata Management Platforms and Metric Stores.

From a wider perspective, these tools are the convergence of Data Mesh and Data Fabric. The Data Mesh’s objective is to create composable data products on a scale. The Data Fabric objective is to map all data and analytics assets wherever they are and govern them in order to ensure uniqueness, lack of conflicts, and a single source of truth. Their combination gives us the launchpad for data products with built-in Discovery, Observability, and Governance and with built-in collaboration vehicles between data producers and consumers.

I see FinOps as another characteristic of Observability: being able to attribute the data asset usage to a business process or application and their respective infrastructure costs. FinOps tools are introduced as a part of data management platforms, Application Monitoring systems, and lately, they have gained popularity in the Active Metadata Management platforms roadmap.

 

Transparency to Bring Back Trust

AI and Machine Learning struggle to get traction within organizations, partly because consumers do not trust what they do not understand.

In recent years we talked about Explainable AI, and now we call it Governed AI/ML.

This new name also reflects the shift of definition of what it takes to trust AI/ML: risk, privacy, security management, data reliability, infrastructure, and explainability.

When you read the list above, you see its similarity to the overall Data & Analytics practice. The failure to productize and then to realize the business value, jeopardizes the trust among business stakeholders and makes it harder to attract further organizational budgets.

Still, no one argues that automated AI-powered data-driven decision-making is the future and has a huge competitive advantage.

The possible solution lies in people-oriented Governance and Data Literacy initiatives.

Understanding and interpreting data and analytics requires Data Literacy. 80% of our current business task force does not feel confident using data.

If Observability allows us to see and understand the context, Data Literacy will translate for us what we see and how we can act upon it.

Now to Governance: historically, we treated Governance as a defensive mechanism to grant users access to data and analytics on a “need-to-know basis.” The times changed, and now most of the employees in business roles aspire to but are unable to make data-driven decisions. Governance’s new role is to enable facilitated self-service for Data-literate personnel. 

For this new “offensive” play, we need Governance applied for all data products and services, including ML models.

 

Tech Trends to Know:

I’m biased, but my answer to this plethora of questions is to Govern the organizational logic and data assets usage in a “stand-alone Semantic Layer”.

Here are my reasons:

  • If you document your pipelines, transformations, metrics, and questions in a BI tool ETL or a data store, you are creating yet another siloed, fragmented picture of your organization, where different roles only have one perspective of reality;
  • If you want to have end to end observability as to where the metric originates from, by which calculator on which data and which version of the metric is used and by whom – you must  have it on top or outside of the CDW/ETL/BI pipeline;
  • If you want to leverage data assets Observability and Governance for Risk and Security, they should be available for those workflows or via interfaces in a way that wouldn’t require another “Ops” to run it;
  • Metric stores/Feature stores/Business Glossaries/Views duplicate the effort of Governance and Observability over the same type of assets – why not have one platform to serve all those scenarios on top of unique semantic structure natively connected to your data?
  • Last but not least: do you really want your critical business metrics to be documented on the same platform as the one your data resides in?

 

Conclusion:

We discussed costs, efficiency, transparency, and better business alignment. But what it really boils down to is Sustainability, resilience, and trust; being able to thrive through the tough periods with digital transformation; being able to quickly recap after inevitable failures, to learn from them and become smarter; building a collaborative support system around data for people.

 

Have a Happy and Healthy 2023!

Related Posts

Automated Semantic Data Labeling for Trustworthy GenAI interactions

Taming the Jungles of Your Data with Semantic Data Labeling

There is a treasure trove of insights within your organizational data. More often than not,...

Read More >>
Generative Semantic Data Fabric - don't get RAGged by your RAG

Don’t Get RAGged by your RAG: Why Generative Semantic Fabric is the Future

So, you’ve hopped on the Retrieval Augmented Generation (RAG) bandwagon. It’s the popular choice, and...

Read More >>
Subscribe to our newsletter

Submit this form to subscribe to illumex digest. Your privacy is important; we won’t share your details. Use ‘unsubscribe’ in the digest to stop receiving Metadata, Data Fabric, and LLM Governance content.

Stay in the loop on all things Metadata, LLM Governance, GenAI, and Semantic Data Fabric. By subscribing you’re agreeing to the illumex Privacy Policy.

We use cookies to help personalize content, tailor and measure ads, and provide a safer experience. By continuing to use this website you consent to the use of the cookies in accordance with our Cookie Policy.