The Role of Semantic Layer in Generative AI Productization

The Role of Semantic Layer in Generative AI Productization

The Promise of Generative AI in Enterprise

ChatGPT is on everyone’s lips today – almost everyone has used it for their personal needs as well as their business needs. The rapid adoption of this new technology leaves many data practitioners feeling overwhelmed, almost like a hangover. This is especially true when considering the far-reaching consequences of implementing generative AI for a broad range of business users, especially within enterprises.

Generative AI’s business potential cannot be overstated: McKinsey estimates its potential value footprint to be over four trillion USD annually. Generative AI is estimated to be 40% more effective than traditional AI. This is why we’re seeing inflated valuations for generative AI startups and an increase in investments as organizations plan to use this potential as a revenue driver.

The Risks of Generative AI in Enterprise

In the enterprise context, generative AI finds its most common applications in “go-to-market” functions—such as sales, marketing, and customer success. It is also making inroads into traditional sectors like supply chains and pharmaceuticals. However, this is not without risks and challenges: about 35% of attendees at the San Diego Data Quality and Governance conference mentioned that said companies have banned the use of third-party generative AI services entirely. Additionally, 25% are in the process of crafting strategies or policies concerning the use of generative AI.

Typically, enterprises find challenges in these categories:

  • Security Risks: Malicious software could mimic users and exploit systems. This risk isn’t confined to third-party services; even internally implemented generative AI could present vulnerabilities if not properly secured.
  • Data Access and Manipulation: It’s not just about direct access to data. Intruders can also pose derivative questions to gain unauthorized insights, bypassing conventional access controls.
  • Inherent Bias: General-purpose language models are trained on publicly available data like Wikipedia and Twitter. Therefore, even if your internal data is unbiased, the model itself can introduce biases into your operations.
  • Data Leakage: There’s also the risk of exposing proprietary enterprise data during the training process with third-party services.

These challenges are often the main barriers that limit organizations from implementing generative AI tools within their day-to-day work and allowing them access to organizational data.

What is Semantic AI

This is where Semantic AI comes into play. The idea is simple: Semantic AI aims to bridge the gap between business and data language. It is a kind of “Babel Fish” that sits in your ear and translates business logic into data elements and vice versa. This enables more effective decision-making by providing data insights that are meaningful to business users. Semantic AI, according to Gartner, is a blend of graph networks and large language models that creates an effective semantic layer over data assets.

Two key components form the basis of a semantic layer, derived largely from Metadata:

  • Language models: Table and column names, metrics, and dashboards contain lots of semantic meaning that can be harnessed.
  • Graphs: These help to understand the context in which data is used, allowing for more targeted and effective queries.

Semantic AI could also be referred to as Semantic Layer. 

The power of a Semantic Layer lies in its ability to unify various siloed components within an organization under standardized business terms and metrics. This helps in clearly defining concepts like “lost revenue,” “VIP customers,” or “Net Revenue Retention (NRR).”

The Functionalities of Semantic Layer

The growing diversity of companies’ data stacks makes it crucial to implement a Semantic Layer before any new investment or addition of a tool in this category. Organizations now handle multiple data sources, both internal and external, and these are often processed by various applications for different analytical needs. This growing diversity also spans from on-premise to cloud-based to multi-cloud, industry data stores, and public and private sources.

In such a landscape it is essential that the Semantic Layer is stand-alone. It should be separate from your data sources, ETL tools, and applications. This enables it to act as a unified interpretive layer that can handle and make sense of the data from every corner of your stack, breaking down silos and facilitating better analytics.

This allows for several functionalities, like scaling analytics, and especially to prepare organizations for the implementation of generative AI tools:

  • Inventory Mapping: Semantic Layer maps all your existing data assets in one place, much like having an inventory of ingredients in your kitchen. This offers a roadmap for your analyses and queries, saving you the time you’d spend hunting down this information across divisions and systems.
  • Self-Service Discovery: Semantic layers allow for easy, self-service discovery of assets. You can understand what’s available and how to use it, right from your usual interface, whether that’s Slack, a BI tool, or something else.
  • Governance and Observability: This is where semantic layers really shine. They can detect conflicts in definitions or duplicate meanings across different systems. For example, if ‘Lost Customer’ is defined differently in your finance and sales systems, a semantic layer will alert you to the inconsistency and could even suggest fixes.

All of the three pillars of functionalities mentioned above still pose challenges in most organizations, where data quality, governance, and discovery are still a work in progress. Decision-making based on data analytics is still not as pervasive as one would expect, despite advancements in data dashboards and reporting tools. This is partly due to infrastructure limitations and the absence of comprehensive governance mechanisms. And yet, Chief Data Officers are under pressure to produce as many data products as possible. Now with ChatGPT’s hype, there are new expectations (especially from executives) to create an experience similar to conversational AI models but tailored for enterprise use.

The Role of Semantic Layer for Organizational Quality and Governance

The real challenge lies in data governance. For example, when someone asks, “How are my sales in Germany doing?” a generative AI tool could potentially pull data from multiple sources like Salesforce or finance databases. Without proper governance mechanisms, there’s no way to ensure that the data is consistent, accurate, or even relevant. So, while generative AI has potential, its practical, large-scale adoption is currently hindered by these infrastructural and governance challenges. But how do you apply governance at scale, for every question asked via chat?

Here the role of a Semantic Layer becomes crucial. It serves as a “person in the middle” that understands both the query and the data. When a query has ambiguities, like “country of sales”, the Semantic Layer can prompt the user for clarification. First of all, it interprets all existing assets. It also supports intelligent questioning. For example, if you’re new to BI tools, it can guide you on what questions are commonly asked by people in your position or in your industry. Governance and interpretation means using clear, consistent naming conventions across different divisions or “data zones” in the organization. A well-defined Semantic Layer can help in this regard. Automating this process is the only way to save valuable time.

But can you start right away, by implementing Semantic Layer integration to your chat experience? Another main task, the initial step in getting the organization ready for generative AI tools, is ensuring the data is well-prepared and meaningful. Deprecated, old, and test data can be queried by the LLMs, and generate misleading answers. Semantic AI with its metadata activation component is an excellent tool to alert on quality failures. Manual mapping technologies cannot help prepare your data for LLMs since they are not scalable enough to cover all your data.

Can Your Semantic Layer Speak Your Organization’s Language?

Context is essential to the querying process in order to allow LLMs to reach more accurate results. This concept is called RAG (Retrieval Augmented Generation). While you could embed some data directly into the query, keep in mind the current limitations on the context size that LLMs can handle is roughly the size of a 50-page book. For smaller, more specific queries or data sets, you can indeed embed data directly into the model for better results. However, for larger, more complex data sets, especially enterprise systems, the best approach is to ensure you have a Custom LM which is trained on domain-specific corpus – just training a foundational LLM on your own data is not enough. Those Custom LMs could be the vehicle of automated governance applications in the chat experience.

While organizations are looking to take advantage of LLMs and promote their use for their own business needs, most are still wary of their risks and challenges. These can be mitigated by a well-constructed workflow aided by a Semantic Layer, which will also support all governance and observability needs, interpret from data to business language, and act as a go-between for the query and data and make sure ambiguities, which can cause LLM hallucinations, are made clear.

Stay in the loop on all things Metadata, LLM Governance, GenAI, and Semantic Data Fabric. By subscribing you’re agreeing to the illumex Privacy Policy.

We use cookies to help personalize content, tailor and measure ads, and provide a safer experience. By continuing to use this website you consent to the use of the cookies in accordance with our Cookie Policy.