Semantic Schema
The real challenge of adding agentic capabilities to a product like Kubling is that metadata, and its semantics, change constantly.
This makes it difficult to create a general-purpose agent (or LLM) that can understand the meaning of your evolving schemas without fine-tuning or enhancing existing models.
While such approaches are possible, not all organizations have the budget or expertise to maintain that kind of infrastructure.
Our approach is to simplify the process by generating minimal, targeted contexts containing just enough information for an LLM to reason effectively, whether it’s answering questions, performing complex investigations, or mutating data.
To keep things simple, objects in your schema (such as tables and fields) can be annotated directly in the DDL. These annotations are used to build a simplified semantic structure that captures the information most relevant to an LLM.
Model Metadata
The semantic schema is transformed into model metadata by the engine, which is then sent to the specified implementation during instance registration.
This metadata is used, at least in our implementation, to help extract meaning from questions and prompts, enabling the agent to reason effectively based on your domain-specific context.
Model Metadata with Semantic Schemas (JSON/YAML Schema)
type: "object"
id: "schema:kubling:agent:model:metadata:VDBMetadata"
properties:
name:
type: "string"
dataSources:
type: "array"
items:
type: "object"
id: "schema:kubling:agent:model:metadata:DataSourceMetadata"
properties:
name:
type: "string"
description:
type: "string"
tables:
type: "array"
items:
type: "object"
id: "schema:kubling:agent:model:metadata:TableMetadata"
properties:
name:
type: "string"
description:
type: "string"
fields:
type: "array"
items:
type: "object"
id: "schema:kubling:agent:model:metadata:FieldMetadata"
properties:
name:
type: "string"
description:
type: "string"
type:
type: "string"
pk:
type: "boolean"
properties:
type: "object"
additionalProperties:
type: "object"
id: "schema:kubling:InnerObject"
properties:
type: "object"
additionalProperties:
type: "object"
$ref: "schema:kubling:InnerObject"
uniqueKeys:
type: "array"
items:
type: "object"
id: "schema:kubling:agent:model:metadata:KeyRecord"
properties:
name:
type: "string"
columns:
type: "array"
items:
type: "string"
indexes:
type: "array"
items:
type: "object"
$ref: "schema:kubling:agent:model:metadata:KeyRecord"
foreignKeys:
type: "array"
items:
type: "object"
$ref: "schema:kubling:agent:model:metadata:KeyRecord"
relationships:
type: "object"
additionalProperties:
type: "array"
items:
type: "string"
properties:
type: "object"
additionalProperties:
type: "object"
$ref: "schema:kubling:InnerObject"
properties:
type: "object"
additionalProperties:
type: "object"
$ref: "schema:kubling:InnerObject"
General Annotations
To generate a Semantic Schema, the foundational layer consists of general annotations, which provide meaning to:
- Tables
- Fields
- Functions
These annotations help LLMs understand the structure and purpose of your data, enabling more accurate and relevant responses.
Annotating Tables
When annotating tables, the goal is to describe their general meaning—what the table represents within your schema.
Keep in mind that many LLMs already have a baseline understanding of common entities. For example, if you have a table named USER
, it's usually unnecessary to over-explain it, as concepts like "user," "email," or "password" are universally known.
Example:
CREATE FOREIGN TABLE POD_CONTAINER
(
...
)
OPTIONS(...,
ANNOTATION 'Represents a container running inside a Kubernetes Pod.');
Best Practices for Table Annotations
-
Ensure relevance for vector search
During inference, only the most relevant metadata is retrieved and included in the prompt context. If a table isn't clearly described, it may be ignored or incorrectly prioritized.
→ Make sure annotations are concise, meaningful, and representative of real-world use cases. -
Balance context with assumed knowledge
Don’t over-explain concepts that are likely already understood by the model.
→ For example, instead of writing:
"This table contains the user account data such as email, password hash, and user role..."
,
prefer:
"Represents an application user."
-
Use domain-specific hints
LLMs perform better when annotations help tie data to your specific domain.
→ For example:CREATE FOREIGN TABLE INCIDENT_LOG ( ... ) OPTIONS(..., ANNOTATION 'Captures reported incidents within the incident management system, including timestamps and severity levels.');
-
Avoid noise
Avoid adding technical or structural information (e.g., field types, indexes) in annotations unless it directly affects semantics. This type of metadata is already generated automatically by the engine, so there's no need to repeat it.
Relationship Annotations
Kubling supports a set of relationship annotations that allow you to define semantic links between objects in your schema. These annotations help the LLM understand how different components relate to each other, enabling it to reason more effectively when performing investigations, answering questions, or establishing cause-effect chains.
By annotating these relationships, you're essentially providing a simplified, declarative graph of interactions within your domain, something that would otherwise require extensive training.
Supported Relationship Annotations
Kubling currently supports the following annotations:
relationships
— General associations between this object and others. Use when the direction or type of relationship is not essential.relationship_affects
— Indicates which objects this one can influence or change.relationship_affected_by
— Indicates which objects can influence or change this one.relationship_generates
— Denotes outputs or artifacts generated by this object.relationship_caused_by
— Used to express root-cause-like relationships, especially in diagnostic or event-driven systems.
Example
CREATE FOREIGN TABLE HORIZONTAL_POD_AUTOSCALER
(
...
)
OPTIONS(
...
ANNOTATION 'Represents a Kubernetes HorizontalPodAutoscaler (HPA), which automatically scales the number of pod replicas based on resource metrics.',
relationship_affects 'DEPLOYMENT, REPLICASET, STATEFULSET',
relationship_affected_by 'METRICS_SERVER'
);
In this example:
- The
HORIZONTAL_POD_AUTOSCALER
affects deployments, replicasets, and statefulsets—since it scales them. - It is affected by the
METRICS_SERVER
, which provides the metrics it uses to make scaling decisions.
Best Practices
-
Keep it simple
Use concise names for related entities, matching your schema object names, so the relationships are easy to resolve. -
Be directional when needed
Userelationship_affects
andrelationship_affected_by
when causality or flow matters. Userelationships
when direction is irrelevant or bi-directional. -
Duplicate intentionally
Even if two objects are connected, each should declare the relationship on its own side.
→ For example, ifA
affectsB
, thenB
should also declare it isaffected_by A
.
This ensures both objects are properly represented in the vector space and improves retrieval accuracy during reasoning. -
Align with your schema naming
Make sure the referenced entities match actual table or object names used in your schema for consistent mapping. Relationship annotations play a crucial role in enabling powerful multi-hop reasoning and investigation capabilities within the Kubling Agent Platform.