DDL
Kubling uses SQL-like DDL files to define TABLEs
, which are organized within SCHEMAs
.
A SCHEMA
serves as a container for metadata describing various objects, such as TABLEs
, VIEWs
, or PROCEDUREs
.
Each SCHEMA
is part of a Virtual Database (VDB), defined within the primary Descriptor bundle.
The real advantage of Kubling’s database virtualization engine lies in its flexibility: you can redefine the structure of TABLEs
regardless of the underlying system’s actual model.
This allows you to tailor data representations to your needs while still reflecting the source system's objects.
This flexibility is crucial for creating viewpoints tailored to specific data interactions, enabling you to adapt the data model to your needs rather than conforming to the system’s structure.
Additionally, Kubling allows the definition of TABLEs
based on APIs, even those returning complex JSON
documents.
This capability simplifies working with deeply nested data structures, making it easier to transform them into a tabular format.
DDL files are parsed as templates, meaning that you can use context properties.
See more about template parser.
Define a TABLE
CREATE TABLE {table-name} (
<table-element> (,<table-element>)*
(,<constraint>)*
) [OPTIONS (<options-clause>)]
<table-element> ::=
{column-name} <data-type> <element-attr> <options-clause>
<data-type> ::=
varchar | boolean | integer | double | date | timestamp .. (see Data Types)
<element-attr> ::=
[AUTO_INCREMENT] [NOT NULL] [PRIMARY KEY] [UNIQUE] [INDEX] [DEFAULT {expr}]
<constraint> ::=
CONSTRAINT {constraint-name} (
PRIMARY KEY <columns> |
FOREIGN KEY (<columns>) REFERENCES tbl (<columns>)
UNIQUE <columns> |
INDEX <columns>
<columns> ::=
( {column-name} [,{column-name}]* )
<options-clause> ::=
<key> <value>[,<key>, <value>]*
Sample
CREATE FOREIGN TABLE NAMESPACE
(clusterName string OPTIONS(val_variable 'cluster_name'),
clusterUrl string OPTIONS(val_constant '{{ schema.properties.kubernetes_api_url }}'),
schema string OPTIONS(val_constant '{{ schema.name }}'),
metadata__name string,
metadata__labels json OPTIONS(parser_format 'asJsonPretty'),
status__phase string,
PRIMARY KEY(metadata__name),
UNIQUE(clusterUrl, metadata__name))
OPTIONS(updatable true,
supports_idempotency false,
tags 'kubernetes;{{ schema.properties.cluster_name }};namespace');
Double underscore in field name has a special meaning in the context of a Script Document Data Source. See here for more information.
Options
Following options work ony in document data sources. However, we are implementing them also in non-document Adapters, like POSTGRESQL
.
You can follow the progress in roadmap section.
Options is the mechanism used by the Engine to pass meta-information from the TABLE
definition to the query processor (QP).
Internally, each listed OPTION
but tags
is referred as DIRECTIVE
since, as mentioned before, it can change the behavior of the QP
Common TABLE
Directives
Directive | Type | Options | Description |
---|---|---|---|
updatable | Boolean | True/False | Indicates whether the QP allows INSERT , UPDATE and DELETE operations on this TABLE |
tags | String | Any | Defines tags for the TABLE , which can be used to get TABLE s based on tags criteria. |
cache | String | cache 'option' Options: enabled , disabled , schemaDefault | Defines the cache behavior for this TABLE when caching is enabled at the data source level. |
Common Field Directives
Directive | Type | Options | Description |
---|---|---|---|
val_constant | String | Any | Defines a constant field value. |
val_variable | String | Any | The value of the field is get from the properties. More information. |
val_pk | String | [field+field+...] | Creates a calculated PK based on a field set.Since the value is not stored anywhere, when filtering using this field, the QP replaces the WHERE section by the original list of field's values. |
call_func | String | call_func 'fn_name<field, field,...>' | Calls a function (built-in or user-defined) to get the field value. Function parameters can be other's fields values. |
Specific directives are listed in each data source documentation.
Schema post-precessing in Document data sources
Suppose our data source interacts with the GitHub API, and we are retrieving a specific repository. GitHub’s response would be something like the following
{
"id": 1296269,
"node_id": "MDEwOlJlcG9zaXRvcnkxMjk2MjY5",
"name": "app-backend",
"full_name": "demo/app-backend",
"owner": {
"login": "demo",
"id": 1,
"node_id": "MDQ6VXNlcjE=",
"avatar_url": "https://github.com/images/error/demo_happy.gif",
"gravatar_id": "",
"url": "https://api.github.com/users/demo",
"html_url": "https://github.com/demo",
...
},
"private": false,
"html_url": "https://github.com/demo/app-backend",
"description": "This your first repo!",
"fork": false,
"url": "https://api.github.com/repos/demo/app-backend",
"archive_url": "https://api.github.com/repos/demo/app-backend/{archive_format}{/ref}",
"assignees_url": "https://api.github.com/repos/demo/app-backend/assignees{/user}",
"blobs_url": "https://api.github.com/repos/demo/app-backend/git/blobs{/sha}",
"branches_url": "https://api.github.com/repos/demo/app-backend/branches{/branch}",
...
"topics": [
"demo",
"atom",
"electron",
"api"
],
"has_issues": true,
"has_projects": true,
"has_wiki": true,
"has_pages": false,
"has_downloads": true,
"has_discussions": false,
"archived": false,
"disabled": false,
"visibility": "public",
"pushed_at": "2011-01-26T19:06:43Z",
"created_at": "2011-01-26T19:01:12Z",
"updated_at": "2011-01-26T19:14:43Z",
"permissions": {
"admin": false,
"push": false,
"pull": true
},
"security_and_analysis": {
"advanced_security": {
"status": "enabled"
},
"secret_scanning": {
"status": "enabled"
},
"secret_scanning_push_protection": {
"status": "disabled"
}
}
}
The challenge we face when translating a document schema into Kubling's Schema is handling nested fields.
Before diving into how to manage nested fields, it's important to note that while you can create a Kubling Schema that mirrors the entire document schema, it's generally not recommended. Most applications use only a subset of the data provided by upstream systems via their APIs. Therefore, the design phase remains critical when implementing Kubling.
When dealing with nested fields, there are three options:
- Store the entire nested document in a
json
field. - Use a path-like access naming convention.
- Ungroup fields into
synthetic
tables.
json
field
A json
field is treated by the engine as a nested document but behaves like any other regular field.
Path-like access
Suppose that you only need a couple of fields of owner
rather than the entire nested document.
When defining the schema
, you can simply do the following:
CREATE FOREIGN TABLE CODE_REPO
(
id long,
name string,
...
owner__login string,
owner__html_url string,
...
) OPTIONS(...);
When a document-based adapter retrieves the information, the special double underscore character (__
) is interpreted as a JSON path, equivalent to .[field]
.
Essentially, it replaces the dot (.
) typically used in path notation.
synthetic
tables
synthetic
are one of the most powerful features in Kubling because they allow the creation of tables that do not have a direct physical entity in the remote data source.
Continuing with this example, we can define a synthetic table called CODE_REPO_OWNER
to represent the nested document, as shown below:
CREATE FOREIGN TABLE CODE_REPO_OWNER
(
repoId long NOT NULL OPTIONS(synthetic_type 'parent', synthetic_parent_field 'id'),
repoName long NOT NULL OPTIONS(synthetic_type 'parent', synthetic_parent_field 'name'),
id long,
node_id string,
"login" string,
html_url string,
...
identifier string OPTIONS(val_pk 'repoId+id' ),
PRIMARY KEY(identifier),
UNIQUE(repoId, id)
) OPTIONS(
synthetic_parent 'CODE_REPO',
synthetic_path 'owner');
CODE_REPO_OWNER
is a table derived from the synthetic_path
field of synthetic_parent
.
It includes two fields inherited from its parent, marked as synthetic_type 'parent'
, and a generated identifier that also serves as the primary key.
Additionally, we mark repoId
and id
as UNIQUE
, ensuring that no other tuple will have the exact same combination of values.
The UNIQUE
constraint plays an important role: when a Create, Update, or Delete (CUD) operation is performed on a synthetic
table,
the engine needs to trace the exact document lineage. It does this by relying on the combination of fields in the UNIQUE
constraint.
Document arrays
When a field is an array of documents, the main difference is that the synthetic
table will retrieve multiple rows for each parent document.
However, you must ensure that the path built using the UNIQUE
constraint leads to the exact element,
meaning both the parent document(s) and the precise index or indices of the array.
INSERT
on a synthetic
table
Performing an INSERT
on a synthetic table means that a new element must be added to the parent document's synthetic_path
, whether it's a field or an array.
The engine will create this new element based on the field values that are not associated with any synthetic_type
.
These values will then be used to trace the document lineage and identify the correct node where the new element should be inserted.
UPDATE
on a synthetic
table
Similar to INSERT
, but instead of adding a new element (node), the values specified in the SET
clause will replace an existing node.
DELETE
on a synthetic
table
Similar to INSERT
, but after identifying the node, the engine removes it from the document.