Queries

SQL compatibility

Kubling provides most of the functionality of SQL-92 DML. The idea of this document is not to cover SQL exhaustively, but rather highlights how SQL is used in Kubling.

Identifiers

SQL commands contain references to tables and columns. These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command. All queries are processed in the context of a virtual database, or VDB. Because information can be federated across multiple sources, tables and columns must be scoped in some manner to avoid conflicts. This scoping is provided by schemas, which contain the information for each data source or set of views.

Fully-qualified table and column names are of the following form, where the separate `parts' of the identifier are delimited by periods.

TABLE: <schema_name>.<table_spec>
COLUMN: <schema_name>.<table_spec>.<column_name>

Syntax

Identifiers can consist of alphanumeric characters, or the underscore (_) character, and must begin with an alphabetic character. Any Unicode character may be used in an identifier.
Identifiers in double quotes can have any contents. The double quote character can be used if is escaped with an additional double quote; for example, "some "" id".
Because different data sources organize tables in different ways, with some prepending catalog, schema, or user information, Kubling allows table specification to be a dot-delimited construct.
Columns, column aliases, and schemas cannot contain a dot (.) character.
Identifiers, even when quoted, are not case-sensitive in Kubling.

Examples of valid, fully-qualified `TABLE` identifiers are:

Kubernetes.DEPLOYMENTS
"Kubernetes.Deployments"

Examples of valid, fully-qualified `COLUMN` identifiers are:

Kubernetes.DEPLOYMENTS.metadata__name
"Kubernetes.Deployments"."metadata__name"

Operator precedence

Operator	Description
`[]`	Array element reference
`+`, `-`	positive/negative value expression
`*`, `/`	multiplication/division
`+`, `-`	addition/subtraction
`\|\|`	concat
`criteria`

Expressions

Identifiers, literals, and functions can be combined into expressions. Expressions can be used in a query with nearly any keyword, including SELECT, FROM (if specifying join criteria), WHERE, GROUP BY, HAVING, or ORDER BY.

Column Identifiers

Column identifiers are used to specify the output columns in SELECT statements, the columns and their values for INSERT and UPDATE statements, and criteria used in WHERE and FROM clauses. They are also used in GROUP BY, HAVING, and ORDER BY clauses. The syntax for column identifiers was defined in the Identifiers section above.

Literals

Literal values represent fixed values. These can be any of the 'standard' data types. For information about data types, see Data Types..

Syntax Rules

Integer values will be assigned an integral data type big enough to hold the value (integer, long, or biginteger).
Floating point values will always be parsed as a double.
The keyword null is used to represent an absent or unknown value and is inherently untyped. In many cases, a null literal value will be assigned an implied type based on context. For example, in the function 5 + null, the nullvalue will be assigned the type `integer` to match the type of the value `5`. Anull`` literal used in the SELECT clause of a query with no implied context will be assigned to type string.

Examples

'abc'

'isn"t true'

-37.75e01

true

false

Unicode character:

'\u0027'

Binary:

X'0F0A'

Aggregate functions

Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group.

COUNT(*) Count the number of values (including nulls and duplicates) in a group. Returns an integer - an exception will be thrown if a larger count is computed.

COUNT(x) Count the number of values (excluding nulls) in a group. Returns an integer - an exception will be thrown if a larger count is computed.

COUNT_BIG(*) Count the number of values (including nulls and duplicates) in a group. Returns a long - an exception will be thrown if a larger count is computed.

COUNT_BIG(x) Count the number of values (excluding nulls) in a group. Returns a long - an exception will be thrown if a larger count is computed.

SUM(x) Sum of the values (excluding nulls) in a group.

AVG(x) Average of the values (excluding nulls) in a group.

MIN(x) Minimum value in a group (excluding null).

MAX(x) Maximum value in a group (excluding null).

ANY(x)/SOME(x) Returns TRUE if any value in the group is TRUE (excluding null).

EVERY(x) Returns TRUE if every value in the group is TRUE (excluding null).

VAR_POP(x) Biased variance (excluding null) logically equals(sum(x^2) - sum(x)^2/count(x))/count(x); returns a double; null if count = 0.

VAR_SAMP(x) Sample variance (excluding null) logically equals(sum(x^2) - sum(x)^2/count(x))/(count(x) - 1); returns a double; null if count < 2.

STDDEV_POP(x) Standard deviation (excluding null) logically equals SQRT(VAR_POP(x)).

STDDEV_SAMP(x) Sample standard deviation (excluding null) logically equals SQRT(VAR_SAMP(x)).

Case and searched case

To include conditional logic in a scalar expression, you can use the following two forms of the CASE expression:

CASE <expr> ( WHEN <expr> THEN <expr>)+ [ELSE expr] END
CASE ( WHEN <criteria> THEN <expr>)+ [ELSE expr] END

Each form allows for an output based on conditional logic. The first form starts with an initial expression and evaluates WHEN expressions until the values match, and outputs the THEN expression. If no WHEN is matched, the ELSE expression is output. If no WHEN is matched and no ELSE is specified, a null literal value is output.

The second form (the searched case expression) searches the WHEN clauses, which specify an arbitrary criteria to evaluate. If any criteria evaluates to true, the THEN expression is evaluated and output. If no WHEN is true, the ELSE is evaluated or NULL is output if none exists.

Examples

SELECT CASE columnA WHEN '10' THEN 'ten' WHEN '20' THEN 'twenty' END AS myExample
 
SELECT CASE WHEN columnA = '10' THEN 'ten' WHEN columnA = '20' THEN 'twenty' END AS myExample

Arrays

Empty arrays

()
(,)
ARRAY[]

Single element array

(expr,)
ARRAY[expr]

General array syntax

(expr, expr ... [,])
ARRAY[expr, ...]

If all of the elements in the array are of the same type, the array will be of a matching base type. If the elements types differ the array base type, it will be of object.

An array element reference takes the form of:

array_expr[index_expr]

index_expr must resolve to an integer value. This syntax is effectively the same as the array_get system function and expects 1-based indexing.

Criteria

Criteria can be any of the following items:

Predicates that evaluate to true or false.
Logical criteria that combine criteria (AND, OR, NOT).
A value expression of type boolean.

General usage:

criteria AND|OR criteria

NOT criteria

(criteria)

expression (=|<>|!=|<|>|<=|>=) (expression|((ANY|ALL|SOME) subquery|(array_expression)))

expression IS [NOT] DISTINCT FROM expression

expression [NOT] IS NULL

expression [NOT] IN (expression [,expression]*)|subquery

expression [NOT] LIKE pattern [ESCAPE char]

LIKE matches the string expression against the given string pattern. The pattern may contain % to match any number of characters, and _ to match any single character. The escape character can be used to escape the match characters % and _.

expression [NOT] SIMILAR TO pattern [ESCAPE char]

SIMILAR TO is a cross between LIKE and standard regular expression syntax. % and _ are still used, rather than .* and ., respectively.

💡

The Engine does not exhaustively validate SIMILAR TO pattern values. Instead, the pattern is converted to an equivalent regular expression. Do not rely on general regular expression features when using SIMILAR TO. If additional features are needed, use LIKE_REGEX. Avoid the use of non-literal patterns, because the Engine has a limited ability to process SQL pushdown predicates.

expression [NOT] LIKE_REGEX pattern

EXISTS (subquery)

expression [NOT] BETWEEN minExpression AND maxExpression

The Engine converts BETWEEN into the equivalent form expression >= minExpression AND expression ⇐ maxExpression.

expression

Where expression has type boolean.

Syntax rules The precedence ordering from lowest to highest is comparison, NOT, AND and OR.

Criteria nested by parenthesis will be logically evaluated prior to evaluating the parent criteria.

Some examples of valid criteria are:

(requests > 2500.0)
100*(50 - x)/(25 - y) > z
concat(areaCode,concat('-',phone)) LIKE '314%1'

💡

null values represent an unknown value. Comparison with a null value will evaluate to unknown, which can never be true even if not is used.

Criteria precedence The Engine parses and evaluates conditions with higher precedence before those with lower precedence. Conditions with equal precedence are left-associative. The following table lists condition precedence from high to low:

Condition	Description
SQL Operators	See here
`EXISTS`, `LIKE`, `SIMILAR TO`, `LIKE_REGEX`, `BETWEEN`, `IN`, `IS NULL`, `IS DISTINCT`, `<,` `⇐`, `>`, `>=`, `=`, `<>`	Comparison
`NOT`	Negation
`AND`	Conjunction
`OR`	Disjunction

💡

To prevent lookaheads, the parser does not accept all possible criteria sequences. For example, a = b is null is not accepted, because by the left-associative parsing we first recognize a =, then look for a common value expression. b is null is not a valid common value expression. Thus, nesting must be used, for example, (a = b) is null. For more information about parsing rules, see

Scalar Functions

Numeric Functions

Function	Description	Datatype constraint
`+` `-` `*` `/`	Standard numeric operators	x of [`integer`, `long`, `float`, `double`, `biginteger`, `bigdecimal`], returns same as `x [a]`
`ABS(x)`	Absolute value	See standard numeric operators above
`ACOS(x)`	Arc cosine	x of [`double`, `bigdecimal`], returns `double`
`ASIN(x)`	Arc sine	x of [`double`, `bigdecimal`], returns `double`
`ATAN(x)`	Arc tangent	x in [`double`, `bigdecimal`], returns `double`
`ATAN2(x, y)`	Arc tangent	x in [`double`, `bigdecimal`], returns `double`
`CEILING(x)`	Ceiling	x in [`double`, `float`], returns `double`
`COS(x)`	Cosine	x in [`double`, `bigdecimal`], returns `double`
`COT(x)`	Cotangent	x in [`double`, `bigdecimal`], returns `double`
`DEGREES(x)`	Convert degrees to radians	x in [`double`, `bigdecimal`], returns `double`
`EXP(x)`	e^x	x in [`double`, `float`], returns `double`
`FLOOR(x)`	Floor	x in [`double`, `float`], returns `double`
`FORMATBIGDECIMAL(x, y)`	Formats x using format y	x is `bigdecimal`, y is `string`, returns `string`
`FORMATBIGINTEGER(x, y)`	Formats x using format y	x is `biginteger`, y is `string`, returns `string`
`FORMATDOUBLE(x, y)`	Formats x using format y	x is `double`, y is `string`, returns `string`
`FORMATFLOAT(x, y)`	Formats x using format y	x is `float`, y is `string`, returns `string`
`FORMATINTEGER(x, y)`	Formats x using format y	x is `integer`, y is `string`, returns `string`
`FORMATLONG(x, y)`	Formats x using format y	x is `long`, y is `string`, returns `string`
`LOG(x)`	Natural log (base e)	x in [`double`, `float`], returns `double`
`LOG10(x)`	Log (base 10)	x in [`double`, `float`], returns `double`
`MOD(x, y)`	Modulus (remainder of x / y)	x in [`integer`, `long`, `float`, `double`, `biginteger`, `bigdecimal`], returns same as x
`PARSEBIGDECIMAL(x, y)`	Parses x using format y	x, y of `string`s, returns `bigdecimal`
`PARSEBIGINTEGER(x, y)`	Parses x using format y	x, y of `string`s, returns `biginteger`
`PARSEDOUBLE(x, y)`	Parses x using format y	x, y of `string`s, returns `double`
`PARSEFLOAT(x, y)`	Parses x using format y	x, y of `string`s, returns `float`
`PARSEINTEGER(x, y)`	Parses x using format y	x, y of `string`s, returns `integer`
`PARSELONG(x, y)`	Parses x using format y	x, y of `string`s, returns `long`
`POWER(x, y)`	x to the y power	x in [`double`, `bigdecimal`, `biginteger`], return is the same type as x
`RADIANS(x)`	Convert radians to degrees	x in [`double`, `bigdecimal`], returns `double`
`RAND()`	Returns a random number, using generator established so far in the query or initializing with system clock if necessary.	Returns `double`.
`RAND(x)`	Returns a random number, using new generator seeded with x. This should typically be called in an initialization query. It will only effect the random values returned by the `RAND` function and not the values from `RAND` functions evaluated by sources.	x is `integer`, returns `double`.
`ROUND(x, y)`	Round x to y places; negative values of y indicate places to the left of the decimal point	x in [`integer`, `float`, `double`, `bigdecimal`] y is `integer`, return is same type as x.
`SIGN(x)`	`1` if `x > 0`, `0` if `x = 0`, `-1` if `x < 0`	x in [`integer`, `long`, `float`, `double`, `biginteger`, `bigdecimal`], returns `integer`
`SIN(x)`	Sine	x in [`double`, `bigdecimal`], returns `double`
`SQRT(x)`	Square root	x in [`long`, `double`, `bigdecimal`], returns `double`
`TAN(x)`	Tangent	x in [`double`, `bigdecimal`], returns `double`
`BITAND(x, y)`	Bitwise AND	x, y in [`integer`], returns `integer`
`BITOR(x, y)`	Bitwise OR	x, y in [`integer`], returns `integer`
`BITXOR(x, y)`	Bitwise XOR	x, y in [`integer`], returns `integer`
`BITNOT(x)`	Bitwise NOT	x in [`integer`], returns `integer`

String Functions

Unless specified, all of the arguments and return types in the following table are strings and all indexes are 1-based. The 0 index is considered to be before the start of the string.

Function	Description	Datatype constraint
`x \|\| y`	Concatenation operator	x,y in [`string`, `clob`], return type is `string` or character large object (`CLOB`).
`ASCII(x)`	Provide ASCII value of the left most character[1] in x. The empty `string` will as input will return `null`.	returns `integer`.
`CHR(x)` `CHAR(x)`	Provide the character[1] for ASCII value x [a].	x of `integer` [1] For the implementations of the `ASCII` and `CHR` functions, characters are limited to UCS2 values only. For pushdown there is little consistency among sources for character values beyond character code `255`.
`CONCAT(x, y)`	Concatenates x and y with ANSI semantics. If x and/or y is null, returns null.	x, y `string`.
`CONCAT2(x, y)`	Concatenates x and y with non-ANSI null semantics. If x and y is null, returns null. If only x or y is null, returns the other value.	x, y `string`.
`ENDSWITH(x, y)`	Checks if y ends with x. If x or y are null, returns null.	x, y `string` returns `boolean`.
`INITCAP(x)`	Capitalizes first letter of each word in `string` x while the remaining in lowercase.	x `string`.
`INSERT(str1, start, length, str2)`	Insert str2 into str1	str1 `string`, start `integer`, length `integer`, str2 `string`
`LCASE(x)`	Lowercase	x `string`
`LEFT(x, y)`	Get left y characters of x	x `string`, y `integer`, returns `string`
`LENGTH(x)`	Length of a `string`	x `string`, returns `integer`
`LOCATE(x, y)`	Find position of x in y starting at beginning of y.	x `string`, y `integer`, returns `integer`
`LOCATE(x, y, z)`	Find position of x in y starting at z.	x `string`, y `string`, z `integer`, returns `integer`
`LPAD(x, y)`	Pad input `string` x with spaces on the left to the length of y.	x `string`, y `integer`, returns `string`
`LPAD(x, y, z)`	Pad input `string` x on the left to the length of y using character z.	x `string`, y `string`, z character, returns `string`
`LEFT(x, y)`	Get left y characters of x	x `string`, y `integer`, returns `string`
`LTRIM(x)`	Left trim x of blank chars.	x `string`, returns `string`
`QUERYSTRING(path [, expr [AS name] …])`	Returns a properly encoded query `string` appended to the given path. Null valued expressions are omitted. Names are optional for column reference expressions. For example, `QUERYSTRING('path', 'value' as "&x", ' & ' as y, null as z)` returns `'path?%26x=value&y=%20%26%20'`.	path, expr `string`, name is an identifier.
`REPEAT(str1, instances)`	Repeats str1 a specified number of times.	str1 `string`, instances `integer`, returns `string`.
`RIGHT(x, y)`	Gets right y characters of x.	x `string`, y `integer`, returns `string`
`RPAD(input string x, pad length y)`	Pad input `string` x with spaces on the right to the length of y.	x `string`, y `integer`, returns `string`
`RPAD(x, y, z)`	Pad input `string` x on the right to the length of y using character z.	x `string`, y `string`, z character, returns `string`
`RTRIM(x)`	Right trim x of blank chars.	x `string`, returns `string`
`TRIM([[LEADING\|TRAILING\|BOTH] [x] FROM] y)`	Trim the leading, trailing, or both ends of a `string` y of character x. If `LEADING`/`TRAILING`/`BOTH` is not specified, `BOTH` is used. If no trim character x is specified, then a blank space is used.	x `character`, y `string`
`SPACE(x)`	Repeats the space character x number of times	x `integer`, returns `string`
`SUBSTRING(x, y)`	Gets substring of x, from position y to the end of x	x `string`, returns `integer`
`SUBSTRING(x, y, z)`	Gets substring of x from position y with length z	x `string`, y `integer`, z `integer`, returns `integer`
`UCASE(x)`	Uppercase	x `string`, returns `integer`
`UNESCAPE(x)`	Unescaped of x. Possible escape sequences are `\b` - backspace, `\t` - tab, `\n` - line feed, `\f` - form feed, `\r` - carriage return. `\uXXXX`, where `X` is a hex value, can be used to specify any unicode character. `\XXX`, where `X` is an octal digit, can be used to specify an octal `byte` value. If any other character appears after an escape character, that character will appear in the output and the escape character will be ignored.	x `string`, returns `string`
`REPLACE(x, y, z)`	Replace all occurrences of a given `string` with another.	`x`, `y`, `z` `string`, returns `string`
`REGEXP_REPLACE(str, pattern, sub [, flags])`	Replace one or more occurrences of `pattern` with `sub` in `str`. Flags: `g`: Replaces all occurrences, not just the first. `m`: Match over multiple lines. `i`: Match without case sensitivity.	`str`, `pattern`, `sub` `string`, returns `string`

Type Conversion Functions

Function	Description
`CONVERT(x, type)`	Converts x to type, where type is a standard data type.
`CAST(x AS type)`	Identical to `CONVERT`

💡

Options that are specified on the type, such as length, precision, scale, etc., are effectively ignored - the Engine is simply converting from one object type to another.

Data and Time Functions

Function	Description	Datatype Constraint
`CURDATE()` / `CURRENT_DATE()`	Return current date - same value for all invocations in the user command.	Returns `date`
`CURTIME()`	Return current time - same value for all invocations in the user command. See also `CURRENT_TIME`.	Returns `time`
`NOW()`	Return current timestamp (date and time with millisecond precision) - same value for all invocations in the user command.	Returns `timestamp`
`CURRENT_TIME[(precision)]`	Return current time. The precision argument is ignored, behaving the same as `CURTIME()` without a precision argument.	Returns `time`
`CURRENT_TIMESTAMP[(precision)]`	Return current timestamp with millisecond precision. Behaves like `NOW()` without a precision argument.	Returns `timestamp`
`DAYNAME(x)`	Return name of day in the default locale.	`x` in timestamp, returns `string`
`DAYOFMONTH(x)`	Return day of the month.	`x` in timestamp, returns `integer`
`DAYOFWEEK(x)`	Return day of the week (Sunday=1, Saturday=7).	`x` in timestamp, returns `integer`
`DAYOFYEAR(x)`	Return day number in the year.	`x` in timestamp, returns `integer`
`EPOCH(x)`	Return seconds since the Unix epoch with microsecond precision.	`x` in timestamp, returns `double`
`EXTRACT(FIELD FROM x)`	Return the specified `FIELD` from the date value `x`. Fields include `YEAR`, `MONTH`, `DAYOFMONTH`, `HOUR`, `MINUTE`, `SECOND`, `EPOCH`, etc.	`x` in timestamp, returns `integer` or `double`
`FORMATDATE(x, y)`	Format the date `x` using the format `y`.	`x` is `date`, `y` is `string`, returns `string`
`FORMATTIME(x, y)`	Format the time `x` using the format `y`.	`x` is `time`, `y` is `string`, returns `string`
`FORMATTIMESTAMP(x, y)`	Format the timestamp `x` using the format `y`.	`x` is `timestamp`, `y` is `string`, returns `string`
`FROM_MILLIS(millis)`	Return the `timestamp` value for the given milliseconds.	Long UTC timestamp in milliseconds
`FROM_UNIXTIME(unix_timestamp)`	Return the Unix timestamp as a `string` in the default format `yyyy/mm/dd hh:mm:ss`.	Long Unix timestamp (in seconds)
`HOUR(x)`	Return hour (in 24-hour format).	`x` in timestamp, returns `integer`
`MINUTE(x)`	Return minute.	`x` in timestamp, returns `integer`
`MODIFYTIMEZONE(timestamp, startTimeZone, endTimeZone)`	Return a timestamp adjusted for the difference between `startTimeZone` and `endTimeZone`.	Strings for time zones, returns `timestamp`
`MONTH(x)`	Return month.	`x` in timestamp, returns `integer`
`MONTHNAME(x)`	Return name of the month in the default locale.	`x` in timestamp, returns `string`
`PARSEDATE(x, y)`	Parse date from `x` using format `y`.	`x`, `y` are `string`, returns `date`
`PARSETIME(x, y)`	Parse time from `x` using format `y`.	`x`, `y` are `string`, returns `time`
`PARSETIMESTAMP(x, y)`	Parse timestamp from `x` using format `y`.	`x`, `y` are `string`, returns `timestamp`
`QUARTER(x)`	Return the quarter.	`x` in timestamp, returns `integer`
`SECOND(x)`	Return seconds.	`x` in timestamp, returns `integer`
`TIMESTAMPCREATE(date, time)`	Create a timestamp from a `date` and a `time`.	`date` is `date`, `time` is `time`, returns `timestamp`
`TO_MILLIS(timestamp)`	Return the UTC timestamp in milliseconds.	Timestamp value
`UNIX_TIMESTAMP(unix_timestamp)`	Return the Unix timestamp (in seconds).	`unix_timestamp` as `string`, returns `long`
`WEEK(x)`	Return the week of the year (1-53).	`x` in timestamp, returns `integer`
`YEAR(x)`	Return the four-digit year.	`x` in timestamp, returns `integer`

`TIMESTAMPADD`

Adds a specified interval amount to a timestamp.

TIMESTAMPADD(interval, count, timestamp)

Arguments:

Name	Description
`interval`	A datetime interval unit, can be one of the following keywords:
	- `SQL_TSI_FRAC_SECOND` - fractional seconds (billionths of a second)
	- `SQL_TSI_SECOND` - seconds
	- `SQL_TSI_MINUTE` - minutes
	- `SQL_TSI_HOUR` - hours
	- `SQL_TSI_DAY` - days
	- `SQL_TSI_WEEK` - weeks using Sunday as the first day
	- `SQL_TSI_MONTH` - months
	- `SQL_TSI_QUARTER` - quarters (3 months) where the first quarter is months 1-3, etc.
	- `SQL_TSI_YEAR` - years
`count`	A long or integer count of units to add to the timestamp. Negative values subtract that number of units. Long values are allowed for symmetry with `TIMESTAMPDIFF`, but the effective range is still limited to integer values.
`timestamp`	A datetime expression.

Example:

SELECT TIMESTAMPADD(SQL_TSI_MONTH, 12,'2025-10-10');
SELECT TIMESTAMPADD(SQL_TSI_SECOND, 12,'2025-10-10 23:59:59');

`TIMESTAMPDIFF`

Calculates the number of date part intervals crossed between two timestamps and returns a long value.

TIMESTAMPDIFF(interval, startTime, endTime)

Arguments:

Name	Description
`interval`	A datetime interval unit, the same as keywords used by `TIMESTAMPADD`.
`startTime`	A datetime expression.
`endTime`	A datetime expression.

Example:

SELECT TIMESTAMPDIFF(SQL_TSI_MONTH,'2000-01-02','2025-10-10');
SELECT TIMESTAMPDIFF(SQL_TSI_SECOND,'2000-01-02 00:00:00','2025-10-10 23:59:59');
SELECT TIMESTAMPDIFF(SQL_TSI_FRAC_SECOND,'2000-01-02 00:00:00.0','2025-10-10 23:59:59.999999');

Please note that:

If endTime > startTime, a non-negative number will be returned.
If endTime < startTime, a non-positive number will be returned.
The date part difference is counted regardless of how close the timestamps are.
For example, '2025-01-02 00:00:00.0' is still considered 1 hour ahead of '2025-01-01 23:59:59.999999'.

Decode Functions

Function	Description	Datatype Constraint
`DECODESTRING(x, y [, z])`	Decode column `x` using a string of value pairs `y` with an optional delimiter `z` and return the decoded column as a string. If a delimiter is not specified, a comma (`,`) is used. `y` has the format `SearchDelimResultDelimSearchDelimResult[DelimDefault]`. Returns `Default` if specified, or `x` if there are no matches.	All parameters and return type are `string`.
`DECODEINTEGER(x, y [, z])`	Decode column `x` using a string of value pairs `y` with an optional delimiter `z` and return the decoded column as an integer. If a delimiter is not specified, a comma (`,`) is used. `y` has the format `SearchDelimResultDelimSearchDelimResult[DelimDefault]`. Returns `Default` if specified, or `x` if there are no matches.	All parameters are `string`; returns `integer`.

Document Functions v24.5.3+

Function	Description
`jsonParse(value, skipValidation)`	Parses a `value` (`CLOB` or `BLOB`) and returns a `json` object. `skipValidation` indicates whether pre-parsing validation of `value` should be skipped.
`jsonObject(c1, c2, ..., cn)`	Returns a `json` object based on a list of columns. The resulting object contains fields in the form of `column_name: column_value`.
`yamlAsJSON(value)`	Converts `value` (`CLOB`), which is expected to be a valid YAML document, into a `json` object.
`jsonPath(value, path)`	Evaluates a JsonPath (opens in a new tab) expression on `value`, which is expected to be a valid `json` document (`string` or `CLOB`). Always returns a `json` document, even if it’s a single value. In such cases, transform it to other types as needed.
`jsonJq(value, query)`	Applies a jq (opens in a new tab) query to `value`, which is expected to be a valid `json` document (`string` or `CLOB`). Always returns a `json` document, even if it’s a single value. In such cases, transform it to other types as needed.
`jsonPathAsString(value, path)`	Same as `jsonPath` but returns a `string` instead of a `json` object.
`jsonJqAsString(value, query)`	Same as `jsonJq` but returns a `string` instead of a `json` object.

Query Planner Limitations on `WHERE` Clauses with `json` Objects

As of version v24.5.3, the query planner is not yet optimized to perform efficient comparisons on json objects in the WHERE clause, even when the value expression is within a type conversion function. Consider the following example:

SELECT * 
FROM 
	kube.EVENT 
WHERE 
	convert(jsonPath(regarding, '$.uid'), string) = '7e613d43'

Assuming that such a uid exists, we expect the query planner to drop the entire query due to its current limitations in creating an optimized node based on json objects.

For these cases, you should use jsonPathAsString or jsonJqAsString instead, as shown below:

SELECT * 
FROM 
	kube.EVENT 
WHERE 
	jsonPathAsString(regarding, '$.uid') = '7e613d43'

Choice functions

Function	Description	Datatype constraint
`COALESCE(x, y1, y2,...,yn)`	Returns the first non-null parameter.	x and y can be any compatible types.
`IFNULL(x,y)`	If x is `null`, return y; else return x.	x, y of any type. However, that type must be the same for both.
`NVL(x,y)`	If x is `null`, return y; else return x.	x, y of any type. However, that type must be the same for both.
`NULLIF(param1, param2)`	Equivalent to `case when (param1 = param2) then null else param1`.	param1 and param2 must be comparable types.

Vector functions v25.1+

Function	Description	Datatype constraint
`vectorStringToArray(x)`	Transforms a string that represents a vector, in the form `[0.1, 0.2, 0.3,...]` to an array of `double`.	x must be string.
`euclideanDistance(x,y)`	Computes the euclidean distance of x and y	x, y must be `double[]` with the same size.
`innerProduct(x,y)`	Computes the inner product (or dot product) of x and y	x, y must be `double[]` with the same size.
`cosineDistance(x,y)`	Computes the cosine distance of x and y	x, y must be `double[]` with the same size.

`SELECT` command

The SELECT command is used to retrieve records for any number of relations.

A SELECT command can contain the following clauses:

Except for the OPTION clause, all of the preceding clauses are defined by the SQL specification. The specification also tells the order in which these clauses are logically processed. Processing occurs in stages, with each stage passing a set of rows to the following stage. The processing model is logical, and does not represent the way that a Engine performs the actual processing, but it is a useful model for understanding how SQL works.
The SELECT command processes clauses in the following stages:

1. WITH clause

Gathers all rows from all with items in the order listed. Subsequent WITH items and the main query can reference a WITH item as if it were a table.

2. FROM clause

Gathers all rows from all tables involved in the query and logically joins them with a Cartesian product to produce a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.

3. WHERE clause

Applies a criteria to every output row from the FROM stage, further reducing the number of rows.

4. GROUP BY clause

Groups sets of rows with matching values in the GROUP BY columns.

5. HAVING clause

Applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).

6. SELECT clause

Specifies the column expressions that should be returned from the query. Expressions are evaluated, including aggregate functions that are based on the groups of rows, which will no longer exist after this point. The output columns are named using either column aliases or an implicit name determined by the engine. If SELECT DISTINCT is specified, duplicate removal is performed on the rows being returned from the SELECT stage.

7. ORDER BY clause

Sorts the rows returned from the SELECT stage as desired. Supports sorting on multiple columns in specified order, ascending or descending. The output columns will be identical to those columns returned from the SELECT stage and will have the same name.

8. LIMIT clause

Returns only the specified rows (with skip and limit values).

`WITH` clause

The Engine provides access to common table expressions via the WITH clause. You can reference WITH clause items as tables in subsequent WITH clause items, and in the main query. You can think of the WITH clause as providing query-scoped temporary tables.

Usage:

WITH name [(column, ...)] AS [/*+ no_inline|materialize */] (query expression) ...

Syntax rules:

All of the projected column names must be unique. If they are not unique, then the column name list must be provided.
If the columns of the WITH clause item are declared, then they must match the number of columns projected by the query expression.
Each WITH clause item must have a unique name.
The optional no_inline hint indicates to the QP optimizer that the query expression should not be substituted as an inline view where referenced. It is possible with no_inline for multiple evaluations of the common table as needed by source queries.
The optional materialize hint requires that the common table be created as a temporary table. This forces a single evaluation of the common table.

💡

The WITH clause is also subject to optimization and its entries might not be processed if they are not needed in the subsequent query.

💡

Common tables are aggressively inlined to enhance the possibility of pushdown. If a common table is only referenced a single time in the main query, it is likely to be inlined. In some situations, such as when you use a common table to prevent n-many-processing of a non-pushdown, correlated subquery, you might need to include the no_inline or materialize hint.

Examples:

WITH n (x) AS (select col from tbl) select x from n, n as n1

WITH n (x) AS /*+ no_inline */ (select col from tbl) select x from n, n as n1

`FROM` clause

The FROM clause specifies the target tables for SELECT, UPDATE, and DELETE statements.

Example Syntax:

FROM table [[AS] alias]

FROM table1 [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table2 ON join-criteria

FROM table1 CROSS JOIN table2

FROM (subquery) [AS] alias

FROM TABLE(subquery) [AS] alias

`FROM` clause hints

FROM clause hints are typically specified in a comment block preceding the affected clause. MAKEDEP and MAKENOTDEP may also appear after in non-comment form after the affected clause. If multiple hints apply to that clause, the hints should be placed in the same comment block.

Example:

FROM /*+ MAKEDEP PRESERVE */ (tbl1 inner join tbl2 inner join tbl3 on tbl2.col1 = tbl3.col1 on tbl1.col1 = tbl2.col1), tbl3 WHERE tbl1.col1 = tbl2.col1

Dependent `JOIN` hints

MAKEIND, MAKEDEP, and MAKENOTDEP are hints that you can use to control dependent join behavior. Use them only in situations where the optimizer does not choose the most optimal plan based upon query structure, metadata, and costing information. The hints can appear in a comment that follows the FROM keyword. The hints can be specified against any FROM clause, not just a named table.

`MAKEIND`

Indicates that the clause should be the independent (feeder) side of a dependent join.

`MAKEDEP`

Indicates that the clause should be the dependent (filtered) side of a join.

`MAKENOTDEP`

Prevents the clause from being the dependent (filtered) side of a join.

`MAKEDEP(JOIN)`

Indicates that the entire JOIN should be pushed.

`MAKEDEP(NO JOIN)`

Indicates that the entire JOIN should not be pushed.

`MAKEDEP(MAX:val)`

Indicates that the dependent JOIN should only be performed if there are less than the maximum number of values from the independent side.

💡

The above-mentioned hints become a crucial part of the Kubling's implementation success. Continue reading to discover why!

When working with a poor or absent cardinality, which is very frequent in the case of Script Modules since the systems they interact with exchanges JSON documents, it is not possible for the query planner to determine what to get first, sometimes making queries to not work properly or even failing, due to the fact that the Engine will try to perform a TABLE full scan.

Since the concept is a bit complex, let's use the following example, which is actually a part of the samples repo:

  SELECT
    a.NAME as app_name,
    a.ID as app_id,
    c.ID as component_id, 
    c.NAME as component_name, 
    r.html_url as repo_location,
    cd.DEPLOYMENT_IDENTIFIER,
    cd.environment,
    deploy.clusterName as cluster,
    deploy.status__replicas  as desired_replicas,
    deploy.status__availableReplicas as available_replicas,
    deploy.status__readyReplicas as ready_replicas
  FROM
    app_db.APPLICATION a
    LEFT JOIN app_db.COMPONENT c ON c.APPLICATION_ID = a.ID
    LEFT JOIN COMPONENT_CODE_REPO_JOIN cc ON cc.COMPONENT_ID = c.ID
    LEFT JOIN app_db.COMPONENT_DEPLOYMENT_JOIN cd ON cd.COMPONENT_ID = c.ID
    LEFT JOIN /*+ makedep */ github.CODE_REPO r ON (r.org = cc.CODE_REPO_ORG AND r.name = cc.CODE_REPO_NAME)
    LEFT JOIN /*+ makedep */ k8s.DEPLOYMENT deploy  ON (deploy.identifier = cd.DEPLOYMENT_IDENTIFIER)
  WHERE a.NAME='{{ app_name }}'

This query runs on a VDB that has 5 Data Sources:

1 POSTGRESQL
2 KUBERNETES
1 SCRIPT_DOCUMENT_JS
1 aggregator

The query above gets all applications registered in the POSTGRESQL, with some extra information, like their GitHub repositories (SCRIPT_DOCUMENT_JS) and their deployment clusters (KUBERNETES).

In our context we support multiple organizations but, by design, github.CODE_REPO forces to filter, at least, by one organization since the API works that way.
Due to the fact that the QP does not see any relationship between github.CODE_REPO and cc.CODE_REPO_ORG, it may decide to run a full scan over github.CODE_REPO and then apply the criteria. However, the full scan will never complete since github.CODE_REPO will throw an error due to the lack of the organization in the WHERE clause.

By using the /*+ makedep */ hint, we ensure that clause depends on the filter side of the join, in other words, the QP will first get the list of cc.CODE_REPO_ORG and cc.CODE_REPO_NAME before generating a query to route to github.CODE_REPO.
Therefore, your script will receive a standard SELECT with a WHERE clause in the form of IN or OR list.

ℹ️

This is a well-known problem in federated database systems, which solve the problem using decomposition or pushdown.

`WHERE` clause

The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE statements.
See its general form.

`GROUP BY` clause

The GROUP BY clause denotes that rows should be grouped according to the specified expression values. One row is returned for each group, after optionally filtering those aggregate rows based on a HAVING clause.

The general form of the GROUP BY:

GROUP BY expression [,expression]*

GROUP BY ROLLUP(expression [,expression]*)

Syntax Rules:

Column references in the group by cannot be made to alias names in the SELECT clause.
Expressions used in the group by must appear in the select clause.
Column references and expressions in the SELECT/HAVING/ORDER BY clauses that are not used in the group by clause must appear in aggregate functions.
If an aggregate function is used in the SELECT clause and no GROUP BY is specified, an implicit GROUP BY will be performed with the entire result set as a single group. In this case, every column in the SELECT must be an aggregate function as no other column value will be fixed across the entire group.
The GROUP BY columns must be of a comparable type.

`HAVING` clause

The HAVING clause operates exactly as the WHERE clause, although it operates on the output of a GROUP BY. You can use the same syntax with the HAVING clause as with the WHERE clause.

Syntax Rule:

Expressions used in the GROUP BY clause must contain either an aggregate function (COUNT, AVG, SUM, MIN, MAX), or be one of the grouping expressions.

`SELECT` clause

Queries that start with the SELECT keyword and are often referred to as SELECT statements. Usage:

SELECT [DISTINCT|ALL] ((expression [[AS] name])|(group identifier.STAR))*|STAR ...

Syntax Rules:

Aliased expressions are only used as the output column names and in the ORDER BY clause. They cannot be used in other clauses of the query.
DISTINCT may only be specified if the SELECT symbols are comparable.

`ORDER BY` clause

The ORDER BY clause specifies how records are sorted. The options are ASC (ascending) or DESC (descending).

Usage:

ORDER BY expression [ASC|DESC] [NULLS (FIRST|LAST)], ...

Syntax rules:

Sort columns can be specified by SELECT clause alias name, by SELECT clause expression, or by an unrelated expression.
Column references can appear in the SELECT clause as the expression for an aliased column, or can reference columns from tables in the FROM clause. If the column reference is not in the SELECT clause, the query cannot be a set operation, specify SELECT DISTINCT, or contain a GROUP BY clause.
Unrelated expressions, expressions not appearing as an aliased expression in the select clause, are allowed in the ORDER BY clause of a non-set QUERY. The columns referenced in the expression must come from the from clause table references. The column references cannot be to alias names.
The ORDER BY columns must be of a comparable type.
If an ORDER BY is used in an inline view or view definition without a LIMIT clause, it is removed by the query optimizer.
If NULLS FIRST/LAST is specified, then nulls are guaranteed to be sorted either first or last. If the null ordering is not specified, then results will typically be sorted with nulls as low values, which is the default internal sorting behavior. However, not all data Sources return results with nulls sorted as low values by default, and the Engine might return results with different null orderings.

`LIMIT` clause

The LIMIT clause specifies a limit on the number of records returned by the SELECT command. An optional offset can be specified (the number of rows to skip). The LIMIT clause can also be specified using the SQL 2008 OFFSET/FETCH FIRST clauses. If an ORDER BY is also specified, it will be applied before the OFFSET/LIMIT are applied. If an ORDER BY is not specified there is generally no guarantee what subset of rows will be returned.

Usage:

LIMIT [offset,] limit

LIMIT limit OFFSET offset

[OFFSET offset ROW|ROWS] [FETCH FIRST|NEXT [limit] ROW|ROWS ONLY]

Syntax rules:

The LIMIT/OFFSET expressions must be a non-negative integer or a parameter reference (?). An offset of 0 is ignored. A limit of 0 returns no rows.
The terms FIRST/NEXT are interchangeable as well as ROW/ROWS.
The LIMIT clause can take an optional preceding NON_STRICT hint to indicate that pushdown operations should not be inhibited, even if the results are not consistent with the logical application of the limit.
The hint is only needed on unordered limits, for example, SELECT * FROM VW /*+ NON_STRICT */ LIMIT 2.

`INTO` clause

When the INTO clause is specified with a SELECT, the results of the query are inserted into the specified table.
This is often used to copy records from one Data Source to another, using an as-is approach or applying transforming functions to the original value. In some specific circumstances, it is also useful to insert records into a temporary table, although in Kubling we discourage the use of local storage.
The INTO clause immediately precedes the FROM clause.

Usage:

INTO [TABLE] FROM...

Syntax rules:

The INTO clause is logically applied last in processing, after the ORDER BY and LIMIT clauses.
In the SELECT INTO the target of the INTO clause is a table where the result of the SELECT command will be inserted.
SELECT INTO can't be combined with a UNION query.

Example:

SELECT col1, col2 INTO [TABLE_TABLE] FROM [SOURCE_TABLE]

DDL Script Context Members

Queries

SQL compatibility

Identifiers

Syntax

Examples of valid, fully-qualified TABLE identifiers are:

Examples of valid, fully-qualified COLUMN identifiers are:

Operator precedence

Expressions

Column Identifiers

Literals

Syntax Rules

Examples

Aggregate functions

Case and searched case

Examples

Arrays

Criteria

Scalar Functions

Numeric Functions

String Functions

Type Conversion Functions

Data and Time Functions

TIMESTAMPADD

TIMESTAMPDIFF

Decode Functions

Document Functions v24.5.3+

Query Planner Limitations on WHERE Clauses with json Objects

Choice functions

Vector functions v25.1+

SELECT command

1. WITH clause

2. FROM clause

3. WHERE clause

4. GROUP BY clause

5. HAVING clause

6. SELECT clause

7. ORDER BY clause

8. LIMIT clause

WITH clause

FROM clause

FROM clause hints

Dependent JOIN hints

MAKEIND

MAKEDEP

MAKENOTDEP

MAKEDEP(JOIN)

MAKEDEP(NO JOIN)

MAKEDEP(MAX:val)

WHERE clause

GROUP BY clause

HAVING clause

SELECT clause

ORDER BY clause

LIMIT clause

INTO clause

Examples of valid, fully-qualified `TABLE` identifiers are:

Examples of valid, fully-qualified `COLUMN` identifiers are:

`TIMESTAMPADD`

`TIMESTAMPDIFF`

Query Planner Limitations on `WHERE` Clauses with `json` Objects

`SELECT` command

`WITH` clause

`FROM` clause

`FROM` clause hints

Dependent `JOIN` hints

`MAKEIND`

`MAKEDEP`

`MAKENOTDEP`

`MAKEDEP(JOIN)`

`MAKEDEP(NO JOIN)`

`MAKEDEP(MAX:val)`

`WHERE` clause

`GROUP BY` clause

`HAVING` clause

`SELECT` clause

`ORDER BY` clause

`LIMIT` clause

`INTO` clause