Engine
Queries

Queries

SQL compatibility

Kubling provides most of the functionality of SQL-92 DML. The idea of this document is not to cover SQL exhaustively, but rather highlights how SQL is used in Kubling.

Identifiers

SQL commands contain references to tables and columns. These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command. All queries are processed in the context of a virtual database, or VDB. Because information can be federated across multiple sources, tables and columns must be scoped in some manner to avoid conflicts. This scoping is provided by schemas, which contain the information for each data source or set of views.

Fully-qualified table and column names are of the following form, where the separate `parts' of the identifier are delimited by periods.

  • TABLE: <schema_name>.<table_spec>
  • COLUMN: <schema_name>.<table_spec>.<column_name>

Syntax

  • Identifiers can consist of alphanumeric characters, or the underscore (_) character, and must begin with an alphabetic character. Any Unicode character may be used in an identifier.

  • Identifiers in double quotes can have any contents. The double quote character can be used if is escaped with an additional double quote; for example, "some "" id".

  • Because different data sources organize tables in different ways, with some prepending catalog, schema, or user information, Kubling allows table specification to be a dot-delimited construct.

  • Columns, column aliases, and schemas cannot contain a dot (.) character.

  • Identifiers, even when quoted, are not case-sensitive in Kubling.

Examples of valid, fully-qualified TABLE identifiers are:

  • Kubernetes.DEPLOYMENTS
  • "Kubernetes.Deployments"

Examples of valid, fully-qualified COLUMN identifiers are:

  • Kubernetes.DEPLOYMENTS.metadata__name
  • "Kubernetes.Deployments"."metadata__name"

Operator precedence

OperatorDescription
[]Array element reference
+, -positive/negative value expression
*, /multiplication/division
+, -addition/subtraction
||concat
criteria

Expressions

Identifiers, literals, and functions can be combined into expressions. Expressions can be used in a query with nearly any keyword, including SELECT, FROM (if specifying join criteria), WHERE, GROUP BY, HAVING, or ORDER BY.

Column Identifiers

Column identifiers are used to specify the output columns in SELECT statements, the columns and their values for INSERT and UPDATE statements, and criteria used in WHERE and FROM clauses. They are also used in GROUP BY, HAVING, and ORDER BY clauses. The syntax for column identifiers was defined in the Identifiers section above.

Literals

Literal values represent fixed values. These can be any of the 'standard' data types. For information about data types, see Data Types..

Syntax Rules

  • Integer values will be assigned an integral data type big enough to hold the value (integer, long, or biginteger).
  • Floating point values will always be parsed as a double.
  • The keyword null is used to represent an absent or unknown value and is inherently untyped. In many cases, a null literal value will be assigned an implied type based on context. For example, in the function 5 + null, the nullvalue will be assigned the type `integer` to match the type of the value `5`. Anull`` literal used in the SELECT clause of a query with no implied context will be assigned to type string.

Examples

'abc'
'isn"t true'
5
-37.75e01
true
false

Unicode character:

'\u0027'

Binary:

X'0F0A'

Aggregate functions

Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group.

COUNT(*) Count the number of values (including nulls and duplicates) in a group. Returns an integer - an exception will be thrown if a larger count is computed.

COUNT(x) Count the number of values (excluding nulls) in a group. Returns an integer - an exception will be thrown if a larger count is computed.

COUNT_BIG(*) Count the number of values (including nulls and duplicates) in a group. Returns a long - an exception will be thrown if a larger count is computed.

COUNT_BIG(x) Count the number of values (excluding nulls) in a group. Returns a long - an exception will be thrown if a larger count is computed.

SUM(x) Sum of the values (excluding nulls) in a group.

AVG(x) Average of the values (excluding nulls) in a group.

MIN(x) Minimum value in a group (excluding null).

MAX(x) Maximum value in a group (excluding null).

ANY(x)/SOME(x) Returns TRUE if any value in the group is TRUE (excluding null).

EVERY(x) Returns TRUE if every value in the group is TRUE (excluding null).

VAR_POP(x) Biased variance (excluding null) logically equals(sum(x^2) - sum(x)^2/count(x))/count(x); returns a double; null if count = 0.

VAR_SAMP(x) Sample variance (excluding null) logically equals(sum(x^2) - sum(x)^2/count(x))/(count(x) - 1); returns a double; null if count < 2.

STDDEV_POP(x) Standard deviation (excluding null) logically equals SQRT(VAR_POP(x)).

STDDEV_SAMP(x) Sample standard deviation (excluding null) logically equals SQRT(VAR_SAMP(x)).

Case and searched case

To include conditional logic in a scalar expression, you can use the following two forms of the CASE expression:

  • CASE <expr> ( WHEN <expr> THEN <expr>)+ [ELSE expr] END
  • CASE ( WHEN <criteria> THEN <expr>)+ [ELSE expr] END

Each form allows for an output based on conditional logic. The first form starts with an initial expression and evaluates WHEN expressions until the values match, and outputs the THEN expression. If no WHEN is matched, the ELSE expression is output. If no WHEN is matched and no ELSE is specified, a null literal value is output.

The second form (the searched case expression) searches the WHEN clauses, which specify an arbitrary criteria to evaluate. If any criteria evaluates to true, the THEN expression is evaluated and output. If no WHEN is true, the ELSE is evaluated or NULL is output if none exists.

Examples

SELECT CASE columnA WHEN '10' THEN 'ten' WHEN '20' THEN 'twenty' END AS myExample
 
SELECT CASE WHEN columnA = '10' THEN 'ten' WHEN columnA = '20' THEN 'twenty' END AS myExample

Arrays

Empty arrays

()
(,)
ARRAY[]

Single element array

(expr,)
ARRAY[expr]

General array syntax

(expr, expr ... [,])
ARRAY[expr, ...]

If all of the elements in the array are of the same type, the array will be of a matching base type. If the elements types differ the array base type, it will be of object.

An array element reference takes the form of:

array_expr[index_expr]

index_expr must resolve to an integer value. This syntax is effectively the same as the array_get system function and expects 1-based indexing.

Criteria

Criteria can be any of the following items:

  • Predicates that evaluate to true or false.
  • Logical criteria that combine criteria (AND, OR, NOT).
  • A value expression of type boolean.

General usage:

criteria AND|OR criteria
NOT criteria
(criteria)
expression (=|<>|!=|<|>|<=|>=) (expression|((ANY|ALL|SOME) subquery|(array_expression)))
expression IS [NOT] DISTINCT FROM expression
expression [NOT] IS NULL
expression [NOT] IN (expression [,expression]*)|subquery
expression [NOT] LIKE pattern [ESCAPE char]

LIKE matches the string expression against the given string pattern. The pattern may contain % to match any number of characters, and _ to match any single character. The escape character can be used to escape the match characters % and _.

expression [NOT] SIMILAR TO pattern [ESCAPE char]

SIMILAR TO is a cross between LIKE and standard regular expression syntax. % and _ are still used, rather than .* and ., respectively.

💡

The Engine does not exhaustively validate SIMILAR TO pattern values. Instead, the pattern is converted to an equivalent regular expression. Do not rely on general regular expression features when using SIMILAR TO. If additional features are needed, use LIKE_REGEX. Avoid the use of non-literal patterns, because the Engine has a limited ability to process SQL pushdown predicates.

expression [NOT] LIKE_REGEX pattern
EXISTS (subquery)
expression [NOT] BETWEEN minExpression AND maxExpression

The Engine converts BETWEEN into the equivalent form expression >= minExpression AND expression ⇐ maxExpression.

expression

Where expression has type boolean.

Syntax rules The precedence ordering from lowest to highest is comparison, NOT, AND and OR.

Criteria nested by parenthesis will be logically evaluated prior to evaluating the parent criteria.

Some examples of valid criteria are:

  • (requests > 2500.0)
  • 100*(50 - x)/(25 - y) > z
  • concat(areaCode,concat('-',phone)) LIKE '314%1'
💡

null values represent an unknown value. Comparison with a null value will evaluate to unknown, which can never be true even if not is used.

Criteria precedence The Engine parses and evaluates conditions with higher precedence before those with lower precedence. Conditions with equal precedence are left-associative. The following table lists condition precedence from high to low:

ConditionDescription
SQL OperatorsSee here
EXISTS, LIKE, SIMILAR TO, LIKE_REGEX, BETWEEN, IN, IS NULL, IS DISTINCT, <, , >, >=, =, <>Comparison
NOTNegation
ANDConjunction
ORDisjunction
💡

To prevent lookaheads, the parser does not accept all possible criteria sequences. For example, a = b is null is not accepted, because by the left-associative parsing we first recognize a =, then look for a common value expression. b is null is not a valid common value expression. Thus, nesting must be used, for example, (a = b) is null. For more information about parsing rules, see

Scalar Functions

Numeric Functions

FunctionDescriptionDatatype constraint
+ - * /Standard numeric operatorsx of [integer, long, float, double, biginteger, bigdecimal], returns same as x [a]
ABS(x)Absolute valueSee standard numeric operators above
ACOS(x)Arc cosinex of [double, bigdecimal], returns double
ASIN(x)Arc sinex of [double, bigdecimal], returns double
ATAN(x)Arc tangentx in [double, bigdecimal], returns double
ATAN2(x, y)Arc tangentx in [double, bigdecimal], returns double
CEILING(x)Ceilingx in [double, float], returns double
COS(x)Cosinex in [double, bigdecimal], returns double
COT(x)Cotangentx in [double, bigdecimal], returns double
DEGREES(x)Convert degrees to radiansx in [double, bigdecimal], returns double
EXP(x)e^xx in [double, float], returns double
FLOOR(x)Floorx in [double, float], returns double
FORMATBIGDECIMAL(x, y)Formats x using format yx is bigdecimal, y is string, returns string
FORMATBIGINTEGER(x, y)Formats x using format yx is biginteger, y is string, returns string
FORMATDOUBLE(x, y)Formats x using format yx is double, y is string, returns string
FORMATFLOAT(x, y)Formats x using format yx is float, y is string, returns string
FORMATINTEGER(x, y)Formats x using format yx is integer, y is string, returns string
FORMATLONG(x, y)Formats x using format yx is long, y is string, returns string
LOG(x)Natural log (base e)x in [double, float], returns double
LOG10(x)Log (base 10)x in [double, float], returns double
MOD(x, y)Modulus (remainder of x / y)x in [integer, long, float, double, biginteger, bigdecimal], returns same as x
PARSEBIGDECIMAL(x, y)Parses x using format yx, y of strings, returns bigdecimal
PARSEBIGINTEGER(x, y)Parses x using format yx, y of strings, returns biginteger
PARSEDOUBLE(x, y)Parses x using format yx, y of strings, returns double
PARSEFLOAT(x, y)Parses x using format yx, y of strings, returns float
PARSEINTEGER(x, y)Parses x using format yx, y of strings, returns integer
PARSELONG(x, y)Parses x using format yx, y of strings, returns long
POWER(x, y)x to the y powerx in [double, bigdecimal, biginteger], return is the same type as x
RADIANS(x)Convert radians to degreesx in [double, bigdecimal], returns double
RAND()Returns a random number, using generator established so far in the query or initializing with system clock if necessary.Returns double.
RAND(x)Returns a random number, using new generator seeded with x. This should typically be called in an initialization query. It will only effect the random values returned by the RAND function and not the values from RAND functions evaluated by sources.x is integer, returns double.
ROUND(x, y)Round x to y places; negative values of y indicate places to the left of the decimal pointx in [integer, float, double, bigdecimal] y is integer, return is same type as x.
SIGN(x)1 if x > 0, 0 if x = 0, -1 if x < 0x in [integer, long, float, double, biginteger, bigdecimal], returns integer
SIN(x)Sinex in [double, bigdecimal], returns double
SQRT(x)Square rootx in [long, double, bigdecimal], returns double
TAN(x)Tangentx in [double, bigdecimal], returns double
BITAND(x, y)Bitwise ANDx, y in [integer], returns integer
BITOR(x, y)Bitwise ORx, y in [integer], returns integer
BITXOR(x, y)Bitwise XORx, y in [integer], returns integer
BITNOT(x)Bitwise NOTx in [integer], returns integer

String Functions

Unless specified, all of the arguments and return types in the following table are strings and all indexes are 1-based. The 0 index is considered to be before the start of the string.

FunctionDescriptionDatatype constraint
x || yConcatenation operatorx,y in [string, clob], return type is string or character large object (CLOB).
ASCII(x)Provide ASCII value of the left most character[1] in x. The empty string will as input will return null.returns integer.
CHR(x) CHAR(x)Provide the character[1] for ASCII value x [a].x of integer [1] For the implementations of the ASCII and CHR functions, characters are limited to UCS2 values only. For pushdown there is little consistency among sources for character values beyond character code 255.
CONCAT(x, y)Concatenates x and y with ANSI semantics. If x and/or y is null, returns null.x, y string.
CONCAT2(x, y)Concatenates x and y with non-ANSI null semantics. If x and y is null, returns null. If only x or y is null, returns the other value.x, y string.
ENDSWITH(x, y)Checks if y ends with x. If x or y are null, returns null.x, y string returns boolean.
INITCAP(x)Capitalizes first letter of each word in string x while the remaining in lowercase.x string.
INSERT(str1, start, length, str2)Insert str2 into str1str1 string, start integer, length integer, str2 string
LCASE(x)Lowercasex string
LEFT(x, y)Get left y characters of xx string, y integer, returns string
LENGTH(x) Length of a stringx string, returns integer
LOCATE(x, y)Find position of x in y starting at beginning of y.x string, y integer, returns integer
LOCATE(x, y, z)Find position of x in y starting at z.x string, y string, z integer, returns integer
LPAD(x, y)Pad input string x with spaces on the left to the length of y.x string, y integer, returns string
LPAD(x, y, z)Pad input string x on the left to the length of y using character z.x string, y string, z character, returns string
LEFT(x, y)Get left y characters of xx string, y integer, returns string
LTRIM(x)Left trim x of blank chars.x string, returns string
QUERYSTRING(path [, expr [AS name] …])Returns a properly encoded query string appended to the given path. Null valued expressions are omitted. Names are optional for column reference expressions. For example, QUERYSTRING('path', 'value' as "&x", ' & ' as y, null as z) returns 'path?%26x=value&y=%20%26%20'.path, expr string, name is an identifier.
REPEAT(str1, instances)Repeats str1 a specified number of times.str1 string, instances integer, returns string.
RIGHT(x, y)Gets right y characters of x.x string, y integer, returns string
RPAD(input string x, pad length y)Pad input string x with spaces on the right to the length of y.x string, y integer, returns string
RPAD(x, y, z)Pad input string x on the right to the length of y using character z.x string, y string, z character, returns string
RTRIM(x)Right trim x of blank chars.x string, returns string
TRIM([[LEADING|TRAILING|BOTH] [x] FROM] y)Trim the leading, trailing, or both ends of a string y of character x. If LEADING/TRAILING/BOTH is not specified, BOTH is used. If no trim character x is specified, then a blank space is used.x character, y string
SPACE(x)Repeats the space character x number of timesx integer, returns string
SUBSTRING(x, y)Gets substring of x, from position y to the end of xx string, returns integer
SUBSTRING(x, y, z)Gets substring of x from position y with length zx string, y integer, z integer, returns integer
UCASE(x)Uppercasex string, returns integer
UNESCAPE(x)Unescaped of x. Possible escape sequences are \b - backspace, \t - tab, \n - line feed, \f - form feed, \r - carriage return. \uXXXX, where X is a hex value, can be used to specify any unicode character. \XXX, where X is an octal digit, can be used to specify an octal byte value. If any other character appears after an escape character, that character will appear in the output and the escape character will be ignored.x string, returns string
REPLACE(x, y, z)Replace all occurrences of a given string with another.x, y, z string, returns string
REGEXP_REPLACE(str, pattern, sub [, flags])Replace one or more occurrences of pattern with sub in str. Flags:
g: Replaces all occurrences, not just the first.
m: Match over multiple lines.
i: Match without case sensitivity.
str, pattern, sub string, returns string

Type Conversion Functions

FunctionDescription
CONVERT(x, type)Converts x to type, where type is a standard data type.
CAST(x AS type)Identical to CONVERT
💡

Options that are specified on the type, such as length, precision, scale, etc., are effectively ignored - the Engine is simply converting from one object type to another.

Data and Time Functions

FunctionDescriptionDatatype Constraint
CURDATE() / CURRENT_DATE()Return current date - same value for all invocations in the user command.Returns date
CURTIME()Return current time - same value for all invocations in the user command. See also CURRENT_TIME.Returns time
NOW()Return current timestamp (date and time with millisecond precision) - same value for all invocations in the user command.Returns timestamp
CURRENT_TIME[(precision)]Return current time. The precision argument is ignored, behaving the same as CURTIME() without a precision argument.Returns time
CURRENT_TIMESTAMP[(precision)]Return current timestamp with millisecond precision. Behaves like NOW() without a precision argument.Returns timestamp
DAYNAME(x)Return name of day in the default locale.x in timestamp, returns string
DAYOFMONTH(x)Return day of the month.x in timestamp, returns integer
DAYOFWEEK(x)Return day of the week (Sunday=1, Saturday=7).x in timestamp, returns integer
DAYOFYEAR(x)Return day number in the year.x in timestamp, returns integer
EPOCH(x)Return seconds since the Unix epoch with microsecond precision.x in timestamp, returns double
EXTRACT(FIELD FROM x)Return the specified FIELD from the date value x. Fields include YEAR, MONTH, DAYOFMONTH, HOUR, MINUTE, SECOND, EPOCH, etc.x in timestamp, returns integer or double
FORMATDATE(x, y)Format the date x using the format y.x is date, y is string, returns string
FORMATTIME(x, y)Format the time x using the format y.x is time, y is string, returns string
FORMATTIMESTAMP(x, y)Format the timestamp x using the format y.x is timestamp, y is string, returns string
FROM_MILLIS(millis)Return the timestamp value for the given milliseconds.Long UTC timestamp in milliseconds
FROM_UNIXTIME(unix_timestamp)Return the Unix timestamp as a string in the default format yyyy/mm/dd hh:mm:ss.Long Unix timestamp (in seconds)
HOUR(x)Return hour (in 24-hour format).x in timestamp, returns integer
MINUTE(x)Return minute.x in timestamp, returns integer
MODIFYTIMEZONE(timestamp, startTimeZone, endTimeZone)Return a timestamp adjusted for the difference between startTimeZone and endTimeZone.Strings for time zones, returns timestamp
MONTH(x)Return month.x in timestamp, returns integer
MONTHNAME(x)Return name of the month in the default locale.x in timestamp, returns string
PARSEDATE(x, y)Parse date from x using format y.x, y are string, returns date
PARSETIME(x, y)Parse time from x using format y.x, y are string, returns time
PARSETIMESTAMP(x, y)Parse timestamp from x using format y.x, y are string, returns timestamp
QUARTER(x)Return the quarter.x in timestamp, returns integer
SECOND(x)Return seconds.x in timestamp, returns integer
TIMESTAMPCREATE(date, time)Create a timestamp from a date and a time.date is date, time is time, returns timestamp
TO_MILLIS(timestamp)Return the UTC timestamp in milliseconds.Timestamp value
UNIX_TIMESTAMP(unix_timestamp)Return the Unix timestamp (in seconds).unix_timestamp as string, returns long
WEEK(x)Return the week of the year (1-53).x in timestamp, returns integer
YEAR(x)Return the four-digit year.x in timestamp, returns integer

Decode Functions

FunctionDescriptionDatatype Constraint
DECODESTRING(x, y [, z])Decode column x using a string of value pairs y with an optional delimiter z and return the decoded column as a string. If a delimiter is not specified, a comma (,) is used. y has the format SearchDelimResultDelimSearchDelimResult[DelimDefault]. Returns Default if specified, or x if there are no matches.All parameters and return type are string.
DECODEINTEGER(x, y [, z])Decode column x using a string of value pairs y with an optional delimiter z and return the decoded column as an integer. If a delimiter is not specified, a comma (,) is used. y has the format SearchDelimResultDelimSearchDelimResult[DelimDefault]. Returns Default if specified, or x if there are no matches.All parameters are string; returns integer.

Document Functions v24.5.3+

FunctionDescription
jsonParse(value, skipValidation)Parses a value (CLOB or BLOB) and returns a json object. skipValidation indicates whether pre-parsing validation of value should be skipped.
jsonObject(c1, c2, ..., cn)Returns a json object based on a list of columns. The resulting object contains fields in the form of column_name: column_value.
yamlAsJSON(value)Converts value (CLOB), which is expected to be a valid YAML document, into a json object.
jsonPath(value, path)Evaluates a JsonPath (opens in a new tab) expression on value, which is expected to be a valid json document (string or CLOB). Always returns a json document, even if it’s a single value. In such cases, transform it to other types as needed.
jsonJq(value, query)Applies a jq (opens in a new tab) query to value, which is expected to be a valid json document (string or CLOB). Always returns a json document, even if it’s a single value. In such cases, transform it to other types as needed.
jsonPathAsString(value, path)Same as jsonPath but returns a string instead of a json object.
jsonJqAsString(value, query)Same as jsonJq but returns a string instead of a json object.

Query Planner Limitations on WHERE Clauses with json Objects

As of version v24.5.3, the query planner is not yet optimized to perform efficient comparisons on json objects in the WHERE clause, even when the value expression is within a type conversion function. Consider the following example:

SELECT * 
FROM 
	kube.EVENT 
WHERE 
	convert(jsonPath(regarding, '$.uid'), string) = '7e613d43'

Assuming that such a uid exists, we expect the query planner to drop the entire query due to its current limitations in creating an optimized node based on json objects.

For these cases, you should use jsonPathAsString or jsonJqAsString instead, as shown below:

SELECT * 
FROM 
	kube.EVENT 
WHERE 
	jsonPathAsString(regarding, '$.uid') = '7e613d43'

Choice functions

FunctionDescriptionDatatype constraint
COALESCE(x, y1, y2,...,yn)Returns the first non-null parameter.x and y can be any compatible types.
IFNULL(x,y)If x is null, return y; else return x.x, y of any type. However, that type must be the same for both.
NVL(x,y)If x is null, return y; else return x.x, y of any type. However, that type must be the same for both.
NULLIF(param1, param2)Equivalent to case when (param1 = param2) then null else param1.param1 and param2 must be comparable types.

SELECT command

The SELECT command is used to retrieve records for any number of relations.

A SELECT command can contain the following clauses:

Except for the OPTION clause, all of the preceding clauses are defined by the SQL specification. The specification also tells the order in which these clauses are logically processed. Processing occurs in stages, with each stage passing a set of rows to the following stage. The processing model is logical, and does not represent the way that a Engine performs the actual processing, but it is a useful model for understanding how SQL works.
The SELECT command processes clauses in the following stages:

1. WITH clause

Gathers all rows from all with items in the order listed. Subsequent WITH items and the main query can reference a WITH item as if it were a table.

2. FROM clause

Gathers all rows from all tables involved in the query and logically joins them with a Cartesian product to produce a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.

3. WHERE clause

Applies a criteria to every output row from the FROM stage, further reducing the number of rows.

4. GROUP BY clause

Groups sets of rows with matching values in the GROUP BY columns.

5. HAVING clause

Applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).

6. SELECT clause

Specifies the column expressions that should be returned from the query. Expressions are evaluated, including aggregate functions that are based on the groups of rows, which will no longer exist after this point. The output columns are named using either column aliases or an implicit name determined by the engine. If SELECT DISTINCT is specified, duplicate removal is performed on the rows being returned from the SELECT stage.

7. ORDER BY clause

Sorts the rows returned from the SELECT stage as desired. Supports sorting on multiple columns in specified order, ascending or descending. The output columns will be identical to those columns returned from the SELECT stage and will have the same name.

8. LIMIT clause

Returns only the specified rows (with skip and limit values).

WITH clause

The Engine provides access to common table expressions via the WITH clause. You can reference WITH clause items as tables in subsequent WITH clause items, and in the main query. You can think of the WITH clause as providing query-scoped temporary tables.

Usage:

WITH name [(column, ...)] AS [/*+ no_inline|materialize */] (query expression) ...

Syntax rules:

  • All of the projected column names must be unique. If they are not unique, then the column name list must be provided.
  • If the columns of the WITH clause item are declared, then they must match the number of columns projected by the query expression.
  • Each WITH clause item must have a unique name.
  • The optional no_inline hint indicates to the QP optimizer that the query expression should not be substituted as an inline view where referenced. It is possible with no_inline for multiple evaluations of the common table as needed by source queries.
  • The optional materialize hint requires that the common table be created as a temporary table. This forces a single evaluation of the common table.
💡

The WITH clause is also subject to optimization and its entries might not be processed if they are not needed in the subsequent query.

💡

Common tables are aggressively inlined to enhance the possibility of pushdown. If a common table is only referenced a single time in the main query, it is likely to be inlined. In some situations, such as when you use a common table to prevent n-many-processing of a non-pushdown, correlated subquery, you might need to include the no_inline or materialize hint.

Examples:

WITH n (x) AS (select col from tbl) select x from n, n as n1
WITH n (x) AS /*+ no_inline */ (select col from tbl) select x from n, n as n1

FROM clause

The FROM clause specifies the target tables for SELECT, UPDATE, and DELETE statements.

Example Syntax:

FROM table [[AS] alias]
FROM table1 [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table2 ON join-criteria
FROM table1 CROSS JOIN table2
FROM (subquery) [AS] alias
FROM TABLE(subquery) [AS] alias

FROM clause hints

FROM clause hints are typically specified in a comment block preceding the affected clause. MAKEDEP and MAKENOTDEP may also appear after in non-comment form after the affected clause. If multiple hints apply to that clause, the hints should be placed in the same comment block.

Example:

FROM /*+ MAKEDEP PRESERVE */ (tbl1 inner join tbl2 inner join tbl3 on tbl2.col1 = tbl3.col1 on tbl1.col1 = tbl2.col1), tbl3 WHERE tbl1.col1 = tbl2.col1

Dependent JOIN hints

MAKEIND, MAKEDEP, and MAKENOTDEP are hints that you can use to control dependent join behavior. Use them only in situations where the optimizer does not choose the most optimal plan based upon query structure, metadata, and costing information. The hints can appear in a comment that follows the FROM keyword. The hints can be specified against any FROM clause, not just a named table.

MAKEIND

Indicates that the clause should be the independent (feeder) side of a dependent join.

MAKEDEP

Indicates that the clause should be the dependent (filtered) side of a join.

MAKENOTDEP

Prevents the clause from being the dependent (filtered) side of a join.

MAKEDEP(JOIN)

Indicates that the entire JOIN should be pushed.

MAKEDEP(NO JOIN)

Indicates that the entire JOIN should not be pushed.

MAKEDEP(MAX:val)

Indicates that the dependent JOIN should only be performed if there are less than the maximum number of values from the independent side.

💡

The above-mentioned hints become a crucial part of the Kubling's implementation success. Continue reading to discover why!

When working with a poor or absent cardinality, which is very frequent in the case of Script Modules since the systems they interact with exchanges JSON documents, it is not possible for the query planner to determine what to get first, sometimes making queries to not work properly or even failing, due to the fact that the Engine will try to perform a TABLE full scan.

Since the concept is a bit complex, let's use the following example, which is actually a part of the samples repo:

  SELECT
    a.NAME as app_name,
    a.ID as app_id,
    c.ID as component_id, 
    c.NAME as component_name, 
    r.html_url as repo_location,
    cd.DEPLOYMENT_IDENTIFIER,
    cd.environment,
    deploy.clusterName as cluster,
    deploy.status__replicas  as desired_replicas,
    deploy.status__availableReplicas as available_replicas,
    deploy.status__readyReplicas as ready_replicas
  FROM
    app_db.APPLICATION a
    LEFT JOIN app_db.COMPONENT c ON c.APPLICATION_ID = a.ID
    LEFT JOIN COMPONENT_CODE_REPO_JOIN cc ON cc.COMPONENT_ID = c.ID
    LEFT JOIN app_db.COMPONENT_DEPLOYMENT_JOIN cd ON cd.COMPONENT_ID = c.ID
    LEFT JOIN /*+ makedep */ github.CODE_REPO r ON (r.org = cc.CODE_REPO_ORG AND r.name = cc.CODE_REPO_NAME)
    LEFT JOIN /*+ makedep */ k8s.DEPLOYMENT deploy  ON (deploy.identifier = cd.DEPLOYMENT_IDENTIFIER)
  WHERE a.NAME='{{ app_name }}'

This query runs on a VDB that has 5 Data Sources:

  • 1 POSTGRESQL
  • 2 KUBERNETES
  • 1 SCRIPT_DOCUMENT_JS
  • 1 aggregator

The query above gets all applications registered in the POSTGRESQL, with some extra information, like their GitHub repositories (SCRIPT_DOCUMENT_JS) and their deployment clusters (KUBERNETES).

In our context we support multiple organizations but, by design, github.CODE_REPO forces to filter, at least, by one organization since the API works that way.
Due to the fact that the QP does not see any relationship between github.CODE_REPO and cc.CODE_REPO_ORG, it may decide to run a full scan over github.CODE_REPO and then apply the criteria. However, the full scan will never complete since github.CODE_REPO will throw an error due to the lack of the organization in the WHERE clause.

By using the /*+ makedep */ hint, we ensure that clause depends on the filter side of the join, in other words, the QP will first get the list of cc.CODE_REPO_ORG and cc.CODE_REPO_NAME before generating a query to route to github.CODE_REPO.
Therefore, your script will receive a standard SELECT with a WHERE clause in the form of IN or OR list.

ℹ️

This is a well-known problem in federated database systems, which solve the problem using decomposition or pushdown.

WHERE clause

The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE statements.
See its general form.

GROUP BY clause

The GROUP BY clause denotes that rows should be grouped according to the specified expression values. One row is returned for each group, after optionally filtering those aggregate rows based on a HAVING clause.

The general form of the GROUP BY:

GROUP BY expression [,expression]*
GROUP BY ROLLUP(expression [,expression]*)

Syntax Rules:

  • Column references in the group by cannot be made to alias names in the SELECT clause.
  • Expressions used in the group by must appear in the select clause.
  • Column references and expressions in the SELECT/HAVING/ORDER BY clauses that are not used in the group by clause must appear in aggregate functions.
  • If an aggregate function is used in the SELECT clause and no GROUP BY is specified, an implicit GROUP BY will be performed with the entire result set as a single group. In this case, every column in the SELECT must be an aggregate function as no other column value will be fixed across the entire group.
  • The GROUP BY columns must be of a comparable type.

HAVING clause

The HAVING clause operates exactly as the WHERE clause, although it operates on the output of a GROUP BY. You can use the same syntax with the HAVING clause as with the WHERE clause.

Syntax Rule:

  • Expressions used in the GROUP BY clause must contain either an aggregate function (COUNT, AVG, SUM, MIN, MAX), or be one of the grouping expressions.

SELECT clause

Queries that start with the SELECT keyword and are often referred to as SELECT statements. Usage:

SELECT [DISTINCT|ALL] ((expression [[AS] name])|(group identifier.STAR))*|STAR ...

Syntax Rules:

  • Aliased expressions are only used as the output column names and in the ORDER BY clause. They cannot be used in other clauses of the query.
  • DISTINCT may only be specified if the SELECT symbols are comparable.

ORDER BY clause

The ORDER BY clause specifies how records are sorted. The options are ASC (ascending) or DESC (descending).

Usage:

ORDER BY expression [ASC|DESC] [NULLS (FIRST|LAST)], ...

Syntax rules:

  • Sort columns can be specified by SELECT clause alias name, by SELECT clause expression, or by an unrelated expression.
  • Column references can appear in the SELECT clause as the expression for an aliased column, or can reference columns from tables in the FROM clause. If the column reference is not in the SELECT clause, the query cannot be a set operation, specify SELECT DISTINCT, or contain a GROUP BY clause.
  • Unrelated expressions, expressions not appearing as an aliased expression in the select clause, are allowed in the ORDER BY clause of a non-set QUERY. The columns referenced in the expression must come from the from clause table references. The column references cannot be to alias names.
  • The ORDER BY columns must be of a comparable type.
  • If an ORDER BY is used in an inline view or view definition without a LIMIT clause, it is removed by the query optimizer.
  • If NULLS FIRST/LAST is specified, then nulls are guaranteed to be sorted either first or last. If the null ordering is not specified, then results will typically be sorted with nulls as low values, which is the default internal sorting behavior. However, not all data Sources return results with nulls sorted as low values by default, and the Engine might return results with different null orderings.

LIMIT clause

The LIMIT clause specifies a limit on the number of records returned by the SELECT command. An optional offset can be specified (the number of rows to skip). The LIMIT clause can also be specified using the SQL 2008 OFFSET/FETCH FIRST clauses. If an ORDER BY is also specified, it will be applied before the OFFSET/LIMIT are applied. If an ORDER BY is not specified there is generally no guarantee what subset of rows will be returned.

Usage:

LIMIT [offset,] limit
LIMIT limit OFFSET offset
[OFFSET offset ROW|ROWS] [FETCH FIRST|NEXT [limit] ROW|ROWS ONLY]

Syntax rules:

  • The LIMIT/OFFSET expressions must be a non-negative integer or a parameter reference (?). An offset of 0 is ignored. A limit of 0 returns no rows.
  • The terms FIRST/NEXT are interchangeable as well as ROW/ROWS.
  • The LIMIT clause can take an optional preceding NON_STRICT hint to indicate that pushdown operations should not be inhibited, even if the results are not consistent with the logical application of the limit.
    The hint is only needed on unordered limits, for example, SELECT * FROM VW /*+ NON_STRICT */ LIMIT 2.

INTO clause

When the INTO clause is specified with a SELECT, the results of the query are inserted into the specified table.
This is often used to copy records from one Data Source to another, using an as-is approach or applying transforming functions to the original value. In some specific circumstances, it is also useful to insert records into a temporary table, although in Kubling we discourage the use of local storage.
The INTO clause immediately precedes the FROM clause.

Usage:

INTO [TABLE] FROM...

Syntax rules:

  • The INTO clause is logically applied last in processing, after the ORDER BY and LIMIT clauses.
  • In the SELECT INTO the target of the INTO clause is a table where the result of the SELECT command will be inserted.
  • SELECT INTO can't be combined with a UNION query.

Example:

SELECT col1, col2 INTO [TABLE_TABLE] FROM [SOURCE_TABLE]