Queries
SQL compatibility
Kubling provides most of the functionality of SQL-92 DML. The idea of this document is not to cover SQL exhaustively, but rather highlights how SQL is used in Kubling.
Identifiers
SQL commands contain references to tables and columns.
These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command.
All queries are processed in the context of a virtual database, or VDB
.
Because information can be federated across multiple sources, tables and columns must be scoped in some manner to avoid conflicts.
This scoping is provided by schemas, which contain the information for each data source or set of views.
Fully-qualified table and column names are of the following form, where the separate `parts' of the identifier are delimited by periods.
TABLE
: <schema_name>.<table_spec>COLUMN
: <schema_name>.<table_spec>.<column_name>
Syntax
-
Identifiers can consist of alphanumeric characters, or the underscore (_) character, and must begin with an alphabetic character. Any Unicode character may be used in an identifier.
-
Identifiers in double quotes can have any contents. The double quote character can be used if is escaped with an additional double quote; for example,
"some "" id"
. -
Because different data sources organize tables in different ways, with some prepending catalog, schema, or user information, Kubling allows table specification to be a dot-delimited construct.
-
Columns, column aliases, and schemas cannot contain a dot (.) character.
-
Identifiers, even when quoted, are not case-sensitive in Kubling.
Examples of valid, fully-qualified TABLE
identifiers are:
Kubernetes.DEPLOYMENTS
"Kubernetes.Deployments"
Examples of valid, fully-qualified COLUMN
identifiers are:
Kubernetes.DEPLOYMENTS.metadata__name
"Kubernetes.Deployments"."metadata__name"
Operator precedence
Operator | Description |
---|---|
[] | Array element reference |
+ , - | positive/negative value expression |
* , / | multiplication/division |
+ , - | addition/subtraction |
|| | concat |
criteria |
Expressions
Identifiers, literals, and functions can be combined into expressions.
Expressions can be used in a query with nearly any keyword, including SELECT
, FROM
(if specifying join criteria), WHERE
, GROUP BY
, HAVING
, or ORDER BY
.
Column Identifiers
Column identifiers are used to specify the output columns in SELECT
statements, the columns and their values for INSERT
and UPDATE
statements,
and criteria used in WHERE
and FROM
clauses. They are also used in GROUP BY
, HAVING
, and ORDER BY
clauses. The syntax for column identifiers
was defined in the Identifiers section above.
Literals
Literal values represent fixed values. These can be any of the 'standard' data types. For information about data types, see Data Types..
Syntax Rules
- Integer values will be assigned an integral data type big enough to hold the value (
integer
,long
, orbiginteger
). - Floating point values will always be parsed as a double.
- The keyword
null
is used to represent an absent or unknown value and is inherently untyped. In many cases, anull
literal value will be assigned an implied type based on context. For example, in the function5 +
null, the
nullvalue will be assigned the type `integer` to match the type of the value `5`. A
null`` literal used in theSELECT
clause of a query with no implied context will be assigned to typestring
.
Examples
'abc'
'isn"t true'
5
-37.75e01
true
false
Unicode character:
'\u0027'
Binary:
X'0F0A'
Aggregate functions
Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group.
COUNT(*)
Count the number of values (including null
s and duplicates) in a group. Returns an integer - an exception will be thrown if a larger count is computed.
COUNT(x)
Count the number of values (excluding null
s) in a group. Returns an integer - an exception will be thrown if a larger count is computed.
COUNT_BIG(*)
Count the number of values (including null
s and duplicates) in a group. Returns a long - an exception will be thrown if a larger count is computed.
COUNT_BIG(x)
Count the number of values (excluding null
s) in a group. Returns a long - an exception will be thrown if a larger count is computed.
SUM(x)
Sum of the values (excluding null
s) in a group.
AVG(x)
Average of the values (excluding null
s) in a group.
MIN(x)
Minimum value in a group (excluding null
).
MAX(x)
Maximum value in a group (excluding null
).
ANY(x)
/SOME(x)
Returns TRUE if any value in the group is TRUE (excluding null
).
EVERY(x)
Returns TRUE if every value in the group is TRUE (excluding null
).
VAR_POP(x)
Biased variance (excluding null
) logically equals(sum(x^2) - sum(x)^2/count(x))/count(x); returns a double; null
if count = 0.
VAR_SAMP(x)
Sample variance (excluding null
) logically equals(sum(x^2) - sum(x)^2/count(x))/(count(x) - 1); returns a double; null
if count < 2.
STDDEV_POP(x)
Standard deviation (excluding null
) logically equals SQRT(VAR_POP(x)).
STDDEV_SAMP(x)
Sample standard deviation (excluding null
) logically equals SQRT(VAR_SAMP(x)).
Case and searched case
To include conditional logic in a scalar expression, you can use the following two forms of the CASE
expression:
- CASE <expr> ( WHEN <expr> THEN <expr>)+ [ELSE expr] END
- CASE ( WHEN <criteria> THEN <expr>)+ [ELSE expr] END
Each form allows for an output based on conditional logic.
The first form starts with an initial expression and evaluates WHEN
expressions until the values match, and outputs the THEN
expression.
If no WHEN
is matched, the ELSE
expression is output. If no WHEN
is matched and no ELSE
is specified, a null
literal value is output.
The second form (the searched case expression) searches the WHEN
clauses, which specify an arbitrary criteria to evaluate.
If any criteria evaluates to true
, the THEN
expression is evaluated and output. If no WHEN
is true
, the ELSE
is evaluated or NULL
is output if none exists.
Examples
SELECT CASE columnA WHEN '10' THEN 'ten' WHEN '20' THEN 'twenty' END AS myExample
SELECT CASE WHEN columnA = '10' THEN 'ten' WHEN columnA = '20' THEN 'twenty' END AS myExample
Arrays
Empty arrays
()
(,)
ARRAY[]
Single element array
(expr,)
ARRAY[expr]
General array syntax
(expr, expr ... [,])
ARRAY[expr, ...]
If all of the elements in the array are of the same type, the array will be of a matching base type.
If the elements types differ the array
base type, it will be of object
.
An array element reference takes the form of:
array_expr[index_expr]
index_expr
must resolve to an integer value. This syntax is effectively the same as the array_get
system function and expects 1-based indexing.
Criteria
Criteria can be any of the following items:
- Predicates that evaluate to
true
orfalse
. - Logical criteria that combine criteria (
AND
,OR
,NOT
). - A value expression of type
boolean
.
General usage:
criteria AND|OR criteria
NOT criteria
(criteria)
expression (=|<>|!=|<|>|<=|>=) (expression|((ANY|ALL|SOME) subquery|(array_expression)))
expression IS [NOT] DISTINCT FROM expression
expression [NOT] IS NULL
expression [NOT] IN (expression [,expression]*)|subquery
expression [NOT] LIKE pattern [ESCAPE char]
LIKE
matches the string expression against the given string pattern.
The pattern may contain %
to match any number of characters, and _
to match any single character.
The escape character can be used to escape the match characters %
and _
.
expression [NOT] SIMILAR TO pattern [ESCAPE char]
SIMILAR TO
is a cross between LIKE
and standard regular expression syntax. %
and _
are still used, rather than .*
and .,
respectively.
The Engine does not exhaustively validate SIMILAR TO
pattern values. Instead, the pattern is converted to an equivalent regular expression.
Do not rely on general regular expression features when using SIMILAR TO
. If additional features are needed, use LIKE_REGEX
.
Avoid the use of non-literal patterns, because the Engine has a limited ability to process SQL pushdown predicates.
expression [NOT] LIKE_REGEX pattern
EXISTS (subquery)
expression [NOT] BETWEEN minExpression AND maxExpression
The Engine converts BETWEEN
into the equivalent form expression >= minExpression AND expression ⇐ maxExpression
.
expression
Where expression
has type boolean
.
Syntax rules
The precedence ordering from lowest to highest is comparison, NOT
, AND
and OR
.
Criteria nested by parenthesis will be logically evaluated prior to evaluating the parent criteria.
Some examples of valid criteria are:
(requests > 2500.0)
100*(50 - x)/(25 - y) > z
concat(areaCode,concat('-',phone)) LIKE '314%1'
null
values represent an unknown value. Comparison with a null
value will evaluate to unknown, which can never be true even if not is used.
Criteria precedence The Engine parses and evaluates conditions with higher precedence before those with lower precedence. Conditions with equal precedence are left-associative. The following table lists condition precedence from high to low:
Condition | Description |
---|---|
SQL Operators | See here |
EXISTS , LIKE , SIMILAR TO , LIKE_REGEX , BETWEEN , IN , IS NULL , IS DISTINCT , <, ⇐ , > , >= , = , <> | Comparison |
NOT | Negation |
AND | Conjunction |
OR | Disjunction |
To prevent lookaheads, the parser does not accept all possible criteria sequences.
For example, a = b
is null
is not accepted, because by the left-associative parsing we first recognize a =
, then look for a common value expression.
b
is null
is not a valid common value expression. Thus, nesting must be used, for example, (a = b)
is null
. For more information about parsing rules, see
Scalar Functions
Numeric Functions
Function | Description | Datatype constraint |
---|---|---|
+ - * / | Standard numeric operators | x of [integer , long , float , double , biginteger , bigdecimal ], returns same as x [a] |
ABS(x) | Absolute value | See standard numeric operators above |
ACOS(x) | Arc cosine | x of [double , bigdecimal ], returns double |
ASIN(x) | Arc sine | x of [double , bigdecimal ], returns double |
ATAN(x) | Arc tangent | x in [double , bigdecimal ], returns double |
ATAN2(x, y) | Arc tangent | x in [double , bigdecimal ], returns double |
CEILING(x) | Ceiling | x in [double , float ], returns double |
COS(x) | Cosine | x in [double , bigdecimal ], returns double |
COT(x) | Cotangent | x in [double , bigdecimal ], returns double |
DEGREES(x) | Convert degrees to radians | x in [double , bigdecimal ], returns double |
EXP(x) | e^x | x in [double , float ], returns double |
FLOOR(x) | Floor | x in [double , float ], returns double |
FORMATBIGDECIMAL(x, y) | Formats x using format y | x is bigdecimal , y is string , returns string |
FORMATBIGINTEGER(x, y) | Formats x using format y | x is biginteger , y is string , returns string |
FORMATDOUBLE(x, y) | Formats x using format y | x is double , y is string , returns string |
FORMATFLOAT(x, y) | Formats x using format y | x is float , y is string , returns string |
FORMATINTEGER(x, y) | Formats x using format y | x is integer , y is string , returns string |
FORMATLONG(x, y) | Formats x using format y | x is long , y is string , returns string |
LOG(x) | Natural log (base e) | x in [double , float ], returns double |
LOG10(x) | Log (base 10) | x in [double , float ], returns double |
MOD(x, y) | Modulus (remainder of x / y) | x in [integer , long , float , double , biginteger , bigdecimal ], returns same as x |
PARSEBIGDECIMAL(x, y) | Parses x using format y | x, y of string s, returns bigdecimal |
PARSEBIGINTEGER(x, y) | Parses x using format y | x, y of string s, returns biginteger |
PARSEDOUBLE(x, y) | Parses x using format y | x, y of string s, returns double |
PARSEFLOAT(x, y) | Parses x using format y | x, y of string s, returns float |
PARSEINTEGER(x, y) | Parses x using format y | x, y of string s, returns integer |
PARSELONG(x, y) | Parses x using format y | x, y of string s, returns long |
POWER(x, y) | x to the y power | x in [double , bigdecimal , biginteger ], return is the same type as x |
RADIANS(x) | Convert radians to degrees | x in [double , bigdecimal ], returns double |
RAND() | Returns a random number, using generator established so far in the query or initializing with system clock if necessary. | Returns double . |
RAND(x) | Returns a random number, using new generator seeded with x. This should typically be called in an initialization query. It will only effect the random values returned by the RAND function and not the values from RAND functions evaluated by sources. | x is integer , returns double . |
ROUND(x, y) | Round x to y places; negative values of y indicate places to the left of the decimal point | x in [integer , float , double , bigdecimal ] y is integer , return is same type as x. |
SIGN(x) | 1 if x > 0 , 0 if x = 0 , -1 if x < 0 | x in [integer , long , float , double , biginteger , bigdecimal ], returns integer |
SIN(x) | Sine | x in [double , bigdecimal ], returns double |
SQRT(x) | Square root | x in [long , double , bigdecimal ], returns double |
TAN(x) | Tangent | x in [double , bigdecimal ], returns double |
BITAND(x, y) | Bitwise AND | x, y in [integer ], returns integer |
BITOR(x, y) | Bitwise OR | x, y in [integer ], returns integer |
BITXOR(x, y) | Bitwise XOR | x, y in [integer ], returns integer |
BITNOT(x) | Bitwise NOT | x in [integer ], returns integer |
String Functions
Unless specified, all of the arguments and return types in the following table are strings and all indexes are 1-based.
The 0
index is considered to be before the start of the string.
Function | Description | Datatype constraint |
---|---|---|
x || y | Concatenation operator | x,y in [string , clob ], return type is string or character large object (CLOB ). |
ASCII(x) | Provide ASCII value of the left most character[1] in x. The empty string will as input will return null . | returns integer . |
CHR(x) CHAR(x) | Provide the character[1] for ASCII value x [a]. | x of integer [1] For the implementations of the ASCII and CHR functions, characters are limited to UCS2 values only. For pushdown there is little consistency among sources for character values beyond character code 255 . |
CONCAT(x, y) | Concatenates x and y with ANSI semantics. If x and/or y is null, returns null. | x, y string . |
CONCAT2(x, y) | Concatenates x and y with non-ANSI null semantics. If x and y is null, returns null. If only x or y is null, returns the other value. | x, y string . |
ENDSWITH(x, y) | Checks if y ends with x. If x or y are null, returns null. | x, y string returns boolean . |
INITCAP(x) | Capitalizes first letter of each word in string x while the remaining in lowercase. | x string . |
INSERT(str1, start, length, str2) | Insert str2 into str1 | str1 string , start integer , length integer , str2 string |
LCASE(x) | Lowercase | x string |
LEFT(x, y) | Get left y characters of x | x string , y integer , returns string |
LENGTH(x) | Length of a string | x string , returns integer |
LOCATE(x, y) | Find position of x in y starting at beginning of y. | x string , y integer , returns integer |
LOCATE(x, y, z) | Find position of x in y starting at z. | x string , y string , z integer , returns integer |
LPAD(x, y) | Pad input string x with spaces on the left to the length of y. | x string , y integer , returns string |
LPAD(x, y, z) | Pad input string x on the left to the length of y using character z. | x string , y string , z character, returns string |
LEFT(x, y) | Get left y characters of x | x string , y integer , returns string |
LTRIM(x) | Left trim x of blank chars. | x string , returns string |
QUERYSTRING(path [, expr [AS name] …]) | Returns a properly encoded query string appended to the given path. Null valued expressions are omitted. Names are optional for column reference expressions. For example, QUERYSTRING('path', 'value' as "&x", ' & ' as y, null as z) returns 'path?%26x=value&y=%20%26%20' . | path, expr string , name is an identifier. |
REPEAT(str1, instances) | Repeats str1 a specified number of times. | str1 string , instances integer , returns string . |
RIGHT(x, y) | Gets right y characters of x. | x string , y integer , returns string |
RPAD(input string x, pad length y) | Pad input string x with spaces on the right to the length of y. | x string , y integer , returns string |
RPAD(x, y, z) | Pad input string x on the right to the length of y using character z. | x string , y string , z character, returns string |
RTRIM(x) | Right trim x of blank chars. | x string , returns string |
TRIM([[LEADING|TRAILING|BOTH] [x] FROM] y) | Trim the leading, trailing, or both ends of a string y of character x. If LEADING /TRAILING /BOTH is not specified, BOTH is used. If no trim character x is specified, then a blank space is used. | x character , y string |
SPACE(x) | Repeats the space character x number of times | x integer , returns string |
SUBSTRING(x, y) | Gets substring of x, from position y to the end of x | x string , returns integer |
SUBSTRING(x, y, z) | Gets substring of x from position y with length z | x string , y integer , z integer , returns integer |
UCASE(x) | Uppercase | x string , returns integer |
UNESCAPE(x) | Unescaped of x. Possible escape sequences are \b - backspace, \t - tab, \n - line feed, \f - form feed, \r - carriage return. \uXXXX , where X is a hex value, can be used to specify any unicode character. \XXX , where X is an octal digit, can be used to specify an octal byte value. If any other character appears after an escape character, that character will appear in the output and the escape character will be ignored. | x string , returns string |
REPLACE(x, y, z) | Replace all occurrences of a given string with another. | x , y , z string , returns string |
REGEXP_REPLACE(str, pattern, sub [, flags]) | Replace one or more occurrences of pattern with sub in str . Flags:g : Replaces all occurrences, not just the first.m : Match over multiple lines.i : Match without case sensitivity. | str , pattern , sub string , returns string |
Type Conversion Functions
Function | Description |
---|---|
CONVERT(x, type) | Converts x to type, where type is a standard data type. |
CAST(x AS type) | Identical to CONVERT |
Options that are specified on the type, such as length, precision, scale, etc., are effectively ignored - the Engine is simply converting from one object type to another.
Data and Time Functions
Function | Description | Datatype Constraint |
---|---|---|
CURDATE() / CURRENT_DATE() | Return current date - same value for all invocations in the user command. | Returns date |
CURTIME() | Return current time - same value for all invocations in the user command. See also CURRENT_TIME . | Returns time |
NOW() | Return current timestamp (date and time with millisecond precision) - same value for all invocations in the user command. | Returns timestamp |
CURRENT_TIME[(precision)] | Return current time. The precision argument is ignored, behaving the same as CURTIME() without a precision argument. | Returns time |
CURRENT_TIMESTAMP[(precision)] | Return current timestamp with millisecond precision. Behaves like NOW() without a precision argument. | Returns timestamp |
DAYNAME(x) | Return name of day in the default locale. | x in timestamp, returns string |
DAYOFMONTH(x) | Return day of the month. | x in timestamp, returns integer |
DAYOFWEEK(x) | Return day of the week (Sunday=1, Saturday=7). | x in timestamp, returns integer |
DAYOFYEAR(x) | Return day number in the year. | x in timestamp, returns integer |
EPOCH(x) | Return seconds since the Unix epoch with microsecond precision. | x in timestamp, returns double |
EXTRACT(FIELD FROM x) | Return the specified FIELD from the date value x . Fields include YEAR , MONTH , DAYOFMONTH , HOUR , MINUTE , SECOND , EPOCH , etc. | x in timestamp, returns integer or double |
FORMATDATE(x, y) | Format the date x using the format y . | x is date , y is string , returns string |
FORMATTIME(x, y) | Format the time x using the format y . | x is time , y is string , returns string |
FORMATTIMESTAMP(x, y) | Format the timestamp x using the format y . | x is timestamp , y is string , returns string |
FROM_MILLIS(millis) | Return the timestamp value for the given milliseconds. | Long UTC timestamp in milliseconds |
FROM_UNIXTIME(unix_timestamp) | Return the Unix timestamp as a string in the default format yyyy/mm/dd hh:mm:ss . | Long Unix timestamp (in seconds) |
HOUR(x) | Return hour (in 24-hour format). | x in timestamp, returns integer |
MINUTE(x) | Return minute. | x in timestamp, returns integer |
MODIFYTIMEZONE(timestamp, startTimeZone, endTimeZone) | Return a timestamp adjusted for the difference between startTimeZone and endTimeZone . | Strings for time zones, returns timestamp |
MONTH(x) | Return month. | x in timestamp, returns integer |
MONTHNAME(x) | Return name of the month in the default locale. | x in timestamp, returns string |
PARSEDATE(x, y) | Parse date from x using format y . | x , y are string , returns date |
PARSETIME(x, y) | Parse time from x using format y . | x , y are string , returns time |
PARSETIMESTAMP(x, y) | Parse timestamp from x using format y . | x , y are string , returns timestamp |
QUARTER(x) | Return the quarter. | x in timestamp, returns integer |
SECOND(x) | Return seconds. | x in timestamp, returns integer |
TIMESTAMPCREATE(date, time) | Create a timestamp from a date and a time . | date is date , time is time , returns timestamp |
TO_MILLIS(timestamp) | Return the UTC timestamp in milliseconds. | Timestamp value |
UNIX_TIMESTAMP(unix_timestamp) | Return the Unix timestamp (in seconds). | unix_timestamp as string , returns long |
WEEK(x) | Return the week of the year (1-53). | x in timestamp, returns integer |
YEAR(x) | Return the four-digit year. | x in timestamp, returns integer |
Decode Functions
Function | Description | Datatype Constraint |
---|---|---|
DECODESTRING(x, y [, z]) | Decode column x using a string of value pairs y with an optional delimiter z and return the decoded column as a string. If a delimiter is not specified, a comma (, ) is used. y has the format SearchDelimResultDelimSearchDelimResult[DelimDefault] . Returns Default if specified, or x if there are no matches. | All parameters and return type are string . |
DECODEINTEGER(x, y [, z]) | Decode column x using a string of value pairs y with an optional delimiter z and return the decoded column as an integer. If a delimiter is not specified, a comma (, ) is used. y has the format SearchDelimResultDelimSearchDelimResult[DelimDefault] . Returns Default if specified, or x if there are no matches. | All parameters are string ; returns integer . |
Document Functions v24.5.3+
Function | Description |
---|---|
jsonParse(value, skipValidation) | Parses a value (CLOB or BLOB ) and returns a json object. skipValidation indicates whether pre-parsing validation of value should be skipped. |
jsonObject(c1, c2, ..., cn) | Returns a json object based on a list of columns. The resulting object contains fields in the form of column_name: column_value . |
yamlAsJSON(value) | Converts value (CLOB ), which is expected to be a valid YAML document, into a json object. |
jsonPath(value, path) | Evaluates a JsonPath (opens in a new tab) expression on value , which is expected to be a valid json document (string or CLOB ). Always returns a json document, even if it’s a single value. In such cases, transform it to other types as needed. |
jsonJq(value, query) | Applies a jq (opens in a new tab) query to value , which is expected to be a valid json document (string or CLOB ). Always returns a json document, even if it’s a single value. In such cases, transform it to other types as needed. |
jsonPathAsString(value, path) | Same as jsonPath but returns a string instead of a json object. |
jsonJqAsString(value, query) | Same as jsonJq but returns a string instead of a json object. |
Query Planner Limitations on WHERE
Clauses with json
Objects
As of version v24.5.3
, the query planner is not yet optimized to perform efficient comparisons on json
objects in the WHERE
clause, even when the value expression is within a type conversion function.
Consider the following example:
SELECT *
FROM
kube.EVENT
WHERE
convert(jsonPath(regarding, '$.uid'), string) = '7e613d43'
Assuming that such a uid
exists, we expect the query planner to drop the entire query due to its current limitations in creating an optimized node based on json
objects.
For these cases, you should use jsonPathAsString
or jsonJqAsString
instead, as shown below:
SELECT *
FROM
kube.EVENT
WHERE
jsonPathAsString(regarding, '$.uid') = '7e613d43'
Choice functions
Function | Description | Datatype constraint |
---|---|---|
COALESCE(x, y1, y2,...,yn) | Returns the first non-null parameter. | x and y can be any compatible types. |
IFNULL(x,y) | If x is null , return y; else return x. | x, y of any type. However, that type must be the same for both. |
NVL(x,y) | If x is null , return y; else return x. | x, y of any type. However, that type must be the same for both. |
NULLIF(param1, param2) | Equivalent to case when (param1 = param2) then null else param1 . | param1 and param2 must be comparable types. |
SELECT
command
The SELECT
command is used to retrieve records for any number of relations.
A SELECT
command can contain the following clauses:
Except for the OPTION
clause, all of the preceding clauses are defined by the SQL specification.
The specification also tells the order in which these clauses are logically processed.
Processing occurs in stages, with each stage passing a set of rows to the following stage.
The processing model is logical, and does not represent the way that a Engine performs the actual processing, but it is a useful model for understanding how SQL works.
The SELECT
command processes clauses in the following stages:
1. WITH clause
Gathers all rows from all with items in the order listed. Subsequent WITH
items and the main query can reference a WITH
item as if it were a table.
2. FROM clause
Gathers all rows from all tables involved in the query and logically joins them with a Cartesian product to produce a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.
3. WHERE clause
Applies a criteria to every output row from the FROM
stage, further reducing the number of rows.
4. GROUP BY clause
Groups sets of rows with matching values in the GROUP BY
columns.
5. HAVING clause
Applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).
6. SELECT clause
Specifies the column expressions that should be returned from the query.
Expressions are evaluated, including aggregate functions that are based on the groups of rows, which will no longer exist after this point.
The output columns are named using either column aliases or an implicit name determined by the engine.
If SELECT DISTINCT
is specified, duplicate removal is performed on the rows being returned from the SELECT
stage.
7. ORDER BY clause
Sorts the rows returned from the SELECT
stage as desired. Supports sorting on multiple columns in specified order, ascending or descending.
The output columns will be identical to those columns returned from the SELECT
stage and will have the same name.
8. LIMIT clause
Returns only the specified rows (with skip and limit values).
WITH
clause
The Engine provides access to common table expressions via the WITH
clause.
You can reference WITH
clause items as tables in subsequent WITH
clause items, and in the main query.
You can think of the WITH
clause as providing query-scoped temporary tables.
Usage:
WITH name [(column, ...)] AS [/*+ no_inline|materialize */] (query expression) ...
Syntax rules:
- All of the projected column names must be unique. If they are not unique, then the column name list must be provided.
- If the columns of the
WITH
clause item are declared, then they must match the number of columns projected by the query expression. - Each
WITH
clause item must have a unique name. - The optional
no_inline
hint indicates to the QP optimizer that the query expression should not be substituted as an inline view where referenced. It is possible withno_inline
for multiple evaluations of the common table as needed by source queries. - The optional
materialize
hint requires that the common table be created as a temporary table. This forces a single evaluation of the common table.
The WITH
clause is also subject to optimization and its entries might not be processed if they are not needed in the subsequent query.
Common tables are aggressively inlined to enhance the possibility of pushdown.
If a common table is only referenced a single time in the main query, it is likely to be inlined.
In some situations, such as when you use a common table to prevent n-many-processing of a non-pushdown, correlated subquery,
you might need to include the no_inline
or materialize hint.
Examples:
WITH n (x) AS (select col from tbl) select x from n, n as n1
WITH n (x) AS /*+ no_inline */ (select col from tbl) select x from n, n as n1
FROM
clause
The FROM
clause specifies the target tables for SELECT
, UPDATE
, and DELETE
statements.
Example Syntax:
FROM table [[AS] alias]
FROM table1 [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table2 ON join-criteria
FROM table1 CROSS JOIN table2
FROM (subquery) [AS] alias
FROM TABLE(subquery) [AS] alias
FROM
clause hints
FROM
clause hints are typically specified in a comment block preceding the affected clause.
MAKEDEP
and MAKENOTDEP
may also appear after in non-comment form after the affected clause.
If multiple hints apply to that clause, the hints should be placed in the same comment block.
Example:
FROM /*+ MAKEDEP PRESERVE */ (tbl1 inner join tbl2 inner join tbl3 on tbl2.col1 = tbl3.col1 on tbl1.col1 = tbl2.col1), tbl3 WHERE tbl1.col1 = tbl2.col1
Dependent JOIN
hints
MAKEIND
, MAKEDEP
, and MAKENOTDEP
are hints that you can use to control dependent join behavior.
Use them only in situations where the optimizer does not choose the most optimal plan based upon query structure, metadata, and costing information.
The hints can appear in a comment that follows the FROM
keyword. The hints can be specified against any FROM
clause, not just a named table.
MAKEIND
Indicates that the clause should be the independent (feeder) side of a dependent join.
MAKEDEP
Indicates that the clause should be the dependent (filtered) side of a join.
MAKENOTDEP
Prevents the clause from being the dependent (filtered) side of a join.
MAKEDEP(JOIN)
Indicates that the entire JOIN
should be pushed.
MAKEDEP(NO JOIN)
Indicates that the entire JOIN
should not be pushed.
MAKEDEP(MAX:val)
Indicates that the dependent JOIN
should only be performed if there are less than the maximum number of values from the independent side.
The above-mentioned hints become a crucial part of the Kubling's implementation success. Continue reading to discover why!
When working with a poor or absent cardinality, which is very frequent in the case of Script Modules since the systems they interact with exchanges JSON documents,
it is not possible for the query planner to determine what to get first, sometimes making queries to not work properly or even failing, due to the fact that the Engine will try to
perform a TABLE
full scan.
Since the concept is a bit complex, let's use the following example, which is actually a part of the samples repo:
SELECT
a.NAME as app_name,
a.ID as app_id,
c.ID as component_id,
c.NAME as component_name,
r.html_url as repo_location,
cd.DEPLOYMENT_IDENTIFIER,
cd.environment,
deploy.clusterName as cluster,
deploy.status__replicas as desired_replicas,
deploy.status__availableReplicas as available_replicas,
deploy.status__readyReplicas as ready_replicas
FROM
app_db.APPLICATION a
LEFT JOIN app_db.COMPONENT c ON c.APPLICATION_ID = a.ID
LEFT JOIN COMPONENT_CODE_REPO_JOIN cc ON cc.COMPONENT_ID = c.ID
LEFT JOIN app_db.COMPONENT_DEPLOYMENT_JOIN cd ON cd.COMPONENT_ID = c.ID
LEFT JOIN /*+ makedep */ github.CODE_REPO r ON (r.org = cc.CODE_REPO_ORG AND r.name = cc.CODE_REPO_NAME)
LEFT JOIN /*+ makedep */ k8s.DEPLOYMENT deploy ON (deploy.identifier = cd.DEPLOYMENT_IDENTIFIER)
WHERE a.NAME='{{ app_name }}'
This query runs on a VDB that has 5 Data Sources:
- 1
POSTGRESQL
- 2
KUBERNETES
- 1
SCRIPT_DOCUMENT_JS
- 1 aggregator
The query above gets all applications registered in the POSTGRESQL
, with some extra information, like their GitHub repositories (SCRIPT_DOCUMENT_JS
) and
their deployment clusters (KUBERNETES
).
In our context we support multiple organizations but, by design, github.CODE_REPO
forces to filter, at least, by one organization since the API works that way.
Due to the fact that the QP does not see any relationship between github.CODE_REPO
and cc.CODE_REPO_ORG
, it may decide to run a full scan over github.CODE_REPO
and
then apply the criteria.
However, the full scan will never complete since github.CODE_REPO
will throw an error due to the lack of the organization in the WHERE
clause.
By using the /*+ makedep */
hint, we ensure that clause depends on the filter side of the join, in other words, the QP will first get the list of cc.CODE_REPO_ORG
and
cc.CODE_REPO_NAME
before generating a query to route to github.CODE_REPO
.
Therefore, your script will receive a standard SELECT
with a WHERE
clause in the form of IN
or OR
list.
This is a well-known problem in federated database systems, which solve the problem using decomposition or pushdown.
WHERE
clause
The WHERE
clause defines the criteria to limit the records affected by SELECT
, UPDATE
, and DELETE
statements.
See its general form.
GROUP BY
clause
The GROUP BY
clause denotes that rows should be grouped according to the specified expression values.
One row is returned for each group, after optionally filtering those aggregate rows based on a HAVING
clause.
The general form of the GROUP BY
:
GROUP BY expression [,expression]*
GROUP BY ROLLUP(expression [,expression]*)
Syntax Rules:
- Column references in the group by cannot be made to alias names in the
SELECT
clause. - Expressions used in the group by must appear in the select clause.
- Column references and expressions in the
SELECT
/HAVING
/ORDER BY
clauses that are not used in the group by clause must appear in aggregate functions. - If an aggregate function is used in the
SELECT
clause and noGROUP BY
is specified, an implicitGROUP BY
will be performed with the entire result set as a single group. In this case, every column in theSELECT
must be an aggregate function as no other column value will be fixed across the entire group. - The
GROUP BY
columns must be of a comparable type.
HAVING
clause
The HAVING
clause operates exactly as the WHERE
clause, although it operates on the output of a GROUP BY
.
You can use the same syntax with the HAVING
clause as with the WHERE
clause.
Syntax Rule:
- Expressions used in the
GROUP BY
clause must contain either an aggregate function (COUNT
,AVG
,SUM
,MIN
,MAX
), or be one of the grouping expressions.
SELECT
clause
Queries that start with the SELECT keyword and are often referred to as SELECT
statements.
Usage:
SELECT [DISTINCT|ALL] ((expression [[AS] name])|(group identifier.STAR))*|STAR ...
Syntax Rules:
- Aliased expressions are only used as the output column names and in the
ORDER BY
clause. They cannot be used in other clauses of the query. DISTINCT
may only be specified if theSELECT
symbols are comparable.
ORDER BY
clause
The ORDER BY
clause specifies how records are sorted. The options are ASC
(ascending) or DESC
(descending).
Usage:
ORDER BY expression [ASC|DESC] [NULLS (FIRST|LAST)], ...
Syntax rules:
- Sort columns can be specified by
SELECT
clause alias name, bySELECT
clause expression, or by an unrelated expression. - Column references can appear in the
SELECT
clause as the expression for an aliased column, or can reference columns from tables in theFROM
clause. If the column reference is not in theSELECT
clause, the query cannot be a set operation, specifySELECT DISTINCT
, or contain aGROUP BY
clause. - Unrelated expressions, expressions not appearing as an aliased expression in the select clause, are allowed in the
ORDER BY
clause of a non-setQUERY
. The columns referenced in the expression must come from the from clause table references. The column references cannot be to alias names. - The
ORDER BY
columns must be of a comparable type. - If an
ORDER BY
is used in an inline view or view definition without aLIMIT
clause, it is removed by the query optimizer. - If
NULLS FIRST/LAST
is specified, thennulls
are guaranteed to be sorted either first or last. If thenull
ordering is not specified, then results will typically be sorted withnulls
as low values, which is the default internal sorting behavior. However, not all data Sources return results withnulls
sorted as low values by default, and the Engine might return results with differentnull
orderings.
LIMIT
clause
The LIMIT
clause specifies a limit on the number of records returned by the SELECT
command.
An optional offset
can be specified (the number of rows to skip).
The LIMIT
clause can also be specified using the SQL 2008 OFFSET/FETCH FIRST
clauses.
If an ORDER BY
is also specified, it will be applied before the OFFSET/LIMIT
are applied.
If an ORDER BY
is not specified there is generally no guarantee what subset of rows will be returned.
Usage:
LIMIT [offset,] limit
LIMIT limit OFFSET offset
[OFFSET offset ROW|ROWS] [FETCH FIRST|NEXT [limit] ROW|ROWS ONLY]
Syntax rules:
- The
LIMIT/OFFSET
expressions must be a non-negative integer or a parameter reference (?). An offset of0
is ignored. A limit of0
returns no rows. - The terms
FIRST/NEXT
are interchangeable as well asROW/ROWS
. - The
LIMIT
clause can take an optional precedingNON_STRICT
hint to indicate that pushdown operations should not be inhibited, even if the results are not consistent with the logical application of the limit.
The hint is only needed on unordered limits, for example,SELECT * FROM VW /*+ NON_STRICT */ LIMIT 2
.
INTO
clause
When the INTO
clause is specified with a SELECT
, the results of the query are inserted into the specified table.
This is often used to copy records from one Data Source to another, using an as-is approach or applying transforming functions to the original value.
In some specific circumstances, it is also useful to insert records into a temporary table, although in Kubling we discourage the use of local storage.
The INTO
clause immediately precedes the FROM
clause.
Usage:
INTO [TABLE] FROM...
Syntax rules:
- The
INTO
clause is logically applied last in processing, after theORDER BY
andLIMIT
clauses. - In the
SELECT INTO
the target of theINTO
clause is a table where the result of theSELECT
command will be inserted. SELECT INTO
can't be combined with aUNION
query.
Example:
SELECT col1, col2 INTO [TABLE_TABLE] FROM [SOURCE_TABLE]