SQL Functions Bundle Module

This module contains functions that can be called in DDL and Query SQL contexts.

Having all functions grouped in modules makes it easier to manage than putting them in each Script Bundle Module.

Let’s see first the recommended internal directory and file organization:

bundle-sql-function-info.yaml

DDL and SQL functions

Functions are divided into two main categories, DDL and SQL.

Kubling does not evaluate functions the same way it evaluates Delegate scripts, in the sense that in functions only the function code is loaded and evaluated, not the whole script file, and the following limitations apply:

import is not supported.
Calling other functions is not supported.

⚠️

Starting from version 25.3.2, SQL functions can also be invoked in a DDL context using the call_func option.
This change removes the need to distinguish between DDL and SQL function types.
Moving forward, we will likely simplify this classification to only distinguish between standard SQL functions and SQL aggregation functions, as the latter cannot be used within a DDL context.

DDL Functions

DDL functions are defined at the field level within the TABLE definition in the DDL, making them an integral part of the SCHEMA.

These functions are used to transform field values during query execution.

Example:

CREATE FOREIGN TABLE STORAGE_ACCOUNT
   (
     resourceGroup string OPTIONS(call_func 'az_ddl_get_rg<id>'),
     ...
   )

Internally, these functions are referred to as “transformation functions” because they take a set of field values as input and generate a new, transformed value.

Key Characteristics:

Dependency:
DDL functions are dependent on other field values that must be fetched before the function is executed. This dependency adds complexity to their resolution.
Query Resolution:
In most cases, DDL functions are resolved dynamically during each query performed against the TABLE. However, the actual resolution process depends on the DQP (Distributed Query Planner) and the specific execution plan.

DDL Function Script

Let’s break down the following example:

function sql_get_rg(args) {
    if ((args.value === null) || (args.value === undefined)) return null;
    return args.value.id.split("/")[4];
}

This simple function receives an Azure ID (column id) and returns the Resource Group, which is located at the fourth position of the Fully Qualified Name (FQN).

Arguments

DDL function scripts always receive a single args object, which is a key-value (KV) structure with the following members:

value:
An object containing the parameter column names as its members. Its general form is args.value.[col_name].
column:
The name of the column to which the function is applied. This always refers to the TABLE column name.

Note: When used with a different column name, such as sql_get_rg<SOME>, the column value will remain associated with the original field where the function is applied.
datatype:
The data type of the column to which the function is applied.

Example Explanation

args.value.id provides the id column value.
The function splits the Azure FQN (id) by / and returns the element at index 4, which corresponds to the Resource Group.
If the id is null or undefined, the function returns null to ensure robustness.

This structure ensures that DDL functions are flexible and can handle a wide range of transformations dynamically during query execution.

Naming DDL Functions and Resolution

When a function module is referenced in the descriptor bundle, it must have an associated module name, which is part of the fully qualified function name.

If the function name is unique across all modules, you can call the function without using the module name. However, if the function name is duplicated, you must use the module name. Otherwise, the function resolution process will fail due to an ambiguous name.

The fully qualified function name follows the format:
[module-name].[function-name]<params...>

Example:

... OPTIONS(call_func 'azure_mod.az_sql_get_rg<id>')

SQL Functions

SQL functions are first-class functions registered in the engine as User-Defined Functions (UDFs). They can be invoked within regular queries, just like any built-in SQL function.

Example:

SELECT sql_transform_id(id) FROM app_delegate.STORAGE_ACCOUNT;

SQL Function Script

Let’s break down the following example:

function sql_get_rg({ id }) {
    return id.split("/")[4];
}

How It Works:

The function receives a key-value (KV) object where the keys correspond to the parameters defined in the function signature (see below).
The engine injects values into the function based on their position in the query.

Example Query and Parameter Mapping:

SELECT sql_transform_id(id, primaryLocation) FROM app_delegate.STORAGE_ACCOUNT;

The id and primaryLocation parameters are injected into the function based on their positions in the function signature.

Example Signature:

- name: "sql_get_rg"
  parameters:
    id: string
    primaryLocation: string
  ...

Here, the sql_get_rg function is defined with two parameters:

id: A string representing the resource ID.
primaryLocation: A string representing the location parameter.

This allows the engine to map query arguments to the function’s parameters dynamically.

SQL Aggregation Functions

An aggregation function performs a calculation on a set of values and returns a single aggregated value, such as the well-known SUM, AVG, MAX, and MIN. These functions typically operate over grouped rows when a GROUP BY clause is used.

Aggregation functions are similar to regular SQL functions, differing primarily in the parameters they receive.

Example:

function sql_agg_just_count(values = []) {
    return values.length;
}

The function receives a single parameter, values, which is an array of all values to be aggregated.

Invocation:

The aggregation function can be invoked as follows:

SELECT sql_agg_just_count(id) FROM app_delegate.STORAGE_ACCOUNT;

The id column is passed as an array to the sql_agg_just_count function, which calculates the count of id values within the specified grouping or overall dataset.

📄 `bundle-sql-function-info.yaml`

This is the only file whose location must be the root directory of the module. See here the schema of the file.

DDL Functions example:

DDLFunctions:
  - name: "az_ddl_get_rg"
    description: "Extracts the resource group name from a given Azure resource ID."
    scriptFilePath: "fn/azure/ddl/az_ddl_get_rg.js"

Query Functions example:

queryFunctions:
  - name: "az_get_rg"
    description: "Extracts the resource group name from a given Azure resource ID."
    scriptFilePath: "fn/azure/query/sql_get_rg.js"
    parameters:
      id: string
    returnType: string
    isDeterministic: true

Query Aggregation Functions example:

queryFunctions:
  - name: "az_simple_agg"
    description: "Some simple aggregation function."
    scriptFilePath: "fn/azure/query/az_simple_agg.js"
    parameters:
      aggIt: string
    returnType: integer
    aggregationFunction: true

Ideally, a functions module should contain all the necessary functions for all the Data Sources. That is why it is highly recommended to keep them separate in directories.

Performance

Keep in mind that functions have a limited number of execution contexts, which are configurable via the partitions, minContexts, and maxContexts parameters in the bundle-sql-function-info.yaml configuration file. Each context can execute only one function at a time.

Example:

Consider the DDL function defined earlier. If the engine processes a query like:

SELECT resourceGroup FROM AZ_STORAGE_ACCOUNT;

and the query returns 100 rows, the az_ddl_get_rg function will be called 100 times—once for each row. This can potentially generate queues, especially if there are not enough contexts available.

Best Practices:

To ensure optimal performance:

Keep Functions Simple:
Functions should focus on basic transformations to minimize execution time.
Avoid Heavy Logic:
Avoid placing complex or resource-intensive logic in functions, as it can slow down query execution and increase the likelihood of queuing.
Tune Contexts:
Properly configure the partitions, minContexts, and maxContexts parameters to balance available resources and function execution demand.

By following these guidelines, you can reduce bottlenecks and ensure smoother query execution.

Script Template

SQL Functions Bundle Module

DDL and SQL functions

DDL Functions

Example:

Key Characteristics:

DDL Function Script

Arguments

Example Explanation

Naming DDL Functions and Resolution

SQL Functions

Example:

SQL Function Script

How It Works:

Example Query and Parameter Mapping:

Example Signature:

SQL Aggregation Functions

Example:

Invocation:

📄 bundle-sql-function-info.yaml

📂 Platform-related functions directories

Performance

Example:

Best Practices:

📄 `bundle-sql-function-info.yaml`