From 9be14b4cac9dcebbcb3b26011f683f0a1e383aa4 Mon Sep 17 00:00:00 2001 From: Victoria Lim Date: Fri, 24 Jan 2025 13:37:27 -0800 Subject: [PATCH 1/7] docs: Restore SQL function examples (#17293) * docs: add examples for SQL functions (#16745) * updating first batch of numeric functions * First batch of functions * addressing first few comments * alphabetize list * draft with suggestions applied * minor discrepency expr -> * changed raises to calculates * Update docs/querying/sql-functions.md * switch to underscore * changed to exp(1) to match slack message * adding html text for trademark symbol to .spelling * fixed discrepancy between description and example --------- Co-authored-by: Benedict Jin (cherry picked from commit 721a65046f643eda1fd17f0503252752d7546cc8) * [docs] batch02 of updating functions (#16761) * applying changes * ensuring batch is updated * Update docs/querying/sql-functions.md * raise -> raises * addressing review * Apply suggestions from code review Co-authored-by: Charles Smith --------- Co-authored-by: Benedict Jin Co-authored-by: Charles Smith (cherry picked from commit ca787885c966e38ebdebe7ee8c25671996918325) * [Docs] batch 03 - trig functions (#16795) * batch 03 - trig functions * Apply suggestions from code review Co-authored-by: Charles Smith * applying suggestions and corrections --------- Co-authored-by: Charles Smith (cherry picked from commit 028ee23a1e90446b45952ee78ff3fb0e7ef72123) * [Docs]Batch04 - Bitwise numeric functions (#16805) * Batch04 - Bitwise numeric functions * Batch04 - Bitwise numeric functions * minor fixes * rewording bitwise_shift functions * rewording bitwise_shift functions * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin (cherry picked from commit 85a8a1d805ecc112fa9d430762836b1218e8b4e2) * [docs] batch 5 updating functions (#16812) * batch 5 * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin (cherry picked from commit 3bb6d40285d41d9734cd39bce5c867ea8d978f21) * [Docs] Batch06: starting string functions (#16838) * batch06, starting string functions * addind space after Syntax * quick change * correcting spelling * Update docs/querying/sql-functions.md * Update sql-functions.md * applying suggestions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md --------- Co-authored-by: Benedict Jin Co-authored-by: Charles Smith (cherry picked from commit ebea34a814b397ffae79ab8a630e9fa2bff9cc62) * [Docs] Batch08: adding examples to string functions (#16871) * batch08 completed * reviewing batch08 * apply corrections suggestions by @FrankChen021 (cherry picked from commit 5b94839d9d9185faa3a14feb3748e62e16c77c50) * [Docs] Batch07: adding examples to string functions (#16862) * Lower,Upper,Lpad,Rpad,Parse_long * up to REGEXP_EXTRACT * batch 07 ready for review * updated definitions in scalar * Apply suggestions from code review Co-authored-by: Charles Smith * rpad and lpad * addressing comments * minor fixes * improving examples based on suggestions * matched -> matches * correcting typo * Apply suggestions from code review Co-authored-by: Charles Smith --------- Co-authored-by: Charles Smith (cherry picked from commit 725695342c977a597ba7819fe2e20f856a98f89e) * [Docs] Batch09: only `lookup` (#16878) * [Docs] Batch09: only `lookup` * slight changes * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * applying suggestiontions * Apply suggestions from code review Co-authored-by: Victoria Lim * otherwise null -> otherwise returns null * updating definition in sql-scalar.md * Apply suggestions from code review Co-authored-by: Charles Smith * hoping to re-run web checks * change replaceMissingValueWith -> defaultValue * Update docs/querying/sql-scalar.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * acronym_to_name -> airportcode_to_name * shortens `airportcode_to_name` to `code_to_name` --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim Co-authored-by: Charles Smith (cherry picked from commit fda2d19b8823918106f10b01ba9af54c5fe5a4e8) * [docs] Batch10 date and time functions (#16900) * just starting * TIME_PARSE and TIME_FORMAT remaining * fixing typo * adding last two functions * review sql-functions.md * Apply suggestions from code review Suggestions that were accepted as is Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-functions.md needed to confirm that it did indeed return as a number Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * reviewing remaining suggestions * addressing review for time_format * Apply suggestions from code review Accepted as is Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * addressing final suggestion * time_zone -> timezone * timezone fix --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> (cherry picked from commit c4981e34c47b54f3c57a38edbcb001a3b5e1bb6f) * [docs] batch 12: reduction functions (#16930) * [docs] batch 12: reduction functions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md * applying suggestions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Benedict Jin Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> (cherry picked from commit c49dc83b22e19b3fcdb0c7ac81896a32f0e5015e) * [docs] Batch13 IP functions (#16947) * new datasource * reviewing before pr * Update docs/querying/sql-functions.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Charles Smith * Applying suggestions to IPV4_PARSE --------- Co-authored-by: Benedict Jin Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Charles Smith (cherry picked from commit ed811262e302ad9aed90988cb2f885efeda90257) * [docs] Batch11 date and time functions (#16926) * first draft of functions * minor improvments * Update docs/querying/sql-functions.md * Update docs/querying/sql-scalar.md * Apply suggestions from code review Accepted as is Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * applying next round of suggestions * fixing missing column name * addressing floor and ceil functions * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * re-wording TIMESTAMPADD --------- Co-authored-by: Benedict Jin Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> (cherry picked from commit 2d9e92ce78e767c6d1e8e4fe83e9210a199f79ba) * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md Co-authored-by: Benedict Jin * [docs] Batches 14-16, 18: HLL, Theta, Quantiles, other (#93) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: edgar2020 Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo Co-authored-by: Charles Smith * batches 20 21 24 25 * fix unnest list * Add LISTAGG to spelling * cherry pick batch 21 * cherry pick batch 21 --------- Co-authored-by: Edgar Melendrez Co-authored-by: Edgar Melendrez Co-authored-by: Benedict Jin Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo Co-authored-by: Charles Smith --- docs/ingestion/schema-design.md | 2 +- docs/querying/sql-aggregations.md | 32 +- docs/querying/sql-array-functions.md | 28 +- docs/querying/sql-functions.md | 5178 ++++++++++++++--- .../sql-multivalue-string-functions.md | 8 +- docs/querying/sql-scalar.md | 6 +- website/.spelling | 1 + 7 files changed, 4413 insertions(+), 842 deletions(-) diff --git a/docs/ingestion/schema-design.md b/docs/ingestion/schema-design.md index 05c71151ca55..25b84a9a4d43 100644 --- a/docs/ingestion/schema-design.md +++ b/docs/ingestion/schema-design.md @@ -259,7 +259,7 @@ When performing type-aware schema discovery, Druid can discover all the columns the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`, `DOUBLE`, `ARRAY`, `ARRAY`, `ARRAY`, or `COMPLEX` for nested data. For input formats with native boolean types, Druid ingests these values as longs. Array typed columns can be queried using -the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested +the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql.md#unnest). Nested columns can be queried with the [JSON functions](../querying/sql-json-functions.md). Mixed type columns follow the same rules for schema differences between segments, and present as the _least_ restrictive diff --git a/docs/querying/sql-aggregations.md b/docs/querying/sql-aggregations.md index 4d252d821672..2af45a530e0a 100644 --- a/docs/querying/sql-aggregations.md +++ b/docs/querying/sql-aggregations.md @@ -37,6 +37,19 @@ sidebar_label: "Aggregation functions" You can use aggregation functions in the SELECT clause of any [Druid SQL](./sql.md) query. +In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and `STRING_AGG` accept the DISTINCT keyword. + +:::info + The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation + functions can produce inconsistent results across the same query. + + Functions that operate on an input type of "float" or "double" may also see these differences in aggregation + results across multiple query runs because of this. If precisely the same value is desired across multiple query runs, + consider using the `ROUND` function to smooth out the inconsistencies between queries. +::: + +## Filter aggregations + Filter any aggregator using the FILTER clause, for example: ``` @@ -56,16 +69,7 @@ When no rows are selected, aggregation functions return their initial value. Thi The initial value varies by aggregator. `COUNT` and the approximate count distinct sketch functions always return 0 as the initial value. -In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and `STRING_AGG` accept the DISTINCT keyword. - -:::info - The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation - functions can produce inconsistent results across the same query. - - Functions that operate on an input type of "float" or "double" may also see these differences in aggregation - results across multiple query runs because of this. If precisely the same value is desired across multiple query runs, - consider using the `ROUND` function to smooth out the inconsistencies between queries. -::: +## General aggregation functions |Function|Notes|Default| |--------|-----|-------| @@ -92,10 +96,8 @@ In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and |`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value of `expr`.
The latest value of `expr` is taken from the row with the overall latest non-null value of `timestampExpr`.
If the overall latest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.

If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to wasted memory.
If `maxBytesPerValue`is omitted; it defaults to `1024`.

Use `LATEST` instead of `LATEST_BY` on a table that has rollup enabled and was created with any variant of `EARLIEST`, `LATEST`, `EARLIEST_BY`, or `LATEST_BY`. In these cases, the intermediate type already stores the timestamp, and Druid ignores the value passed in `timestampExpr`. |`null`| |`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any value of `expr` including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including `null`).

If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to wasted memory.
If `maxBytesPerValue` is omitted; it defaults to `1024`. `aggregateMultipleValues` is an optional boolean flag controls the behavior of aggregating a [multi-value dimension](./multi-value-dimensions.md). `aggregateMultipleValues` is set as true by default and returns the stringified array in case of a multi-value dimension. By setting it to false, function will return first value instead. |`null`| |`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy dimension is included in a row, when using `GROUPING SETS`. Refer to [additional documentation](aggregations.md#grouping-aggregator) on how to infer this number.|N/A| -|`ARRAY_AGG(expr, [size])`|Collects all values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes). If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results within the output array may vary depending on processing order.|`null`| -|`ARRAY_AGG(DISTINCT expr, [size])`|Collects all distinct values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results will be based on the default for the element type.|`null`| -|`ARRAY_CONCAT_AGG(expr, [size])`|Concatenates all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes). Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not currently supported, and the ordering of results within the output array may vary depending on processing order.|`null`| -|`ARRAY_CONCAT_AGG(DISTINCT expr, [size])`|Concatenates all distinct values of all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not currently supported, and the ordering of results will be based on the default for the element type.|`null`| +|`ARRAY_AGG([DISTINCT] expr, [size])`|Collects all values of the specified expression into an array. To include only unique values, specify `DISTINCT`. `size` determines the maximum aggregation size in bytes and defaults to 1024 bytes. If the resulting array exceeds the size limit, the query fails. `ORDER BY` is not supported. The order of elements in the output array may vary depending on the processing order.|`null`| +|`ARRAY_CONCAT_AGG([DISTINCT] expr, [size])`|Concatenates array inputs into a single array. To include only unique values, specify `DISTINCT`. `expr` must be an array. `size` determines the maximum aggregation size in bytes and defaults to 1024 bytes. If the resulting array exceeds the size limit, the query fails. Druid ignores null array expressions, but null values within arrays are included in the output. `ORDER BY` is not supported. The order of elements in the output array may vary depending on the processing order.|`null`| |`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or all distinct values) of `expr` into a single STRING, ignoring null values. Each value is joined by an optional `separator`, which must be a literal STRING. If the `separator` is not provided, strings are concatenated without a separator.

An optional `size` in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `STRING_AGG` expression is not currently supported, and the ordering of results within the output string may vary depending on processing order.|`null`| |`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for `STRING_AGG`.|`null`| |`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null`| @@ -106,7 +108,7 @@ In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and These functions create sketch objects that you can use to perform fast, approximate analyses. For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.md#approx). -To operate on sketch objects, also see the [DataSketches post aggregator functions](sql-scalar.md#sketch-functions). +To operate on sketch objects, see the scalar [DataSketches post aggregator functions](sql-scalar.md#sketch-functions). ### HLL sketch functions diff --git a/docs/querying/sql-array-functions.md b/docs/querying/sql-array-functions.md index eaa7ebf50d78..5073fc3efc23 100644 --- a/docs/querying/sql-array-functions.md +++ b/docs/querying/sql-array-functions.md @@ -48,19 +48,19 @@ The following table describes array functions. To learn more about array aggrega |Function|Description| |--------|-----| -|`ARRAY[expr1, expr2, ...]`|Constructs a SQL `ARRAY` literal from the expression arguments, using the type of the first argument as the output array type.| -|`ARRAY_LENGTH(arr)`|Returns length of the array expression.| -|`ARRAY_OFFSET(arr, long)`|Returns the array element at the 0-based index supplied, or null for an out of range index.| -|`ARRAY_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| -|`ARRAY_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns true if `arr` contains `expr`. If `expr` is an array, returns true if `arr` contains all elements of `expr`. Otherwise returns false.| -|`ARRAY_OVERLAP(arr1, arr2)`|Returns true if `arr1` and `arr2` have any elements in common, else false.| -|`SCALAR_IN_ARRAY(expr, arr)`|Returns true if the scalar `expr` is present in `arr`. Otherwise, returns false if the scalar `expr` is non-null or `UNKNOWN` if the scalar `expr` is `NULL`.| +|`ARRAY[expr1, expr2, ...]`|Constructs a SQL `ARRAY` literal from the provided expression arguments. All arguments must be of the same type.| +|`ARRAY_APPEND(arr, expr)`|Appends the expression to the array. The source array type determines the resulting array type.| +|`ARRAY_CONCAT(arr1, arr2)`|Concatenates two arrays. The type of `arr1` determines the resulting array type.| +|`ARRAY_CONTAINS(arr, expr)`|Checks if the array contains the specified expression. If the specified expression is a scalar value, returns true if the source array contains the value. If the specified expression is an array, returns true if the source array contains all elements of the expression.| +|`ARRAY_LENGTH(arr)`|Returns the length of the array.| +|`ARRAY_OFFSET(arr, long)`|Returns the array element at the specified zero-based index. Returns null if the index is out of bounds.| |`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| +|`ARRAY_ORDINAL(arr, long)`|Returns the array element at the specified one-based index. Returns null if the index is out of bounds.| |`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| -|`ARRAY_PREPEND(expr, arr)`|Adds `expr` to the beginning of `arr`, the resulting array type determined by the type of `arr`.| -|`ARRAY_APPEND(arr, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr`.| -|`ARRAY_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| -|`ARRAY_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0-based index `start` (inclusive) to `end` (exclusive). Returns `null`, if `start` is less than 0, greater than length of `arr`, or greater than `end`.| -|`ARRAY_TO_STRING(arr, str)`|Joins all elements of `arr` by the delimiter specified by `str`.| -|`STRING_TO_ARRAY(str1, str2)`|Splits `str1` into an array on the delimiter specified by `str2`, which is a regular expression.| -|`ARRAY_TO_MV(arr)`|Converts an `ARRAY` of any type into a multi-value string `VARCHAR`.| +|`ARRAY_OVERLAP(arr1, arr2)`|Returns true if two arrays have any elements in common. Treats `NULL` values as known elements.| +|`ARRAY_PREPEND(expr, arr)`|Prepends the expression to the array. The source array type determines the resulting array type.| +|`ARRAY_SLICE(arr, start, end)`|Returns a subset of the array from the zero-based index `start` (inclusive) to `end` (exclusive). Returns null if `start` is less than 0, greater than the length of the array, or greater than `end`.| +|`ARRAY_TO_MV(arr)`|Converts an array of any type into a [multi-value string](sql-data-types.md#multi-value-strings).| +|`ARRAY_TO_STRING(arr, delimiter)`|Joins all elements of the array into a string using the specified delimiter.| +|`SCALAR_IN_ARRAY(expr, arr)`|Checks if the scalar value is present in the array. Returns false if the value is non-null, or `UNKNOWN` if the value is `NULL`. Returns `UNKNOWN` if the array is `NULL`.| +|`STRING_TO_ARRAY(string, delimiter)`|Splits the string into an array of substrings using the specified delimiter. The delimiter must be a valid regular expression.| diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 666e06d548d0..dff531885341 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -28,1668 +28,5236 @@ sidebar_label: "All functions" This document describes the SQL language. ::: - -This page provides a reference of all Druid SQL functions in alphabetical order. -Click the linked function type for documentation on a particular function. +This page provides a reference of Apache Druid® SQL functions in alphabetical order. For more details on a function, refer to the following: +* [Aggregation functions](sql-aggregations.md) +* [Array functions](sql-array-functions.md) +* [JSON functions](sql-json-functions.md) +* [Multi-value string functions](sql-multivalue-string-functions.md) +* [Scalar functions](sql-scalar.md) +* [Window functions](sql-window-functions.md) + +## Example data + +The examples on this page use the following example datasources: +* `array-example` created with [SQL-based ingestion](../multi-stage-query/index.md) +* `flight-carriers` using `FlightCarrierOnTime (1 month)` included with Druid +* `kttm` using `KoalasToTheMax one day` included with Druid +* `mvd-example` using [SQL-based ingestion](multi-value-dimensions.md#sql-based-ingestion) +* `taxi-trips` using `NYC Taxi cabs (3 files)` included with Druid + +To load a datasource included with Druid, +access the [web console](../operations/web-console.md) +and go to **Load data > Batch - SQL > Example data**. +Select **Connect data**, and parse using the default settings. +On the page to configure the schema, select the datasource label +and enter the name of the datasource listed above. + +Use the following query to create the `array-example` datasource: + +
Datasource for arrays + +```sql +REPLACE INTO "array-example" OVERWRITE ALL +WITH "ext" AS ( + SELECT * + FROM TABLE( + EXTERN( + '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}', + '{"type":"json"}' + ) + ) EXTEND ( + "timestamp" VARCHAR, + "label" VARCHAR, + "arrayString" VARCHAR ARRAY, + "arrayLong" BIGINT ARRAY, + "arrayDouble" DOUBLE ARRAY + ) +) +SELECT + TIME_PARSE("timestamp") AS "__time", + "label", + "arrayString", + "arrayLong", + "arrayDouble" +FROM "ext" +PARTITIONED BY DAY +``` + +
+ +Use the following query to create the `mvd-example` datasource: + +
Datasource for multi-value string dimensions + +```sql +REPLACE INTO "mvd-example" OVERWRITE ALL +WITH "ext" AS ( + SELECT * + FROM TABLE( + EXTERN( + '{"type":"inline","data":"{\"timestamp\": \"2011-01-12T00:00:00.000Z\", \"label\": \"row1\", \"tags\": [\"t1\",\"t2\",\"t3\"]}\n{\"timestamp\": \"2011-01-13T00:00:00.000Z\", \"label\": \"row2\", \"tags\": [\"t3\",\"t4\",\"t5\"]}\n{\"timestamp\": \"2011-01-14T00:00:00.000Z\", \"label\": \"row3\", \"tags\": [\"t5\",\"t6\",\"t7\"]}\n{\"timestamp\": \"2011-01-14T00:00:00.000Z\", \"label\": \"row4\", \"tags\": []}"}', + '{"type":"json"}', + '[{"name":"timestamp", "type":"STRING"},{"name":"label", "type":"STRING"},{"name":"tags", "type":"ARRAY"}]' + ) + ) +) +SELECT + TIME_PARSE("timestamp") AS "__time", + "label", + ARRAY_TO_MV("tags") AS "tags" +FROM "ext" +PARTITIONED BY DAY +``` + +
## ABS -`ABS()` +Calculates the absolute value of a numeric expression. -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +* **Syntax:** `ABS()` +* **Function type:** Scalar, numeric -Calculates the absolute value of a numeric expression. +
Example -## ACOS +The following example applies the ABS function to the `ArrDelay` column from the `flight-carriers` datasource. -`ACOS()` +```sql +SELECT + "ArrDelay" AS "arrival_delay", + ABS("ArrDelay") AS "absolute_arrival_delay" +FROM "flight-carriers" +WHERE "ArrDelay" < 0 +LIMIT 1 +``` +Returns the following: -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +| `arrival_delay` | `absolute_arrival_delay` | +| -- | -- | +| `-27` | `27` | +
-Calculates the arc cosine of a numeric expression. +[Learn more](sql-scalar.md#numeric-functions) -## ANY_VALUE +## ACOS -`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])` +Calculates the arc cosine (arccosine) of a numeric expression. -**Function type:** [Aggregation](sql-aggregations.md) +* **Syntax:** `ACOS(expr)` +* **Function type:** Scalar, numeric -Returns any value of the specified expression. +
Example -## APPROX_COUNT_DISTINCT +The following example calculates the arc cosine of `0`. -`APPROX_COUNT_DISTINCT(expr)` +```sql +SELECT ACOS(0) AS "arc_cosine" +``` +Returns the following: -**Function type:** [Aggregation](sql-aggregations.md) +| `arc_cosine` | +| -- | +| `1.5707963267948966` | -Counts distinct values of a regular column or a prebuilt sketch column. +
-`APPROX_COUNT_DISTINCT_BUILTIN(expr)` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Aggregation](sql-aggregations.md) +## ANY_VALUE -Counts distinct values of a string, numeric, or `hyperUnique` column using Druid's built-in `cardinality` or `hyperUnique` aggregators. +Returns any value of the specified expression. -## APPROX_COUNT_DISTINCT_DS_HLL +* **Syntax**: `ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])` +* **Function type:** Aggregation -`APPROX_COUNT_DISTINCT_DS_HLL(expr, [, ])` +
Example -**Function type:** [Aggregation](sql-aggregations.md) +The following example returns the state abbrevation, state name, and average flight time grouped by each state in `flight-carriers`: -Counts distinct values of an HLL sketch column or a regular column. +```sql +SELECT + "OriginState", + ANY_VALUE("OriginStateName") AS "OriginStateName", + AVG("ActualElapsedTime") AS "AverageFlightTime" +FROM "flight-carriers" +GROUP BY 1 +LIMIT 3 +``` -## APPROX_COUNT_DISTINCT_DS_THETA +Returns the following: -`APPROX_COUNT_DISTINCT_DS_THETA(expr, [])` +|`OriginState`|`OriginStateName`|`AverageFlightTime`| +|-------------|-----------------|-------------------| +|`AK`|`Alaska`|`113.2777967841259`| +|`AL`|`Alabama`|`92.28766697732215`| +|`AR`|`Arkansas`|`95.0391382405745`| -**Function type:** [Aggregation](sql-aggregations.md) +
-Counts distinct values of a Theta sketch column or a regular column. +[Learn more](sql-aggregations.md) -## APPROX_QUANTILE +## APPROX_COUNT_DISTINCT -`APPROX_QUANTILE(expr, , [])` +Counts distinct values of a regular column or a prebuilt sketch column using an approximate algorithm. -**Function type:** [Aggregation](sql-aggregations.md) +* **Syntax**: `APPROX_COUNT_DISTINCT(expr)` +* **Function type:** Aggregation -Deprecated in favor of `APPROX_QUANTILE_DS`. +
Example -## APPROX_QUANTILE_DS +The following example counts the number of distinct airlines reported in `flight-carriers`: -`APPROX_QUANTILE_DS(expr, , [])` +```sql +SELECT APPROX_COUNT_DISTINCT("Reporting_Airline") AS "num_airlines" +FROM "flight-carriers" +``` -**Function type:** [Aggregation](sql-aggregations.md) +Returns the following: -Computes approximate quantiles on a Quantiles sketch column or a regular numeric column. +| `num_airlines` | +| -- | +| `20` | -## APPROX_QUANTILE_FIXED_BUCKETS +
-`APPROX_QUANTILE_FIXED_BUCKETS(expr, , , , , [])` +[Learn more](sql-aggregations.md) -**Function type:** [Aggregation](sql-aggregations.md) +## APPROX_COUNT_DISTINCT_BUILTIN -Computes approximate quantiles on fixed buckets histogram column or a regular numeric column. +Counts distinct values of a string, numeric, or `hyperUnique` column using Druid's built-in `cardinality` or `hyperUnique` aggregators. +Consider using `APPROX_COUNT_DISTINCT_DS_HLL` instead, which offers better accuracy in many cases. -## ARRAY[] +* **Syntax**: `APPROX_COUNT_DISTINCT_BUILTIN(expr)` +* **Function type:** Aggregation -`ARRAY[expr1, expr2, ...]` +
Example -**Function type:** [Array](sql-array-functions.md) +The following example counts the number of distinct airlines reported in `flight-carriers`: -Constructs a SQL ARRAY literal from the expression arguments. The arguments must be of the same type. +```sql +SELECT APPROX_COUNT_DISTINCT_BUILTIN("Reporting_Airline") AS "num_airlines" +FROM "flight-carriers" +``` -## ARRAY_AGG +Returns the following: -`ARRAY_AGG([DISTINCT] expr, [])` +| `num_airlines` | +| -- | +| `20` | -**Function type:** [Aggregation](sql-aggregations.md) +
-Returns an array of all values of the specified expression. +[Learn more](sql-aggregations.md) -## ARRAY_APPEND +## APPROX_COUNT_DISTINCT_DS_HLL -`ARRAY_APPEND(arr1, expr)` +Returns the approximate number of distinct values in a HLL sketch column or a regular column. See [DataSketches HLL Sketch module](../development/extensions-core/datasketches-hll.md) for a description of optional parameters. -**Function type:** [Array](./sql-array-functions.md) +* **Syntax:** `APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])` +* **Function type:** Aggregation -Appends `expr` to `arr`, the resulting array type determined by the type of `arr1`. +
Example -## ARRAY_CONCAT +The following example returns the approximate number of distinct tail numbers in the `flight-carriers` datasource. -`ARRAY_CONCAT(arr1, arr2)` +```sql +SELECT APPROX_COUNT_DISTINCT_DS_HLL("Tail_Number") AS "estimate" +FROM "flight-carriers" +``` -**Function type:** [Array](./sql-array-functions.md) +Returns the following: -Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| +| `estimate` | +| -- | +| `4686` | -## ARRAY_CONCAT_AGG +
-`ARRAY_CONCAT_AGG([DISTINCT] expr, [])` +[Learn more](sql-aggregations.md) -**Function type:** [Aggregation](sql-aggregations.md) +## APPROX_COUNT_DISTINCT_DS_THETA -Concatenates array inputs into a single array. +Returns the approximate number of distinct values in a Theta sketch column or a regular column. See [DataSketches Theta Sketch module](../development/extensions-core/datasketches-theta#aggregator) for a description of optional parameters. -## ARRAY_CONTAINS +* **Syntax:** `APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])` +* **Function type:** Aggregation -`ARRAY_CONTAINS(arr, expr)` +
Example -**Function type:** [Array](./sql-array-functions.md) +The following example returns the approximate number of distinct tail numbers in the `Tail_Number` column of the `flight-carriers` datasource. -If `expr` is a scalar type, returns true if `arr` contains `expr`. If `expr` is an array, returns 1 if `arr` contains all elements of `expr`. Otherwise returns false. +```sql +SELECT APPROX_COUNT_DISTINCT_DS_THETA("Tail_Number") AS "estimate" +FROM "flight-carriers" +``` +Returns the following: -## ARRAY_LENGTH +| `estimate` | +| -- | +| `4667` | -`ARRAY_LENGTH(arr)` +
-**Function type:** [Array](./sql-array-functions.md) +[Learn more](sql-aggregations.md) -Returns length of the array expression. +## APPROX_QUANTILE -## ARRAY_OFFSET +:::info +Deprecated in favor of [`APPROX_QUANTILE_DS`](#approx_quantile_ds). +::: -`ARRAY_OFFSET(arr, long)` +* **Syntax:** `APPROX_QUANTILE(expr, probability, [k])` +* **Function type:** Aggregation -**Function type:** [Array](./sql-array-functions.md) +[Learn more](sql-aggregations.md) -Returns the array element at the 0-based index supplied, or null for an out of range index. +## APPROX_QUANTILE_DS -## ARRAY_OFFSET_OF +Computes approximate quantiles on a Quantiles sketch column or a regular numeric column. See [DataSketches Quantiles Sketch module](../development/extensions-core/datasketches-quantiles.md) for a description of parameters. -`ARRAY_OFFSET_OF(arr, expr)` +* **Syntax:** `APPROX_QUANTILE_DS(expr, probability, [k])` +* **Function type:** Aggregation -**Function type:** [Array](./sql-array-functions.md) +
Example -Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`. +The following example approximates the median of the `Distance` column from the `flight-carriers` datasource. The query may return a different approximation on each execution. -## ARRAY_ORDINAL +```sql +SELECT APPROX_QUANTILE_DS("Distance", 0.5, 128) AS "estimate_median" +FROM "flight-carriers" +``` -**Function type:** [Array](./sql-array-functions.md) +Returns a result similar to the following: -`ARRAY_ORDINAL(arr, long)` +| `estimate_median` | +| -- | +| `569` | -Returns the array element at the 1-based index supplied, or null for an out of range index. -## ARRAY_ORDINAL_OF +
-`ARRAY_ORDINAL_OF(arr, expr)` +[Learn more](sql-aggregations.md) -**Function type:** [Array](./sql-array-functions.md) +## APPROX_QUANTILE_FIXED_BUCKETS -Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`. +Computes approximate quantiles on fixed buckets histogram column or a regular numeric column. See [Fixed buckets histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram) for a description of parameters. -## ARRAY_OVERLAP +* **Syntax:** `APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])` +* **Function type:** Aggregation -`ARRAY_OVERLAP(arr1, arr2)` +
Example -**Function type:** [Array](./sql-array-functions.md) +The following example approximates the median of a histogram on the `Distance` column from the `flight-carriers` datasource. The histogram has 10 buckets, a lower limit of zero, an upper limit of 2500, and ignores outlier values. -Returns true if `arr1` and `arr2` have any elements in common, else false. +```sql +SELECT APPROX_QUANTILE_FIXED_BUCKETS("Distance", 0.5, 10, 0, 2500, 'ignore') AS "estimate_median" +FROM "flight-carriers" +``` -## SCALAR_IN_ARRAY +Returns the following: -`SCALAR_IN_ARRAY(expr, arr)` +| `estimate_median` | +| -- | +| `571.6983032226562` | -**Function type:** [Array](./sql-array-functions.md) +
-Returns true if the scalar `expr` is present in `arr`. Otherwise, returns false if the scalar `expr` is non-null or -`UNKNOWN` if the scalar `expr` is `NULL`. +[Learn more](sql-aggregations.md) -Returns `UNKNOWN` if `arr` is `NULL`. +## ARRAY -## ARRAY_PREPEND +Constructs a SQL `ARRAY` literal from the provided expression arguments. All arguments must be of the same type. -`ARRAY_PREPEND(expr, arr)` +* **Syntax**: `ARRAY[expr1, expr2, ...]` +* **Function type:** Array -**Function type:** [Array](./sql-array-functions.md) +
Example -Prepends `expr` to `arr` at the beginning, the resulting array type determined by the type of `arr`. +The following example constructs arrays from the values of the `agent_category`, `browser`, and `browser_version` columns in the `kttm` datasource. -## ARRAY_SLICE +```sql +SELECT ARRAY["agent_category", "browser", "browser_version"] AS "user_agent_details" +FROM "kttm" +LIMIT 5 +``` -`ARRAY_SLICE(arr, start, end)` +Returns the following: -**Function type:** [Array](./sql-array-functions.md) +| `user_agent_details` | +| -- | +| `["Personal computer","Chrome","76.0.3809.100"]` | +| `["Smartphone","Chrome Mobile","50.0.2661.89"]` | +| `["Personal computer","Chrome","76.0.3809.100"]` | +| `["Personal computer","Opera","62.0.3331.116"]` | +| `["Smartphone","Mobile Safari","12.0"]` | -Returns the subarray of `arr` from the 0-based index `start` (inclusive) to `end` (exclusive). Returns `null`, if `start` is less than 0, greater than length of `arr`, or greater than `end`. +
-## ARRAY_TO_MV +[Learn more](sql-array-functions.md) -`ARRAY_TO_MV(arr)` +## ARRAY_AGG -**Function type:** [Array](./sql-array-functions.md) +Returns an array of all values of the specified expression. To include only unique values, specify `DISTINCT`. -Converts an `ARRAY` of any type into a multi-value string `VARCHAR`. +* **Syntax**: `ARRAY_AGG([DISTINCT] expr, [size])` +* **Function type:** Aggregation -## ARRAY_TO_STRING +
Example -`ARRAY_TO_STRING(arr, str)` +The following example returns arrays of unique values from the `OriginState` column in the `flight-carriers` datasource, grouped by `Reporting_Airline`. -**Function type:** [Array](./sql-array-functions.md) +```sql +SELECT "Reporting_Airline", ARRAY_AGG(DISTINCT "OriginState", 50000) AS "Origin" +FROM "flight-carriers" +GROUP BY "Reporting_Airline" +LIMIT 5 +``` -Joins all elements of `arr` by the delimiter specified by `str`. +Returns the following: -## ASIN +| `Reporting_Airline` | `Origin` | +| -- | -- | +| `AA` |`["AL","AR","AZ","CA","CO","CT","FL","GA","HI","IL","IN","KS","KY","LA","MA","MD","MI","MN","MO","NC","NE","NJ","NM","NV","NY","OH","OK","OR","PA","PR","RI","TN","TX","UT","VA","VI","WA"]`| +| `AS` |`["AK","AZ","CA","CO","FL","ID","IL","MA","NJ","NV","OR","TX","VA","WA"]`| +| `B6` |`["AZ","CA","CO","FL","LA","MA","NJ","NV","NY","OR","PR","UT","VA","VT","WA"]`| +| `CO` |`["AK","AL","AZ","CA","CO","CT","FL","GA","HI","IL","IN","LA","MA","MD","MI","MN","MO","MS","NC","NE","NH","NJ","NM","NV","NY","OH","OK","OR","PA","PR","RI","SC","TN","TX","UT","VA","VI","WA"]`| +| `DH` |`["AL","CA","CT","FL","GA","IL","MA","ME","MI","NC","NH","NJ","NV","NY","OH","PA","RI","SC","TN","VA","VT","WA","WV"]`| -`ASIN()` +
-**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +[Learn more](sql-aggregations.md) -Calculates the arc sine of a numeric expression. +## ARRAY_APPEND -## ATAN +Appends the expression to the array. The source array type determines the resulting array type. -`ATAN()` +* **Syntax**: `ARRAY_APPEND(arr, expr)` +* **Function type:** Array -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Calculates the arc tangent of a numeric expression. +The following example appends `c` to the values in the `arrayString` column from the `array-example` datasource. -## ATAN2 +```sql +SELECT ARRAY_APPEND("arrayString",'c') AS "array_appended" +FROM "array-example" +``` -`ATAN2(, )` +Returns the following: -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +| `array_appended` | +| -- | +| `[a, b, c]` | +| `[null,"b","c"]`| +| `[c]` | +| `[a, b, c]`| +| `null` | -Calculates the arc tangent of the two arguments. +
-## AVG +[Learn more](sql-array-functions.md) -`AVG()` +## ARRAY_CONCAT -**Function type:** [Aggregation](sql-aggregations.md) +Concatenates two arrays. The type of `arr1` determines the resulting array type. -Calculates the average of a set of values. +* **Syntax**: `ARRAY_CONCAT(arr1, arr2)` +* **Function type:** Array -## BIT_AND +
Example -`BIT_AND(expr)` +The following example concatenates the arrays in the `arrayLong` and `arrayDouble` columns from the `array-example` datasource. -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT ARRAY_CONCAT("arrayLong", "arrayDouble") AS "arrayConcatenated" +FROM "array-example" +``` -Performs a bitwise AND operation on all input values. +Returns the following: -## BIT_OR +| `arrayConcatenated` | +| -- | +| `[1,null,3,1.1,2.2,null]` | +| `null`| +| `[1,2,3,null,2.2,1.1]` | +| `[1,2,3]`| +| `null` | -`BIT_OR(expr)` +
-**Function type:** [Aggregation](sql-aggregations.md) +[Learn more](sql-array-functions.md) -Performs a bitwise OR operation on all input values. +## ARRAY_CONCAT_AGG -## BIT_XOR +Concatenates array inputs into a single array. To include only unique values, specify `DISTINCT`. -`BIT_XOR(expr)` +* **Syntax**: `ARRAY_CONCAT_AGG([DISTINCT] expr, [size])` +* **Function type:** Aggregation -**Function type:** [Aggregation](sql-aggregations.md) +
Example -Performs a bitwise XOR operation on all input values. +The following example concatenates the array inputs from the `arrayDouble` column of the `array-example` datasource into a single array. -## BITWISE_AND +```sql +SELECT ARRAY_CONCAT_AGG( DISTINCT "arrayDouble") AS "array_concat_agg_distinct" +FROM "array-example" +``` -`BITWISE_AND(expr1, expr2)` +Returns the following: -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +| `array_concat_agg_distinct` | +| -- | +| `[null,1.1,2.2,5.5,999]` | -Returns the bitwise AND between the two expressions, that is, `expr1 & expr2`. +
-## BITWISE_COMPLEMENT +[Learn more](sql-aggregations.md) -`BITWISE_COMPLEMENT(expr)` +## ARRAY_CONTAINS -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +Checks if the array contains the specified expression. -Returns the bitwise NOT for the expression, that is, `~expr`. +### Scalar -## BITWISE_CONVERT_DOUBLE_TO_LONG_BITS +If the specified expression is a scalar value, returns true if the source array contains the value. -`BITWISE_CONVERT_DOUBLE_TO_LONG_BITS(expr)` +* **Syntax**: `ARRAY_CONTAINS(arr, expr)` +* **Function type:** Array -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Converts the bits of an IEEE 754 floating-point double value to a long. +The following example returns true if the `arraySring` column from the `array-example` datasource contains `2`. -## BITWISE_CONVERT_LONG_BITS_TO_DOUBLE +```sql +SELECT "arrayLong", ARRAY_CONTAINS("arrayLong", 2) AS "arrayContains" +FROM "array-example" +``` -`BITWISE_CONVERT_LONG_BITS_TO_DOUBLE(expr)` +Returns the following: -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +| `arrayLong` | `arrayContains` | +| -- | --| +| `[1,null,3]` | `false` | +| `null` | `null` | +| `[1,2,3]` | `true` | +| `[1,2,3]` | `true` | +| `[]` | `false` | -Converts a long to the IEEE 754 floating-point double specified by the bits stored in the long. +
-## BITWISE_OR +[Learn more](sql-array-functions.md) -`BITWISE_OR(expr1, expr2)` +### Array -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +If the specified expression is an array, returns true if the source array contains all elements of the expression. -Returns the bitwise OR between the two expressions, that is, `expr1 | expr2`. +* **Syntax**: `ARRAY_CONTAINS(arr, expr)` +* **Function type:** Array -## BITWISE_SHIFT_LEFT +
Example -`BITWISE_SHIFT_LEFT(expr1, expr2)` +The following example returns true if the `arrayLong` column from the `array-example` datasource contains all elements of the provided expression. -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +```sql +SELECT "label", "arrayLong", ARRAY_CONTAINS("arrayLong", ARRAY[1,2,3]) AS "arrayContains" +FROM "array-example" +``` -Returns a bitwise left shift of expr1, that is, `expr1 << expr2`. +Returns the following: -## BITWISE_SHIFT_RIGHT +| `label` | `arrayLong` | `arrayContains` | +| -- | -- | -- | +| `row1` | `[1,null,3]` | `false` | +| `row2`| `null` | `null` | +| `row3`| `[1,2,3]` | `true` | +| `row4`| `[1,2,3]` | `true` | +| `row5`| `[]` | `false` | -`BITWISE_SHIFT_RIGHT(expr1, expr2)` +
-**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +[Learn more](sql-array-functions.md) -Returns a bitwise right shift of expr1, that is, `expr1 >> expr2`. +## ARRAY_LENGTH -## BITWISE_XOR +Returns the length of the array. -`BITWISE_XOR(expr1, expr2)` +* **Syntax**: `ARRAY_LENGTH(arr)` +* **Function type:** Array -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Returns the bitwise exclusive OR between the two expressions, that is, `expr1 ^ expr2`. +The following example returns the length of array expressions in the `arrayDouble` column from the `array-example` datasource. -## BLOOM_FILTER +```sql +SELECT "arrayDouble" AS "array", ARRAY_LENGTH("arrayDouble") AS "arrayLength" +FROM "array-example" +``` -`BLOOM_FILTER(expr, )` +Returns the following: -**Function type:** [Aggregation](sql-aggregations.md) +| `larray` | `arrayLength` | +| -- | -- | +| `row1` | 3 | +| `row2`| 3 | +| `row3`| 3 | +| `row4`| 0 | +| `row5`| `null` | -Computes a Bloom filter from values produced by the specified expression. +
-## BLOOM_FILTER_TEST +[Learn more](sql-array-functions.md) -`BLOOM_FILTER_TEST(expr, )` +## ARRAY_OFFSET -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +Returns the array element at the specified zero-based index. Returns null if the index is out of bounds. -Returns true if the expression is contained in a Base64-serialized Bloom filter. +* **Syntax**: `ARRAY_OFFSET(arr, long)` +* **Function type:** Array -## BTRIM +
Example -`BTRIM(, [])` +The following example returns the element at the specified zero-based index from the arrays in the `arrayLong` column of the `array-example` datasource. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +```sql +SELECT "arrayLong" as "array", ARRAY_OFFSET("arrayLong", 2) AS "elementAtIndex" +FROM "array-example" +``` -Trims characters from both the leading and trailing ends of an expression. +Returns the following: -## CASE +| `array` | `elementAtIndex` | +| -- | -- | +| `[1,null,3]` | 3 | +| `null`| `null` | +| `[1,2,3]`| 3 | +| `[1,2,3]`| 3 | +| `[]`| `null` | -`CASE expr WHEN value1 THEN result1 \[ WHEN value2 THEN result2 ... \] \[ ELSE resultN \] END` +
-**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +[Learn more](sql-array-functions.md) -Returns a result based on a given condition. +## ARRAY_OFFSET_OF -## CAST +Returns the zero-based index of the first occurrence of the expression in the array. Returns null if the value isn't present, or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode). -`CAST(value AS TYPE)` +* **Syntax**: `ARRAY_OFFSET_OF(arr, expr)` +* **Function type:** Array -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +
Example -Converts a value into the specified data type. +The following example returns the zero-based index of the fist occurrence of `3` in the arrays in the `arrayLong` column of the `array-example` datasource. -## CEIL (date and time) +```sql +SELECT "arrayLong" as "array", ARRAY_OFFSET_OF("arrayLong", 3) AS "offset" +FROM "array-example" +``` -`CEIL( TO )` +Returns the following: -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +| `array` | `offset` | +| -- | -- | +| `[1,null,3]` | 2 | +| `null`| `null` | +| `[1,2,3]`| 2 | +| `[1,2,3]`| 2 | +| `[]`| `null` | -Rounds up a timestamp by a given time unit. +
-## CEIL (numeric) +[Learn more](sql-array-functions.md) -`CEIL()` +## ARRAY_ORDINAL -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +Returns the array element at the specified one-based index. Returns null if the index is out of bounds. -Calculates the smallest integer value greater than or equal to the numeric expression. +* **Syntax**: `ARRAY_ORDINAL(arr, long)` +* **Function type:** Array -## CHAR_LENGTH +
Example -`CHAR_LENGTH(expr)` +The following example returns the element at the specified one-based index from the arrays in the `arrayLong` column of the `array-example` datasource. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +```sql +SELECT "arrayLong" as "array", ARRAY_ORDINAL("arrayLong", 2) AS "elementAtIndex" +FROM "array-example" +``` -Alias for [`LENGTH`](#length). +Returns the following: -## CHARACTER_LENGTH +| `array` | `elementAtIndex` | +| -- | -- | +| `[1,null,3]` | `null` | +| `null`| `null` | +| `[1,2,3]`| 2 | +| `[1,2,3]`| 2 | +| `[]`| `null` | -`CHARACTER_LENGTH(expr)` +
-**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-array-functions.md) -Alias for [`LENGTH`](#length). +## ARRAY_ORDINAL_OF -## COALESCE +Returns the one-based index of the first occurrence of the expression in the array. Returns null if the value isn't present, or `-1` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode). -`COALESCE(expr, expr, ...)` +* **Syntax**: `ARRAY_ORDINAL_OF(arr, expr)` +* **Function type:** Array -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +
Example -Returns the first non-null value. +The following example returns the one-based index of the fist occurrence of `3` in the arrays in the `arrayLong` column of the `array-example` datasource. -## CONCAT +```sql +SELECT "arrayLong" as "array", ARRAY_ORDINAL_OF("arrayLong", 3) AS "ordinal" +FROM "array-example" +``` -`CONCAT(expr, expr...)` +Returns the following: -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +| `array` | `ordinal` | +| -- | -- | +| `[1,null,3]` | 3 | +| `null`| `null` | +| `[1,2,3]`| 3 | +| `[1,2,3]`| 3 | +| `[]`| `null` | -Concatenates a list of expressions. +
-## CONTAINS_STRING +[Learn more](sql-array-functions.md) -`CONTAINS_STRING(, )` +## ARRAY_OVERLAP -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +Returns true if two arrays have any elements in common. Treats `NULL` values as known elements. -Finds whether a string is in a given expression, case-sensitive. +* **Syntax**: `ARRAY_OVERLAP(arr1, arr2)` +* **Function type:** Array -## COS +
Example -`COS()` +The following example returns true if columns `arrayString` and `arrayDouble` from the `array-example` datasource have common elements. -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +```sql +SELECT "arrayString", "arrayDouble", ARRAY_OVERLAP("arrayString", "arrayDouble") AS "overlap" +FROM "array-example" +``` -Calculates the trigonometric cosine of an angle expressed in radians. +Returns the following: -## COT +| `arrayString` | `arrayDouble` | `overlap`| +| -- | -- | -- | +| `["a","b"]` | `[1.1,2.2,null]` | false | +| `[null,"b"]`| `[999,null,5.5]` | true | +| `[]`| `[null,2.2,1.1]` | false | +| `["a","b"]`| `[]` | false | +| `null`| `null` | `null` | -`COT()` +
-**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +[Learn more](sql-array-functions.md) -Calculates the trigonometric cotangent of an angle expressed in radians. +## SCALAR_IN_ARRAY -## COUNT +Checks if the scalar value is present in the array. Returns false if the value is non-null, or `UNKNOWN` if the value is `NULL`. Returns `UNKNOWN` if the array is `NULL`. -`COUNT([DISTINCT] expr)` +* **Syntax**: `SCALAR_IN_ARRAY(expr, arr)` +* **Function type:** Array -`COUNT(*)` +
Example -**Function type:** [Aggregation](sql-aggregations.md) +The following example returns true if the value `36` is present in the array generated from the elements in the `DestStateFips` column from the `flight-carriers` datasource. -Counts the number of rows. +```sql +SELECT "Reporting_Airline", ARRAY_AGG(DISTINCT "DestStateFips") AS "StateFipsArray", SCALAR_IN_ARRAY(36, ARRAY_AGG(DISTINCT "DestStateFips")) AS "ValueInArray" +FROM "flight-carriers" +GROUP BY "Reporting_Airline" +LIMIT 5 +``` -## CUME_DIST +Returns the following: -`CUME_DIST()` +| `Reporting_Airline` | `StateFipsArray` | `ValueInArray`| +| -- | -- | -- | +| `AA` | `[1,4,5,6,8,9,12,13,15,17,18,20,21,22,24,25,26,27,29,31,32,34,35,36,37,39,40,41,42,44,47,48,49,51,53,72,78]` | true | +| `AS`| `[2,4,6,8,12,16,17,25,32,34,41,48,51,53]` | false | +| `B6`| `[4,6,8,12,22,25,32,34,36,41,49,50,51,53,72]` | true | +| `CO`| `[1,2,4,6,8,9,12,13,15,17,18,22,24,25,26,27,28,29,31,32,33,34,35,36,37,39,40,41,42,44,45,47,48,49,51,53,72,78]` | true | +| `DH`| `[1,6,9,12,13,17,23,25,26,32,33,34,36,37,39,42,44,45,47,50,51,53,54]` | true | -**Function type:** [Window](sql-window-functions.md#window-function-reference) +
-Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`. The return value ranges between `1/number of rows` and 1. +[Learn more](sql-array-functions.md) -## CURRENT_DATE +## ARRAY_PREPEND -`CURRENT_DATE` +Prepends the expression to the array. The source array type determines the resulting array type. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +* **Syntax**: `ARRAY_PREPEND(expr, arr)` +* **Function type:** Array -Returns the current date in the connection's time zone. +
Example -## CURRENT_TIMESTAMP +The following example prepends `c` to the arrays in the `arrayString` column from the `array-example` datasource. -`CURRENT_TIMESTAMP` +```sql +SELECT ARRAY_PREPEND('c', "arrayString") AS "arrayPrepended" +FROM "array-example" +``` -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +Returns the following: -Returns the current timestamp in the connection's time zone. +| `arrayPrepended` | +| -- | +| `[c, a, b]` | +| `["c",null,"b"]`| +| `[c]`| +| `[c,a,b]`| +| `null`| -## DATE_TRUNC +
-`DATE_TRUNC(, )` +[Learn more](sql-array-functions.md) -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +## ARRAY_SLICE -Rounds down a timestamp by a given time unit. +Returns a subset of the array from the zero-based index `start` (inclusive) to `end` (exclusive). Returns null if `start` is less than 0, greater than the length of the array, or greater than `end`. -## DECODE_BASE64_COMPLEX +* **Syntax**: `ARRAY_SLICE(arr, start, end)` +* **Function type:** Array -`DECODE_BASE64_COMPLEX(dataType, expr)` +
Example -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +The following example constructs a new array from the elements of arrays in the `arrayDouble` column from the `array-example` datasource. -Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. +```sql +SELECT "arrayDouble", ARRAY_SLICE("arrayDouble", 0, 2) AS "arrayNew" +FROM "array-example" +``` -## DECODE_BASE64_UTF8 +Returns the following: -`DECODE_BASE64_UTF8(expr)` +| `arrayDouble` | `arrayNew` | +| -- | -- | +| `[1.1,2.2,null]` | `[1.1,2.2]` | +| `[999,null,5.5]`| `[999,null]` | +| `[null,2.2,1.1]`| `[null,2.2]` | +| `[]`| `[null,null]` | +| `null`| `null` | -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +
+[Learn more](sql-array-functions.md) -Decodes a Base64-encoded string into a UTF-8 encoded string. +## ARRAY_TO_MV -## DEGREES +Converts an array of any type into a [multi-value string](sql-data-types.md#multi-value-strings). -`DEGREES()` +* **Syntax**: `ARRAY_TO_MV(arr)` +* **Function type:** Array -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Converts an angle from radians to degrees. +The following example converts the arrays in the `arrayDouble` column from the `array-example` datasource into multi-value strings. -## DENSE_RANK +```sql +SELECT ARRAY_TO_MV("arrayDouble") AS "multiValueString" +FROM "array-example" +``` -`DENSE_RANK()` +Returns the following: -**Function type:** [Window](sql-window-functions.md#window-function-reference) +| `multiValueString` | +| -- | +| `["1.1","2.2",null]` | +| `["999.0",null,"5.5"]`| +| `[null,"2.2","1.1"]`| +| `[]`| +| `null`| -Returns the rank for a row within a window without gaps. For example, if two rows tie for a rank of 1, the subsequent row is ranked 2. +
-## DIV +[Learn more](sql-array-functions.md) -`DIV(x, y)` +## ARRAY_TO_STRING -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +Joins all elements of the array into a string using the specified delimiter. -Returns the result of integer division of `x` by `y`. +* **Syntax**: `ARRAY_TO_STRING(arr, delimiter)` +* **Function type:** Array -:::info -The `DIV` function is not implemented in Druid versions 30.0.0 or earlier. Consider using [`SAFE_DIVIDE`](./sql-functions.md#safe_divide) instead. -::: +
Example -## DS_CDF +The following example converts the arrays in the `arrayDouble` column of the `array-example` datasource into concatenated strings. -`DS_CDF(expr, splitPoint0, splitPoint1, ...)` +```sql +SELECT ARRAY_TO_STRING("arrayDouble", '') AS "notSeparated" +FROM "array-example" +``` -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +Returns the following: -Returns a string representing an approximation to the Cumulative Distribution Function given the specified bin definition. +| `multiValueString` | +| -- | +| `1.12.2null` | +| `999.0null5.5` | +| `null2.21.1` | +| ` ` | +| `null`| -## DS_GET_QUANTILE +
-`DS_GET_QUANTILE(expr, fraction)` +[Learn more](sql-array-functions.md) -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +## ASIN -Returns the quantile estimate corresponding to `fraction` from a quantiles sketch. +Calculates the arc sine (arcsine) of a numeric expression. -## DS_GET_QUANTILES +* **Syntax:** `ASIN(expr)` +* **Function type:** Scalar, numeric -`DS_GET_QUANTILES(expr, fraction0, fraction1, ...)` +
Example -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +The following example calculates the arc sine of `1`. -Returns a string representing an array of quantile estimates corresponding to a list of fractions from a quantiles sketch. +```sql +SELECT ASIN(1) AS "arc_sine" +``` +Returns the following: -## DS_HISTOGRAM +| `arc_sine` | +| -- | +| `1.5707963267948966` | +
-`DS_HISTOGRAM(expr, splitPoint0, splitPoint1, ...)` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +## ATAN -Returns a string representing an approximation to the histogram given the specified bin definition. +Calculates the arc tangent (arctangent) of a numeric expression. +* **Syntax:** `ATAN(expr)` +* **Function type:** Scalar, numeric -## DS_HLL +
Example -`DS_HLL(expr, [lgK, tgtHllType])` +The following example calculates the arc tangent of `1`. -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT ATAN(1) AS "arc_tangent" +``` +Returns the following: -Creates an HLL sketch on a column containing HLL sketches or a regular column. +| `arc_tangent` | +| -- | +| `0.7853981633974483` | +
-## DS_QUANTILE_SUMMARY +[Learn more](sql-scalar.md#numeric-functions) -`DS_QUANTILE_SUMMARY(expr)` +## ATAN2 -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +Calculates the arc tangent (arctangent) of a specified x and y coordinate. -Returns a string summary of a quantiles sketch. +* **Syntax:** `ATAN2(x, y)` +* **Function type:** Scalar, numeric -## DS_QUANTILES_SKETCH +
Example -`DS_QUANTILES_SKETCH(expr, [k])` +The following example calculates the arc tangent of the coordinate `(1, -1)` -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT ATAN2(1,-1) AS "arc_tangent_2" +``` +Returns the following: -Creates a Quantiles sketch on a column containing Quantiles sketches or a regular column. +| `arc_tangent_2` | +| -- | +| `2.356194490192345` | +
-## DS_RANK +[Learn more](sql-scalar.md#numeric-functions) -`DS_RANK(expr, value)` +## AVG -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +Calculates the average of a set of values. -Returns an approximate rank between 0 and 1 of a given value, in which the rank signifies the fraction of the distribution less than the given value. +* **Syntax**: `AVG()` +* **Function type:** Aggregation -## DS_THETA -`DS_THETA(expr, [size])` +
Example -**Function type:** [Aggregation](sql-aggregations.md) +The following example calculates the average minutes of delay for a particular airlines in `flight-carriers`: -Creates a Theta sketch on a column containing Theta sketches or a regular column. +```sql +SELECT AVG("DepDelayMinutes") AS avg_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` -## DS_TUPLE_DOUBLES +Returns the following: -`DS_TUPLE_DOUBLES(expr, [nominalEntries])` +| `avg_delay` | +| -- | +| `8.936` | -`DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])` +
-**Function type:** [Aggregation](sql-aggregations.md) +[Learn more](sql-aggregations.md) -Creates a Tuple sketch which contains an array of double values as the Summary Object. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). +## BIT_AND -## DS_TUPLE_DOUBLES_INTERSECT +Performs a bitwise AND operation on all input values. -`DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])` +* **Syntax**: `BIT_AND(expr)` +* **Function type:** Aggregation -**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions) +
Example -Returns an intersection of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). +The following example returns the bitwise AND operation for all values in `passenger-count` from `taxi-trips`: -## DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE +```sql +SELECT + BIT_AND("passenger_count") AS "bit_and" +FROM "taxi-trips" +``` -`DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)` +Returns the following: -**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions) +| `bit_and` | +| -- | +| `0` | -Computes approximate sums of the values contained within a Tuple sketch which contains an array of double values as the Summary Object. +
-## DS_TUPLE_DOUBLES_NOT +[Learn more](sql-aggregations.md) -`DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])` +## BIT_OR -**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions) +Performs a bitwise OR operation on all input values. -Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). +* **Syntax**: `BIT_OR(expr)` +* **Function type:** Aggregation -## DS_TUPLE_DOUBLES_UNION +
Example -`DS_TUPLE_DOUBLES_UNION(expr, ..., [nominalEntries])` +The following example returns the bitwise OR operation for all values in `passenger-count` from `taxi-trips`: -**Function type:** [Scalar, sketch](sql-scalar.md#tuple-sketch-functions) +```sql +SELECT + BIT_OR("passenger_count") AS "bit_or" +FROM "taxi-trips" +``` -Returns a union of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). +Returns the following: -## EARLIEST +| `bit_or` | +| -- | +| `15` | -`EARLIEST(expr, [maxBytesPerValue])` +
-**Function type:** [Aggregation](sql-aggregations.md) +[Learn more](sql-aggregations.md) -Returns the value of a numeric or string expression corresponding to the earliest `__time` value. +## BIT_XOR -## EARLIEST_BY +Performs a bitwise XOR operation on all input values. -`EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])` +* **Syntax**: `BIT_XOR(expr)` +* **Function type:** Aggregation -**Function type:** [Aggregation](sql-aggregations.md) +
Example -Returns the value of a numeric or string expression corresponding to the earliest time value from `timestampExpr`. +The following example returns the bitwise XOR operation for all values in `passenger-count` from `taxi-trips`: -## EXP +```sql +SELECT + BIT_OR("passenger_count") AS "bit_xor" +FROM "taxi-trips" +``` -`EXP()` +Returns the following: -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +| `bit_xor` | +| -- | +| `6` | -Calculates _e_ raised to the power of the numeric expression. +
-## EXTRACT +[Learn more](sql-aggregations.md) -`EXTRACT( FROM )` +## BITWISE_AND -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +Returns the bitwise AND between two expressions: `expr1 & expr2`. -Extracts the value of some unit of the timestamp, optionally from a certain time zone, and returns the number. +* **Syntax:** `BITWISE_AND(expr1, expr2)` +* **Function type:** Scalar, numeric -## FIRST_VALUE +
Example -`FIRST_VALUE(expr)` +The following example performs the bitwise AND operation `12 & 10`. -**Function type:** [Window](sql-window-functions.md#window-function-reference) +```sql +SELECT BITWISE_AND(12, 10) AS "bitwise_and" +``` +Returns the following: -Returns the value evaluated for the expression for the first row within the window. +| `bitwise_and` | +| -- | +| 8 | -## FLOOR (date and time) +
-`FLOOR( TO )` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +## BITWISE_COMPLEMENT -Rounds down a timestamp by a given time unit. +Returns the bitwise complement (bitwise not) for the expression: `~expr`. -## FLOOR (numeric) +* **Syntax:** `BITWISE_COMPLEMENT(expr)` +* **Function type:** Scalar, numeric -`FLOOR()` +
Example -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +The following example performs the bitwise complement operation `~12`. -Calculates the largest integer value less than or equal to the numeric expression. +```sql +SELECT BITWISE_COMPLEMENT(12) AS "bitwise_complement" +``` +Returns the following: -## GREATEST +| `bitwise_complement` | +| -- | +| -13 | -`GREATEST([expr1, ...])` +
-**Function type:** [Scalar, reduction](sql-scalar.md#reduction-functions) +[Learn more](sql-scalar.md#numeric-functions) -Returns the maximum value from the provided arguments. +## BITWISE_CONVERT_DOUBLE_TO_LONG_BITS -## GROUPING +Converts the bits of an IEEE 754 floating-point double value to long. -`GROUPING(expr, expr...)` +* **Syntax:**`BITWISE_CONVERT_DOUBLE_TO_LONG_BITS(expr)` +* **Function type:** Scalar, numeric -**Function type:** [Aggregation](sql-aggregations.md) +
Example -Returns a number for each output row of a groupBy query, indicating whether the specified dimension is included for that row. +The following example returns the IEEE 754 floating-point double representation of `255` as a long. -## HLL_SKETCH_ESTIMATE +```sql +SELECT BITWISE_CONVERT_DOUBLE_TO_LONG_BITS(255) AS "ieee_754_double_to_long" +``` +Returns the following: -`HLL_SKETCH_ESTIMATE(expr, [round])` +| `ieee_754_double_to_long` | +| -- | +| `4643176031446892544` | -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +
-Returns the distinct count estimate from an HLL sketch. +[Learn more](sql-scalar.md#numeric-functions) -## HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS -`HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, [numStdDev])` +## BITWISE_CONVERT_LONG_BITS_TO_DOUBLE -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +Converts a long to the IEEE 754 floating-point double specified by the bits stored in the long. -Returns the distinct count estimate and error bounds from an HLL sketch. +* **Syntax:**`BITWISE_CONVERT_LONG_BITS_TO_DOUBLE(expr)` +* **Function type:** Scalar, numeric -## HLL_SKETCH_TO_STRING +
Example -`HLL_SKETCH_TO_STRING(expr)` +The following example returns the long representation of `4643176031446892544` as an IEEE 754 floating-point double. -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +```sql +SELECT BITWISE_CONVERT_LONG_BITS_TO_DOUBLE(4643176031446892544) AS "long_to_ieee_754_double" +``` +Returns the following: -Returns a human-readable string representation of an HLL sketch. +| `long_to_ieee_754_double` | +| -- | +| `255` | -## HLL_SKETCH_UNION +
-`HLL_SKETCH_UNION([lgK, tgtHllType], expr0, expr1, ...)` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +## BITWISE_OR -Returns a union of HLL sketches. +Returns the bitwise OR between the two expressions: `expr1 | expr2`. -## HUMAN_READABLE_BINARY_BYTE_FORMAT +* **Syntax:** `BITWISE_OR(expr1, expr2)` +* **Function type:** Scalar, numeric -`HUMAN_READABLE_BINARY_BYTE_FORMAT(value[, precision])` +
Example -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +The following example performs the bitwise OR operation `12 | 10`. -Converts an integer byte size into human-readable IEC format. +```sql +SELECT BITWISE_OR(12, 10) AS "bitwise_or" +``` +Returns the following: -## HUMAN_READABLE_DECIMAL_BYTE_FORMAT +| `bitwise_or` | +| -- | +| `14` | -`HUMAN_READABLE_DECIMAL_BYTE_FORMAT(value[, precision])` +
-**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +[Learn more](sql-scalar.md#numeric-functions) -Converts a byte size into human-readable SI format. +## BITWISE_SHIFT_LEFT -## HUMAN_READABLE_DECIMAL_FORMAT +Returns the bitwise left shift by x positions of an expr: `expr << x`. -`HUMAN_READABLE_DECIMAL_FORMAT(value[, precision])` +* **Syntax:** `BITWISE_SHIFT_LEFT(expr, x)` +* **Function type:** Scalar, numeric -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Converts a byte size into human-readable SI format with single-character units. +The following example performs the bitwise SHIFT operation `2 << 3`. -## ICONTAINS_STRING +```sql +SELECT BITWISE_SHIFT_LEFT(2, 3) AS "bitwise_shift_left" +``` +Returns the following: -`ICONTAINS_STRING(, str)` +| `bitwise_shift_left` | +| -- | +| `16` | -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +
-Finds whether a string is in a given expression, case-insensitive. +[Learn more](sql-scalar.md#numeric-functions) -## IPV4_MATCH +## BITWISE_SHIFT_RIGHT -`IPV4_MATCH(address, subnet)` +Returns the bitwise right shift by x positions of an expr: `expr >> x`. -**Function type:** [Scalar, IP address](sql-scalar.md#ip-address-functions) +* **Syntax:** `BITWISE_SHIFT_RIGHT(expr, x)` +* **Function type:** Scalar, numeric -Returns true if the IPv4 `address` belongs to the `subnet` literal, else false. +
Example -## IPV4_PARSE +The following example performs the bitwise SHIFT operation `16 >> 3`. -`IPV4_PARSE(address)` +```sql +SELECT BITWISE_SHIFT_RIGHT(16, 3) AS "bitwise_shift_right" +``` +Returns the following: -**Function type:** [Scalar, IP address](sql-scalar.md#ip-address-functions) +| `bitwise_shift_right` | +| -- | +| `2` | -Parses `address` into an IPv4 address stored as an integer. +
-## IPV4_STRINGIFY +[Learn more](sql-scalar.md#numeric-functions) -`IPV4_STRINGIFY(address)` +## BITWISE_XOR -**Function type:** [Scalar, IP address](sql-scalar.md#ip-address-functions) +Returns the bitwise exclusive OR between the two expressions: `expr1 ^ expr2`. -Converts `address` into an IPv4 address in dot-decimal notation. +* **Syntax:** `BITWISE_XOR(expr1, expr2)` +* **Function type:** Scalar, numeric -## IPV6_MATCH +
Example -`IPV6_MATCH(address, subnet)` +The following example performs the bitwise XOR operation `12 ^ 10`. -**Function type:** [Scalar, IP address](sql-scalar.md#ip-address-functions) +```sql +SELECT BITWISE_XOR(12, 10) AS "bitwise_xor" +``` +Returns the following: -Returns true if the IPv6 `address` belongs to the `subnet` literal, else false. +| `bitwise_xor` | +| -- | +| `6` | -## JSON_KEYS +
-**Function type:** [JSON](sql-json-functions.md) +[Learn more](sql-scalar.md#numeric-functions) -`JSON_KEYS(expr, path)` +## BLOOM_FILTER -Returns an array of field names from `expr` at the specified `path`. +Computes a Bloom filter from values produced by the specified expression. -## JSON_MERGE +* **Syntax**: `BLOOM_FILTER(expr, )` +* **Function type:** Aggregation -**Function type:** [JSON](sql-json-functions.md) +[Learn more](sql-aggregations.md) -`JSON_MERGE(expr1, expr2[, expr3 ...])` -Merges two or more JSON `STRING` or `COMPLEX` into one. Preserves the rightmost value when there are key overlaps. Returning always a `COMPLEX` type. +## BLOOM_FILTER_TEST -## JSON_OBJECT +Returns true if the expression is contained in a Base64-serialized Bloom filter. -**Function type:** [JSON](sql-json-functions.md) +* **Syntax**: `BLOOM_FILTER_TEST(expr, )` +* **Function type:** Scalar, other -`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` +[Learn more](sql-scalar.md#other-scalar-functions) -Constructs a new `COMPLEX` object. The `KEY` expressions must evaluate to string types. The `VALUE` expressions can be composed of any input type, including other `COMPLEX` values. `JSON_OBJECT` can accept colon-separated key-value pairs. The following syntax is equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`. +## BTRIM -## JSON_PATHS +Trims characters from both the leading and trailing ends of an expression. Defaults `chars` to a space if none is provided. -**Function type:** [JSON](sql-json-functions.md) +* **Syntax:** `BTRIM(expr[, chars])` +* **Function type:** Scalar, string -`JSON_PATHS(expr)` +
Example -Returns an array of all paths which refer to literal values in `expr` in JSONPath format. +The following example trims the `_` characters from both ends of the string expression. -## JSON_QUERY +```sql +SELECT + '___abc___' AS "original_string", + BTRIM('___abc___', '_') AS "trim_both_ends" +``` -**Function type:** [JSON](sql-json-functions.md) +Returns the following: -`JSON_QUERY(expr, path)` +| `original_string` | `trim_both_ends` | +| -- | -- | +| `___abc___` | `abc` | -Extracts a `COMPLEX` value from `expr`, at the specified `path`. +
-## JSON_QUERY_ARRAY +[Learn more](sql-scalar.md#string-functions) -**Function type:** [JSON](sql-json-functions.md) +## CASE -`JSON_QUERY_ARRAY(expr, path)` +Returns a result based on given conditions. -Extracts an `ARRAY>` value from `expr` at the specified `path`. If value is not an `ARRAY`, it gets translated into a single element `ARRAY` containing the value at `path`. The primary use of this function is to extract arrays of objects to use as inputs to other [array functions](./sql-array-functions.md). +### Simple CASE -## JSON_VALUE +Compares an expression to a set of values or expressions. -**Function type:** [JSON](sql-json-functions.md) +* **Syntax:** `CASE expr WHEN value1 THEN result1 \[ WHEN value2 THEN result2 ... \] \[ ELSE resultN \] END` +* **Function type:** Scalar, other -`JSON_VALUE(expr, path [RETURNING sqlType])` +
Example -Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`. +The following example returns a UI type based on the value of `agent_category` from the `kttm` datasource. -## LAG +```sql +SELECT "agent_category" AS "device_type", +CASE "agent_category" + WHEN 'Personal computer' THEN 'Large UI' + WHEN 'Smartphone' THEN 'Mobile UI' + ELSE 'other' +END AS "UI_type" +FROM "kttm" +LIMIT 2 +``` -`LAG(expr[, offset])` +Returns the following: -**Function type:** [Window](sql-window-functions.md#window-function-reference) +| `device_type` | `UI_type` | +| -- | -- | +| `Personal computer` | `Large UI` | +| `Smartphone` | `Mobile UI` | -If you do not supply an `offset`, returns the value evaluated at the row preceding the current row. Specify an offset number `n` to return the value evaluated at `n` rows preceding the current one. +
-## LAST_VALUE +[Lean more](sql-scalar.md#other-scalar-functions) -`LAST_VALUE(expr)` +### Searched CASE -**Function type:** [Window](sql-window-functions.md#window-function-reference) +Evaluates a set of Boolean expressions. -Returns the value evaluated for the expression for the last row within the window. +* **Syntax:** `CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END` +* **Function type:** Scalar, other -## LATEST +
Example -`LATEST(expr, [maxBytesPerValue])` +The following example returns the departure location corresponding to the value of the `OriginStateName` column from the `flight-carriers` datasource. -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT "OriginStateName" AS "flight_origin", +CASE + WHEN "OriginStateName" = 'Puerto Rico' THEN 'U.S. Territory' + WHEN "OriginStateName" = 'U.S. Virgin Islands' THEN 'U.S. Territory' + ELSE 'U.S. State' +END AS "state_status" +FROM "flight-carriers" +LIMIT 2 +``` -Returns the value of a numeric or string expression corresponding to the latest `__time` value. +Returns the following: -## LATEST_BY +| `flight_origin` | `departure_location` | +| -- | -- | +| `Puerto Rico` | `U.S. Territory` | +| `Massachusetts` | `U.S. State` | -`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])` +
-**Function type:** [Aggregation](sql-aggregations.md) +[Lean more](sql-scalar.md#other-scalar-functions) -Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`. +## CAST -## LEAD +Converts a value into the specified data type. -`LEAD(expr[, offset])` +* **Syntax:** `CAST(value AS TYPE)` +* **Function type:** Scalar, other -**Function type:** [Window](sql-window-functions.md#window-function-reference) +
Example -If you do not supply an `offset`, returns the value evaluated at the row following the current row. Specify an offset number `n` to return the value evaluated at `n` rows following the current one; if there is no such row, returns the given default value. +The following example converts the values in the `Distance` column from the `flight-carriers` datasource from `DOUBLE` to `VARCHAR`. -## LEAST +```sql +SELECT "Distance" AS "original_column", + CAST("Distance" AS VARCHAR) "cast_to_string" +FROM "flight-carriers" +LIMIT 1 +``` -`LEAST([expr1, ...])` +Returns the following: -**Function type:** [Scalar, reduction](sql-scalar.md#reduction-functions) +| `original_column` | `cast_to_string` | +| -- | -- | +| `1571` | `1571.0` | -Returns the minimum value from the provided arguments. +
-## LEFT +[Learn more](sql-scalar.md#other-scalar-functions) -`LEFT(expr, [length])` +## CEIL -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +### Date and time -Returns the leftmost number of characters from an expression. +Rounds up a timestamp by a given time unit. -## LENGTH +* **Syntax:** `CEIL(timestamp_expr TO unit>)` +* **Function type:** Scalar, date and time -`LENGTH(expr)` +
Example -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +The following example rounds up the `__time` column from the `taxi-trips` datasource to the nearest year. -Returns the length of the expression in UTF-16 encoding. +```sql +SELECT + "__time" AS "original_time", + CEIL("__time" TO YEAR) AS "ceiling" +FROM "taxi-trips" +LIMIT 1 +``` -## LN +Returns the following: -`LN(expr)` +| `original_time` | `ceiling` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2014-01-01T00:00:00.000Z` | -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
-Calculates the natural logarithm of the numeric expression. +[Learn more](sql-scalar.md#date-and-time-functions) -## LOG10 +### Numeric -`LOG10(expr)` +Calculates the smallest integer value greater than or equal to the numeric expression. +* **Syntax:** `CEIL()` +* **Function type:** Scalar, numeric -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
Example -Calculates the base-10 of the numeric expression. +The following example applies the CEIL function to the `fare_amount` column from the `taxi-trips` datasource. -## LOOKUP +```sql +SELECT + "fare_amount" AS "fare_amount", + CEIL("fare_amount") AS "ceiling_fare_amount" +FROM "taxi-trips" +LIMIT 1 +``` +Returns the following: -`LOOKUP(, [, ])` +| `fare_amount` | `ceiling_fare_amount` | +| -- | -- | +| `21.25` | `22` | +
-**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-scalar.md#numeric-functions) -Looks up the expression in a registered query-time lookup table. +## CHAR_LENGTH -## LOWER +Alias for [`LENGTH`](#length). -`LOWER(expr)` +* **Syntax:** `CHAR_LENGTH(expr)` +* **Function type:** Scalar, string -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-scalar.md#string-functions) -Returns the expression in lowercase. +## CHARACTER_LENGTH -## LPAD +Alias for [`LENGTH`](#length). -`LPAD(, , [])` +* **Syntax:** `CHARACTER_LENGTH(expr)` +* **Function type:** Scalar, string -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-scalar.md#string-functions) -Returns the leftmost number of characters from an expression, optionally padded with the given characters. -## LTRIM +## COALESCE -`LTRIM(, [])` +Returns the first non-null value. +* **Syntax:** `COALESCE(expr, expr, ...)` +* **Function type:** Scalar, other -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +
Example -Trims characters from the leading end of an expression. +The following example returns the first non-null value from the list of parameters. -## MAX +```sql +SELECT COALESCE(null, null, 5, 'abc') AS "first_non_null" +``` -`MAX(expr)` +Returns the following: -**Function type:** [Aggregation](sql-aggregations.md) +| `first_non_null` | +| -- | +| `5` | -Returns the maximum value of a set of values. +
-## MILLIS_TO_TIMESTAMP +[Learn more](sql-scalar.md#other-scalar-functions) -`MILLIS_TO_TIMESTAMP(millis_expr)` +## CONCAT -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +Concatenates a list of expressions. -Converts a number of milliseconds since epoch into a timestamp. +* **Syntax:** `CONCAT(expr[, expr,...])` +* **Function type:** Scalar, string -## MIN +
Example -`MIN(expr)` +The following example concatenates the `OriginCityName` column from `flight-carriers`, the string ` to `, and the `DestCityName` column from `flight-carriers`. -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT + "OriginCityName" AS "origin_city", + "DestCityName" AS "destination_city", + CONCAT("OriginCityName", ' to ', "DestCityName") AS "concatenate_flight_details" +FROM "flight-carriers" +LIMIT 1 +``` -Returns the minimum value of a set of values. +Returns the following: -## MOD +| `origin_city` | `destination_city` | `concatenate_flight_details` | +| -- | -- | -- | +| `San Juan, PR` | `Washington, DC` | `San Juan, PR to Washington, DC` | -`MOD(x, y)` +
-**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +[Learn more](sql-scalar.md#string-functions) -Calculates x modulo y, or the remainder of x divided by y. +## CONTAINS_STRING -## MV_APPEND +Returns true if `str` is a substring of `expr`, case-sensitive. Otherwise, returns false. -`MV_APPEND(arr1, expr)` +* **Syntax:** `CONTAINS_STRING(expr, str)` +* **Function type:** Scalar, string -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +
Example -Adds the expression to the end of the array. +The following example returns true if the `OriginCityName` column from the `flight-carriers` datasource contains the substring `San`. -## MV_CONCAT +```sql +SELECT + "OriginCityName" AS "origin_city", + CONTAINS_STRING("OriginCityName", 'San') AS "contains_string" +FROM "flight-carriers" +LIMIT 2 +``` -`MV_CONCAT(arr1, arr2)` +Returns the following: -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +| `origin_city` | `contains_string` | +| -- | -- | +| `San Juan, PR` | `true` | +| `Boston, MA` | `false` | -Concatenates two arrays. +
-## MV_CONTAINS -`MV_CONTAINS(arr, expr)` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +## COS -Returns true if the expression is in the array, false otherwise. +Calculates the trigonometric cosine of an angle expressed in radians. -## MV_FILTER_NONE +* **Syntax:** `COS(expr)` +* **Function type:** Scalar, numeric -`MV_FILTER_NONE(expr, arr)` +
Example -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +The following example calculates the cosine of angle `PI/3` radians. -Filters a multi-value expression to include no values contained in the array. +```sql +SELECT COS(PI / 3) AS "cosine" +``` +Returns the following: + +| `cosine` | +| -- | +| `0.5000000000000001` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## COT + +Calculates the trigonometric cotangent of an angle expressed in radians. + +* **Syntax:** `COT(expr)` +* **Function type:** Scalar, numeric + +
Example + +The following example calculates the cotangent of angle `PI/3` radians. + +```sql +SELECT COT(PI / 3) AS "cotangent" +``` +Returns the following: + +| `cotangent` | +| -- | +| `0.577350269189626` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## COUNT + +Counts the number of rows. + +* **Syntax**: `COUNT([DISTINCT] expr)` `COUNT(*)` +COUNT DISTINCT is an alias for [`APPROX_COUNT_DISTINCT`](#approx_count_distinct). +* **Function type:** Aggregation + +
Example + +The following example counts the number of distinct flights per day after `'2005-01-01 00:00:00'` in `flight-carriers`: + +```sql +SELECT + TIME_FLOOR(__time, 'P1D') AS "flight_day", + COUNT(*) AS "num_flights" +FROM "flight-carriers" +WHERE __time > '2005-01-01 00:00:00' +GROUP BY 1 +LIMIT 3 +``` + +Returns the following: + +|`flight_day`|`num_flights`| +|------------|------------| +|`2005-11-01T00:00:00.000Z`|`18961`| +|`2005-11-02T00:00:00.000Z`|`19434`| +|`2005-11-03T00:00:00.000Z`|`19745`| + +
+ +[Learn more](sql-aggregations.md) + +## CUME_DIST + +Returns the cumulative distribution of the current row within the window calculated as `number of window rows at the same rank or higher than current row` / `total window rows`. The return value ranges between `1/number of rows` and 1. + +* **Syntax**: `CUME_DIST()` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## CURRENT_DATE + +Returns the current date in UTC time, unless you specify a different timezone in the query context. + +* **Syntax:** `CURRENT_DATE` +* **Function type:** Scalar, date and time + +
Example + +The following example returns the current date. + +```sql +SELECT CURRENT_DATE AS "current_date" +``` + +Returns the following: + +| `current_date` | +| -- | +| `2024-08-14T00:00:00.000Z `| + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + +## CURRENT_TIMESTAMP + +Returns the current timestamp in UTC time, unless you specify a different timezone in the query context. + + +* **Syntax:** `CURRENT_TIMESTAMP` +* **Function type:** Scalar, date and time + +
Example + +The following example returns the current timestamp. + +```sql +SELECT CURRENT_TIMESTAMP AS "current_timestamp" +``` + +Returns the following: + +| `current_timestamp` | +| -- | +| `2024-08-14T21:30:13.793Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + +## DATE_TRUNC + +Rounds down a timestamp by a given time unit. + +* **Syntax:** `DATE_TRUNC(unit, timestamp_expr)` +* **Function type:** Scalar, date and time + +
Example + +The following example truncates a timestamp from the `__time` column from the `taxi-trips` datasource to the most recent `decade`. + +```sql +SELECT + "__time" AS "original_timestamp", + DATE_TRUNC('decade', "__time") AS "truncate_timestamp" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `original_timestamp` | `truncate_time` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2010-01-01T00:00:00.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + + +## DECODE_BASE64_COMPLEX + +Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. + +* **Syntax**: `DECODE_BASE64_COMPLEX(dataType, expr)` +* **Function type:** Scalar, other + +[Learn more](sql-scalar.md#other-scalar-functions) + +## DECODE_BASE64_UTF8 + +Decodes a Base64-encoded string into a UTF-8 encoded string. + +* **Syntax:** `DECODE_BASE64_UTF8(expr)` +* **Function type:** Scalar, string + +
Example + +The following example converts the base64 encoded string `SGVsbG8gV29ybGQhCg==` into an UTF-8 encoded string. + +```sql +SELECT + 'SGVsbG8gV29ybGQhCg==' AS "base64_encoding", + DECODE_BASE64_UTF8('SGVsbG8gV29ybGQhCg==') AS "convert_to_UTF8_encoding" +``` + +Returns the following: + +| `base64_encoding` | `convert_to_UTF8_encoding` | +| -- | -- | +| `SGVsbG8gV29ybGQhCg==` | `Hello World!` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## DEGREES + +Converts an angle from radians to degrees. + +* **Syntax:** `DEGREES(expr)` +* **Function type:** Scalar, numeric + +
Example + +The following example converts an angle of `PI` radians to degrees + +```sql +SELECT DEGREES(PI) AS "degrees" +``` +Returns the following: + +| `degrees` | +| -- | +| `180` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## DENSE_RANK + +Returns the rank for a row within a window without gaps. For example, if two rows tie for a rank of 1, the subsequent row is ranked 2. + +* **Syntax**: `DENSE_RANK()` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## DIV + +Returns the result of integer division of `x` by `y`. + +:::info +The `DIV` function is not implemented in Druid versions 30.0.0 or earlier. Consider using [`SAFE_DIVIDE`](./sql-functions.md#safe_divide) instead. +::: + +* **Syntax:** `DIV(x, y)` +* **Function type:** Scalar, numeric + + +
Example + + The following calculates integer divisions of `78` by `10`. + + ```sql + SELECT DIV(78, 10) as "division" + ``` + + Returns the following: + + | `division` | + | -- | + | `7` | + +
+ + +[Learn more](sql-scalar.md#numeric-functions) + +## DS_CDF + +Returns a string representing an approximation to the cumulative distribution function given a list of split points that define the edges of the bins from a Quantiles sketch. + +* **Syntax:** `DS_CDF(expr, splitPoint0, splitPoint1, ...)` +* **Function type:** Scalar, sketch + +
Example + +The following example specifies three split points to return cumulative distribution function approximations on the `Distance` column from the `flight-carriers` datasource. The query may return a different approximation for each bin on each execution. + +```sql +SELECT DS_CDF( DS_QUANTILES_SKETCH("Distance"), 750, 1500, 2250) AS "estimate_cdf" +FROM "flight-carriers" +``` + +Returns a result similar to the following: + +| `estimate_cdf` | +| -- | +| `[0.6332237016416492,0.8908411023460711,0.9612303007393957,1.0]` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_GET_QUANTILE + +Returns the quantile estimate corresponding to the fraction from a Quantiles sketch. + +* **Syntax:** `DS_GET_QUANTILE(expr, fraction)` +* **Function type:** Scalar, sketch + +
Example + +The following example approximates the median of the `Distance` column from the `flight-carriers` datasource. The query may return a different approximation with each execution. + +```sql +SELECT DS_GET_QUANTILE( DS_QUANTILES_SKETCH("Distance"), 0.5) AS "estimate_median" +FROM "flight-carriers" +``` + +Returns a result similar to the following: + +| `estimate_median` | +| -- | +| `569` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_GET_QUANTILES + +Returns a string representing an array of quantile estimates corresponding to a list of fractions from a Quantiles sketch. + +* **Syntax:** `DS_GET_QUANTILES(expr, fraction0, fraction1, ...)` +* **Function type:** Scalar, sketch + +
Example + +The following example approximates the 25th, 50th, and 75th percentiles of the `Distance` column from the `flight-carriers` datasource. The query may return a different approximation for each percentile on each execution. + +```sql +SELECT DS_GET_QUANTILES( DS_QUANTILES_SKETCH("Distance"), 0.25, 0.5, 0.75) AS "estimate_fractions" +FROM "flight-carriers" +``` + +Returns a result similar to the following: + +| `estimate_fractions` | +| -- | +| `[316.0,571.0,951.0]` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_HISTOGRAM + +Returns an approximation to the histogram from a Quantiles sketch. The split points define the histogram bins. + +* **Syntax:** `DS_HISTOGRAM(expr, splitPoint0, splitPoint1, ...)` +* **Function type:** Scalar, sketch + +
Example + +The following example specifies three split points to approximate a histogram on the `Distance` column from the `flight-carriers` datasource. The query may return a different approximation for each bin on each execution. + +```sql +SELECT DS_HISTOGRAM( DS_QUANTILES_SKETCH("Distance"), 750, 1500, 2250) AS "estimate_histogram" +FROM "flight-carriers" + +``` + +Returns a result similar to the following: + +| `estimate_histogram` | +| -- | +| `[358496.0,153974.99999999997,39909.99999999999,13757.000000000005]` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_HLL + +Creates a HLL sketch on a column containing HLL sketches or a regular column. See [DataSketches HLL Sketch module](../development/extensions-core/datasketches-hll.md) for a description of optional parameters. + +* **Syntax:**`DS_HLL(expr, [lgK, tgtHllType])` +* **Function type:** Aggregation + +
Example + +The following example creates a HLL sketch on the `Tail_number` column of the `flight-carriers` datasource grouping by `OriginState` and `DestState`. + +```sql +SELECT + "OriginState" AS "origin_state", + "DestState" AS "destination_state", + DS_HLL("Tail_Number") AS "hll_tail_number" +FROM "flight-carriers" +GROUP BY 1,2 +LIMIT 1 +``` + +Returns the following: + +| `origin_state` | `destination_state` | `hll_tail_number` | +| -- | -- | -- | +| `AK` | `AK` | `"AwEHDAcIAAFBAAAAfY..."` | + +
+ + +[Learn more](sql-aggregations.md) + +## DS_QUANTILE_SUMMARY + +Returns a string summary of a Quantiles sketch. +* **Syntax:** `DS_QUANTILE_SUMMARY(expr)` +* **Function type:** Scalar, sketch + +
Example + +The following example returns a summary of a Quantiles sketch on the `Distance` column from the `flight-carriers` datasource. + +```sql +SELECT DS_QUANTILE_SUMMARY( DS_QUANTILES_SKETCH("Distance") ) AS "summary" +FROM "flight-carriers" +``` + +Returns the following: + + + + + + + + +
summary
+ +``` +### Quantiles DirectCompactDoublesSketch SUMMARY: + Empty : false + Memory, Capacity bytes : true, 6128 + Estimation Mode : true + K : 128 + N : 566,138 + Levels (Needed, Total, Valid): 12, 12, 5 + Level Bit Pattern : 100010100011 + BaseBufferCount : 122 + Combined Buffer Capacity : 762 + Retained Items : 762 + Compact Storage Bytes : 6,128 + Updatable Storage Bytes : 14,368 + Normalized Rank Error : 1.406% + Normalized Rank Error (PMF) : 1.711% + Min Item : 2.400000e+01 + Max Item : 4.962000e+03 +### END SKETCH SUMMARY +``` + +
+ +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_QUANTILES_SKETCH + +Creates a Quantiles sketch on a Quantiles sketch column or a regular column. See [DataSketches Quantiles Sketch module](../development/extensions-core/datasketches-quantiles.md) for a description of parameters. + +* **Syntax:** `DS_QUANTILES_SKETCH(expr, [k])` +* **Function type:** Aggregation + +
Example + +The following example creates a Quantile sketch on the `Distance` column from the `flight-carriers` datasource. + +```sql +SELECT DS_QUANTILES_SKETCH("Distance") AS "quantile_sketch" +FROM "flight-carriers" +``` + +Returns the following: + +| `quantile_sketch` | +| -- | +| `AgMIGoAAAAB6owgAA...` | + +
+ +[Learn more](sql-aggregations.md) + +## DS_RANK + +Returns an approximate rank of a given value in a distribution. The rank represents the fraction of the distribution less than the given value. + +* **Syntax:** `DS_RANK(expr, value)` +* **Function type:** Scalar, sketch + +
Example + +The following example estimates the fraction of records in the `flight-carriers` datasource where the value in the `Distance` column is less than 500. The query may return a different approximation on each execution. + +```sql +SELECT DS_RANK( DS_QUANTILES_SKETCH("Distance"), 500) AS "estimate_rank" +FROM "flight-carriers" +``` + +Returns a result similar to the following: + +| `estimate_rank` | +| -- | +| `0.43837721544923675 ` | + + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## DS_THETA + +Creates a Theta sketch on a column containing Theta sketches or a regular column. See [DataSketches Theta Sketch module](../development/extensions-core/datasketches-theta#aggregator) for a description of optional parameters. + +* **Syntax:** `DS_THETA(expr, [size])` +* **Function type:** Aggregation + +
Example + +The following example creates a Theta sketch on the `Tail_number` column of the `flight-carriers` datasource grouping by `OriginState` and `DestState`. + +```sql +SELECT + "OriginState" AS "origin_state", + "DestState" AS "destination_state", + DS_THETA("Tail_Number") AS "theta_tail_number" +FROM "flight-carriers" +GROUP BY 1,2 +LIMIT 1 +``` + +Returns the following: + +| `origin_state` | `destination_state` | `theta_tail_number` | +| -- | -- | -- | +| `AK` | `AK` | `AgMDAAAazJNBAAAAA...` | + +
+ +[Learn more](sql-aggregations.md) + +## DS_TUPLE_DOUBLES + +Creates a Tuple sketch which contains an array of double values as the Summary Object. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). + +* **Syntax**: `DS_TUPLE_DOUBLES(expr, [nominalEntries])` + `DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])` +* **Function type:** Aggregation + +[Learn more](sql-aggregations.md) + +## DS_TUPLE_DOUBLES_INTERSECT + +Returns an intersection of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). + +* **Syntax**: `DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])` +* **Function type:** Scalar, sketch + +[Learn more](sql-scalar.md#tuple-sketch-functions) + +## DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE + +Computes approximate sums of the values contained within a Tuple sketch which contains an array of double values as the Summary Object. + +* **Syntax**: `DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)` +* **Function type:** Scalar, sketch + +[Learn more](sql-scalar.md#tuple-sketch-functions) + +## DS_TUPLE_DOUBLES_NOT + +Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). + +* **Syntax**: `DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])` +* **Function type:** Scalar, sketch + +[Learn more](sql-scalar.md#tuple-sketch-functions) + +## DS_TUPLE_DOUBLES_UNION + +Returns a union of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). + +* **Syntax**: `DS_TUPLE_DOUBLES_UNION(expr, ..., [nominalEntries])` +* **Function type:** Scalar, sketch + +[Learn more](sql-scalar.md#tuple-sketch-functions) + +## EARLIEST + +Returns the value of a numeric or string expression corresponding to the earliest `__time` value. + +* **Syntax**: `EARLIEST(expr, [maxBytesPerValue])` +* **Function type:** Aggregation + +
Example + +The following example returns the origin airport code associated with the earliest departing flight daily after `'2005-01-01 00:00:00'` in `flight-carriers`: + +```sql +SELECT + TIME_FLOOR(__time, 'P1D') AS "departure_day", + EARLIEST("Origin") AS "origin" +FROM "flight-carriers" +WHERE __time >= TIMESTAMP '2005-01-01 00:00:00' +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`departure_day`|`origin`| +|------------|--------| +|`2005-11-01T00:00:00.000Z`|`LAS`| +|`2005-11-02T00:00:00.000Z`|`SDF`| + +
+ +[Learn more](sql-aggregations.md) + +## EARLIEST_BY + +Returns the value of a numeric or string expression corresponding to the earliest time value from `timestampExpr`. + +* **Syntax**: `EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])` +* **Function type:** Aggregation + +
Example + +The following example returns the destination airport code associated with the earliest arriving flight daily after `'2005-01-01 00:00:00'` in `flight-carriers`: + +```sql +SELECT + TIME_FLOOR(TIME_PARSE("arrivalime"), 'P1D') AS "arrival_day", + EARLIEST_BY("Dest", TIME_PARSE("arrivalime")) AS "dest" +FROM "flight-carriers" +WHERE TIME_PARSE("arrivalime") >= TIMESTAMP '2005-01-01 00:00:00' +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`arrival_day`|`origin`| +|-------------|--------| +|`2005-11-01T00:00:00.000Z`|`RSW`| +|`2005-11-02T00:00:00.000Z`|`CLE`| + +
+ +[Learn more](sql-aggregations.md) + +## EXP + +Calculates _e_ raised to the power of the numeric expression. + +* **Syntax:** `EXP()` +* **Function type:** Scalar, numeric + +
Example + +The following example calculates _e_ to the power of 1. + +```sql +SELECT EXP(1) AS "exponential" +``` +Returns the following: + +| `exponential` | +| -- | +| `2.7182818284590455` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## EXTRACT + +Extracts the value of some unit from the timestamp. + +* **Syntax:** `EXTRACT(unit FROM timestamp_expr)` +* **Function type:** Scalar, date and time + +
Example + +The following example extracts the year from the `__time` column from the `taxi-trips` datasource. + +```sql +SELECT + "__time" AS "original_time", + EXTRACT(YEAR FROM "__time" ) AS "year" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `original_time` | `year` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + +## FIRST_VALUE + +Returns the value evaluated for the expression for the first row within the window. + +* **Syntax**: `FIRST_VALUE(expr)` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## FLOOR + +### Date and time + +Rounds down a timestamp by a given time unit. + +* **Syntax:** `FLOOR(timestamp_expr TO unit)` +* **Function type:** Scalar, date and time + +
Example + +The following example rounds down the `__time` column from the `taxi-trips` datasource to the nearest year. + +```sql +SELECT + "__time" AS "original_time", + FLOOR("__time" TO YEAR) AS "floor" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `original_time` | `floor` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013-01-01T00:00:00.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + +### Numeric + +Calculates the largest integer less than or equal to the numeric expression. + +* **Syntax:** `FLOOR(expr)` +* **Function type:** Scalar, numeric + +
Example + +The following example applies the FLOOR function to the `fare_amount` column from the `taxi-trips` datasource. + +```sql +SELECT + "fare_amount" AS "fare_amount", + FLOOR("fare_amount") AS "floor_fare_amount" +FROM "taxi-trips" +LIMIT 1 +``` +Returns the following: + +| `fare_amount` | `floor_fare_amount` | +| -- | -- | +| `21.25` | `21` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## GREATEST + +Returns the maximum value from the provided expressions. For information on how Druid interprets the arguments passed into the function, see [Reduction functions](sql-scalar.md#reduction-functions). + +* **Syntax:** `GREATEST([expr1, ...])` +* **Function type:** Scalar, reduction + +
Example + +The following example returns the greatest value between the numeric constant `PI`, the integer number `4`, and the double `-5.0`. Druid interprets these arguments as DOUBLE data type. + +```sql +SELECT GREATEST(PI, 4, -5.0) AS "greatest" +``` + +Returns the following: + +| `greatest` | +| -- | +| `4` | + +
+ +[Learn more](sql-scalar.md#reduction-functions) + + +## GROUPING + +Returns a number for each output row of a groupBy query, indicating whether the specified dimension is included for that row. + +* **Syntax**: `GROUPING(expr, expr...)` +* **Function type:** Aggregation + +
Example + +The following example returns the total minutes of flight delay for each day of the week in `flight-carriers`. +The GROUP BY clause creates two grouping sets, one for the day of the week and one for the grand total. + +For more information, refer to [CASE](#case) and grouping sets with [SQL GROUP BY](sql.md#group-by). + +```sql +SELECT + CASE + WHEN GROUPING("DayOfWeek") = 1 THEN 'Total' + ELSE "DayOfWeek" + END AS "DayOfWeek", + GROUPING("DayOfWeek") AS Subgroup, + SUM("DepDelayMinutes") AS "MinutesDelayed" +FROM "flight-carriers" +GROUP BY GROUPING SETS("DayOfWeek", ()) +``` + +Returns the following: + +|`DayOfWeek`|`Subgroup`|`MinutesDelayed`| +|-----------|-----------|----------------| +|`1`|`0`|`998505`| +|`2`|`0`|`1031599`| +|`3`|`0`|`884677`| +|`4`|`0`|`525351`| +|`5`|`0`|`519413`| +|`6`|`0`|`354601`| +|`7`|`0`|`848704`| +|`Total`|`1`|`5162850`| + +
+ +[Learn more](sql-aggregations.md) + +## HLL_SKETCH_ESTIMATE + +Returns the distinct count estimate from a HLL sketch. To round the distinct count estimate, set `round` to true. `round` defaults to false. + +* **Syntax:** `HLL_SKETCH_ESTIMATE(expr, [round])` +* **Function type:** Scalar, sketch + + +
Example + +The following example estimates the distinct number of unique tail numbers in the `flight-carriers` datasource. + +```sql +SELECT + HLL_SKETCH_ESTIMATE(DS_HLL("Tail_Number")) AS "estimate" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate` | +| -- | +| `4685.8815405960595` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS + +Returns the distinct count estimate and error bounds from a HLL sketch. To specify the number of standard bound deviations, use `numStdDev`. + +* **Syntax:** `HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, [numStdDev])` +* **Function type:** Scalar, sketch + +
Example + +The following example estimates the number of unique tail numbers in the `flight-carriers` datasource with error bounds at plus or minus one standard deviation. + +```sql +SELECT + HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(DS_HLL("Tail_Number"), 1) AS "estimate_with_errors" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate_with_errors` | +| -- | +| `[4685.8815405960595,4611.381540678335,4762.978259800803]` | + + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## HLL_SKETCH_TO_STRING + +Returns a human-readable string representation of a HLL sketch for debugging. + +* **Syntax:** `HLL_SKETCH_TO_STRING(expr)` +* **Function type:** Scalar, sketch + +
Example + +The following example returns the HLL sketch on column `Tail_Number` from the `flight-carriers` datasource as a human-readable string. + +```sql +SELECT + HLL_SKETCH_TO_STRING( DS_HLL("Tail_Number") ) AS "summary" +FROM "flight-carriers" +``` + +Returns the following: + + + + + + + + +
summary
+ +``` +### HLL SKETCH SUMMARY: + Log Config K : 12 + Hll Target : HLL_4 + Current Mode : HLL + Memory : false + LB : 4611.381540678335 + Estimate : 4685.8815405960595 + UB : 4762.978259800803 + OutOfOrder Flag: true + CurMin : 0 + NumAtCurMin : 1316 + HipAccum : 0.0 + KxQ0 : 2080.7755126953125 + KxQ1 : 0.0 + Rebuild KxQ Flg: false +``` + +
+ +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## HLL_SKETCH_UNION + +Returns a union of HLL sketches. See [DataSketches HLL Sketch module](../development/extensions-core/datasketches-hll.md) for a description of optional parameters. + +* **Syntax:** `HLL_SKETCH_UNION([lgK, tgtHllType], expr0, expr1, ...)` +* **Function type:** Scalar, sketch + + +
Example + +The following example estimates the union of the HLL sketch of tail numbers that took off from `CA` and the HLL sketch of tail numbers that took off from `TX`. The example uses the `Tail_Number` and `OriginState` columns from the `flight-carriers` datasource. + +```sql +SELECT + HLL_SKETCH_ESTIMATE( + HLL_SKETCH_UNION( + DS_HLL("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), + DS_HLL("Tail_Number") FILTER(WHERE "OriginState" = 'TX') + ) + ) AS "estimate_union" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate_union` | +| -- | +| `4204.798431046455` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) + +## HUMAN_READABLE_BINARY_BYTE_FORMAT + +Converts an integer byte size into human-readable [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. + +* **Syntax:** `HUMAN_READABLE_BINARY_BYTE_FORMAT(value[, precision])` +* **Function type:** Scalar, numeric + +
Example + + The following example converts `1000000` into IEC format. + + ```sql + SELECT HUMAN_READABLE_BINARY_BYTE_FORMAT(1000000, 2) AS "iec_format" + ``` + + Returns the following: + + | `iec_format` | + | -- | + | `976.56 KiB` | + +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## HUMAN_READABLE_DECIMAL_BYTE_FORMAT + +Converts a byte size into human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. + +* **Syntax:** `HUMAN_READABLE_DECIMAL_BYTE_FORMAT(value[, precision])` +* **Function type:** Scalar, numeric + +
Example + +The following example converts `1000000` into SI format. + +```sql +SELECT HUMAN_READABLE_DECIMAL_BYTE_FORMAT(1000000, 2) AS "si_format" +``` + +Returns the following: + +|`si_format`| +|--| +|`1.00 MB`| + +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## HUMAN_READABLE_DECIMAL_FORMAT + +Converts a byte size into human-readable SI format with single-character units. + +* **Syntax:** `HUMAN_READABLE_DECIMAL_FORMAT(value[, precision])` +* **Function type:** Scalar, numeric + +
Example + + The following example converts `1000000` into single character SI format. + +```sql +SELECT HUMAN_READABLE_DECIMAL_FORMAT(1000000, 2) AS "single_character_si_format" +``` + +Returns the following: + +|`single_character_si_format`| +|--| +|`1.00 M`| +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## ICONTAINS_STRING + +Returns true if `str` is a substring of `expr`, case-insensitive. Otherwise, returns false. + +* **Syntax:** `ICONTAINS_STRING(expr, str)` +* **Function type:** Scalar, string + +
Example + +The following example returns true if the `OriginCityName` column from the `flight-carriers` datasource contains the case-insensitive substring `san`. + +```sql +SELECT + "OriginCityName" AS "origin_city", + ICONTAINS_STRING("OriginCityName", 'san') AS "contains_case_insensitive_string" +FROM "flight-carriers" +LIMIT 2 +``` + +Returns the following: + +| `origin_city` | `contains_case_insensitive_string` | +| -- | -- | +| `San Juan, PR` | `true` | +| `Boston, MA` | `false` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## IPV4_MATCH + +Returns true if the IPv4 `address` belongs to the `subnet` literal, otherwise returns false. + +* **Syntax:** `IPV4_MATCH(address, subnet)` +* **Function type:** Scalar, IP address + +
Example + +The following example returns true if the IPv4 address in the `forward_for` column from the `kttm` datasource belongs to the subnet `181.13.41.0/24`. + +```sql +SELECT + "forwarded_for" AS "ipv4_address", + IPV4_MATCH("forwarded_for", '181.13.41.0/24') AS "belongs_in_subnet" +FROM "kttm" +LIMIT 2 +``` + +Returns the following: + +| `ipv4_address` | `belongs_in_subnet`| +| -- | -- | +| `181.13.41.82` | `true`| +| `177.242.100.0` | `false`| + +
+ + +[Learn more](sql-scalar.md#ip-address-functions) + + +## IPV4_PARSE + +Parses an IPv4 `address` into its integer notation. + +* **Syntax:** `IPV4_PARSE(address)` +* **Function type:** Scalar, IP address + +
Example + +The following example returns an integer that represents the IPv4 address `5.5.5.5`. + +```sql +SELECT + '5.5.5.5' AS "ipv4_address", + IPV4_PARSE('5.5.5.5') AS "integer" +``` + +Returns the following: + +| `ipv4_address` | `integer` | +| -- | -- | +| `5.5.5.5` | `84215045` | + +
+ +[Learn more](sql-scalar.md#ip-address-functions) + +## IPV4_STRINGIFY + +Converts an IPv4 `address` in integer notation into dot-decimal notation. + +* **Syntax:** `IPV4_STRINGIFY(address)` +* **Function type:** Scalar, IP address + +
Example + +The following example returns the integer `84215045` in IPv4 dot-decimal notation. + +```sql +SELECT + '84215045' AS "integer", + IPV4_STRINGIFY(84215045) AS "dot_decimal_notation" +``` + +Returns the following: + +| `integer` | `dot_decimal_notation` | +| -- | -- | +| `84215045` | `5.5.5.5` | + +
+ +[Learn more](sql-scalar.md#ip-address-functions) + +## IPV6_MATCH + +Returns true if the IPv6 `address` belongs to the `subnet` literal. Otherwise, returns false. + +* **Syntax:** `IPV6_MATCH(address, subnet)` +* **Function type:** Scalar, IP address + +
Example + +The following example returns true because `75e9:efa4:29c6:85f6::232c` is in the subnet of `75e9:efa4:29c6:85f6::/64`. + +```sql +SELECT + '75e9:efa4:29c6:85f6::232c' AS "ipv6_address", + IPV6_MATCH('75e9:efa4:29c6:85f6::232c', '75e9:efa4:29c6:85f6::/64') AS "belongs_in_subnet" +``` + +Returns the following: + +| `ipv6_address` | `belongs_in_subnet` | +| -- | -- | +| `75e9:efa4:29c6:85f6::232c` | `true` | + + +
+ +[Learn more](sql-scalar.md#ip-address-functions) + + +## JSON_KEYS + +Returns an array of field names from `expr` at the specified `path`. + +* **Syntax**: `JSON_KEYS(expr, path)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + + +## JSON_MERGE + +Merges two or more JSON `STRING` or `COMPLEX` into one. Preserves the rightmost value when there are key overlaps. Returning always a `COMPLEX` type. + +* **Syntax:** `JSON_MERGE(expr1, expr2[, expr3 ...])` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + + +## JSON_OBJECT + +Constructs a new `COMPLEX` object. The `KEY` expressions must evaluate to string types. The `VALUE` expressions can be composed of any input type, including other `COMPLEX` values. `JSON_OBJECT` can accept colon-separated key-value pairs. The following syntax is equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`. + +* **Syntax**: `JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + + +## JSON_PATHS + +Returns an array of all paths which refer to literal values in `expr` in JSONPath format. + +* **Syntax**: `JSON_PATHS(expr)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + + +## JSON_QUERY + +Extracts a `COMPLEX` value from `expr`, at the specified `path`. + +* **Syntax**: `JSON_QUERY(expr, path)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + + +## JSON_QUERY_ARRAY + +Extracts an `ARRAY>` value from `expr` at the specified `path`. If value is not an `ARRAY`, it gets translated into a single element `ARRAY` containing the value at `path`. The primary use of this function is to extract arrays of objects to use as inputs to other [array functions](./sql-array-functions.md). + +* **Syntax**: `JSON_QUERY_ARRAY(expr, path)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + +## JSON_VALUE + +Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`. + +* **Syntax**: `JSON_VALUE(expr, path [RETURNING sqlType])` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + +## LAG + +If you do not supply an `offset`, returns the value evaluated at the row preceding the current row. Specify an offset number `n` to return the value evaluated at `n` rows preceding the current one. + +* **Syntax**: `LAG(expr[, offset])` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## LAST_VALUE + +Returns the value evaluated for the expression for the last row within the window. + +* **Syntax**: `LAST_VALUE(expr)` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## LATEST + +Returns the value of a numeric or string expression corresponding to the latest `__time` value. + +* **Syntax**: `LATEST(expr, [maxBytesPerValue])` +* **Function type:** Aggregation + +
Example + +The following example returns the origin airport code associated with the latest departing flight daily after `'2005-01-01 00:00:00'` in `flight-carriers`: + +```sql +SELECT + TIME_FLOOR(__time, 'P1D') AS "departure_day", + LATEST("Origin") AS "origin" +FROM "flight-carriers" +WHERE __time >= TIMESTAMP '2005-01-01 00:00:00' +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`departure_day`|`origin`| +|------------|--------| +|`2005-11-01T00:00:00.000Z`|`LAS`| +|`2005-11-02T00:00:00.000Z`|`LAX`| + +
+ +[Learn more](sql-aggregations.md) + +## LATEST_BY + +Returns the value of a numeric or string expression corresponding to the latest time value from `timestampExpr`. + +* **Syntax**: `LATEST_BY(expr, timestampExpr, [maxBytesPerValue])` +* **Function type:** Aggregation + +
Example + +The following example returns the destination airport code associated with the latest arriving flight daily after `'2005-01-01 00:00:00'` in `flight-carriers`: + +```sql +SELECT + TIME_FLOOR(TIME_PARSE("arrivalime"), 'P1D') AS "arrival_day", + LATEST_BY("Dest", TIME_PARSE("arrivalime")) AS "dest" +FROM "flight-carriers" +WHERE TIME_PARSE("arrivalime") >= TIMESTAMP '2005-01-01 00:00:00' +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`arrival_day`|`origin`| +|-------------|--------| +|`2005-11-01T00:00:00.000Z`|`MCO`| +|`2005-11-02T00:00:00.000Z`|`BUF`| + +
+ +[Learn more](sql-aggregations.md) + +## LEAD + +If you do not supply an `offset`, returns the value evaluated at the row following the current row. Specify an offset number `n` to return the value evaluated at `n` rows following the current one; if there is no such row, returns the given default value. + +* **Syntax**: `LEAD(expr[, offset])` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## LEAST + +Returns the minimum value from the provided expressions. For information on how Druid interprets the arguments passed into the function, see [Reduction functions](sql-scalar.md#reduction-functions). + +* **Syntax:** `LEAST([expr1, ...])` +* **Function type:** Scalar, reduction + +
Example + +The following example returns the minimum value between the strings `apple`, `orange`, and `pear`. Druid interprets these arguments as STRING data type. + +```sql +SELECT LEAST( 'apple', 'orange', 'pear') AS "least" +``` + +Returns the following: + +| `least` | +| -- | +| `apple` | + +
+ +[Learn more](sql-scalar.md#reduction-functions) + + +## LEFT + +Returns the `N` leftmost characters of an expression, where `N` is an integer value. + +* **Syntax:** `LEFT(expr, N)` +* **Function type:** Scalar, string + +
Example + +The following example returns the `3` leftmost characters of the expression `ABCDEFG`. + +```sql +SELECT + 'ABCDEFG' AS "expression", + LEFT('ABCDEFG', 3) AS "leftmost_characters" +``` + +Returns the following: + +| `expression` | `leftmost_characters` | +| -- | -- | +| `ABCDEFG` | `ABC` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## LENGTH + +Returns the length of the expression in UTF-16 code units. + +* **Syntax:** `LENGTH(expr)` +* **Function type:** Scalar, string + +
Example + +The following example returns the character length of the `OriginCityName` column from the `flight-carriers` datasource. + +```sql +SELECT + "OriginCityName" AS "origin_city_name", + LENGTH("OriginCityName") AS "city_name_length" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `origin_city_name` | `city_name_length` | +| -- | -- | +| `San Juan, PR` | `12` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## LISTAGG + +Alias for [`STRING_AGG`](#string_agg). + +* **Syntax:** `LISTAGG([DISTINCT] expr, [separator, [size]])` +* **Function type:** Aggregation + +[Learn more](sql-aggregations.md) + +## LN + +Calculates the natural logarithm of the numeric expression. + +* **Syntax:** `LN()` +* **Function type:** Scalar, numeric + +
Example + +The following example applies the LN function to the `max_temperature` column from the `taxi-trips` datasource. + +```sql +SELECT + "max_temperature" AS "max_temperature", + LN("max_temperature") AS "natural_log_max_temp" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `max_temperature` | `natural_log_max_temp` | +| -- | -- | +| `76` | `4.330733340286331` | + +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## LOG10 + +Calculates the base-10 logarithm of the numeric expression. + +* **Syntax:** `LOG10()` +* **Function type:** Scalar, numeric + +
Example + +The following example applies the LOG10 function to the `max_temperature` column from the `taxi-trips` datasource. + +```sql +SELECT + "max_temperature" AS "max_temperature", + LOG10("max_temperature") AS "log10_max_temp" +FROM "taxi-trips" +LIMIT 1 +``` +Returns the following: + +| `max_temperature` | `log10_max_temp` | +| -- | -- | +| `76` | `1.8808135922807914` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## LOOKUP + +Searches for `expr` in a registered [query-time lookup table](lookups.md) named `lookupName` and returns the mapped value. If `expr` is null or not contained in the lookup, returns `defaultValue` if supplied, otherwise returns null. + +* **Syntax:** `LOOKUP(expr, lookupName[, defaultValue])` +* **Function type:** Scalar, string + +
Example + +The following example uses a `map` type lookup table named `code_to_name`, which contains the following key-value pairs: + +```json +{ + "SJU": "Luis Munoz Marin International Airport", + "IAD": "Dulles International Airport" +} +``` + +The example uses `code_to_name` to map the `Origin` column from the `flight-carriers` datasource to the corresponding full airport name. Returns `key not found` if no matching key exists in the lookup table. + +```sql +SELECT + "Origin" AS "origin_airport", + LOOKUP("Origin", 'code_to_name','key not found') AS "full_airport_name" +FROM "flight-carriers" +LIMIT 2 +``` + +Returns the following: + +| `origin_airport` | `full_airport_name` | +| -- | -- | +| `SJU` | `Luis Munoz Marin International Airport` | +| `BOS` | `key not found` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## LOWER + +Returns the expression in lowercase. + +* **Syntax:** `LOWER(expr)` +* **Function type:** Scalar, string + +
Example + +The following example converts the `OriginCityName` column from the `flight-carriers` datasource to lowercase. + +```sql +SELECT + "OriginCityName" AS "origin_city", + LOWER("OriginCityName") AS "lowercase" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `origin_city` | `lowercase` | +| -- | -- | +`San Juan, PR` | `san juan, pr` | + +
+ +[Learn more](sql-scalar.md#string-functions) + + +## LPAD + +Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, left pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`. + +* **Syntax:** `LPAD(expr, length[, chars])` +* **Function type:** Scalar, string + +
Example + +The following example left pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters. + +```sql +SELECT + "OriginStateName" AS "origin_state", + LPAD("OriginStateName", 11, '+') AS "add_left_padding" +FROM "flight-carriers" +LIMIT 3 +``` + +Returns the following: + +| `origin_state` | `add_left_padding` | +| -- | -- | +| `Puerto Rico` | `Puerto Rico` | +| `Massachusetts` | `Massachuset` | +| `Florida` | `++++Florida` | + +
+ +[Learn more](sql-scalar.md#string-functions) + + +## LTRIM + +Trims characters from the leading end of an expression. Defaults `chars` to a space if none is provided. + +* **Syntax:** `LTRIM(expr[, chars])` +* **Function type:** Scalar, string + +
Example + +The following example trims the `_` characters from the leading end of the string expression. + +```sql +SELECT + '___abc___' AS "original_string", + LTRIM('___abc___', '_') AS "trim_leading_end_of_expression" +``` + +Returns the following: + +| `original_string` | `trim_leading_end_of_expression` | +| -- | -- | +| `___abc___` | `abc___` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## MAX + +Returns the maximum value of a set of values. + +* **Syntax**: `MAX(expr)` +* **Function type:** Aggregation + + +
Example + +The following example calculates the maximum delay in minutes for an airline in `flight-carriers`: + +```sql +SELECT MAX("DepDelayMinutes") AS max_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` + +Returns the following: + +| `max_delay` | +| -- | +| `1210` | + +
+ +[Learn more](sql-aggregations.md) + +## MILLIS_TO_TIMESTAMP + +Converts a number of milliseconds since epoch into a timestamp. + +* **Syntax:** `MILLIS_TO_TIMESTAMP(millis_expr)` +* **Function type:** Scalar, date and time + +
Example + +The following example converts 1375344877000 milliseconds from epoch into a timestamp. + +```sql +SELECT MILLIS_TO_TIMESTAMP(1375344877000) AS "timestamp" +``` + +Returns the following: + +| `timestamp` | +| -- | +| `2013-08-01T08:14:37.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + +## MIN + +Returns the minimum value of a set of values. + +* **Syntax**: `MIN(expr)` +* **Function type:** Aggregation + +
Example + +The following example calculates the minimum delay in minutes for an airline in `flight-carriers`: + +```sql +SELECT MIN("DepDelayMinutes") AS min_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` + +Returns the following: + +| `min_delay` | +| -- | +| `0` | + +
+ +[Learn more](sql-aggregations.md) + +## MOD + +Calculates x modulo y, or the remainder of x divided by y. Where x and y are numeric expressions. + +* **Syntax:** `MOD(x, y)` +* **Function type:** Scalar, numeric + +
Example + +The following calculates 78 MOD 10. + +```sql +SELECT MOD(78, 10) as "modulo" +``` +Returns the following: + +| `modulo` | +| -- | +| `8` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## MV_APPEND + +Adds the expression to the end of the array. + +* **Syntax:** `MV_APPEND(arr1, expr)` +* **Function type:** Multi-value string + +
Example + +The following example appends the string `label` to the multi-value string `tags` from `mvd-example`: + +```sql +SELECT MV_APPEND("tags", "label") AS append +FROM "mvd-example" +LIMIT 1 +``` + +Returns the following: + +| `append` | +| -- | +| `["t1","t2","t3","row1"]` | + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_CONCAT + +Concatenates two arrays. + +* **Syntax:** `MV_CONCAT(arr1, arr2)` +* **Function type:** Multi-value string + +
Example + +The following example concatenates `tags` from `mvd-example` to itself: + +```sql +SELECT MV_CONCAT("tags", "tags") AS cat +FROM "mvd-example" +LIMIT 1 +``` + +Returns the following: + +| `cat` | +| -- | +| `["t1","t2","t3","t1","t2","t3"]` | + +
+ +[Learn more](sql-multivalue-string-functions.md) + + +## MV_CONTAINS + +Returns true if the expression is in the array, false otherwise. + +* **Syntax:** `MV_CONTAINS(arr, expr)` +* **Function type:** Multi-value string + +
Example + +The following example checks if the string `t3` exists within `tags` from `mvd-example`: + +```sql +SELECT "tags", MV_CONTAINS("tags", 't3') AS contained +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`contained`| +|------|-----------| +|`["t1","t2","t3"]`|`true`| +|`["t3","t4","t5"]`|`true`| +|`["t5","t6","t7"]`|`false`| +|`null`|`false`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_FILTER_NONE + +Filters a multi-value expression to exclude values from an array. + +* **Syntax:** `MV_FILTER_NONE(expr, arr)` +* **Function type:** Multi-value string + +
Example + +The following example filters `tags` from `mvd-example` to remove values `t1` or `t3`, if present: + +```sql +SELECT MV_FILTER_NONE("tags", ARRAY['t1', 't3']) AS exclude +FROM "mvd-example" +LIMIT 3 +``` + +Returns the following: + +| `exclude` | +| -- | +| `t2` | +| `["t4", "t5"]` | +| `["t5","t6","t7"]` | + +
+ +[Learn more](sql-multivalue-string-functions.md) ## MV_FILTER_ONLY -`MV_FILTER_ONLY(expr, arr)` +Filters a multi-value expression to include only values contained in the array. + +* **Syntax:** `MV_FILTER_ONLY(expr, arr)` +* **Function type:** Multi-value string + +
Example + +The following example filters `tags` from `mvd-example` to only contain the values `t1` or `t3`: + +```sql +SELECT MV_FILTER_ONLY("tags", ARRAY['t1', 't3']) AS filt +FROM "mvd-example" +LIMIT 3 +``` + +Returns the following: + +| `filt` | +| -- | +| `["t1","t3"]` | +| `t3` | +| null | + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_LENGTH + +Returns the length of an array expression. + +* **Syntax:** `MV_LENGTH(arr)` +* **Function type:** Multi-value string + +
Example + +The following example returns the length of the `tags` multi-value strings from `mvd-example`: + +```sql +SELECT MV_LENGTH("tags") AS len +FROM "mvd-example" +LIMIT 1 +``` + +Returns the following: + +| `len` | +| -- | +| `3` | + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_OFFSET + +Returns the array element at the given zero-based index. + +* **Syntax:** `MV_OFFSET(arr, long)` +* **Function type:** Multi-value string + +
Example + +The following example returns `tags` and the element at the third position of `tags` in `mvd-example`: + +```sql +SELECT "tags", MV_OFFSET("tags", 2) AS elem +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`elem`| +|------|------| +|`["t1","t2","t3"]`|`t3`| +|`["t3","t4","t5"]`|`t5`| +|`["t5","t6","t7"]`|`t7`| +|`null`|`null`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_OFFSET_OF + +Returns the zero-based index of the first occurrence of a given expression in the array. + +* **Syntax:** `MV_OFFSET_OF(arr, expr)` +* **Function type:** Multi-value string + +
Example + +The following example returns `tags` and the zero-based index of the string `t3` from `tags` in `mvd-example`: + +```sql +SELECT "tags", MV_OFFSET_OF("tags", 't3') AS index +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`index`| +|------|-------| +|`["t1","t2","t3"]`|`2`| +|`["t3","t4","t5"]`|`0`| +|`["t5","t6","t7"]`|`null`| +|`null`|`null`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_ORDINAL + +Returns the array element at the given one-based index. + +* **Syntax:** `MV_ORDINAL(arr, long)` +* **Function type:** Multi-value string + +
Example + +The following example returns `tags` and the element at the third position of `tags` in `mvd-example`: + +```sql +SELECT "tags", MV_ORDINAL("tags", 3) AS elem +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`elem`| +|------|------| +|`["t1","t2","t3"]`|`t3`| +|`["t3","t4","t5"]`|`t5`| +|`["t5","t6","t7"]`|`t7`| +|`null`|`null`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_ORDINAL_OF + +Returns the one-based index of the first occurrence of a given expression. + +* **Syntax:** `MV_ORDINAL_OF(arr, expr)` +* **Function type:** Multi-value string + +
Example + +The following example returns `tags` and the one-based index of the string `t3` from `tags` in `mvd-example`: + +```sql +SELECT "tags", MV_ORDINAL_OF("tags", 't3') AS index +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`index`| +|------|-------| +|`["t1","t2","t3"]`|`3`| +|`["t3","t4","t5"]`|`1`| +|`["t5","t6","t7"]`|`null`| +|`null`|`null`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_OVERLAP + +Returns true if the two arrays have any elements in common, false otherwise. + +* **Syntax:** `MV_OVERLAP(arr1, arr2)` +* **Function type:** Multi-value string + +
Example + +The following example identifies rows that contain `t1` or `t3` in `tags` from `mvd-example`: + +```sql +SELECT "tags", MV_OVERLAP("tags", ARRAY['t1', 't3']) AS overlap +FROM "mvd_example" +``` + +Returns the following: + +|`tags`|`overlap`| +|------|---------| +|`["t1","t2","t3"]`|`true`| +|`["t3","t4","t5"]`|`true`| +|`["t5","t6","t7"]`|`false`| +|`null`|`false`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_PREPEND + +Adds the expression to the beginning of the array. + +* **Syntax:** `MV_PREPEND(expr, arr)` +* **Function type:** Multi-value string + +
Example + +The following example prepends the string dimension `label` to the multi-value string dimension `tags` from `mvd-example`: + +```sql +SELECT MV_PREPEND("label", "tags") AS prepend +FROM "mvd-example" +LIMIT 1 +``` + +Returns the following: + +| `prepend` | +| -- | +| `["row1","t1","t2","t3"]` | + + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_SLICE + +Returns a slice of the array from the zero-based start and end indexes. + +* **Syntax:** `MV_SLICE(arr, start, end)` +* **Function type:** Multi-value string + +
Example + +The following example returns `tags` and the second and third values of `tags` from `mvd-example`: + +```sql +SELECT "tags", MV_SLICE(tags, 1, 3) AS slice +FROM "mvd-example" +``` + +Returns the following: + +|`tags`|`slice`| +|------|-------| +|`["t1"","t2","t3"]`|`["t2","t3"]`| +|`["t3"","t4","t5"]`|`["t4","t5"]`| +|`["t5"","t6","t7"]`|`["t6","t7"]`| +|`null`|`null`| + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_TO_ARRAY + +Converts a multi-value string from a `VARCHAR` to a `VARCHAR ARRAY`. + +* **Syntax:** `MV_TO_ARRAY(str)` +* **Function type:** Multi-value string + +
Example + +The following example transforms the `tags` column from `mvd-example` to arrays: + +```sql +SELECT MV_TO_ARRAY(tags) AS arr +FROM "mvd-example" +LIMIT 1 +``` + +Returns the following: + +| `arr` | +| -- | +| `[t1, t2, t3]` | + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## MV_TO_STRING + +Joins all elements of the array together by the given delimiter. + +* **Syntax:** `MV_TO_STRING(arr, str)` +* **Function type:** Multi-value string + +
Example + +The following example transforms the `tags` column from `mvd-example` to strings delimited by a space character: + +```sql +SELECT MV_TO_STRING("tags", ' ') AS str +FROM mvd-example +LIMIT 1 +``` + +Returns the following: + +| `str` | +| -- | +| `t1 t2 t3` | + +
+ +[Learn more](sql-multivalue-string-functions.md) + +## NTILE + +Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into. + +* **Syntax**: `NTILE(tiles)` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## NULLIF + +Returns null if two values are equal, else returns the first value. +* **Syntax:** `NULLIF(value1, value2)` +* **Function type:** Scalar, other + +
Example + +The following example returns null if the `OriginState` column from the `flight-carriers` datasource is `PR`. + +```sql +SELECT "OriginState" AS "origin_state", + NULLIF("OriginState", 'PR') AS "remove_pr" +FROM "flight-carriers" +LIMIT 2 +``` + +Returns the following: + +| `origin_state` | `remove_pr` | +| -- | -- | +| `PR` | `null` | +| `MA` | `MA` | + +
+ +[Learn more](sql-scalar.md#other-scalar-functions) + + +## NVL + +Returns `value1` if `value1` is not null, otherwise returns `value2`. + +* **Syntax:** `NVL(value1, value1)` +* **Function type:** Scalar, other + +
Example + +The following example replaces each null value in the `Tail_Number` column of the `flight-carriers` datasource with the string "No tail number." + +```sql +SELECT "Tail_Number" AS "original_column", + NVL("Tail_Number", 'No tail number') AS "remove_null" +FROM "flight-carriers" +WHERE "OriginState" = 'CT' +LIMIT 2 +``` + +Returns the following: + +| `original_column` | `remove_null` +| -- | -- | +| `N951DL` | `N951DL` | +| `null` | `No tail number` | + +
+ +[Learn more](sql-scalar.md#other-scalar-functions) + +## PARSE_JSON + +Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in an error. + +* **Syntax**: `PARSE_JSON(expr)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + +## PARSE_LONG + +Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided. + +* **Syntax:**`PARSE_LONG(string[, radix])` +* **Function type:** Scalar, string + +
Example + +The following example converts the string representation of the binary, radix 2, number `1100` into its long (BIGINT) equivalent. + +```sql +SELECT + '1100' AS "binary_as_string", + PARSE_LONG('1110', 2) AS "bigint_value" +``` + +Returns the following: + +| `binary_as_string` | `bigint_value` | +| -- | -- | +| `1100` | `14` | + +
+ +[Learn more](sql-scalar.md#string-functions) + + +## PERCENT_RANK + +Returns the relative rank of the row calculated as a percentage according to the formula: `RANK() OVER (window) / COUNT(1) OVER (window)`. + +* **Syntax**: `PERCENT_RANK()` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## POSITION + +Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. If `substring` is not found, returns 0. + +* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])` +* **Function type:** Scalar, string + +
Example + +The following example returns the one-based index of the substring `PR` in the `OriginCityName` column from the `flight-carriers` datasource starting from index 5. + +```sql +SELECT + "OriginCityName" AS "origin_city", + POSITION('PR' IN "OriginCityName" FROM 5) AS "index" +FROM "flight-carriers" +LIMIT 2 +``` + +Returns the following: + +| `origin_city` | `index` | +| -- | -- | +| `San Juan, PR` | `11` | +| `Boston, MA` | `0` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## POWER + +Calculates a numerical expression raised to the specified power. + +* **Syntax:** `POWER(base, exponent)` +* **Function type:** Scalar, numeric + +
Example + +The following example raises 5 to the power of 2. + +```sql +SELECT POWER(5, 2) AS "power" +``` +Returns the following: + +| `power` | +| -- | +| `25` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## RADIANS + +Converts an angle from degrees to radians. + +* **Syntax:** `RADIANS(expr)` +* **Function type:** Scalar, numeric + +
Example + +The following example converts an angle of `180` degrees to radians + +```sql +SELECT RADIANS(180) AS "radians" +``` +Returns the following: + +| `radians` | +| -- | +| `3.141592653589793` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## RANK + +Returns the rank with gaps for a row within a window. For example, if two rows tie for rank 1, the next rank is 3. + +* **Syntax**: `RANK()` +* **Function type:** Window + +[Learn more](sql-window-functions.md#window-function-reference) + +## REGEXP_EXTRACT + +Apply regular expression `pattern` to `expr` and extract the Nth capture group. If `N` is unspecified or zero, returns the first substring that matches the pattern. Returns null if there is no matching pattern. + +* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])` +* **Function type:** Scalar, string + +
Example + +The following example uses regular expressions to find city names inside the `OriginCityName` column from the `flight-carriers` datasource by matching what comes before the comma. + +```sql +SELECT + "OriginCityName" AS "origin_city", + REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_match" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `origin_city` | `pattern_match` | +| -- | -- | +| `San Juan, PR` | `San Juan`| + +
+ +[Learn more](sql-scalar.md#string-functions) + +## REGEXP_LIKE + +Returns `true` if the regular expression `pattern` finds a match in `expr`. Returns `false` otherwise. + +* **Syntax:** `REGEXP_LIKE(expr, pattern)` +* **Function type:** Scalar, string + +
Example + +The following example returns `true` when the `OriginCityName` column from `flight-carriers` has a city name containing a space. + +```sql +SELECT + "OriginCityName" AS "origin_city", + REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found" +FROM "flight-carriers" +LIMIT 2 +``` + +Returns the following: + +| `origin_city` | `pattern_found` | +| -- | -- | +| `San Juan, PR` | `true` | +| `Boston, MA` | `false` | + +
+ +[Learn more](sql-scalar.md#string-functions) + +## REGEXP_REPLACE + +Replaces all occurrences of a regular expression in a string expression with a replacement string. Refer to capture groups in the replacement string using `$group` syntax. For example: `$1` or `$2`. + +* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)` +* **Function type:** Scalar, string + +
Example + +The following example matches three consecutive words, where each word is its own capture group, and replaces the matched words with the word in the second capture group punctuated with exclamation marks. + +```sql +SELECT + 'foo bar baz' AS "original_string", + REGEXP_REPLACE('foo bar baz', '([A-Za-z]+) ([A-Za-z]+) ([A-Za-z]+)' , '$2!') AS "modified_string" +``` + +Returns the following: + +| `original_string` | `modified_string` | +| -- | -- | +| `foo bar baz` | `bar!` | + +
+ +[Learn more](sql-scalar.md#string-functions) + + +## REPEAT + +Repeats the string expression `N` times, where `N` is an integer. + +* **Syntax:** `REPEAT(expr, N)` +* **Function type:** Scalar, string + +
Example + +The following example returns the string expression `abc` repeated `3` times. + +```sql +SELECT + 'abc' AS "original_string", + REPEAT('abc', 3) AS "with_repetition" +``` + +Returns the following: + +| `original_string` | `with_repetition` | +| -- | -- | +| `abc` | `abcabcabc` | -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +
-Filters a multi-value expression to include only values contained in the array. +[Learn more](sql-scalar.md#string-functions) -## MV_LENGTH +## REPLACE -`MV_LENGTH(arr)` +Replaces instances of a substring with a replacement string in the given expression. -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +* **Syntax:** `REPLACE(expr, substring, replacement)` +* **Function type:** Scalar, string -Returns the length of an array expression. +
Example -## MV_OFFSET +The following example replaces instances of the substring `abc` with `XYZ`. -`MV_OFFSET(arr, long)` +```sql +SELECT + 'abc 123 abc 123' AS "original_string", + REPLACE('abc 123 abc 123', 'abc', 'XYZ') AS "modified_string" +``` -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +Returns the following: -Returns the array element at the given zero-based index. +| `original_string` | `modified_string` | +| -- | -- | +| `abc 123 abc 123` | `XYZ 123 XYZ 123` | -## MV_OFFSET_OF +
-`MV_OFFSET_OF(arr, expr)` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +## REVERSE -Returns the zero-based index of the first occurrence of a given expression in the array. +Reverses the given expression. -## MV_ORDINAL +* **Syntax:** `REVERSE(expr)` +* **Function type:** Scalar, string -`MV_ORDINAL(arr, long)` +
Example -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +The following example reverses the string expression `abc`. -Returns the array element at the given one-based index. +```sql +SELECT + 'abc' AS "original_string", + REVERSE('abc') AS "reversal" +``` -## MV_ORDINAL_OF +Returns the following: -`MV_ORDINAL_OF(arr, expr)` +| `original_string` | `reversal` | +| -- | -- | +| `abc` | `cba` | -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +
-Returns the one-based index of the first occurrence of a given expression. +[Learn more](sql-scalar.md#string-functions) -## MV_OVERLAP +## RIGHT -`MV_OVERLAP(arr1, arr2)` +Returns the `N` rightmost characters of an expression, where `N` is an integer value. -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +* **Syntax:** `RIGHT(expr, N)` +* **Function type:** Scalar, string -Returns true if the two arrays have any elements in common, false otherwise. +
Example -## MV_PREPEND +The following example returns the `3` rightmost characters of the expression `ABCDEFG`. -`MV_PREPEND(expr, arr)` +```sql +SELECT + 'ABCDEFG' AS "expression", + RIGHT('ABCDEFG', 3) AS "rightmost_characters" +``` -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +Returns the following: -Adds the expression to the beginning of the array. +| `expression` | `rightmost_characters` | +| -- | -- | +| `ABCDEFG` | `EFG` | -## MV_SLICE +
-`MV_SLICE(arr, start, end)` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +## ROUND -Returns a slice of the array from the zero-based start and end indexes. +Calculates the rounded value for a numerical expression. -## MV_TO_ARRAY +* **Syntax:** `ROUND(expr[, digits])` +* **Function type:** Scalar, numeric -`MV_TO_ARRAY(str)` +
Example -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +The following applies the ROUND function to 0 decimal points on the `pickup_longitude` column from the `taxi-trips` datasource. -Converts a multi-value string from a `VARCHAR` to a `VARCHAR ARRAY`. +```sql +SELECT + "pickup_longitude" AS "pickup_longitude", + ROUND("pickup_longitude", 0) as "rounded_pickup_longitude" +FROM "taxi-trips" +WHERE "pickup_longitude" IS NOT NULL +LIMIT 1 +``` +Returns the following: -## MV_TO_STRING +| `pickup_longitude` | `rounded_pickup_longitude` | +| -- | -- | +| `-73.9377670288086` | `-74` | +
-`MV_TO_STRING(arr, str)` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +## ROW_NUMBER -Joins all elements of the array together by the given delimiter. +Returns the number of the row within the window starting from 1. -## NTILE +* **Syntax**: `ROW_NUMBER()` +* **Function type:** Window -`NTILE(tiles)` +[Learn more](sql-window-functions.md#window-function-reference) -**Function type:** [Window](sql-window-functions.md#window-function-reference) +## RPAD -Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into. +Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, right pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`. -## NULLIF +* **Syntax:** `RPAD(expr, length[, chars])` +* **Function type:** Scalar, string -`NULLIF(value1, value2)` +
Example -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +The following example right pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters. -Returns NULL if two values are equal, else returns the first value. +```sql +SELECT + "OriginStateName" AS "origin_state", + RPAD("OriginStateName", 11, '+') AS "add_right_padding" +FROM "flight-carriers" +LIMIT 3 +``` -## NVL +Returns the following: -`NVL(e1, e2)` +| `origin_state` | `add_right_padding` | +| -- | -- | +| `Puerto Rico` | `Puerto Rico` | +| `Massachusetts` | `Massachuset` | +| `Florida` | `Florida++++` | -**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) +
-Returns `e2` if `e1` is null, else returns `e1`. +[Learn more](sql-scalar.md#string-functions) -## PARSE_JSON +## RTRIM -**Function type:** [JSON](sql-json-functions.md) +Trims characters from the trailing end of an expression. Defaults `chars` to a space if none is provided. -`PARSE_JSON(expr)` +* **Syntax:** `RTRIM(expr[, chars])` +* **Function type:** Scalar, string -Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in an error. +
Example -## PARSE_LONG +The following example trims the `_` characters from the trailing end of the string expression. -`PARSE_LONG(, [])` +```sql +SELECT + '___abc___' AS "original_string", + RTRIM('___abc___', '_') AS "trim_end" +``` -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +Returns the following: -Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified. +| `original_string` | `trim_end` | +| -- | -- | +| `___abc___` | `___abc` | -## PERCENT_RANK +
-`PERCENT_RANK()` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Window](sql-window-functions.md#window-function-reference) +## SAFE_DIVIDE -Returns the relative rank of the row calculated as a percentage according to the formula: `RANK() OVER (window) / COUNT(1) OVER (window)`. +Returns `x` divided by `y`, guarded on division by 0. -## POSITION +* **Syntax:** `SAFE_DIVIDE(x, y)` +* **Function type:** Scalar, numeric -`POSITION( IN [FROM ])` +
Example -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +The following example calculates divisions of integer `78` by integer `10`. -Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. +```sql +SELECT SAFE_DIVIDE(78, 10) AS "safe_division" +``` -## POWER +Returns the following: -`POWER(expr, power)` +|`safe_division`| +|--| +| `7` | -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
-Calculates a numerical expression raised to the specified power. +[Learn more](sql-scalar.md#numeric-functions) -## RADIANS +## SIN -`RADIANS(expr)` +Calculates the trigonometric sine of an angle expressed in radians. -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +* **Syntax:** `SIN(expr)` +* **Function type:** Scalar, numeric -Converts an angle from degrees to radians. +
Example -## RANK +The following example calculates the sine of angle `PI/3` radians. -`RANK()` +```sql +SELECT SIN(PI / 3) AS "sine" +``` +Returns the following: -**Function type:** [Window](sql-window-functions.md#window-function-reference) +| `sine` | +| -- | +| `0.8660254037844386` | +
-Returns the rank with gaps for a row within a window. For example, if two rows tie for rank 1, the next rank is 3. +[Learn more](sql-scalar.md#numeric-functions) -## REGEXP_EXTRACT +## SQRT -`REGEXP_EXTRACT(, , [])` +Calculates the square root of a numeric expression. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax:** `SQRT()` +* **Function type:** Scalar, numeric -Applies a regular expression to the string expression and returns the _n_th match. +
Example -## REGEXP_LIKE +The following example calculates the square root of 25. -`REGEXP_LIKE(, )` +```sql +SELECT SQRT(25) AS "square_root" +``` +Returns the following: -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +| `square_root` | +| -- | +| `5` | +
-Returns true or false signifying whether the regular expression finds a match in the string expression. +[Learn more](sql-scalar.md#numeric-functions) -## REGEXP_REPLACE +## STDDEV -`REGEXP_REPLACE(, , )` +Alias for [`STDDEV_SAMP`](#stddev_samp). +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax**: `STDDEV(expr)` +* **Function type:** Aggregation -Replaces all occurrences of a regular expression in a string expression with a replacement string. The replacement -string may refer to capture groups using `$1`, `$2`, etc. +[Learn more](sql-aggregations.md) -## REPEAT +## STDDEV_POP -`REPEAT(, [])` +Calculates the population standard deviation of a set of values. +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax**: `STDDEV_POP(expr)` +* **Function type:** Aggregation -Repeats the string expression an integer number of times. +
Example -## REPLACE +The following example calculates the population standard deviation for minutes of delay for an airline in `flight-carriers`: -`REPLACE(expr, pattern, replacement)` +```sql +SELECT STDDEV_POP("DepDelayMinutes") AS sd_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +Returns the following: -Replaces a pattern with another string in the given expression. +| `sd_delay` | +| -- | +| `27.083557` | -## REVERSE +
-`REVERSE(expr)` +[Learn more](sql-aggregations.md) -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +## STDDEV_SAMP -Reverses the given expression. +Calculates the sample standard deviation of a set of values. +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). -## RIGHT +* **Syntax**: `STDDEV_SAMP(expr)` +* **Function type:** Aggregation -`RIGHT(expr, [length])` +
Example -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +The following example calculates the sample standard deviation for minutes of delay for an airline in `flight-carriers`: -Returns the rightmost number of characters from an expression. +```sql +SELECT STDDEV_SAMP("DepDelayMinutes") AS sd_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` -## ROUND +Returns the following: -`ROUND(expr[, digits])` +| `sd_delay` | +| -- | +| `27.083811` | -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
-Calculates the rounded value for a numerical expression. +[Learn more](sql-aggregations.md) -## ROW_NUMBER +## STRING_AGG -`ROW_NUMBER()` +Collects all values of an expression into a single string. -**Function type:** [Window](sql-window-functions.md#window-function-reference) +* **Syntax**: `STRING_AGG(expr, separator, [size])` +* **Function type:** Aggregation -Returns the number of the row within the window starting from 1. +
Example -## RPAD +The following example returns all the distinct airlines from `flight-carriers` as a single space-delimited string: -`RPAD(, , [])` +```sql +SELECT + STRING_AGG(DISTINCT "Reporting_Airline", ' ') AS "AllCarriers" +FROM "flight-carriers" +``` -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +Returns the following: -Returns the rightmost number of characters from an expression, optionally padded with the given characters. +|`AllCarriers`| +|-------------| +|`AA AS B6 CO DH DL EV F9 FL HA HP MQ NW OH OO TZ UA US WN XE`| -## RTRIM +
-`RTRIM(, [])` +[Learn more](sql-aggregations.md) -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +## STRING_FORMAT -Trims characters from the trailing end of an expression. +Returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#format-java.lang.String-java.lang.Object...-). -## SAFE_DIVIDE +* **Syntax:** `STRING_FORMAT(pattern[, args...])` +* **Function type:** Scalar, string -`SAFE_DIVIDE(x, y)` +
Example -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +The following example uses Java String format to pass in `Flight_Number_Reporting_Airline` and `origin_airport` columns, from the `flight-carriers` datasource, as arguments into the string. -Returns `x` divided by `y`, guarded on division by 0. +```sql +SELECT + "Flight_Number_Reporting_Airline" AS "flight_number", + "Origin" AS "origin_airport", + STRING_FORMAT('Flight No.%d departing from %s', "Flight_Number_Reporting_Airline", "Origin") AS "departure_announcement" +FROM "flight-carriers" +LIMIT 1 +``` -## SIN +Returns the following: -`SIN(expr)` +| `flight_number` | `origin_airport` | `departure_announcement` | +| -- | -- | -- | +| `314` | `SJU` | `Flight No.314 departing from SJU` | -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +
-Calculates the trigonometric sine of an angle expressed in radians. +[Learn more](sql-scalar.md#string-functions) -## SQRT +## STRING_TO_ARRAY -`SQRT(expr)` +Splits the string into an array of substrings using the specified delimiter. The delimiter must be a valid regular expression. -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +* **Syntax**: `STRING_TO_ARRAY(string, delimiter)` +* **Function type:** Array -Calculates the square root of a numeric expression. +[Learn more](sql-array-functions.md) -## STDDEV +## STRING_TO_MV -`STDDEV(expr)` +Splits `str1` into an multi-value string on the delimiter specified by `str2`, which is a regular expression. -**Function type:** [Aggregation](sql-aggregations.md) +* **Syntax:** `STRING_TO_MV(str1, str2)` +* **Function type:** Multi-value string -Alias for [`STDDEV_SAMP`](#stddev_samp). +
Example -## STDDEV_POP +The following example splits a street address by whitespace characters: -`STDDEV_POP(expr)` +```sql +SELECT STRING_TO_MV('123 Rose Lane', '\s+') AS mv +``` -**Function type:** [Aggregation](sql-aggregations.md) +Returns the following: -Calculates the population standard deviation of a set of values. +| `mv` | +| -- | +| `["123","Rose","Lane"]` | -## STDDEV_SAMP +
-`STDDEV_SAMP(expr)` +[Learn more](sql-multivalue-string-functions.md) -**Function type:** [Aggregation](sql-aggregations.md) +## STRLEN -Calculates the sample standard deviation of a set of values. +Alias for [`LENGTH`](#length). -## STRING_AGG +* **Syntax:** `STRLEN(expr)` +* **Function type:** Scalar, string -`STRING_AGG(expr, separator, [size])` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Aggregation](sql-aggregations.md) +## STRPOS -Collects all values of an expression into a single string. +Returns the one-based index position of a substring within an expression. If `substring` is not found, returns 0. -## STRING_TO_ARRAY +* **Syntax:** `STRPOS(expr, substring)` +* **Function type:** Scalar, string -`STRING_TO_ARRAY(str1, str2)` +
Example -**Function type:** [Array](sql-array-functions.md) +The following example returns the one-based index position of `World`. -Splits `str1` into an array on the delimiter specified by `str2`, which is a regular expression. +```sql +SELECT + 'Hello World!' AS "original_string", + STRPOS('Hello World!', 'World') AS "index" +``` +Returns the following: -## STRING_FORMAT +| `original_string` | `index` | +| -- | -- | +| `Hello World!` | `7` | -`STRING_FORMAT(pattern[, args...])` +
-**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-scalar.md#string-functions) -Returns a string formatted in accordance to Java's String.format method. +## SUBSTR -## STRING_TO_MV +Alias for [`SUBSTRING`](#substring). -`STRING_TO_MV(str1, str2)` +* **Syntax:** `SUBSTR(expr, index[, length])` +* **Function type:** Scalar, string -**Function type:** [Multi-value string](sql-multivalue-string-functions.md) +[Learn more](sql-scalar.md#string-functions) -Splits `str1` into an multi-value string on the delimiter specified by `str2`, which is a regular expression. -## STRLEN +## SUBSTRING -`STRLEN(expr)` +Returns a substring of the expression starting at a given one-based index. If `length` is omitted, extracts characters to the end of the string, otherwise returns a substring of `length` characters. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax:** `SUBSTRING(expr, index[, length])` +* **Function type:** Scalar, string -Alias for [`LENGTH`](#length). +
Example -## STRPOS +The following example extracts a substring from the string expression `abcdefghi` of length `3` starting at index `4` -`STRPOS(, )` +```sql +SELECT + 'abcdefghi' AS "original_string", + SUBSTRING('abcdefghi', 4, 3) AS "substring" +``` -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +Returns the following: -Returns the one-based index position of a substring within an expression. +| `original_string` | `substring` | +| -- | -- | +| `abcdefghi` | `def` | -## SUBSTR +
-`SUBSTR(, , [])` -**Function type:** [Scalar, string](sql-scalar.md#string-functions) -Alias for [`SUBSTRING`](#substring). +[Learn more](sql-scalar.md#string-functions) -## SUBSTRING +## SUM -`SUBSTRING(, , [])` +Calculates the sum of a set of values. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax**: `SUM(expr)` +* **Function type:** Aggregation -Returns a substring of the expression starting at a given one-based index. +
Example -## SUM +The following example calculates the total minutes of delay for an airline in `flight-carriers`: -`SUM(expr)` +```sql +SELECT SUM("DepDelayMinutes") AS tot_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` -**Function type:** [Aggregation](sql-aggregations.md) +Returns the following: -Calculates the sum of a set of values. +| `tot_delay` | +| -- | +| `475735` | -## TAN +
-`TAN(expr)` +[Learn more](sql-aggregations.md) -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +## TAN Calculates the trigonometric tangent of an angle expressed in radians. -## TDIGEST_GENERATE_SKETCH +* **Syntax:** `TAN(expr)` +* **Function type:** Scalar, numeric + +
Example + +The following example calculates the tangent of angle `PI/3` radians. -`TDIGEST_GENERATE_SKETCH(expr, [compression])` +```sql +SELECT TAN(PI / 3) AS "tangent" +``` +Returns the following: -**Function type:** [Aggregation](sql-aggregations.md) +| `tangent` | +| -- | +| `1.7320508075688767` | +
+ +[Learn more](sql-scalar.md#numeric-functions) + +## TDIGEST_GENERATE_SKETCH Generates a T-digest sketch from values of the specified expression. -## TDIGEST_QUANTILE +* **Syntax**: `TDIGEST_GENERATE_SKETCH(expr, [compression])` +* **Function type:** Aggregation -`TDIGEST_QUANTILE(expr, quantileFraction, [compression])` +[Learn more](sql-aggregations.md) -**Function type:** [Aggregation](sql-aggregations.md) +## TDIGEST_QUANTILE Returns the quantile for the specified fraction from a T-Digest sketch constructed from values of the expression. -## TEXTCAT +* **Syntax**: `TDIGEST_QUANTILE(expr, quantileFraction, [compression])` +* **Function type:** Aggregation -`TEXTCAT(, )` +[Learn more](sql-aggregations.md) -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +## TEXTCAT Concatenates two string expressions. +* **Syntax:** `TEXTCAT(expr, expr)` +* **Function type:** Scalar, string + +
Example + +The following example concatenates the `OriginState` column from the `flight-carriers` datasource to `, USA`. + +```sql +SELECT + "OriginState" AS "origin_state", + TEXTCAT("OriginState", ', USA') AS "concatenate_state_with_USA" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `origin_state` | `concatenate_state_with_USA` | +| -- | -- | +| `PR` | `PR, USA` | + +
+ +[Learn more](sql-scalar.md#string-functions) + ## THETA_SKETCH_ESTIMATE -`THETA_SKETCH_ESTIMATE(expr)` +Returns the distinct count estimate from a Theta sketch. The `expr` argument must return a Theta sketch. + +* **Syntax:** `THETA_SKETCH_ESTIMATE(expr)` +* **Function type:** Scalar, sketch + +
Example -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +The following example estimates the distinct number of tail numbers in the `Tail_Number` column of the `flight-carriers` datasource. -Returns the distinct count estimate from a Theta sketch. +```sql +SELECT THETA_SKETCH_ESTIMATE( DS_THETA("Tail_Number") ) AS "estimate" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate` | +| -- | +| `4667` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) ## THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS -`THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, errorBoundsStdDev)` +Returns the distinct count estimate and error bounds from a Theta sketch. The `expr` argument must return a Theta sketch. Use `errorBoundsStdDev` to specify the number of standard error bound deviations. + +* **Syntax:** `THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, errorBoundsStdDev)` +* **Function type:** Scalar, sketch + +
Details + +The following example estimates the number of distinct tail numbers in the `Tail_Number` column of the `flight-carriers` datasource with error bounds at plus or minus one standard deviation. + +```sql +SELECT THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(DS_THETA("Tail_Number", 4096), 1) AS "estimate_with_error" +FROM "flight-carriers" +``` -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +Returns the following: -Returns the distinct count estimate and error bounds from a Theta sketch. +| `estimate_with_error` | +| -- | +| `{"estimate":4691.201541339628,"highBound":4718.4577807143205,"lowBound":4664.093801991001,"numStdDev":1}` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) ## THETA_SKETCH_INTERSECT -`THETA_SKETCH_INTERSECT([size], expr0, expr1, ...)` +Returns an intersection of Theta sketches. Each input expression must return a Theta sketch. See [DataSketches Theta Sketch module](../development/extensions-core/datasketches-theta#aggregator) for a description of optional parameters. + +* **Syntax:** `THETA_SKETCH_INTERSECT([size], expr0, expr1, ...)` +* **Function type:** Scalar, sketch + +
Example + +The following example estimates the intersection of distinct tail numbers in the `flight-carriers` datasource for flights originating in CA, TX, and NY. + +```sql +SELECT + THETA_SKETCH_ESTIMATE( + THETA_SKETCH_INTERSECT( + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') + ) + ) AS "estimate_intersection" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate_intersection` | +| -- | +| `1701` | -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +
-Returns an intersection of Theta sketches. +[Learn more](sql-scalar.md#sketch-functions) ## THETA_SKETCH_NOT -`THETA_SKETCH_NOT([size], expr0, expr1, ...)` +Returns a set difference of Theta sketches. Each input expression must return a Theta sketch. See [DataSketches Theta Sketch module](../development/extensions-core/datasketches-theta#aggregator) for a description of optional parameters. -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +* **Syntax:** `THETA_SKETCH_NOT([size], expr0, expr1, ...)` +* **Function type:** Scalar, sketch -Returns a set difference of Theta sketches. +
Example + +The following example estimates the number of distinct tail numbers in the `flight-carriers` datasource for flights not originating in CA, TX, or NY. + +```sql +SELECT + THETA_SKETCH_ESTIMATE( + THETA_SKETCH_NOT( + DS_THETA("Tail_Number"), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') + ) + ) AS "estimate_not" +FROM "flight-carriers" +``` + +Returns the following: + +| `estimate_not` | +| -- | +| `145` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) ## THETA_SKETCH_UNION -`THETA_SKETCH_UNION([size], expr0, expr1, ...)` +Returns a union of Theta sketches. Each input expression must return a Theta sketch. See [DataSketches Theta Sketch module](../development/extensions-core/datasketches-theta#aggregator) for a description of optional parameters. + +* **Syntax:**`THETA_SKETCH_UNION([size], expr0, expr1, ...)` +* **Function type:** Scalar, sketch + +
Example -**Function type:** [Scalar, sketch](sql-scalar.md#sketch-functions) +The following example estimates the number of distinct tail numbers that depart from CA, TX, or NY. -Returns a union of Theta sketches. +```sql +SELECT + THETA_SKETCH_ESTIMATE( + THETA_SKETCH_UNION( + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), + DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') + ) + ) AS "estimate_union" +FROM "flight-carriers" +``` +Returns the following: + +| `estimate_union` | +| -- | +| `4522` | + +
+ +[Learn more](sql-scalar.md#sketch-functions) ## TIME_CEIL -`TIME_CEIL(, , [, []])` +Rounds up a timestamp to a given ISO 8601 time period. You can specify `origin` to provide a reference timestamp from which to start rounding. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. + +* **Syntax:** `TIME_CEIL(timestamp_expr, period[, origin[, timezone]])` +* **Function type:** Scalar, date and time + +
Example + +The following example rounds up the `__time` column from the `taxi-trips` datasource to the nearest 45th minute in reference to the timestamp `2013-08-01 08:0:00`. + +```sql +SELECT + "__time" AS "original_timestamp", + TIME_CEIL("__time", 'PT45M', TIMESTAMP '2013-08-01 08:0:00') AS "time_ceiling" +FROM "taxi-trips" +LIMIT 2 +``` + +Returns the following: -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +| `original_timestamp` | `time_ceiling` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013-08-01T08:45:00.000Z` | +| `2013-08-01T09:13:00.000Z` | `2013-08-01T09:30:00.000Z` | +
-Rounds up a timestamp by a given time period, optionally from some reference time or timezone. +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_EXTRACT -`TIME_EXTRACT(, [, []])` +Extracts the value of `unit` from the timestamp and returns it as a number. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +* **Syntax:** `TIME_EXTRACT(timestamp_expr[, unit[, timezone]])` +* **Function type:** Scalar, date and time -Extracts the value of some unit of the timestamp and returns the number. +
Example + +The following example extracts the hour from the `__time` column in the `taxi-trips` datasource and offsets its timezone by `-04:00` hours. + +```sql +SELECT + "__time" AS "original_timestamp", + TIME_EXTRACT("__time", 'hour', '-04:00') AS "extract_hour" +FROM "taxi-trips" +LIMIT 2 +``` + +Returns the following: + +| `original_timestamp` | `extract_hour` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `4` | +| `2013-08-01T09:13:00.000Z` | `5` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_FLOOR -`TIME_FLOOR(, , [, []])` +Rounds down a timestamp to a given ISO 8601 time period. You can specify `origin` to provide a reference timestamp from which to start rounding. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. + +* **Syntax:** `TIME_FLOOR(timestamp_expr, period[, origin[, timezone]])` +* **Function type:** Scalar, date and time + +
Example -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +The following example rounds down the `__time` column from the `taxi-trips` datasource to the nearest 45th minute in reference to the timestamp `2013-08-01 08:0:00`. -Rounds down a timestamp by a given time period, optionally from some reference time or timezone. +```sql +SELECT + "__time" AS "original_timestamp", + TIME_FLOOR("__time", 'PT45M', TIMESTAMP '2013-08-01 08:0:00') AS "time_floor" +FROM "taxi-trips" +LIMIT 2 +``` + +Returns the following: + +| `original_timestamp` | `time_floor` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013-08-01T08:00:00.000Z` | +| `2013-08-01T09:13:00.000Z` | `2013-08-01T08:45:00.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_FORMAT -`TIME_FORMAT(, [, []])` +Formats a timestamp as a string in a provided [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html). If no pattern is provided, `pattern` defaults to ISO 8601. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. + +* **Syntax:** `TIME_FORMAT(timestamp_expr[, pattern[, timezone]])` +* **Function type:** Scalar, date and time + +
Example + +The following example formats the `__time` column from the `flight-carriers` datasource into a string format and offsets the result's timezone by `-05:00` hours. + +```sql +SELECT + "__time" AS "original_time", +TIME_FORMAT( "__time", 'dd-MM-YYYY hh:mm aa zzz', '-05:00') AS "string" +FROM "taxi-trips" +LIMIT 1 +``` -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +Returns the following: -Formats a timestamp as a string. +| `original_time` | `string` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `01-08-2013 03:14 AM -05:00` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_IN_INTERVAL -`TIME_IN_INTERVAL(, )` +Returns true if a timestamp is contained within a particular interval. Intervals must be formatted as a string literal containing any ISO 8601 interval. The start instant of an interval is inclusive, and the end instant is exclusive. + +* **Syntax:** `TIME_IN_INTERVAL(timestamp_expr, interval)` +* **Function type:** Scalar, date and time + +
Example + +The following example returns true when a timestamp in the `__time` column of the `taxi-trips` datasource is within a hour interval starting from `2013-08-01T08:00:00`. + +```sql +SELECT + "__time" AS "original_time", + TIME_IN_INTERVAL("__time", '2013-08-01T08:00:00/PT1H') AS "in_interval" +FROM "taxi-trips" +LIMIT 2 +``` + +Returns the following: + +| `original_time` | `in_interval` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `true` | +| `2013-08-01T09:13:00.000Z` | `false` | -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +
-Returns whether a timestamp is contained within a particular interval, formatted as a string. +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_PARSE -`TIME_PARSE(, [, []])` +Parses a string into a timestamp using a given [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html). If no pattern is provided, `pattern` defaults to ISO 8601. Returns NULL if string cannot be parsed. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +* **Syntax:** `TIME_PARSE(string_expr[, pattern[, timezone]])` +* **Function type:** Scalar, date and time -Parses a string into a timestamp. +
Example + +The following example parses the `FlightDate` STRING column from the `flight-carriers` datasource into a valid timestamp with an offset of `-05:00` hours. + +```sql +SELECT + "FlightDate" AS "original_string", + TIME_PARSE("FlightDate", 'YYYY-MM-dd', '-05:00') AS "timestamp" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `original_string` | `timestamp` | +| -- | -- | +| `2005-11-01` | `2005-11-01T05:00:00.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) ## TIME_SHIFT -`TIME_SHIFT(, , , [])` +Shifts a timestamp by a given number of time units. The `period` parameter can be any ISO 8601 period. The `step` parameter can be negative. If provided, `timezone` should be a time zone name like `America/Los_Angeles` or an offset like `-08:00`. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +* **Syntax:** `TIME_SHIFT(timestamp_expr, period, step[, timezone])` +* **Function type:** Scalar, date and time -Shifts a timestamp forwards or backwards by a given number of time units. +
Example -## TIMESTAMP_TO_MILLIS +The following example shifts the `__time` column from the `taxi-trips` datasource back by 24 hours. + +```sql +SELECT + "__time" AS "original_timestamp", + TIME_SHIFT("__time", 'PT1H', -24) AS "shift_back" +FROM "taxi-trips" +LIMIT 1 +``` -`TIMESTAMP_TO_MILLIS()` +Returns the following: -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +| `original_timestamp` | `shift_back` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013-07-31T08:14:37.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + + +## TIMESTAMP_TO_MILLIS Returns the number of milliseconds since epoch for the given timestamp. +* **Syntax:** `TIMESTAMP_TO_MILLIS(timestamp_expr)` +* **Function type:** Scalar, date and time + +
Example + +The following example converts the `__time` column from the `taxi-trips` datasource into milliseconds since epoch. + +```sql +SELECT + "__time" AS "original_time", + TIMESTAMP_TO_MILLIS("__time") AS "miliseconds" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `original_time` | `miliseconds` | +| -- | -- | +| `2013-08-01T08:14:37.000Z` | `1375344877000` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) + ## TIMESTAMPADD -`TIMESTAMPADD(, , )` +Add a `unit` of time multiplied by `count` to `timestamp`. + +* **Syntax:** `TIMESTAMPADD(unit, count, timestamp)` +* **Function type:** Scalar, date and time + +
Example + +The following example adds five months to the timestamp `2000-01-01 00:00:00`. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +```sql +SELECT + TIMESTAMP '2000-01-01 00:00:00' AS "original_time", + TIMESTAMPADD (MONTH, 5, TIMESTAMP '2000-01-01 00:00:00') AS "new_time" +``` -Adds a certain amount of time to a given timestamp. +Returns the following: + +| `original_time` | `new_time` | +| -- | -- | +| `2000-01-01T00:00:00.000Z` | `2000-06-01T00:00:00.000Z` | + +
+ +[Learn more](sql-scalar.md#date-and-time-functions) ## TIMESTAMPDIFF -`TIMESTAMPDIFF(, , )` +Returns the difference between two timestamps in a given unit. -**Function type:** [Scalar, date and time](sql-scalar.md#date-and-time-functions) +* **Syntax:** `TIMESTAMPDIFF(unit, timestamp1, timestamp2)` +* **Function type:** Scalar, date and time -Takes the difference between two timestamps, returning the results in the given units. +
Example -## TO_JSON_STRING +The following example calculates the taxi trip length in minutes by subtracting the `__time` column from the `dropoff_datetime` column in the `taxi-trips` datasource. + +```sql +SELECT + "__time" AS "pickup_time", + "dropoff_datetime" AS "dropoff_time", + TIMESTAMPDIFF (MINUTE, "__time", TIME_PARSE("dropoff_datetime")) AS "trip_length" +FROM "taxi-trips" +LIMIT 1 +``` + +Returns the following: + +| `pickup_time` | `dropoff_time` | `trip_length` | +| -- | -- | -- | +| `2013-08-01T08:14:37.000Z` | `2013-08-01 09:09:06` | `54` | + +
-**Function type:** [JSON](sql-json-functions.md) +[Learn more](sql-scalar.md#date-and-time-functions) -`TO_JSON_STRING(expr)` +## TO_JSON_STRING Serializes `expr` into a JSON string. +* **Syntax**: `TO_JSON_STRING(expr)` +* **Function type:** JSON + +[Learn more](sql-json-functions.md) + ## TRIM -`TRIM([BOTH|LEADING|TRAILING] [ FROM] expr)` +Trims the leading and/or trailing characters of an expression. Defaults `chars` to a space if none is provided. Defaults to `BOTH` if no directional argument is provided. -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +* **Syntax:** `TRIM([BOTH|LEADING|TRAILING] [chars FROM] expr)` +* **Function type:** Scalar, string -Trims the leading or trailing characters of an expression. +
Example -## TRUNC +The following example trims `_` characters from both ends of the string expression. + +```sql +SELECT + '___abc___' AS "original_string", + TRIM( BOTH '_' FROM '___abc___') AS "trim_expression" +``` + +Returns the following: + +| `original_string` | `trim_expression` | +| -- | -- | +| `___abc___` | `abc` | + +
-`TRUNC(expr[, digits])` +[Learn more](sql-scalar.md#string-functions) -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +## TRUNC Alias for [`TRUNCATE`](#truncate). -## TRUNCATE +* **Syntax:** `TRUNC(expr[, digits])` +* **Function type:** Scalar, numeric -`TRUNCATE(expr[, digits])` +[Learn more](sql-scalar.md#numeric-functions) -**Function type:** [Scalar, numeric](sql-scalar.md#numeric-functions) +## TRUNCATE Truncates a numerical expression to a specific number of decimal digits. +* **Syntax:** `TRUNCATE(expr[, digits])` +* **Function type:** Scalar, numeric -## TRY_PARSE_JSON +
Example -**Function type:** [JSON](sql-json-functions.md) +The following applies the TRUNCATE function to 1 decimal place on the `pickup_longitude` column from the `taxi-trips` datasource. -`TRY_PARSE_JSON(expr)` +```sql +SELECT + "pickup_longitude" as "pickup_longitude", + TRUNCATE("pickup_longitude", 1) as "truncate_pickup_longitude" +FROM "taxi-trips" +WHERE "pickup_longitude" IS NOT NULL +LIMIT 1 +``` +Returns the following: -Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value. +| `pickup_longitude` | `truncate_pickup_longitude` | +| -- | -- | +| `-73.9377670288086` | `-73.9` | -## UNNEST +
-`UNNEST(source_expression) as table_alias_name(column_alias_name)` -Unnests a source expression that includes arrays into a target column with an aliased name. +[Learn more](sql-scalar.md#numeric-functions) -For more information, see [UNNEST](./sql.md#unnest). +## TRY_PARSE_JSON -## UPPER +Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value. -`UPPER(expr)` +* **Syntax**: `TRY_PARSE_JSON(expr)` +* **Function type:** JSON -**Function type:** [Scalar, string](sql-scalar.md#string-functions) +[Learn more](sql-json-functions.md) + + +## UPPER Returns the expression in uppercase. +* **Syntax:** `UPPER(expr)` +* **Function type:** Scalar, string + +
Example + +The following example converts the `OriginCityName` column from the `flight-carriers` datasource to uppercase. + +```sql +SELECT + "OriginCityName" AS "origin_city", + UPPER("OriginCityName") AS "uppercase" +FROM "flight-carriers" +LIMIT 1 +``` + +Returns the following: + +| `origin_city` | `uppercase` | +| -- | -- | +| `San Juan, PR` | `SAN JUAN, PR` | + +
+ +[Learn more](sql-scalar.md#string-functions) + ## VAR_POP -`VAR_POP(expr)` +Calculates the population variance of a set of values. +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). + +* **Syntax**: `VAR_POP(expr)` +* **Function type:** Aggregation + +
Example + +The following example calculates the population variance for minutes of delay by a particular airlines in `flight-carriers`: + +```sql +SELECT VAR_POP("DepDelayMinutes") AS varpop_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` + +Returns the following: -**Function type:** [Aggregation](sql-aggregations.md) +| `varpop_delay` | +| -- | +| `733.51908` | -Calculates the population variance of a set of values. +
+ +[Learn more](sql-aggregations.md) ## VAR_SAMP -`VAR_SAMP(expr)` +Calculates the sample variance of a set of values. +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). + +* **Syntax**: `VAR_SAMP(expr)` +* **Function type:** Aggregation + +
Example + +The following example calculates the sample variance for minutes of delay for an airline in `flight-carriers`: -**Function type:** [Aggregation](sql-aggregations.md) +```sql +SELECT VAR_SAMP("DepDelayMinutes") AS varsamp_delay +FROM "flight-carriers" +WHERE "Reporting_Airline" = 'AA' +``` -Calculates the sample variance of a set of values. +Returns the following: + +| `varsamp_delay` | +| -- | +| `733.53286` | + +
+ +[Learn more](sql-aggregations.md) ## VARIANCE -`VARIANCE(expr)` +Alias for [`VAR_SAMP`](#var_samp). +Requires the [`druid-stats` extension](../development/extensions-core/stats.md). + +* **Syntax**: `VARIANCE(expr)` +* **Function type:** Aggregation -**Function type:** [Aggregation](sql-aggregations.md) +[Learn more](sql-aggregations.md) -Alias for [`VAR_SAMP`](#var_samp). diff --git a/docs/querying/sql-multivalue-string-functions.md b/docs/querying/sql-multivalue-string-functions.md index 4851a1ab3f1b..553062145425 100644 --- a/docs/querying/sql-multivalue-string-functions.md +++ b/docs/querying/sql-multivalue-string-functions.md @@ -51,16 +51,16 @@ All array references in the multi-value string function documentation can refer |`MV_FILTER_ONLY(expr, arr)`|Filters multi-value `expr` to include only values contained in array `arr`.| |`MV_FILTER_NONE(expr, arr)`|Filters multi-value `expr` to include no values contained in array `arr`.| |`MV_LENGTH(arr)`|Returns length of the array expression.| +|`MV_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns true if `arr` contains `expr`. If `expr` is an array, returns true if `arr` contains all elements of `expr`. Otherwise returns false.| +|`MV_OVERLAP(arr1, arr2)`|Returns true if `arr1` and `arr2` have any elements in common, else false.| |`MV_OFFSET(arr, long)`|Returns the array element at the 0-based index supplied, or null for an out of range index.| -|`MV_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| -|`MV_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` contains `expr`. If `expr` is an array, returns 1 if `arr` contains all elements of `expr`. Otherwise returns 0.| -|`MV_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements in common, else 0.| |`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| +|`MV_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| |`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.| |`MV_PREPEND(expr, arr)`|Adds `expr` to the beginning of `arr`, the resulting array type determined by the type `arr`.| |`MV_APPEND(arr, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr`.| |`MV_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| -|`MV_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0-based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or greater than end.| +|`MV_SLICE(arr, start, end)`|Returns the subarray of `arr` from the zero-based index of `start` (inclusive) to `end` (exclusive). Returns null when `start` is less than 0, greater than the array length, or greater than `end`. When `end` is greater than the array length, null values are appended to the subarray.| |`MV_TO_STRING(arr, str)`|Joins all elements of `arr` by the delimiter specified by `str`.| |`STRING_TO_MV(str1, str2)`|Splits `str1` into an array on the delimiter specified by `str2`, which is a regular expression.| |`MV_TO_ARRAY(str)`|Converts a multi-value string from a `VARCHAR` to a `VARCHAR ARRAY`.| diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index fe91aa0055d5..05f29436da0f 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -223,8 +223,8 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |Function|Notes| |--------|-----| -|`HLL_SKETCH_ESTIMATE(expr[, round])`|Returns the distinct count estimate from an HLL sketch. `expr` must return an HLL sketch. The optional `round` boolean parameter will round the estimate if set to `true`, with a default of `false`.| -|`HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr[, numStdDev])`|Returns the distinct count estimate and error bounds from an HLL sketch. `expr` must return an HLL sketch. An optional `numStdDev` argument can be provided.| +|`HLL_SKETCH_ESTIMATE(expr[, round])`|Returns a distinct count estimate from a HLL sketch. `expr` must be a HLL sketch. To round the estimate, set `round` to true. Otherwise, `round` defaults to false.| +|`HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr[, numStdDev])`|Returns a distinct count estimate and error bounds from a HLL sketch. `expr` must be a HLL sketch. `numStdDev` argument specifies the number of standard deviations of the bounds. `numStdDev` must be `1`, `2`, or `3`. | |`HLL_SKETCH_UNION([lgK, tgtHllType], expr0, expr1, ...)`|Returns a union of HLL sketches, where each input expression must return an HLL sketch. The `lgK` and `tgtHllType` can be optionally specified as the first parameter; if provided, both optional parameters must be specified.| |`HLL_SKETCH_TO_STRING(expr)`|Returns a human-readable string representation of an HLL sketch for debugging. `expr` must return an HLL sketch.| @@ -276,7 +276,7 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |`CASE expr WHEN value1 THEN result1 \[ WHEN value2 THEN result2 ... \] \[ ELSE resultN \] END`|Simple CASE.| |`CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END`|Searched CASE.| |`CAST(value AS TYPE)`|Cast value to another type. See [Data types](sql-data-types.md) for details about how Druid SQL handles CAST.| -|`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.| +|`COALESCE(value1, value2, ...)`|Returns the first non-null value.| |`DECODE_BASE64_COMPLEX(dataType, expr)`| Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. The `hyperUnique` and `serializablePairLongString` data types are supported by default. You can enable support for the following complex data types by loading their extensions:
  • `druid-bloom-filter`: `bloom`
  • `druid-datasketches`: `arrayOfDoublesSketch`, `HLLSketch`, `KllDoublesSketch`, `KllFloatsSketch`, `quantilesDoublesSketch`, `thetaSketch`
  • `druid-histogram`: `approximateHistogram`, `fixedBucketsHistogram`
  • `druid-stats`: `variance`
  • `druid-compressed-bigdecimal`: `compressedBigDecimal`
  • `druid-momentsketch`: `momentSketch`
  • `druid-tdigestsketch`: `tDigestSketch`
| |`NULLIF(value1, value2)`|Returns NULL if `value1` and `value2` match, else returns `value1`.| |`NVL(value1, value2)`|Returns `value1` if `value1` is not null, otherwise `value2`.| \ No newline at end of file diff --git a/website/.spelling b/website/.spelling index da7563ffc6a3..27595a3e5079 100644 --- a/website/.spelling +++ b/website/.spelling @@ -180,6 +180,7 @@ LRU LZ4 LZO LimitSpec +LISTAGG Long.MAX_VALUE Long.MAX_VALUE. Long.MIN_VALUE From 8bd89d4e3216a2fde1fbdd299bae9ab0f9e25a05 Mon Sep 17 00:00:00 2001 From: Victoria Lim Date: Fri, 24 Jan 2025 16:32:40 -0800 Subject: [PATCH 2/7] fix typo (#17663) --- docs/querying/sql-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index dff531885341..75e9d54ed8c0 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -172,7 +172,7 @@ Returns any value of the specified expression.
Example -The following example returns the state abbrevation, state name, and average flight time grouped by each state in `flight-carriers`: +The following example returns the state abbreviation, state name, and average flight time grouped by each state in `flight-carriers`: ```sql SELECT From 6a0fc992b502633278b5f1f682518eb7804a6448 Mon Sep 17 00:00:00 2001 From: Victoria Lim Date: Mon, 27 Jan 2025 16:45:04 -0800 Subject: [PATCH 3/7] add tuple examples (#17667) --- docs/querying/sql-aggregations.md | 5 +- docs/querying/sql-functions.md | 160 +++++++++++++++++++++++++++++- 2 files changed, 159 insertions(+), 6 deletions(-) diff --git a/docs/querying/sql-aggregations.md b/docs/querying/sql-aggregations.md index 2af45a530e0a..1a90f9c72d6f 100644 --- a/docs/querying/sql-aggregations.md +++ b/docs/querying/sql-aggregations.md @@ -146,9 +146,8 @@ Load the [DataSketches extension](../development/extensions-core/datasketches-ex |Function|Notes|Default| |--------|-----|-------| -|`DS_TUPLE_DOUBLES(expr, [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on the values of `expr` which is a column containing Tuple sketches which contain an array of double values as their Summary Objects. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation. -|`DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) which contains an array of double values as its Summary Object based on the dimension value of `dimensionColumnExpr` and the numeric metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). - +|`DS_TUPLE_DOUBLES(expr[, nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on a precomputed sketch column `expr`, where the precomputed Tuple sketch contains an array of double values as its Summary Object. The `nominalEntries` override parameter is optional and described in the Tuple sketch documentation. +|`DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr1[, metricColumnExpr2, ...], [nominalEntries])`|Creates a [Tuple sketch](../development/extensions-core/datasketches-tuple.md) on raw data. The Tuples sketch will contain an array of double values as its Summary Object based on the dimension value of `dimensionColumnExpr` and the numeric metric values contained in one or more `metricColumnExpr` columns. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). ### T-Digest sketch functions diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 75e9d54ed8c0..519623c833cb 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -2129,12 +2129,34 @@ Returns the following: ## DS_TUPLE_DOUBLES -Creates a Tuple sketch which contains an array of double values as the Summary Object. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for [nominal entries](../development/extensions-core/datasketches-tuple.md). +Creates a Tuple sketch on raw data or a precomputed sketch column. See [DataSketches Tuple Sketch module](../development/extensions-core/datasketches-tuple.md) for a description of parameters. -* **Syntax**: `DS_TUPLE_DOUBLES(expr, [nominalEntries])` - `DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr, ..., [nominalEntries])` +* **Syntax**: `DS_TUPLE_DOUBLES(expr[, nominalEntries])` + `DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr1[, metricColumnExpr2, ...], [nominalEntries])` * **Function type:** Aggregation +
Example + +The following example creates a Tuples sketch column that stores the arrival and departure delay minutes for each airline in `flight-carriers`: + +```sql +SELECT + "Reporting_Airline", + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes", "DepDelayMinutes") AS tuples_delay +FROM "flight-carriers" +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`Reporting_Airline`|`tuples_delay`| +|-------------------|--------------| +|`AA`|`1.0`| +|`AS`|`1.0`| + +
+ [Learn more](sql-aggregations.md) ## DS_TUPLE_DOUBLES_INTERSECT @@ -2144,6 +2166,37 @@ Returns an intersection of Tuple sketches which each contain an array of double * **Syntax**: `DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])` * **Function type:** Scalar, sketch +
Example + +The following example calculates the total minutes of arrival delay for airlines flying out of `SFO` or `LAX`. +An airline that doesn't fly out of both airports returns a value of 0. + +```sql +SELECT + "Reporting_Airline", + DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( + DS_TUPLE_DOUBLES_INTERSECT( + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') + ) + ) AS arrival_delay_sfo_lax +FROM "flight-carriers" +GROUP BY 1 +LIMIT 5 +``` + +Returns the following: + +|`Reporting_Airline`|`arrival_delay_sfo_lax`| +|----|---------| +|`AA`|`[33296]`| +|`AS`|`[13694]`| +|`B6`|`[0]`| +|`CO`|`[13582]`| +|`DH`|`[0]`| + +
+ [Learn more](sql-scalar.md#tuple-sketch-functions) ## DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE @@ -2153,6 +2206,47 @@ Computes approximate sums of the values contained within a Tuple sketch which co * **Syntax**: `DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)` * **Function type:** Scalar, sketch +
Example + +The following example calculates the sum of arrival and departure delay minutes for each airline in `flight-carriers`: + +```sql +SELECT + "Reporting_Airline", + DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes", "DepDelayMinutes")) AS sum_delays +FROM "flight-carriers" +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`Reporting_Airline`|`sum_delays`| +|----|-----------------| +|`AA`|`[612831,474309]`| +|`AS`|`[157340,141462]`| + +Compare this example with an analogous SQL statement that doesn't use approximations: + +```sql +SELECT + "Reporting_Airline", + SUM("ArrDelayMinutes") AS sum_arrival_delay, + SUM("DepDelayMinutes") AS sum_departure_delay +FROM "flight-carriers" +GROUP BY 1 +LIMIT 2 +``` + +Returns the following: + +|`Reporting_Airline`|`sum_arrival_delay`|`sum_departure_delay`| +|----|--------|--------| +|`AA`|`612831`|`475735`| +|`AS`|`157340`|`143620`| + +
+ [Learn more](sql-scalar.md#tuple-sketch-functions) ## DS_TUPLE_DOUBLES_NOT @@ -2162,6 +2256,36 @@ Returns a set difference of Tuple sketches which each contain an array of double * **Syntax**: `DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])` * **Function type:** Scalar, sketch +
Example + +The following example calculates the total minutes of arrival delay for airlines that fly out of `SFO` but not `LAX`. + +```sql +SELECT + "Reporting_Airline", + DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( + DS_TUPLE_DOUBLES_NOT( + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') + ) + ) AS arrival_delay_sfo_lax +FROM "flight-carriers" +GROUP BY 1 +LIMIT 5 +``` + +Returns the following: + +|`Reporting_Airline`|`arrival_delay_sfo_lax`| +|----|---------| +|`AA`|`[0]`| +|`AS`|`[0]`| +|`B6`|`[0]`| +|`CO`|`[0]`| +|`DH`|`[93]`| + +
+ [Learn more](sql-scalar.md#tuple-sketch-functions) ## DS_TUPLE_DOUBLES_UNION @@ -2171,6 +2295,36 @@ Returns a union of Tuple sketches which each contain an array of double values a * **Syntax**: `DS_TUPLE_DOUBLES_UNION(expr, ..., [nominalEntries])` * **Function type:** Scalar, sketch +
Example + +The following example calculates the total minutes of arrival delay for airlines flying out of either `SFO` or `LAX`. + +```sql +SELECT + "Reporting_Airline", + DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( + DS_TUPLE_DOUBLES_UNION( + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), + DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') + ) + ) AS arrival_delay_sfo_lax +FROM "flight-carriers" +GROUP BY 1 +LIMIT 5 +``` + +Returns the following: + +|`Reporting_Airline`|`arrival_delay_sfo_lax`| +|----|---------| +|`AA`|`[33296]`| +|`AS`|`[13694]`| +|`B6`|`[0]`| +|`CO`|`[13582]`| +|`DH`|`[93]`| + +
+ [Learn more](sql-scalar.md#tuple-sketch-functions) ## EARLIEST From 08c0a8dd9f08dccc2ad0c2ba45d24eb84eadabe5 Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Tue, 28 Jan 2025 09:55:39 -0800 Subject: [PATCH 4/7] docs: fix typos in tabs and header (#17673) --- docs/release-info/migr-ansi-sql-null.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/release-info/migr-ansi-sql-null.md b/docs/release-info/migr-ansi-sql-null.md index 1655867ca156..b8548e8ce1a1 100644 --- a/docs/release-info/migr-ansi-sql-null.md +++ b/docs/release-info/migr-ansi-sql-null.md @@ -30,7 +30,7 @@ In Apache Druid 32.0.0, legacy configurations which were incompatible with the A These configurations were: * `druid.generic.useDefaultValueForNull` * `druid.expressions.useStrictBooleans` -* `druid.generic.useThreeValueLogicForNativeFilters`  +* `druid.generic.useThreeValueLogicForNativeFilters` This guide provides strategies for Druid operators who rely on legacy Druid null handling behavior in their applications to transition to Druid 32.0.0 or later. @@ -50,7 +50,7 @@ Prior to Druid 28.0.0, Druid defaulted to a legacy mode which stored default val In this mode, Druid created segments with the following characteristics at ingestion time: - String columns couldn't distinguish an empty string, `''`, from null. - Therefore, Druid treated them both as interchangeable values. + Therefore, Druid treated both values as interchangeable. - Numeric columns couldn't represent null valued rows. Therefore, Druid stored `0` instead of `null`. @@ -207,7 +207,7 @@ The following example shows how to coerce empty strings into null to accommodate - + ```sql REPLACE INTO "null_string" OVERWRITE ALL @@ -286,7 +286,7 @@ PARTITIONED BY MONTH Druid ingests the data with no empty strings as follows: -| `__time` | `string_examle` | +| `__time` | `string_example` | | -- | -- | -- | | `2024-01-01T00:00:00.000Z`| `my_string`| | `2024-01-02T00:00:00.000Z`| `null`| @@ -305,7 +305,7 @@ If you want to maintain null values in your data within Druid, you can use the f Consider the following Druid datasource `null_example`: -| `__time` | `string_examle` | `number_example`| +| `__time` | `string_example` | `number_example`| | -- | -- | -- | | `2024-01-01T00:00:00.000Z`| `my_string`| 99 | | `2024-01-02T00:00:00.000Z`| `empty`| 0 | From 52655546e0b06f224955ab5d23033e529ede7aed Mon Sep 17 00:00:00 2001 From: Jill Osborne Date: Tue, 28 Jan 2025 23:51:10 +0000 Subject: [PATCH 5/7] [docs] Batch26 JSON functions (#17635) * [docs] Batch26 JSON functions * Updated * Updated * Updated * Fixed typo * Small wording update * Updated after review --------- Co-authored-by: Victoria Lim --- docs/querying/sql-functions.md | 276 +++++++++++++++++++++++++++++---- 1 file changed, 250 insertions(+), 26 deletions(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 519623c833cb..426868d9c714 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -2941,70 +2941,222 @@ Returns the following: ## JSON_KEYS -Returns an array of field names from `expr` at the specified `path`. +Returns an array of field names from an expression, at a specified path. -* **Syntax**: `JSON_KEYS(expr, path)` +* **Syntax:** `JSON_KEYS(expr, path)` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example returns an array of field names from the nested column `agent`: + +```sql +SELECT + JSON_KEYS(agent, '$.') AS agent_keys +FROM "kttm_nested" +LIMIT 1 +``` + +Returns the following: +| `agent_keys` | +| -- | +| `[type, category, browser, browser_version, os, platform]` | + +
+ +[Learn more](sql-json-functions.md) ## JSON_MERGE -Merges two or more JSON `STRING` or `COMPLEX` into one. Preserves the rightmost value when there are key overlaps. Returning always a `COMPLEX` type. +Merges two or more JSON `STRING` or `COMPLEX` expressions into one, preserving the rightmost value when there are key overlaps. +The function always returns a `COMPLEX` object. * **Syntax:** `JSON_MERGE(expr1, expr2[, expr3 ...])` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example merges the `event` object with a static string `example_string`: + +```sql +SELECT + event, + JSON_MERGE(event, '{"example_string": 123}') as event_with_string +FROM "kttm_nested" +LIMIT 1 +``` + +Returns the following: +| `event` | `event_with_string` | +| -- | -- | +| `{"type":"PercentClear","percentage":55}` | `{"type":"PercentClear","percentage":55,"example_string":123}` | + +
+ +[Learn more](sql-json-functions.md) ## JSON_OBJECT -Constructs a new `COMPLEX` object. The `KEY` expressions must evaluate to string types. The `VALUE` expressions can be composed of any input type, including other `COMPLEX` values. `JSON_OBJECT` can accept colon-separated key-value pairs. The following syntax is equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`. +Constructs a new `COMPLEX` object from one or more expressions. +The `KEY` expressions must evaluate to string types. +The `VALUE` expressions can be composed of any input type, including other `COMPLEX` objects. +The function can accept colon-separated key-value pairs. -* **Syntax**: `JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` +* **Syntax:** `JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` + or + `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example creates a new object `combinedJSON` from `continent` in `geo_ip` and `type` in `event`: + +```sql +SELECT + JSON_OBJECT( + KEY 'geo_ip' VALUE JSON_QUERY(geo_ip, '$.continent'), + KEY 'event' VALUE JSON_QUERY(event, '$.type') + ) + as combined_JSON +FROM "kttm_nested" +LIMIT 1 +``` + +Returns the following: +| `combined_JSON` | +| -- | +| `{"geo_ip": {"continent": "South America"},"event": {"type": "PercentClear"}}` | + +
+ +[Learn more](sql-json-functions.md) ## JSON_PATHS -Returns an array of all paths which refer to literal values in `expr` in JSONPath format. +Returns an array of all paths which refer to literal values in an expression, in JSONPath format. -* **Syntax**: `JSON_PATHS(expr)` +* **Syntax:** `JSON_PATHS(expr)` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example returns an array of distinct paths in the `geo_ip` nested column: + +```sql +SELECT + ARRAY_CONCAT_AGG(DISTINCT JSON_PATHS(geo_ip)) AS geo_ip_paths +from "kttm_nested" +``` + +Returns the following: + +| `geo_ip_paths` | +| -- | +| `[$.city, $.continent, $.country, $.region]` | +
+ +[Learn more](sql-json-functions.md) ## JSON_QUERY -Extracts a `COMPLEX` value from `expr`, at the specified `path`. +Extracts a `COMPLEX` value from an expression at a specified path. -* **Syntax**: `JSON_QUERY(expr, path)` +* **Syntax:** `JSON_QUERY(expr, path)` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example returns the values of `percentage` in the `event` nested column: + +```sql +SELECT + "event", + JSON_QUERY("event", '$.percentage') +FROM "kttm_nested" +LIMIT 2 +``` + +Returns the following: + +| `event` | `percentage` | +| -- | -- | +| `{"type":"PercentClear","percentage":55}` | `55` | +| `{"type":"PercentClear","percentage":80}` | `80` | +
+ +[Learn more](sql-json-functions.md) ## JSON_QUERY_ARRAY -Extracts an `ARRAY>` value from `expr` at the specified `path`. If value is not an `ARRAY`, it gets translated into a single element `ARRAY` containing the value at `path`. The primary use of this function is to extract arrays of objects to use as inputs to other [array functions](./sql-array-functions.md). +Extracts an `ARRAY>` value from an expression at a specified path. + +If the value isn't an array, the function translates it into a single element `ARRAY` containing the value at `path`. +This function is mainly used to extract arrays of objects to use as inputs to other [array functions](./sql-array-functions.md). -* **Syntax**: `JSON_QUERY_ARRAY(expr, path)` +* **Syntax:** `JSON_QUERY_ARRAY(expr, path)` * **Function type:** JSON +
Example + +The following example returns an array of `percentage` values in the `event` nested column: + +```sql +SELECT + "event", + JSON_QUERY_ARRAY("event", '$.percentage') +FROM "kttm_nested" +LIMIT 2 +``` + +Returns the following: + +| `event` | `percentage` | +| -- | -- | +| `{"type":"PercentClear","percentage":55}` | `[55]` | +| `{"type":"PercentClear","percentage":80}` | `[80]` | + +
+ [Learn more](sql-json-functions.md) ## JSON_VALUE -Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`. +Extracts a literal value from an expression at a specified path. + +If you include `RETURNING` and specify a SQL type (such as `VARCHAR`, `BIGINT`, `DOUBLE`) the function plans the query using the suggested type. +If `RETURNING` isn't included, the function attempts to infer the type based on the context. +If the function can't infer the type, it defaults to `VARCHAR`. -* **Syntax**: `JSON_VALUE(expr, path [RETURNING sqlType])` +* **Syntax:** `JSON_VALUE(expr, path [RETURNING sqlType])` * **Function type:** JSON +
Example + +The following example returns the value of `city` in the `geo_ip` nested column: + +```sql +SELECT + geo_ip, + JSON_VALUE(geo_ip, '$.city' RETURNING VARCHAR) as city +FROM "kttm_nested" +WHERE JSON_VALUE(geo_ip, '$.continent') = 'Asia' +LIMIT 2 +``` + +Returns the following: + +| `geo_ip` | `city` | +| -- | -- | +| `{"continent":"Asia","country":"Taiwan","region":"Taipei City","city":"Taipei"}` | `Taipei` | +| `{"continent":"Asia","country":"Thailand","region":"Bangkok","city":"Bangkok"}` | `Bangkok` | + +
+ [Learn more](sql-json-functions.md) ## LAG @@ -3971,13 +4123,32 @@ Returns the following: ## PARSE_JSON -Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in an error. +Parses an expression into a `COMPLEX` object. + +The function deserializes JSON values when processing them, translating stringified JSON into a nested structure. +If the input is invalid JSON or not a `VARCHAR`, it returns an error. -* **Syntax**: `PARSE_JSON(expr)` +* **Syntax:** `PARSE_JSON(expr)` * **Function type:** JSON -[Learn more](sql-json-functions.md) +
Example + +The following example creates a `COMPLEX` object `gus` from a string of fields: + +```sql +SELECT + PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}') as gus +``` + +Returns the following: + +| `gus` | +| -- | +| `{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}` | + +
+[Learn more](sql-json-functions.md) ## PARSE_LONG Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided. @@ -5236,11 +5407,29 @@ Returns the following: ## TO_JSON_STRING -Serializes `expr` into a JSON string. +Serializes an expression into a JSON string. -* **Syntax**: `TO_JSON_STRING(expr)` +* **Syntax:** `TO_JSON_STRING(expr)` * **Function type:** JSON +
Example + +The following example writes the distinct column names in the `events` nested column to a JSON string: + +```sql +SELECT + TO_JSON_STRING(ARRAY_CONCAT_AGG(DISTINCT JSON_KEYS(event, '$.'))) as json_string +FROM "kttm_nested" +``` + +Returns the following: + +| `json_string` | +| -- | +| `["error","layer","percentage","saveNumber","type","url","userAgent"]` | + +
+ [Learn more](sql-json-functions.md) @@ -5312,11 +5501,47 @@ Returns the following: ## TRY_PARSE_JSON -Parses `expr` into a `COMPLEX` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value. +Parses an expression into a `COMPLEX` object. + +This function deserializes JSON values when processing them, translating stringified JSON into a nested structure. +If the input is invalid JSON or not a `VARCHAR`, it returns a `NULL` value. -* **Syntax**: `TRY_PARSE_JSON(expr)` +You can use this function instead of [PARSE_JSON](#parse_json) to insert a null value when processing invalid data, instead of producing an error. + +* **Syntax:** `TRY_PARSE_JSON(expr)` * **Function type:** JSON +
Example + +The following example creates a `COMPLEX` object `gus` from a string of fields: + +```sql +SELECT + TRY_PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}') as gus +``` + +Returns the following: + +| `gus` | +| -- | +| `{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}` | + + +The following example contains invalid data `x:x`: + +```sql +SELECT + TRY_PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet",x:x}') as gus +``` + +Returns the following: + +| `gus` | +| -- | +| `null` | + +
+ [Learn more](sql-json-functions.md) @@ -5414,4 +5639,3 @@ Requires the [`druid-stats` extension](../development/extensions-core/stats.md). * **Function type:** Aggregation [Learn more](sql-aggregations.md) - From 5f9ca828887f416615e864f6a8186b95f4a44d39 Mon Sep 17 00:00:00 2001 From: Charles Smith Date: Tue, 28 Jan 2025 15:57:58 -0800 Subject: [PATCH 6/7] [Docs] Adds examples for window functions for the SQL Reference (batch23) (#17670) Co-authored-by: Victoria Lim --- docs/querying/sql-functions.md | 304 +++++++++++++++++++++++++++++++++ 1 file changed, 304 insertions(+) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 426868d9c714..0bd4262b7e1b 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -1671,6 +1671,37 @@ Returns the cumulative distribution of the current row within the window calcula * **Syntax**: `CUME_DIST()` * **Function type:** Window +
Example + +The following example returns the cumulative distribution of number of flights by airline from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + CUME_DIST() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "cume_dist" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `cume_dist` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `0.25` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `0.5` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `1` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `0.3333333333333333` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `1`| +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `1` | + +
+ + [Learn more](sql-window-functions.md#window-function-reference) ## CURRENT_DATE @@ -1820,6 +1851,36 @@ Returns the rank for a row within a window without gaps. For example, if two row * **Syntax**: `DENSE_RANK()` * **Function type:** Window +
Example + +The following example returns the dense rank by airline for flights from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + DENSE_RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "dense_rank" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `dense_rank` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `2` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `1` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `2`| +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `2` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## DIV @@ -2450,6 +2511,36 @@ Returns the value evaluated for the expression for the first row within the wind * **Syntax**: `FIRST_VALUE(expr)` * **Function type:** Window +
Example + +The following example returns the name of the first airline in the window of flights by airline for two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + FIRST_VALUE("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "first_val" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `first_val` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `HA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `HA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `HA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `HA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `HA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `HA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `HA` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## FLOOR @@ -3166,6 +3257,36 @@ If you do not supply an `offset`, returns the value evaluated at the row precedi * **Syntax**: `LAG(expr[, offset])` * **Function type:** Window +
Example + +The following example returns the preceding airline in the window for flights by airline from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + LAG("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "lag" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `lag` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `null` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `HA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `UA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `AA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `null` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `HA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `AA` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## LAST_VALUE @@ -3175,6 +3296,38 @@ Returns the value evaluated for the expression for the last row within the windo * **Syntax**: `LAST_VALUE(expr)` * **Function type:** Window +
Example + +The following example returns the last airline name in the window for flights for two airports on a single day. +Note that the RANGE BETWEEN clause defines the window frame between the current row and the final row in the window instead of the default of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when using ORDER BY. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + LAST_VALUE("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC + RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS "last_value" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `last_value` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `NW` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `NW` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `NW` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `NW` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `UA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `UA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `UA` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## LATEST @@ -3248,6 +3401,36 @@ If you do not supply an `offset`, returns the value evaluated at the row followi * **Syntax**: `LEAD(expr[, offset])` * **Function type:** Window +
Example + +The following example returns the subsequent value for an airline in the window for flights from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + LEAD("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "lead" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights ` | `lead` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` |`UA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `AA` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `NW` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `null` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `AA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `UA` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `null` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## LEAST @@ -4060,6 +4243,36 @@ Divides the rows within a window as evenly as possible into the number of tiles, * **Syntax**: `NTILE(tiles)` * **Function type:** Window +
Example + +The following example returns the results for flights by airline from two airports on a single day divided into 3 tiles. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + NTILE(3) OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "ntile" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `lead` | `ntile` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `2` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `1` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `2` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `3` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## NULLIF @@ -4184,6 +4397,37 @@ Returns the relative rank of the row calculated as a percentage according to the * **Syntax**: `PERCENT_RANK()` * **Function type:** Window +
Example + +The following example returns the percent rank within the window for flights by airline from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + PERCENT_RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "pct_rank" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `pct_rank` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `0` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `0.3333333333333333` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `0.6666666666666666` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `0.6666666666666666` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `0` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `0.5` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `0.5` | + +
+ + [Learn more](sql-window-functions.md#window-function-reference) ## POSITION @@ -4269,6 +4513,36 @@ Returns the rank with gaps for a row within a window. For example, if two rows t * **Syntax**: `RANK()` * **Function type:** Window +
Example + +The following example returns the rank within the window for flights by airline from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "rank" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `rank` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `2` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `1` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `2` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `3` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## REGEXP_EXTRACT @@ -4501,6 +4775,36 @@ Returns the number of the row within the window starting from 1. * **Syntax**: `ROW_NUMBER()` * **Function type:** Window +
Example + +The following example returns the row number within the window for flights by airline from two airports on a single day. + +```sql +SELECT FLOOR("__time" TO DAY) AS "flight_day", + "Origin" AS "airport", + "Reporting_Airline" as "airline", + COUNT("Flight_Number_Reporting_Airline") as "num_flights", + ROW_NUMBER() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "row_num" +FROM "flight-carriers" +WHERE FLOOR("__time" TO DAY) = '2005-11-01' + AND "Origin" IN ('KOA', 'LIH') +GROUP BY 1, 2, 3 +``` + +Returns the following: + +| `flight_day` | `airport` | `airline` | `num_flights` | `row_num` | +| --- | --- | --- | --- | ---| +| `2005-11-01T00:00:00.000Z` | `KOA` | `HA` | `11` | `1` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `UA` | `4` | `2` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `AA` | `1` | `3` | +| `2005-11-01T00:00:00.000Z` | `KOA` | `NW` | `1` | `4` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `HA` | `15` | `1` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `AA` | `2` | `2` | +| `2005-11-01T00:00:00.000Z` | `LIH` | `UA` | `2` | `3` | + +
+ [Learn more](sql-window-functions.md#window-function-reference) ## RPAD From 5df8469a9cb74dcc1f3f3ef01a57074d79d0df8f Mon Sep 17 00:00:00 2001 From: Jill Osborne Date: Wed, 5 Feb 2025 16:01:20 +0000 Subject: [PATCH 7/7] [docs] Batch19 SQL aggregation functions (#17658) * Batch19 SQL aggregation functions * Updated * Updated * Update docs/querying/sql-functions.md Co-authored-by: Victoria Lim * Updated after review * Removed some problematic italics * Updated after review * Update docs/querying/sql-functions.md Co-authored-by: Victoria Lim * Update docs/querying/sql-functions.md Co-authored-by: Victoria Lim * Capitalize Bloom * Apply suggestions from code review Apply suggestions * fix backticks --------- Co-authored-by: Victoria Lim Co-authored-by: Charles Smith --- docs/querying/sql-functions.md | 101 ++++++++++++++++++++++++++++----- 1 file changed, 86 insertions(+), 15 deletions(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 4f47927492cc..43abb43a7af5 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -1282,22 +1282,69 @@ Returns the following: ## BLOOM_FILTER -Computes a Bloom filter from values produced by the specified expression. +Computes a [Bloom filter](../development/extensions-core/bloom-filter.md) from values provided in an expression. -* **Syntax**: `BLOOM_FILTER(expr, )` + +* **Syntax:** `BLOOM_FILTER(expr, numEntries)` + `numEntries` specifies the maximum number of distinct values before the false positive rate increases. * **Function type:** Aggregation +
Example + +The following example returns a Base64-encoded Bloom filter representing the set of devices ,`agent_category`, used in Albania: + +```sql +SELECT "country", + BLOOM_FILTER(agent_category, 10) as albanian_bloom +FROM "kttm" +WHERE "country" = 'Albania' +GROUP BY "country" +``` + +Returns the following: + +|`country`| `albanian_bloom`| +|---| --- | +|`Albania`|`BAAAAAgAAACAAEAAAAAAAAAAAEIAAAAAAAAAAAAAAAAAAAAAAAIIAAAAAAAAAAAAAAAAAAIAAAAAAQAAAAAAAAAAAAAA`| + +
+ [Learn more](sql-aggregations.md) ## BLOOM_FILTER_TEST -Returns true if the expression is contained in a Base64-serialized Bloom filter. +Returns true if an expression is contained in a Base64-encoded [Bloom filter](../development/extensions-core/bloom-filter.md) string. -* **Syntax**: `BLOOM_FILTER_TEST(expr, )` +* **Syntax:** `BLOOM_FILTER_TEST(expr, )` * **Function type:** Scalar, other +
Example + +The following example returns `true` when a device type, `agent_category`, exists in the Bloom filter representing the set of devices used in Albania: + +```sql +SELECT agent_category, +BLOOM_FILTER_TEST("agent_category", 'BAAAAAgAAACAAEAAAAAAAAAAAEIAAAAAAAAAAAAAAAAAAAAAAAIIAAAAAAAAAAAAAAAAAAIAAAAAAQAAAAAAAAAAAAAA') AS bloom_test +FROM "kttm" +GROUP BY 1 +``` + +Returns the following: + +| `agent_category` | `bloom_test` | +| --- | --- | +| `empty` | `false` | +| `Game console` | `false` | +| `Personal computer` | `true` | +| `Smart TV` | `false` | +| `Smartphone` | `true` | +| `Tablet` | `false` | + +
+ [Learn more](sql-scalar.md#other-scalar-functions) + ## BTRIM Trims characters from both the leading and trailing ends of an expression. Defaults `chars` to a space if none is provided. @@ -1787,39 +1834,63 @@ Returns the following: ## DECODE_BASE64_COMPLEX -Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. +Decodes a Base64-encoded expression into a complex data type. + +You can use the function to ingest data when a column contains an encoded data sketch such as Theta or HLL. -* **Syntax**: `DECODE_BASE64_COMPLEX(dataType, expr)` +The function supports `hyperUnique` and `serializablePairLongString` data types by default. +To enable support for a complex data type, load the [corresponding extension](../configuration/extensions.md): + +- `druid-bloom-filter`: `bloom` +- `druid-datasketches`: `arrayOfDoublesSketch`, `HLLSketch`, `KllDoublesSketch`, `KllFloatsSketch`, `quantilesDoublesSketch`, `thetaSketch` +- `druid-histogram`: `approximateHistogram`, `fixedBucketsHistogram` +- `druid-stats`: `variance` +- `druid-compressed-bigdecimal`: `compressedBigDecimal` +- `druid-momentsketch`: `momentSketch` +- `druid-tdigestsketch`: `tDigestSketch` + +* **Syntax:** `DECODE_BASE64_COMPLEX(dataType, expr)` * **Function type:** Scalar, other -[Learn more](sql-scalar.md#other-scalar-functions) +
Example + +The following example returns a Theta sketch complex type from a Base64-encoded string representation of the sketch: + +```sql +SELECT DECODE_BASE64_COMPLEX('thetaSketch','AgMDAAAazJNBAAAAAACAP+k/tkWGkSoFYWMAG0y+3gVabvKcIUNrBv0jAkGsw7sK5szX1k0ScwtMfCQmFP/rDhFK6yU7PPkObZ/Ugw5fcBQZ+GaO+Nt6FP+Whz6TmxkWyRJ+gaQLFhcts1+c0Q/vF9FLFfaVlOkb3/XpXaZ3JhyZ2dG8Di2/HO10sMs9C0AdM4FdHuye6SB+GYinIhTOITOHzB5SAfIiph3de9qIGSM89V+s/TkdI/WZVzK9wF0npfi4ZrmgBSnVjphCtQA5K2fp0x59UCwvMopZarsSkzEo81OIxjznNNXLr1BbQBo1Ei3OxJOoNzVs0x9xzsm4NfgAZSvZQvI1c2TmPsZvlzpW7tmIlizOOsr6pGWoh0U99/tV8RFwhz0SJoWyU1Z2P0hZ5d7KRnZBjlWC+e/FLEKrWsu14rlFRXhsOuxRId9FboEuH9PqMUixI2lB8MhLS803hJDoZ7tMy7Egl+YNU04QM11stXX4Tu96NHHcGiZRuCyciGiTGVQflMLmNt6lW6zIwJy0baNdbwjMCTjtUF7oZOtugWLYYJE9sJU3HuVijc0J10l6SmPslbfY6Fw0Za9w/Zdhn/5nIuKc1WMrYWnAJQJKXY73bHYWq7gI6dRvYdC2fLJyv3F8qwQcOJgFc0GaGXw8KRF3w3IVCwxsMntWhdTkaJ88e++5NFyM1Hd/D79wg0b9vH8=') AS "theta_sketch" +``` + +You can perform Theta sketch operations on the resulting `COMPLEX` value which resembles the input string. + +
+ +[Learn more](./sql-scalar.md#other-scalar-functions) ## DECODE_BASE64_UTF8 -Decodes a Base64-encoded string into a UTF-8 encoded string. +Decodes a Base64-encoded expression into a UTF-8 encoded string. * **Syntax:** `DECODE_BASE64_UTF8(expr)` * **Function type:** Scalar, string
Example -The following example converts the base64 encoded string `SGVsbG8gV29ybGQhCg==` into an UTF-8 encoded string. +The following example decodes the Base64-encoded representation of "Hello, World!": ```sql SELECT - 'SGVsbG8gV29ybGQhCg==' AS "base64_encoding", - DECODE_BASE64_UTF8('SGVsbG8gV29ybGQhCg==') AS "convert_to_UTF8_encoding" + DECODE_BASE64_UTF8('SGVsbG8sIFdvcmxkIQ==') as decoded ``` Returns the following: -| `base64_encoding` | `convert_to_UTF8_encoding` | -| -- | -- | -| `SGVsbG8gV29ybGQhCg==` | `Hello World!` | +| `decoded` | +| -- | +| `Hello, World!` |
-[Learn more](sql-scalar.md#string-functions) +[Learn more](./sql-scalar.md#string-functions) ## DEGREES