Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Batch07: adding examples to string functions #16862

Merged
merged 13 commits into from
Aug 21, 2024
246 changes: 218 additions & 28 deletions docs/querying/sql-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1085,19 +1085,63 @@ Looks up the expression in a registered query-time lookup table.

## LOWER

`LOWER(expr)`
Returns the expression in lowercase.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `LOWER(expr)`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example converts the `OriginCityName` column from the `flight-carriers` datasource to lowercase.

```sql
SELECT
"OriginCityName" AS "origin_city",
LOWER("OriginCityName") AS "lowercase"
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_city` | `lowercase` |
| -- | -- |
`San Juan, PR` | `san juan, pr` |

</details>

[Learn more](sql-scalar.md#string-functions)

Returns the expression in lowercase.

## LPAD

`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, left pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `LPAD(expr, length[, chars])`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example left pads the value of `OriginState` from `flight-carriers` to return a total of 5 characters.

```sql
SELECT
"OriginState" AS "origin_state",
LPAD("OriginState", 5, '+') AS "add_left_padding"
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_state` | `add_left_padding` |
| -- | -- |
| `PR` | `+++PR` |

</details>

[Learn more](sql-scalar.md#string-functions)

Returns the leftmost number of characters from an expression, optionally padded with the given characters.

## LTRIM

Expand Down Expand Up @@ -1293,11 +1337,31 @@ Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON val

## PARSE_LONG

`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:**`PARSE_LONG(string[, radix])`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example converts the string representation of the binary, radix 2, number `1100` into its long (BIGINT) equivalent.

```sql
SELECT
'1100' AS "binary_as_string",
PARSE_LONG('1110', 2) AS "bigint_value"
```

Returns the following:

| `binary_as_string` | `bigint_value` |
| -- | -- |
| `1100` | `14` |

</details>

[Learn more](sql-scalar.md#string-functions)

Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified.

## PERCENT_RANK

Expand All @@ -1309,11 +1373,33 @@ Returns the relative rank of the row calculated as a percentage according to the

## POSITION

`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. If `substring` is not found, returns 0.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])`
* **Function type:** Scalar, string

<details><summary>Example</summary>

Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index.
The following example returns the one-based index of the substring `PR` in the `OriginCityName` column from the `flight-carriers` datasource starting from index 5.

```sql
SELECT
"OriginCityName" AS "origin_city",
POSITION('PR' IN "OriginCityName" FROM 5) AS "index"
FROM "flight-carriers"
LIMIT 2
```

Returns the following:

| `origin_city` | `index` |
| -- | -- |
| `San Juan, PR` | `11` |
| `Boston, MA` | `0` |

</details>

[Learn more](sql-scalar.md#string-functions)

## POWER

Expand Down Expand Up @@ -1341,28 +1427,90 @@ Returns the rank with gaps for a row within a window. For example, if two rows t

## REGEXP_EXTRACT

`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
Apply regular expression `pattern` to `expr` and extract the `N`-th capture group. If `N` is unspecified or zero, returns the first substring that matches the pattern. Returns `null` if there is no matching pattern.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example uses regular expressions to find city names inside the `OriginCityName` column from the `flight-carriers` datasource by matching what comes before the comma.

```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_match"
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_city` | `pattern_match` |
| -- | -- |
| `San Juan, PR` | `San Juan`|

</details>

Applies a regular expression to the string expression and returns the _n_th match.
[Learn more](sql-scalar.md#string-functions)

## REGEXP_LIKE

`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
Returns `true` if the regular expression `pattern` finds a match in `expr`. Returns `false` otherwise.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_LIKE(expr, pattern)`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example returns `true` when the `OriginCityName` column from `flight-carriers` has a city name with a space in the name.
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found"
FROM "flight-carriers"
LIMIT 2
```

Returns true or false signifying whether the regular expression finds a match in the string expression.
Returns the following:

| `origin_city` | `pattern_found` |
| -- | -- |
| `San Juan, PR` | `true` |
| `Boston, MA` | `false` |

</details>

[Learn more](sql-scalar.md#string-functions)

## REGEXP_REPLACE

`REGEXP_REPLACE(<CHARACTER>, <CHARACTER>, <CHARACTER>)`
Replaces all occurrences of a regular expression in a string expression with a replacement string. The replacement string may refer to capture groups using `$1`, `$2`, etc.
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example matches instances of the word `Fort` and replaces it with it's abbreviation `Ft.`

```sql
SELECT
'Fort Lauderdale, FL' AS "original_string",
REGEXP_REPLACE('Fort Lauderdale, FL', 'Fort', 'Ft.') AS "modified_string"
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved
```

Returns the following:

| `original_string` | `modified_string` |
| -- | -- |
| `Fort Lauderdale, FL` | `Ft. Lauderdale, FL` |

</details>

[Learn more](sql-scalar.md#string-functions)

Replaces all occurrences of a regular expression in a string expression with a replacement string. The replacement
string may refer to capture groups using `$1`, `$2`, etc.

## REPEAT

Expand Down Expand Up @@ -1414,11 +1562,32 @@ Returns the number of the row within the window starting from 1.

## RPAD

`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, right pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `RPAD(expr, length[, chars])`
* **Function type:** Scalar, string

<details><summary>Example</summary>

The following example inserts `+` characters as padding to the right of the `OriginState` column from the `flight-carriers` datasource, making the return expression a length of `5` characters.
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT
"OriginState" AS "origin_state",
RPAD("OriginState", 5, '+') AS "add_right_padding"
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_state` | `add_right_padding` |
| -- | -- |
| `PR` | `PR+++` |

</details>

Returns the rightmost number of characters from an expression, optionally padded with the given characters.
[Learn more](sql-scalar.md#string-functions)

## RTRIM

Expand Down Expand Up @@ -1753,11 +1922,32 @@ For more information, see [UNNEST](./sql.md#unnest).

## UPPER

`UPPER(expr)`
Returns the expression in uppercase.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `UPPER(expr)`
* **Function type:** Scalar, string

Returns the expression in uppercase.
<details><summary>Example</summary>

The following example converts the `OriginCityName` column from the `flight-carriers` datasource to uppercase.

```sql
SELECT
"OriginCityName" AS "origin_city",
UPPER("OriginCityName") AS "uppercase"
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_city` | `uppercase` |
| -- | -- |
`San Juan, PR` | `SAN JUAN, PR` |

</details>

[Learn more](sql-scalar.md#string-functions)

## VAR_POP

Expand Down
6 changes: 3 additions & 3 deletions docs/querying/sql-scalar.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,11 @@ String functions accept strings, and return a type appropriate to the function.
|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a registered [query-time lookup table](lookups.md) named `lookupName`. The optional constant `replaceMissingValueWith`, if provided, is returned when the `expr` is null or when the lookup does not contain a value for `expr`.<br /><br />Lookups can also be queried directly using the [`lookup` schema](sql.md#from).|
|`LOWER(expr)`|Returns `expr` in all lowercase.|
|`UPPER(expr)`|Returns `expr` in all uppercase.|
|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` left-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.|
|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` right-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.|
|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, left pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, right pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`PARSE_LONG(string, [radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of `needle` within `haystack`, with indexes starting from 1. The search will begin at `fromIndex`, or 1 if `fromIndex` is not specified. If `needle` is not found, returns 0.|
|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` to `expr` and extract a capture group, or `NULL` if there is no match. If index is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).|
|`REGEXP_EXTRACT(expr, pattern, [N])`|Apply regular expression `pattern` to `expr` and extract the `N`-th capture group, or `NULL` if there is no match. If `N` is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).|
|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](sql-operators.md#logical-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.|
|`REGEXP_REPLACE(expr, pattern, replacement)`|Replaces all occurrences of regular expression `pattern` within `expr` with `replacement`. The replacement string may refer to capture groups using `$1`, `$2`, etc. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.|
|`REPLACE(expr, pattern, replacement)`|Replaces pattern with replacement in `expr`, and returns the result.|
Expand Down