Skip to content

Commit

Permalink
feat: include precision parameter in timestamp types (#594)
Browse files Browse the repository at this point in the history
This PR introduces new `PrecisionTimestamp` and `PrecisionTimestampTZ` types that accept an
optional precision parameter to specify fractional second precision.


Closes #592

---------

Co-authored-by: Weston Pace <[email protected]>
  • Loading branch information
richtia and westonpace authored Feb 22, 2024
1 parent a3b1f32 commit 087f87c
Show file tree
Hide file tree
Showing 7 changed files with 155 additions and 50 deletions.
49 changes: 47 additions & 2 deletions extensions/functions_datetime.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ scalar_functions:
* ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of
its days in January.
* US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more)
of its days in January. Last week of US epidemiological year has the year's last Wednesday in it. US
of its days in January. Last week of US epidemiological year has the year's last Wednesday in it. US
epidemiological week starts on Sunday.
* QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter,
April 1 through June 30 map to the second quarter, etc.
Expand All @@ -32,6 +32,7 @@ scalar_functions:
* SECOND Return the second (0-59).
* MILLISECOND Return number of milliseconds since the last full second.
* MICROSECOND Return number of microseconds since the last full millisecond.
* NANOSECOND Return number of nanoseconds since the last full microsecond.
* SUBSECOND Return number of microseconds since the last full second of the given timestamp.
* UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
* TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC.
Expand All @@ -57,7 +58,7 @@ scalar_functions:
* MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52
The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR,
MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The
MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The
indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET.
Expand All @@ -76,6 +77,17 @@ scalar_functions:
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, NANOSECOND, SUBSECOND, UNIX_TIME, TIMEZONE_OFFSET ]
description: The part of the value to extract.
- name: x
value: precision_timestamp_tz<P1>
- name: timezone
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
Expand All @@ -84,6 +96,14 @@ scalar_functions:
- name: x
value: timestamp
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, NANOSECOND, SUBSECOND, UNIX_TIME ]
description: The part of the value to extract.
- name: x
value: precision_timestamp<P1>
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, UNIX_TIME ]
Expand Down Expand Up @@ -112,6 +132,20 @@ scalar_functions:
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, US_WEEK ]
description: The part of the value to extract.
- name: indexing
options: [ ONE, ZERO ]
description: Start counting from 1 or 0.
- name: x
value: precision_timestamp_tz<P1>
- name: timezone
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
Expand All @@ -123,6 +157,17 @@ scalar_functions:
- name: x
value: timestamp
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, US_WEEK ]
description: The part of the value to extract.
- name: indexing
options: [ ONE, ZERO ]
description: Start counting from 1 or 0.
- name: x
value: precision_timestamp<P1>
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
Expand Down
10 changes: 8 additions & 2 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -796,7 +796,8 @@ message Expression {
string string = 12;
bytes binary = 13;
// Timestamp in units of microseconds since the UNIX epoch.
int64 timestamp = 14;
// Deprecated in favor of `precision_timestamp`
int64 timestamp = 14 [deprecated = true];
// Date in units of days since the UNIX epoch.
int32 date = 16;
// Time in units of microseconds past midnight
Expand All @@ -807,10 +808,15 @@ message Expression {
VarChar var_char = 22;
bytes fixed_binary = 23;
Decimal decimal = 24;
// If the precision is 6 or less then this is the microseconds since the UNIX epoch
// If the precision is more than 6 then this is the nanoseconds since the UNIX epoch
uint64 precision_timestamp = 34;
uint64 precision_timestamp_tz = 35;
Struct struct = 25;
Map map = 26;
// Timestamp in units of microseconds since the UNIX epoch.
int64 timestamp_tz = 27;
// Deprecated in favor of `precision_timestamp_tz`
int64 timestamp_tz = 27 [deprecated = true];
bytes uuid = 28;
Type null = 29; // a typed null literal
List list = 30;
Expand Down
20 changes: 18 additions & 2 deletions proto/substrait/parameterized_types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message ParameterizedType {
Type.FP64 fp64 = 11;
Type.String string = 12;
Type.Binary binary = 13;
Type.Timestamp timestamp = 14;
// Deprecated in favor of `ParameterizedPrecisionTimestamp precision_timestamp`
Type.Timestamp timestamp = 14 [deprecated = true];
Type.Date date = 16;
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `ParameterizedPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;

ParameterizedFixedChar fixed_char = 21;
ParameterizedVarChar varchar = 22;
ParameterizedFixedBinary fixed_binary = 23;
ParameterizedDecimal decimal = 24;
ParameterizedPrecisionTimestamp precision_timestamp = 34;
ParameterizedPrecisionTimestampTZ precision_timestamp_tz = 35;

ParameterizedStruct struct = 25;
ParameterizedList list = 27;
Expand Down Expand Up @@ -88,6 +92,18 @@ message ParameterizedType {
Type.Nullability nullability = 4;
}

message ParameterizedPrecisionTimestamp {
IntegerOption precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ParameterizedPrecisionTimestampTZ {
IntegerOption precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ParameterizedStruct {
repeated ParameterizedType types = 1;
uint32 variation_pointer = 2;
Expand Down
22 changes: 20 additions & 2 deletions proto/substrait/type.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message Type {
FP64 fp64 = 11;
String string = 12;
Binary binary = 13;
Timestamp timestamp = 14;
// Deprecated in favor of `PrecisionTimestamp precision_timestamp`
Timestamp timestamp = 14 [deprecated = true];
Date date = 16;
Time time = 17;
IntervalYear interval_year = 19;
IntervalDay interval_day = 20;
TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `PrecisionTimestampTZ precision_timestamp_tz`
TimestampTZ timestamp_tz = 29 [deprecated = true];
UUID uuid = 32;

FixedChar fixed_char = 21;
VarChar varchar = 22;
FixedBinary fixed_binary = 23;
Decimal decimal = 24;
PrecisionTimestamp precision_timestamp = 33;
PrecisionTimestampTZ precision_timestamp_tz = 34;

Struct struct = 25;
List list = 27;
Expand Down Expand Up @@ -159,6 +163,20 @@ message Type {
Nullability nullability = 4;
}

message PrecisionTimestamp {
// Defaults to 6
int32 precision = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
}

message PrecisionTimestampTZ {
// Defaults to 6
int32 precision = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
}

message Struct {
repeated Type types = 1;
uint32 type_variation_reference = 2;
Expand Down
20 changes: 18 additions & 2 deletions proto/substrait/type_expressions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message DerivationExpression {
Type.FP64 fp64 = 11;
Type.String string = 12;
Type.Binary binary = 13;
Type.Timestamp timestamp = 14;
// Deprecated in favor of `ExpressionPrecisionTimestamp precision_timestamp`
Type.Timestamp timestamp = 14 [deprecated = true];
Type.Date date = 16;
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `ExpressionPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;

ExpressionFixedChar fixed_char = 21;
ExpressionVarChar varchar = 22;
ExpressionFixedBinary fixed_binary = 23;
ExpressionDecimal decimal = 24;
ExpressionPrecisionTimestamp precision_timestamp = 40;
ExpressionPrecisionTimestampTZ precision_timestamp_tz = 41;

ExpressionStruct struct = 25;
ExpressionList list = 27;
Expand Down Expand Up @@ -80,6 +84,18 @@ message DerivationExpression {
Type.Nullability nullability = 4;
}

message ExpressionPrecisionTimestamp {
DerivationExpression precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ExpressionPrecisionTimestampTZ {
DerivationExpression precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ExpressionStruct {
repeated DerivationExpression types = 1;
uint32 variation_pointer = 2;
Expand Down
58 changes: 30 additions & 28 deletions site/docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,34 +58,36 @@ Rather than using a full data type representation, the input argument types (`sh

Every compound function signature must be unique. If two function implementations in a YAML file would generate the same compound function signature, then the YAML file is invalid and behavior is undefined.

| Argument Type | Signature Name |
| -------------------------- | -------------- |
| Required Enumeration | req |
| i8 | i8 |
| i16 | i16 |
| i32 | i32 |
| i64 | i64 |
| fp32 | fp32 |
| fp64 | fp64 |
| string | str |
| binary | vbin |
| boolean | bool |
| timestamp | ts |
| timestamp_tz | tstz |
| date | date |
| time | time |
| interval_year | iyear |
| interval_day | iday |
| uuid | uuid |
| fixedchar&lt;N&gt; | fchar |
| varchar&lt;N&gt; | vchar |
| fixedbinary&lt;N&gt; | fbin |
| decimal&lt;P,S&gt; | dec |
| struct&lt;T1,T2,...,TN&gt; | struct |
| list&lt;T&gt; | list |
| map&lt;K,V&gt; | map |
| any[\d]? | any |
| user defined type | u!name |
| Argument Type | Signature Name |
|---------------------------------|----------------|
| Required Enumeration | req |
| i8 | i8 |
| i16 | i16 |
| i32 | i32 |
| i64 | i64 |
| fp32 | fp32 |
| fp64 | fp64 |
| string | str |
| binary | vbin |
| boolean | bool |
| timestamp | ts |
| timestamp_tz | tstz |
| date | date |
| time | time |
| interval_year | iyear |
| interval_day | iday |
| uuid | uuid |
| fixedchar&lt;N&gt; | fchar |
| varchar&lt;N&gt; | vchar |
| fixedbinary&lt;N&gt; | fbin |
| decimal&lt;P,S&gt; | dec |
| precision_timestamp&lt;P&gt; | pts |
| precision_timestamp_tz&lt;P&gt; | ptstz |
| struct&lt;T1,T2,...,TN&gt; | struct |
| list&lt;T&gt; | list |
| map&lt;K,V&gt; | map |
| any[\d]? | any |
| user defined type | u!name |

#### Examples

Expand Down
Loading

0 comments on commit 087f87c

Please sign in to comment.