Skip to content

Commit

Permalink
add IntervalMonthToNano literal and IntervalMonth type
Browse files Browse the repository at this point in the history
  • Loading branch information
Blizzara committed Jul 16, 2024
1 parent a68c1ac commit 421012d
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 18 deletions.
7 changes: 7 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -810,6 +810,7 @@ message Expression {
int64 time = 17;
IntervalYearToMonth interval_year_to_month = 19;
IntervalDayToSecond interval_day_to_second = 20;
IntervalMonthToNano interval_month_to_nano = 36;
string fixed_char = 21;
VarChar var_char = 22;
bytes fixed_binary = 23;
Expand Down Expand Up @@ -884,6 +885,12 @@ message Expression {
int32 microseconds = 3;
}

message IntervalMonthToNano {
int32 months = 1;
int32 days = 2;
int64 nanoseconds = 3;
}

message Struct {
// A possibly heterogeneously typed list of literals
repeated Literal fields = 1;
Expand Down
1 change: 1 addition & 0 deletions proto/substrait/parameterized_types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ message ParameterizedType {
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.IntervalMonth interval_month = 36;
// Deprecated in favor of `ParameterizedPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;
Expand Down
9 changes: 9 additions & 0 deletions proto/substrait/type.proto
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ message Type {
Time time = 17;
IntervalYear interval_year = 19;
IntervalDay interval_day = 20;
IntervalMonth interval_month = 35;
// Deprecated in favor of `PrecisionTimestampTZ precision_timestamp_tz`
TimestampTZ timestamp_tz = 29 [deprecated = true];
UUID uuid = 32;
Expand Down Expand Up @@ -122,16 +123,24 @@ message Type {
Nullability nullability = 2;
}

// An interval consisting of years and months
message IntervalYear {
uint32 type_variation_reference = 1;
Nullability nullability = 2;
}

// An interval consisting of days, seconds, and microseconds
message IntervalDay {
uint32 type_variation_reference = 1;
Nullability nullability = 2;
}

// An interval consisting of months, days, and nanoseconds
message IntervalMonth {
uint32 type_variation_reference = 1;
Nullability nullability = 2;
}

message UUID {
uint32 type_variation_reference = 1;
Nullability nullability = 2;
Expand Down
1 change: 1 addition & 0 deletions proto/substrait/type_expressions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ message DerivationExpression {
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.IntervalMonth interval_month = 42;
// Deprecated in favor of `ExpressionPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;
Expand Down
1 change: 1 addition & 0 deletions site/docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ Rather than using a full data type representation, the input argument types (`sh
| time | time |
| interval_year | iyear |
| interval_day | iday |
| interval_month | imonth |
| uuid | uuid |
| fixedchar<N> | fchar |
| varchar<N> | vchar |
Expand Down
37 changes: 19 additions & 18 deletions site/docs/types/type_classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,25 @@ Implementations of a Substrait type must support *at least* this set of values,

Simple type classes are those that don't support any form of configuration. For simplicity, any generic type that has only a small number of discrete implementations is declared directly, as opposed to via configuration.

| Type Name | Description | Protobuf representation for literals
| --------------- | ------------------------------------------------------------ | ------------------------------------------------
| boolean | A value that is either True or False. | `bool`
| i8 | A signed integer within [-128..127], typically represented as an 8-bit two's complement number. | `int32`
| i16 | A signed integer within [-32,768..32,767], typically represented as a 16-bit two's complement number. | `int32`
| i32 | A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two's complement number. | `int32`
| i64 | A signed integer within [−9,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two's complement number. | `int64`
| fp32 | A 4-byte single-precision floating point number with the same range and precision as defined for the [IEEE 754 32-bit floating-point format](https://standards.ieee.org/ieee/754/6210/). | `float`
| fp64 | An 8-byte double-precision floating point number with the same range and precision as defined for the [IEEE 754 64-bit floating-point format](https://standards.ieee.org/ieee/754/6210/). | `double`
| string | A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. | `string`
| binary | A binary value, [0..2,147,483,647] bytes in length. | `binary`
| timestamp | A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone)
| timestamp_tz | A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. | `int64` microseconds since 1970-01-01 00:00:00.000000 UTC
| date | A date within [1000-01-01..9999-12-31]. | `int32` days since `1970-01-01`
| time | A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. | `int64` microseconds past midnight
| interval_year | Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. `1y 0m` is considered equal to `0y 12m` or `1001y -12000m`. | `int32` years and `int32` months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `-10000y 200000m` is **not** allowed)
| interval_day | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int32` microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed)
| uuid | A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: `c48ffa9e-64f4-44cb-ae47-152b4e60e77b`. Any 128-bit value is allowed, without specific adherence to RFC4122. | 16-byte `binary`
| Type Name | Description | Protobuf representation for literals
|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------
| boolean | A value that is either True or False. | `bool`
| i8 | A signed integer within [-128..127], typically represented as an 8-bit two's complement number. | `int32`
| i16 | A signed integer within [-32,768..32,767], typically represented as a 16-bit two's complement number. | `int32`
| i32 | A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two's complement number. | `int32`
| i64 | A signed integer within [−9,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two's complement number. | `int64`
| fp32 | A 4-byte single-precision floating point number with the same range and precision as defined for the [IEEE 754 32-bit floating-point format](https://standards.ieee.org/ieee/754/6210/). | `float`
| fp64 | An 8-byte double-precision floating point number with the same range and precision as defined for the [IEEE 754 64-bit floating-point format](https://standards.ieee.org/ieee/754/6210/). | `double`
| string | A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. | `string`
| binary | A binary value, [0..2,147,483,647] bytes in length. | `binary`
| timestamp | A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone)
| timestamp_tz | A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. | `int64` microseconds since 1970-01-01 00:00:00.000000 UTC
| date | A date within [1000-01-01..9999-12-31]. | `int32` days since `1970-01-01`
| time | A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. | `int64` microseconds past midnight
| interval_year | Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. `1y 0m` is considered equal to `0y 12m` or `1001y -12000m`. | `int32` years and `int32` months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `-10000y 200000m` is **not** allowed)
| interval_day | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int32` microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed)
| interval_month | Interval of months, days and nanoseconds. Supports a range of [-120,000..120,000] months with nanosecond precision. | `int32` months, `int32` days, and `int64` nanoseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `120001m -40d 0ns` is **not** allowed)
| uuid | A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: `c48ffa9e-64f4-44cb-ae47-152b4e60e77b`. Any 128-bit value is allowed, without specific adherence to RFC4122. | 16-byte `binary`

## Compound Types

Expand Down

0 comments on commit 421012d

Please sign in to comment.