Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add functions for arithmetic, rounding, logarithmic, and string transformations #230

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions extensions/functions_arithmetic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,36 @@ scalar_functions:
- value: fp64
- value: fp64
return: fp64
-
name: "negate"
description: "Negation of the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... I suppose this exists because of the uneveness of twos-compliment but ugh...

required: false
- value: i8
return: i8
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i16
return: i16
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i32
return: i32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i64
return: i64
- args:
- value: fp32
return: fp32
- args:
- value: fp64
return: fp64
-
name: "modulus"
description: "Get the remainder when dividing one value by another."
Expand All @@ -173,6 +203,68 @@ scalar_functions:
- value: i64
- value: i64
return: i64
-
name: "power"
description: "Take the power with the first value as the base and second as exponent"
impls:
- args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you come up with these signatures from? For example, why does i8^i8 return i8? It seems like we'll run out of space very quickly. I feel like many systems support something like power(i64, fp64) => fp64 and power(fp64,fp64) => fp64 and that is enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @sanjibansg was basing these on Arrow's implementation. Arrow's implementation currently does not promote:

>>> import pyarrow.compute as pc
>>> import pyarrow as pa
>>> x = pa.array([1, 2, 3])
>>> pc.power(x, 320)
<pyarrow.lib.Int64Array object at 0x7f442c7fda00>
[
  1,
  0,
  -9149805402889408255
]

Postgres and MySQL promote integers to decimal. SQL Server does not promote (and raises an overflow error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacques-n do you have a strong preference which direction we go here? Should we have multiple power functions?

- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i8
- value: i8
return: i8
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i16
- value: i16
return: i16
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i32
- value: i32
return: i32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i64
- value: i64
return: i64
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
- value: fp64
return: fp64
-
name: "sqrt"
description: "Square root of the value"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the value is negative?

impls:
- args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that smaller numbers use fp32 as an output? It seems like maybe they should all be fp64. Are using a particular system's definition here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Arrow also upcasts all to fp64.

- value: i8
return: fp64
- args:
- value: i16
return: fp64
- args:
- value: i32
return: fp64
- args:
- value: i64
return: fp64
- args:
- value: fp32
return: fp32
- args:
- value: fp64
return: fp64
aggregate_functions:
- name: "sum"
description: Sum a set of values.
Expand Down
10 changes: 10 additions & 0 deletions extensions/functions_comparison.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,14 @@ scalar_functions:
- value: any1
return: BOOLEAN
nullability: DECLARED_OUTPUT
-
name: "is_nan"
description: Whether a value is not a number.
impls:
- args:
- value: fp32
return: BOOLEAN
- args:
- value: fp64
return: BOOLEAN

63 changes: 63 additions & 0 deletions extensions/functions_logarithmic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
%YAML 1.2
---
scalar_functions:
-
name: "ln"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to document what these options apply to (and their meaning) somewhere. In arrow, arithmetic functions overflow by default. However, logrithmic functions go to -inf or NaN. I'm not sure if this latter behavior is "saturate" or "silent".

description: "Natural logarithm of the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
return: fp64
-
name: "log10"
description: "Logarithm to base 10 of the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
return: fp64
-
name: "log2"
description: "Logarithm to base 2 of the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
return: fp64
-
name: "logb"
description: "Logarithm of the value with the given base"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
- value: fp64
return: fp64


64 changes: 64 additions & 0 deletions extensions/functions_rounding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
%YAML 1.2
---
scalar_functions:
-
name: "ceil"
description: "Rounding to the ceiling of the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
return: i64
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
return: i64
-
name: "floor"
description: "Rounding to the floor of the value"
impls:
- args:
required: false
- value: fp32
return: i32
- args:
required: false
- value: fp64
return: i64
-
name: "round"
description: "Rounding off the value"
impls:
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i8
return: i8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, what does this mean?

- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i16
return: i16
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i32
return: i32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: i64
return: i64
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp32
return: fp32
- args:
- options: [ SILENT, SATURATE, ERROR ]
required: false
- value: fp64
return: fp64

29 changes: 28 additions & 1 deletion extensions/functions_string.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ scalar_functions:
- value: i32
- value: i32
return: "string"
- name: starts_with
-
name: starts_with
description: Whether this string starts with another string.
impls:
- args:
Expand Down Expand Up @@ -163,3 +164,29 @@ scalar_functions:
- value: "fixedchar<L1>"
- value: "varchar<L2>"
return: "BOOLEAN"
-
name: lower
description: Transforms the string to lower case characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some additional definition here of how lower case is defined. I assume there is a utf definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow defers to utf8proc which is somewhat sparse on details. However, you are correct, there is a standard UTF-8 way of lowercasing, though it sometimes does the wrong thing semantically.

impls:
- args:
- value: "varchar<L1>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add fixedchar?

return: "varchar<L1>"
- args:
- value: "string"
return: "string"
- args:
- value: "fixedchar<L1>"
return: "fixedchar<L1>"
-
name: upper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as on lower.

description: Transforms the string to upper case characters
impls:
- args:
- value: "varchar<L1>"
return: "varchar<L1>"
- args:
- value: "string"
return: "string"
- args:
- value: "fixedchar<L1>"
return: "fixedchar<L1>"