Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Should we allow options to be declared at the top level of a function definition #673

Open
jacques-n opened this issue Jul 28, 2024 · 6 comments

Comments

@jacques-n
Copy link
Contributor

The rounding blocks in this file:
https://github.com/substrait-io/substrait/blob/main/extensions/functions_rounding.yaml

Make my eyes hurt and destroy my DRY soul. How about we add support for options to be declared at the same level as function name so these patterns aren't this awful...

Thoughts @EpsilonPrime @vbarua @westonpace ?

@jacques-n
Copy link
Contributor Author

The argument descriptions feel similarly painful...

@westonpace
Copy link
Member

There are some arithmetic functions where the options do not apply to all implementations (fp addition does not specify overflow behavior for example).

I don't know if there are any implementations where the arguments differ based on the implementation.

What if argument / option descriptions were at the function level but each implementation still listed which options / arguments apply?

    name: "round"
    description: >
      Rounding the value `x` to `s` decimal places.
    args:
          - name: "x"
            description: >
              Numerical expression to be rounded.
          - name: "s"
            description: >
              Number of decimal places to be rounded to.

              When `s` is a positive number, nothing will happen
              since `x` is an integer value.

              When `s` is a negative number, the rounding is
              performed to the nearest multiple of `10^(-s)`.    
    options:
      rounding:
        description: >
          When a boundary is computed to lie somewhere between two values,
          and this value cannot be exactly represented, this specifies how
          to round it.

            - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
              to the even option.
            - TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
              halfway, tie away from zero.
            - TRUNCATE: always round toward zero.
            - CEILING: always round toward positive infinity.
            - FLOOR: always round toward negative infinity.
            - AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
            - TIE_DOWN: round ties with FLOOR rule
            - TIE_UP: round ties with CEILING rule
            - TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
            - TIE_TO_ODD: round to nearest value; if exactly halfway, tie
              to the odd option.
    impls:
      - args:
          - value: i8
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: i8?
      - args:
          - value: i16
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: i16?
      - args:
          - value: i32
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: i32?
      - args:
          - value: i64
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: i64?
      - args:
          - value: fp32
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: fp32?
      - args:
          - value: fp64
            name: "x"
          - value: i32
            name: "s"
        options:
          rounding:
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: fp64?

We could also leave the option's "values" at the function level (so each function just has an array of option names)

The only programmatic tool I know of that would be impacted is the BFT and I think it already reports options / arguments at the function level so there would be some benefit to align the two models.

@richtia
Copy link
Member

richtia commented Jul 29, 2024

I'm a fan of argument/option descriptions at the function level. We have a fun of other functions where all this information is put into the descriptions that's already at the function level, and it seems like it could be organized better.

Some of the string functions for example: https://github.com/substrait-io/substrait/blob/main/extensions/functions_string.yaml#L61

@Blizzara
Copy link
Contributor

Blizzara commented Jul 30, 2024

For the sake of discussion, an alternative would be to allow multiple types in the args' value field, or generic types. Though that might require allowing to specify output type in terms of an input type. I.e. something like:

name: "round"
    description: >
      Rounding the value `x` to `s` decimal places.
    impls:
      - args:
          - value: integer # alternative 1: specify a "logical" type
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: >
              Number of decimal places to be rounded to.

              When `s` is a positive number, nothing will happen
              since `x` is an integer value.

              When `s` is a negative number, the rounding is
              performed to the nearest multiple of `10^(-s)`.
        options:
          rounding:
            description: >
              When a boundary is computed to lie somewhere between two values,
              and this value cannot be exactly represented, this specifies how
              to round it.

                - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
                  to the even option.
                - TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
                  halfway, tie away from zero.
                - TRUNCATE: always round toward zero.
                - CEILING: always round toward positive infinity.
                - FLOOR: always round toward negative infinity.
                - AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
                - TIE_DOWN: round ties with FLOOR rule
                - TIE_UP: round ties with CEILING rule
                - TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
                - TIE_TO_ODD: round to nearest value; if exactly halfway, tie
                  to the odd option.
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: arg_typeof("x")?
      - args:
          - value: # alternative 2: specify a list of types
             - fp32
             - fp64 
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: >
              Number of decimal places to be rounded to.

              When `s` is a positive number, the rounding
              is performed to a `s` number of decimal places.

              When `s` is a negative number, the rounding is
              performed to the left side of the decimal point
              as specified by `s`.
        options:
          rounding:
            description: >
              When a boundary is computed to lie somewhere between two values,
              and this value cannot be exactly represented, this specifies how
              to round it.

                - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
                  to the even option.
                - TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
                  halfway, tie away from zero.
                - TRUNCATE: always round toward zero.
                - CEILING: always round toward positive infinity.
                - FLOOR: always round toward negative infinity.
                - AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
                - TIE_DOWN: round ties with FLOOR rule
                - TIE_UP: round ties with CEILING rule
                - TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
                - TIE_TO_ODD: round to nearest value; if exactly halfway, tie
                  to the odd option.
            values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: arg_typeof("x")?

If multiple args specify lists of types, then all combinations of those would be considered valid. This could be useful especially if the function takes two or more arguments and they don't necessarily need to be of the same type (though maybe then the output type gets complicated). The "logical" types could be e.g. integers (i8 .. i64), numerics (i8 .. i64, fp32, fp64, decimal?), strings (string, varchar, fixedchar).

@jacques-n jacques-n changed the title [DISCUSS] Should be allow options to be declared at the top level of a function definition [DISCUSS] Should we allow options to be declared at the top level of a function definition Aug 22, 2024
@tokoko
Copy link
Contributor

tokoko commented Oct 5, 2024

I'd rather treat this as a yaml problem and solve with anchors/aliases. for example this works fine for me in python w/o any changes to the code. The new declarations field will probably have to be added to json schema for the files not to fail validation, though.

declarations:
  options: &rounding_options
    rounding:
      description: >
        When a boundary is computed to lie somewhere between two values,
        and this value cannot be exactly represented, this specifies how
        to round it.

          - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie
            to the even option.
          - TIE_AWAY_FROM_ZERO: round to nearest value; if exactly
            halfway, tie away from zero.
          - TRUNCATE: always round toward zero.
          - CEILING: always round toward positive infinity.
          - FLOOR: always round toward negative infinity.
          - AWAY_FROM_ZERO: round negative values with FLOOR rule, round positive values with CEILING rule
          - TIE_DOWN: round ties with FLOOR rule
          - TIE_UP: round ties with CEILING rule
          - TIE_TOWARDS_ZERO: round ties with TRUNCATE rule
          - TIE_TO_ODD: round to nearest value; if exactly halfway, tie
            to the odd option.
      values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR,
        AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
  round_s_description: &round_s_description >
    Number of decimal places to be rounded to.

    When `s` is a positive number, nothing will happen
    since `x` is an integer value.

    When `s` is a negative number, the rounding is
    performed to the nearest multiple of `10^(-s)`.

scalar_functions:
  -
    name: "round"
    description: >
      Rounding the value `x` to `s` decimal places.
    impls:
      - args:
          - value: i8
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: i8?
      - args:
          - value: i16
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: i16?
      - args:
          - value: i32
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: i32?
      - args:
          - value: i64
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: i64?
      - args:
          - value: fp32
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: fp32?
      - args:
          - value: fp64
            name: "x"
            description: >
              Numerical expression to be rounded.
          - value: i32
            name: "s"
            description: *round_s_description
        options:
          <<: *rounding_options
        nullability: DECLARED_OUTPUT
        return: fp64?

@jacques-n
Copy link
Contributor Author

I'd rather treat this as a yaml problem and solve with anchors/aliases

We evaluated this when we originally wrote the format. We found that anchors/aliases are not reliably implemented in many yaml parsers. I wonder if the status is any better today...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants