[CHORE] Tidy up typing of binary ops [1/2] #1114

xcharleslin · 2023-06-30T19:11:31Z

This PR tidies up typing of arithmetic binary ops.

It makes the following small-but-key improvements:

1. Centralize output type resolution in one place (`daft-core::datatype::binary_op`)

Before, both Expressions (daft-dsl::expr) and Series (daft-core::series) implemented their own logic to determine (a) are the input types valid and (b) what should the output type be. This is a nontrivial amount of conditional logic (over 100 lines) that was duplicated twice and had no guarantee of being in sync.

In addition, it made implementing new types for Series operations unwieldy, since one would have to interleave the computation implementation among the input type validation logic and the output type logic.

In this PR, input type validation and output type resolution is factored out to DataType. You can Add a DataType and DataType, and produce a Result which will tell you whether the operation is okay and what the output type will be.

Expression then simply calls DataType. Series calls DataType, and then does the computation logic after.

2. Change type resolution logic from implicit (subtractive) to explicit (additive).

Before, type resolution logic was "opt-out" - all operations would be resolved by the same type matrix, and there was a lot of logic excluding types from that behaviour. Overall, this made it difficult to reason about and modify what types were supported and what logic was being called. There is some concrete evidence of this in the code:

dead code in the supertype match rules (mostly in the logical types)
the existence of multiple allowed/disallowed behaviours that we were unaware of (e.g. being unable to add bools, types where cmp is allowed but arithmetic is not, etc.)
multiple return statements in the typing logic (14 in Expression binary ops, 13 in Series arithmetic alone), which is a bit of a control-flow maze

This PR changes the behaviour to be "opt-in". Binary ops explicitly specify the types they implement behaviour on. If behaviour is not specified, then it does not work - simple as that.

This change is mostly code rearrangement. The typing logic is separated between numeric types, remaining physical types, and logical types. Operations separately specify their behaviour across these different types using match rules instead of return statements, while still calling default implementations where appropriate.

Now it is really easy to see whether some behaviour is implemented or not, and where the correct place is to add new behaviour.

This PR implements the changes for arithmetic binary ops, which are the most complicated of the binary ops. A followup PR will implement the changes for logical and comparison ops, and remove the old supertype utility.

codecov · 2023-07-01T22:19:22Z

Codecov Report

Merging #1114 (459431b) into main (5eb5318) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1114   +/-   ##
=======================================
  Coverage   88.49%   88.49%           
=======================================
  Files          54       54           
  Lines        5476     5476           
=======================================
  Hits         4846     4846           
  Misses        630      630

jaychia · 2023-07-05T18:36:20Z

src/daft-dsl/src/expr.rs

-                        }
-                        let supertype = try_get_supertype(lhs, rhs)?;
-                        Ok(Field::new(left.to_field(schema)?.name.as_str(), supertype))
+                        let result_type = (&left_field.dtype + &right_field.dtype)?;


jaychia · 2023-07-05T18:38:22Z

src/daft-core/src/series/array_impl/binary_ops.rs

+fn physical_add(lhs: &Series, rhs: &Series, output_type: &DataType) -> DaftResult<Series> {
+    use DataType::*;
+    match output_type {
+        Utf8 => {


Nice! This is nice because we can match the intended behavior of the Series operation against the intended return type as intended by the type-system, instead of having Series do its own typing logic.

I do wonder about cases where the result dtype isn't fully expressive about the actual runtime implementation though. For example, if the result if a "date", it may not provide enough information about the implementation (is this a date plus a duration or a duration plus a date?)

Separately, I guess instead of our current unit tests, we can maybe even just add asserts in our code that will ensure that the series output dtype matches the type-system's intended dtype? Not quite sure how best to tie the resolve-time and runtime behavior together yet.

Yeah, it will still be up to the implementation to decide how to actually perform the operation. But the implementation can choose to do it however it wants since there will be no conflicting specification anywhere else.

Separately, I guess instead of our current unit tests, we can maybe even just add asserts in our code that will ensure that the series output dtype matches the type-system's intended dtype? Not quite sure how best to tie the resolve-time and runtime behavior together yet.

This can be accomplished by checking Schema vs Series datatypes at Table construction time, right?

Similar to #1114, for the remaining binary ops (logical and comp). Also cleans up the macros in SeriesLike binary_ops as well. --------- Co-authored-by: Xiayue Charles Lin <[email protected]>

Xiayue Charles Lin added 8 commits June 27, 2023 17:53

Separate out the numerical supertyping logic

e0a343e

Move arithmetic type inference from expr to dtype

c0104e1

Add backwards compatibility to new type match

d8d38c5

Impl add for SeriesLike

1283f9e

Hook in new Add impl and fix bugs

5ec2c64

New sub impl

14177f0

Move all arithmetic

2f7c124

clippy fix

b29cc90

Xiayue Charles Lin added 3 commits July 2, 2023 00:34

trait SeriesBinaryOps

d1e5628

now that series/ops/arithmetic.rs is cleaned up, macroize it

0754881

Rename to binary_ops

8600067

xcharleslin changed the title ~~Charles/opstyping~~ Tidy up typing of binary ops [1/2] Jul 3, 2023

xcharleslin changed the title ~~Tidy up typing of binary ops [1/2]~~ [CHORE] Tidy up typing of binary ops [1/2] Jul 3, 2023

github-actions bot added the chore label Jul 3, 2023

xcharleslin requested review from samster25, clarkzinzow and jaychia July 3, 2023 21:59

xcharleslin marked this pull request as ready for review July 3, 2023 21:59

Merge branch 'main' into charles/opstyping

459431b

xcharleslin merged commit 447cb2f into main Jul 5, 2023

xcharleslin deleted the charles/opstyping branch July 5, 2023 18:28

jaychia reviewed Jul 5, 2023

View reviewed changes

xcharleslin mentioned this pull request Jul 6, 2023

[CHORE] Tidy typing for remaining binary ops: logical, comp #1124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CHORE] Tidy up typing of binary ops [1/2] #1114

[CHORE] Tidy up typing of binary ops [1/2] #1114

xcharleslin commented Jun 30, 2023 •

edited

Loading

codecov bot commented Jul 1, 2023 •

edited

Loading

jaychia Jul 5, 2023

jaychia Jul 5, 2023

jaychia Jul 5, 2023

xcharleslin Jul 6, 2023

xcharleslin Jul 6, 2023

[CHORE] Tidy up typing of binary ops [1/2] #1114

[CHORE] Tidy up typing of binary ops [1/2] #1114

Conversation

xcharleslin commented Jun 30, 2023 • edited Loading

1. Centralize output type resolution in one place (daft-core::datatype::binary_op)

2. Change type resolution logic from implicit (subtractive) to explicit (additive).

codecov bot commented Jul 1, 2023 • edited Loading

Codecov Report

jaychia Jul 5, 2023

Choose a reason for hiding this comment

jaychia Jul 5, 2023

Choose a reason for hiding this comment

jaychia Jul 5, 2023

Choose a reason for hiding this comment

xcharleslin Jul 6, 2023

Choose a reason for hiding this comment

xcharleslin Jul 6, 2023

Choose a reason for hiding this comment

xcharleslin commented Jun 30, 2023 •

edited

Loading

1. Centralize output type resolution in one place (`daft-core::datatype::binary_op`)

codecov bot commented Jul 1, 2023 •

edited

Loading