Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useeq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn kernels from arrow #1475

Merged
merged 6 commits into from
Feb 17, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 22, 2021

Which issue does this PR close?

Marking as draft because the PR fails as the Arrow kernels don't handle timestamps and dates, which DataFusion does. I will make a PR to fix arrow shortly

re #1474 -- and towards #87

Rationale for this change

@jimexist added dyn kernels in apache/arrow-rs#858 but we still have the dispatch logic in datafusion, which is redundant (and as we add more logic to arrow-rs we will need to change datafusion to take advantage of it)

@matthewmturner is working on a similar set of kernels for scalars in apache/arrow-rs#1074

In this way, we will have a single set of kernels that datafusion can call and we will extend them as needed

What changes are included in this PR?

Use different arrow kernels

Are there any user-facing changes?

No

Corresponding arrow PR: apache/arrow-rs#1086

@liukun4515
Copy link
Contributor

@alamb This pr #1483 may be ready next week.
After comparison the decimal pull request, I will file the pull request about the mathematics operation with decimal type.
I think we can finish the feature of binary operation with decimal first.

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2021

@alamb This pr #1483 may be ready next week.
After comparison the decimal pull request, I will file the pull request about the mathematics operation with decimal type.
I think we can finish the feature of binary operation with decimal first.

👍 sounds like a great plan to me.

@liukun4515
Copy link
Contributor

@alamb This pr #1483 may be ready next week.
After comparison the decimal pull request, I will file the pull request about the mathematics operation with decimal type.
I think we can finish the feature of binary operation with decimal first.

👍 sounds like a great plan to me.

thanks.

@alamb
Copy link
Contributor Author

alamb commented Dec 26, 2021

@liukun4515 I am hopefuly that eventually the dyn_eq / dyn_eq_lit type kernels that @matthewmturner is working on in apache/arrow-rs#1074 will also support Decimal values (though we will have to work out precision and scale)

@liukun4515
Copy link
Contributor

@liukun4515 I am hopefuly that eventually the dyn_eq / dyn_eq_lit type kernels that @matthewmturner is working on in apache/arrow-rs#1074 will also support Decimal values (though we will have to work out precision and scale)

The #1483 is ready for merged, now I am developing the kernel for decimal operation in the arrow-rs. I think it will ready in the next release of arrow-rs.

@alamb alamb force-pushed the alamb/use_dyn_kernels branch from 27d7888 to 55b2ebf Compare January 6, 2022 12:11
@alamb
Copy link
Contributor Author

alamb commented Jan 6, 2022

Rebased on top of #1523 (aka upgrade to arrow 7.0.0) which includes apache/arrow-rs#1095 from @viirya ❤️ and this PR is in much better shape.

@alamb alamb force-pushed the alamb/use_dyn_kernels branch from 55b2ebf to dab8b26 Compare January 13, 2022 19:12
@alamb alamb marked this pull request as ready for review January 13, 2022 19:12
@alamb alamb force-pushed the alamb/use_dyn_kernels branch from dab8b26 to dca382f Compare January 16, 2022 12:50
@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2022

This PR is failing because eq_dyn kernels don't have support for Decimal. I will add that

@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2022

Filed apache/arrow-rs#1200 to track adding support for decimal arrays

@alamb alamb marked this pull request as draft January 21, 2022 14:34
@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2022

Switching back to draft until the underlying arrow kernel support is done

@alamb alamb marked this pull request as ready for review February 15, 2022 19:58
@alamb
Copy link
Contributor Author

alamb commented Feb 15, 2022

cc @matthewmturner @viirya and @liukun4515 I think this is now ready for review

@@ -91,8 +131,10 @@ fn is_not_distinct_from_bool(
.collect())
}

// TODO add iter for decimal array
// TODO move this to arrow-rs
// TODO move decimal kernels to to arrow-rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can take this, and migrate decimal to arrow-rs kernel recently

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@alamb codebase is becoming more cleaner.

@alamb
Copy link
Contributor Author

alamb commented Feb 17, 2022

@alamb codebase is becoming more cleaner.

Thanks -- I am trying @liukun4515 . What I really hope to do with these kernels is get to a point where we can add more specialized implementations in arrow-rs and have them flow automatically to DataFusion.

A particular interest of ours in IOx is performance of dictionary encoded data -- so after this PR is merged, the work that @viirya is doing in PRs like apache/arrow-rs#1326 will end up automatically being available in DataFusion)

@alamb alamb merged commit 4d68b6d into apache:master Feb 17, 2022
@alamb alamb deleted the alamb/use_dyn_kernels branch February 17, 2022 13:11
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines +1239 to +1244
Operator::Lt => lt_dyn(&left, &right),
Operator::LtEq => lt_eq_dyn(&left, &right),
Operator::Gt => gt_dyn(&left, &right),
Operator::GtEq => gt_eq_dyn(&left, &right),
Operator::Eq => eq_dyn(&left, &right),
Operator::NotEq => neq_dyn(&left, &right),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants