Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator section to user guide, Add std::ops operations to prelude, and add not() expr_fn #7732

Merged
merged 3 commits into from
Oct 17, 2023

Conversation

ongchi
Copy link
Contributor

@ongchi ongchi commented Oct 3, 2023

Which issue does this PR close?

Closes #7681

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Oct 3, 2023
@ongchi
Copy link
Contributor Author

ongchi commented Oct 4, 2023

Several functions added in this PR:

  • Functions already listed in documentation but not implemented:
    • not
    • eq
    • not_eq
    • gt
    • gt_eq
    • lt
    • lt_eq
  • Additional arithmetic functions introduced in this PR:
    • add
    • sub
    • mul
    • div
    • mod_
    • neg

Few more questions:

  • These implementations and tests already exist somewhere. Do these still have to add test code here?
  • mod is a reserved keyword in Rust. The reminder function named here to be able consistant with SQL function naming, but does it ok to datafusion?

Edit: change r#mod to mod_ for better code completion

@ongchi
Copy link
Contributor Author

ongchi commented Oct 4, 2023

Methods added to Expr in this PR:

  • Methods already listed in documentation but not implemented:
    • bitwise_and
    • bitwise_or
    • bitwise_xor
    • bitwise_shift_left
    • bitwise_shift_right
    • not
  • Additional arithmetic methods introduced in this PR:
    • As same as arithmetic functions listed above

I found that there are multiple implementations to do the same thing, e.g., bitwise operations on https://github.com/apache/arrow-datafusion/blob/a64f36d8fbce35cb7146271fb5d9ab98a9f094e6/datafusion/expr/src/expr_fn.rs#L172-L215 and https://github.com/apache/arrow-datafusion/blob/a64f36d8fbce35cb7146271fb5d9ab98a9f094e6/datafusion/expr/src/operator.rs#L313-L355, that should/is identical to each other. Do we have to merge these implementations and place tests in the same place?

@ongchi ongchi marked this pull request as ready for review October 4, 2023 03:44
//

/// Return `self + other`
#[allow(clippy::should_implement_trait)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy fires warnings because these methods have same name on trait methods in std::ops. Without these methods, user may require to include specific traits from std::ops::* in their source. Or, should we re-export these traits in prelude?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user may require to include specific traits from std::ops::* in their source.

I think using the std::ops in their traits is the less surprising API and is the "idiomatic" rust way to do this. I think we should remove these methods and just use std::ops

Or, should we re-export these traits in prelude?

This would be ok with me. Alternately we could add an example to https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html saying that it is possible to use the std::ops and show an example of a + b or the equivalent

@@ -118,4 +118,4 @@
myst_heading_anchors = 3

# enable nice rendering of checkboxes for the task lists
myst_enable_extensions = [ "tasklist"]
myst_enable_extensions = ["colon_fence", "deflist", "tasklist"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if its related to this PR description 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comphead
Copy link
Contributor

comphead commented Oct 4, 2023

Thanks @ongchi for the PR, I left some minor comments

@ongchi ongchi requested a review from comphead October 7, 2023 13:34
@comphead
Copy link
Contributor

comphead commented Oct 7, 2023

Doc is definitely looks more solid and consistent now, thanks @ongchi
For the new methods introduced for Expr trait we may need to cover them by unit tests.

Another thing comes to my mind if we support both syntaxes, we should prob have some notification when new operator is introduced.
It might be a common trait that will require during compilation for another syntax to be implemented

@ongchi
Copy link
Contributor Author

ongchi commented Oct 8, 2023

Another thing comes to my mind if we support both syntaxes, we should prob have some notification when new operator is introduced. It might be a common trait that will require during compilation for another syntax to be implemented

I think we should reorganize test cases in datafusion-expr and make some improvements by creating a test module. Currently, these tests do not cover all use cases. The tests may follow the order/structure from expression API documentation, which would make it easier to identify which parts needs to be improve.

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Oct 9, 2023
}

/// Return `self % other`
pub fn mod_(self, other: Expr) -> Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underscore in naming might be confusing

}

#[test]
fn test_logical_ops() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. In addition to planner test would be great to have same tests on Dataframe level to prove operators returns correct values.

Update examples and factor out the common interface (to make sure new added operator will be covered by all kind of syntaxes) can be done is follow up PR

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @ongchi and @comphead -- this PR is looking really great. I do think we should switch back to using std::ops but otherwise I think this PR is ready to go.

A really nice addition to the docs and usability in general

//

/// Return `self + other`
#[allow(clippy::should_implement_trait)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user may require to include specific traits from std::ops::* in their source.

I think using the std::ops in their traits is the less surprising API and is the "idiomatic" rust way to do this. I think we should remove these methods and just use std::ops

Or, should we re-export these traits in prelude?

This would be ok with me. Alternately we could add an example to https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html saying that it is possible to use the std::ops and show an example of a + b or the equivalent

@@ -265,7 +265,6 @@ mod test {
use arrow::datatypes::DataType;
use datafusion_common::tree_node::{RewriteRecursion, TreeNode, TreeNodeRewriter};
use datafusion_common::{DFField, DFSchema, ScalarValue};
use std::ops::Add;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing std::ops::Add means you can write something like expr1 + expr2.

@alamb alamb marked this pull request as draft October 13, 2023 17:33
@alamb
Copy link
Contributor

alamb commented Oct 13, 2023

Converting to draft as this PR is not waiting on feedback anymore. Please mark it ready to review when ready.

I do realize when writing documentation, often deficiencies in the existing APIs become clearer

One suggestion I had was to split out the changes to the documentation from the changes to how the traits work.

I think we should reorganize test cases in datafusion-expr and make some improvements by creating a test module. Currently, these tests do not cover all use cases. The tests may follow the order/structure from expression API documentation, which would make it easier to identify which parts needs to be improve.

I agree -- @ongchi could you please file a follow on ticket describing this work? Perhaps it would be a good first step in updating how the traits worked: consolidate all the existing tests we do have, so the scope of the "to do" work becomes clear.

@ongchi
Copy link
Contributor Author

ongchi commented Oct 16, 2023

Converting to draft as this PR is not waiting on feedback anymore. Please mark it ready to review when ready.

I do realize when writing documentation, often deficiencies in the existing APIs become clearer

One suggestion I had was to split out the changes to the documentation from the changes to how the traits work.

I think we should reorganize test cases in datafusion-expr and make some improvements by creating a test module. Currently, these tests do not cover all use cases. The tests may follow the order/structure from expression API documentation, which would make it easier to identify which parts needs to be improve.

I agree -- @ongchi could you please file a follow on ticket describing this work? Perhaps it would be a good first step in updating how the traits worked: consolidate all the existing tests we do have, so the scope of the "to do" work becomes clear.

Sure, I'm happy to do that.

I'll focus on correctness of the documentation in this PR with minimal code changes. Any further changes can depend on other users' input.

…wise,

comparison, and arithmetic expressions
@github-actions github-actions bot removed the optimizer Optimizer rules label Oct 16, 2023
@alamb alamb marked this pull request as ready for review October 16, 2023 15:36
@alamb alamb changed the title Add operator section to user guide Add operator section to user guide, Add std::ops operations to prelude, and add not() expr_fn Oct 16, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ongchi -- I think this one looks good to me. What do you think @comphead ?

@alamb alamb merged commit 360175a into apache:main Oct 17, 2023
24 checks passed
@ongchi ongchi deleted the doc-operators branch October 23, 2023 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supported expression operators should be documented
3 participants