Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-null sub-field on nullable struct-field has wrong nullity. #8507

Closed
jacksonrnewhouse opened this issue Dec 12, 2023 · 6 comments · Fixed by #8623
Closed

non-null sub-field on nullable struct-field has wrong nullity. #8507

jacksonrnewhouse opened this issue Dec 12, 2023 · 6 comments · Fixed by #8623
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@jacksonrnewhouse
Copy link

Describe the bug

If you have table schema like

DFSchema {
    fields: [
        DFField {
            qualifier: Some(
                Bare {
                    table: "nexmark",
                },
            ),
            field: Field {
                name: "bid",
                data_type: Struct(
                    [
                        Field {
                            name: "auction",
                            data_type: Int64,
                            nullable: false,
                        },
                nullable: true,
          }

And run a query like SELECT bid.auction FROM nexmark you'll get an error when bid is null. Error looks like

ArrowError(InvalidArgumentError("Column 'nexmark.bid[datetime]' is declared as non-nullable but contains null values")) 

The problem is that the nullable() method only looks at the sub-fields nullability, ignoring that the parent field may be null.
https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/expr_schema.rs#L280.

To Reproduce

No response

Expected behavior

The expression should be nullable, as its parent may not be present.

Additional context

No response

@jacksonrnewhouse jacksonrnewhouse added the bug Something isn't working label Dec 12, 2023
@alamb
Copy link
Contributor

alamb commented Dec 12, 2023

Thank you for the report @jacksonrnewhouse -- this looks like a good first issue as it has a reproducer and a hit about where in the code to fix it, to me so marking it as such

@alamb alamb added the good first issue Good for newcomers label Dec 12, 2023
@ravikiran232
Copy link

@alamb I will try to solve the issue

@ravikiran232
Copy link

Hi @alamb, what I understand after reading the issue is that I should check condition whether the parent is null || sub-fields are null. if I am wrong please correct me.

@alamb
Copy link
Contributor

alamb commented Dec 19, 2023

Hi @alamb, what I understand after reading the issue is that I should check condition whether the parent is null || sub-fields are null. if I am wrong please correct me.

That sounds about right @ravikiran232 -- what I would suggest doing first is creating a reproducer / test that shows the behavior described in this ticket.

Then the fix should likely be straightforward

@marvinlanhenke
Copy link
Contributor

Hi @ravikiran232 are you still working on this?
Since I'm a first-timer (as well) I took the liberty to look into this Issue as well, in order to learn and get a better feel for the codebase. I think I already found a solution. However, I don't want to steal your opportunity to contribute.

Hi @alamb just to confirm the approach.
I wrote a test creating a nested schema and an expr = col("bid).field("auction") on which I executed nullable(). Then I included a guard close in the relevant match arm to check, if the parent col is nullable and return early if it is? Sounds about right?

If @ravikiran232 is not currently working on this anymore, I'd be more than happy to contribute.

Thank you for you feedback and your efforts.

@ravikiran232
Copy link

Hi @marvinlanhenke, continue with the issue

alamb pushed a commit that referenced this issue Dec 23, 2023
…ity (#8623)

* added test

* added guard clause

* rename schema fields

* clippy

---------

Co-authored-by: mlanhenke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
4 participants