Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22785][SQL] remove ColumnVector.anyNullsSet #19980

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

ColumnVector.anyNullsSet is not called anywhere except tests, and we can easily replace it with ColumnVector.numNulls > 0

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @ueshin @kiszk @gatorsmile

@ueshin
Copy link
Member

ueshin commented Dec 14, 2017

LGTM pending Jenkins.

@kiszk
Copy link
Member

kiszk commented Dec 14, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Dec 14, 2017

Test build #84912 has finished for PR 19980 at commit c4d9a41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

@asfgit asfgit closed this in d095795 Dec 14, 2017
asfgit pushed a commit that referenced this pull request Jan 31, 2018
## What changes were proposed in this pull request?

In #19980 , we thought `anyNullsSet` can be simply implemented by `numNulls() > 0`. This is logically true, but may have performance problems.

`OrcColumnVector` is an example. It doesn't have the `numNulls` property, only has a `noNulls` property. We will lose a lot of performance if we use `numNulls() > 0` to check null.

This PR simply revert #19980, with a renaming to call it `hasNull`. Better name suggestions are welcome, e.g. `nullable`?

## How was this patch tested?

existing test

Author: Wenchen Fan <[email protected]>

Closes #20452 from cloud-fan/null.

(cherry picked from commit 48dd6a4)
Signed-off-by: Wenchen Fan <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Jan 31, 2018
## What changes were proposed in this pull request?

In apache#19980 , we thought `anyNullsSet` can be simply implemented by `numNulls() > 0`. This is logically true, but may have performance problems.

`OrcColumnVector` is an example. It doesn't have the `numNulls` property, only has a `noNulls` property. We will lose a lot of performance if we use `numNulls() > 0` to check null.

This PR simply revert apache#19980, with a renaming to call it `hasNull`. Better name suggestions are welcome, e.g. `nullable`?

## How was this patch tested?

existing test

Author: Wenchen Fan <[email protected]>

Closes apache#20452 from cloud-fan/null.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants