-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23799] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics #20913
Closed
mshtelma
wants to merge
42
commits into
apache:master
from
mshtelma:filter_estimation_devision_by_zero
Closed
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
5c8fe0e
During evaluation of IN conditions, if the source table is empty, div…
67597fd
Added test case for the the following situation: During evaluation of…
989d4cc
[SPARK-23572][DOCS] Bring "security.md" up to date.
d49c6dd
[SPARK-23162][PYSPARK][ML] Add r2adj into Python API in LinearRegress…
kevinyu98 1dfa74b
[SPARK-23794][SQL] Make UUID as stateful expression
viirya 10d6ec1
[SPARK-23096][SS] Migrate rate source to V2
jerryshao 440549c
[SPARK-23699][PYTHON][SQL] Raise same type of error caught with Arrow…
BryanCutler c1b407c
[SPARK-23765][SQL] Supports custom line separator for json datasource
HyukjinKwon 923ab78
Revert "[SPARK-23096][SS] Migrate rate source to V2"
gatorsmile ac4a3c3
[SPARK-23675][WEB-UI] Title add spark logo, use spark logo image
0826158
[SPARK-23806] Broadcast.unpersist can cause fatal exception when used…
96dc23b
[SPARK-23770][R] Exposes repartitionByRange in SparkR
HyukjinKwon b8882b8
[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connec…
7c5ca63
[SPARK-23639][SQL] Obtain token before init metastore client in Spark…
yaooqinn 132afa7
[SPARK-23808][SQL] Set default Spark session in test-only spark sessi…
jose-torres 542f0ba
[SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j'…
jongyoul 082919f
[SPARK-23727][SQL] Support for pushing down filters for DateType in p…
yucai 6bd199b
Roll forward "[SPARK-23096][SS] Migrate rate source to V2"
jose-torres 96ba7af
[SPARK-23500][SQL][FOLLOWUP] Fix complex type simplification rules to…
gatorsmile 32382fc
[SPARK-23640][CORE] Fix hadoop config may override spark config
wangyum 14a48e3
[SPARK-23827][SS] StreamingJoinExec should ensure that input data is …
tdas 5152b3c
[SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result Iterator.
jiangxb1987 1f9284a
[SPARK-15009][PYTHON][FOLLOWUP] Add default param checks for CountVec…
BryanCutler 55f3371
[SPARK-23825][K8S] Requesting memory + memory overhead for pod memory
dvogelbacher f86b703
[SPARK-23285][K8S] Add a config property for specifying physical exec…
liyinan926 54d3dd0
[SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder classes
kiszk 782d7af
[SPARK-23834][TEST] Wait for connection before disconnect in Launcher…
2934384
[SPARK-23690][ML] Add handleinvalid to VectorAssembler
9b81269
[SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite.
36a1607
[MINOR][DOC] Fix a few markdown typos
e29fc9c
[MINOR][CORE] Show block manager id when remove RDD/Broadcast fails.
jiangxb1987 4efdf84
[SPARK-23099][SS] Migrate foreach sink to DataSourceV2
jose-torres 02bc4f6
[SPARK-23587][SQL] Add interpreted execution for MapObjects expression
viirya f8094fb
[SPARK-23809][SQL] Active SparkSession should be set by getOrCreate
ericl b55e39f
[SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unr…
699cc97
[SPARK-23826][TEST] TestHiveSparkSession should set default session
gatorsmile 7561a62
[SPARK-21351][SQL] Update nullability based on children's output
maropu a85535b
[SPARK-23583][SQL] Invoke should support interpreted execution
kiszk 295d11f
[SPARK-23668][K8S] Add config option for passing through k8s Pod.spec…
80cab07
[SPARK-23838][WEBUI] Running SQL query is displayed as "completed" in…
gengliangwang 984faf5
[SPARK-23637][YARN] Yarn might allocate more resource if a same execu…
5883c3c
Merge branch 'master' into filter_estimation_devision_by_zero
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the concrete example query when
ndv.toDouble == 0
?Also, is this only an place where we need this check?
For example, we don't here?:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
Line 166 in 5cfd5fa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have experienced this problem for the sub condition with IN clause, smth like "FLD in ("value")".
To my mind, this happens, if the table is empty. In this case ndv will be 0.
I think, it will make sense, to check it everywhere it is used in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the empty table case?
I think we need to fix the other places if they have the same issue. cc: @wzhfy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have developed the test that illustrates the problem. It is just a small scala app right now.
Right now it breaks after the last select. Adding
REFRESH TABLE TBL
does not fix the problem in this particular case.Should I add it to some particular test suite ?