-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow empty inserts and replaces in MSQ. #15495
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Introduce a new query context failOnEmptyInsert which defaults to false. - When this context is false (default), MSQE will now allow empty inserts and replaces. - When this context is true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault. - For REPLACE ALL over an ALL grain segment, the query will generate a tombstone spanning eternity which will be removed eventually be the coordinator. - Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false) - Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true)
github-actions
bot
added
Area - Documentation
Area - Batch Ingestion
Area - Ingestion
Area - MSQ
For multi stage queries - https://github.com/apache/druid/issues/12262
labels
Dec 5, 2023
abhishekrb19
force-pushed
the
allow_empty_inserts
branch
from
December 6, 2023 17:09
2fc24b0
to
f3ba85b
Compare
2 tasks
kgyrtkirk
reviewed
Dec 13, 2023
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Show resolved
Hide resolved
gianm
reviewed
Dec 13, 2023
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
...-stage-query/src/main/java/org/apache/druid/msq/indexing/error/InsertCannotBeEmptyFault.java
Outdated
Show resolved
Hide resolved
...s-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
abhishekrb19
force-pushed
the
allow_empty_inserts
branch
from
December 13, 2023 21:52
5947edb
to
afc7ab5
Compare
1. Doc suggestions 2. Add tests for empty insert and replace queries with ALL grain and limit in the default failOnEmptyInsert mode (=false). Add similar tests to MSQFaultsTest with failOnEmptyInsert = true, so the query does fail with an InsertCannotBeEmpty fault. 3. Nullable annotation and javadocs
abhishekrb19
force-pushed
the
allow_empty_inserts
branch
from
December 13, 2023 23:00
afc7ab5
to
90d6dfd
Compare
cryptoe
approved these changes
Jan 2, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments:
The release notes should mention the behavior change more explicitly. Other than that LGTM.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Show resolved
Hide resolved
replace_limit.patch
Good callout! I've updated the description. Thanks for the reviews! |
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Area - Batch Ingestion
Area - Documentation
Area - Ingestion
Area - MSQ
For multi stage queries - https://github.com/apache/druid/issues/12262
Release Notes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
Currently, the MSQ engine requires ingest queries to generate output rows. A DML query that doesn't produces any output rows will result in an
InsertCannotBeEmpty
MSQ fault, requiring a user to validate and adjust their query so that the query produces non-empty rows to succeed.Description:
failOnEmptyInsert
, which defaults to false.failOnEmptyInsert
is:false
, the default behavior after this patch, MSQE will allow empty ingest queries. An empty INSERT is essentially a no-op, and an empty REPLACE will delete all data that matches the OVERWRITE clause.true
, MSQE will throw the existingInsertCannotBeEmpty
MSQ fault.Implementation:
isStageOutputEmpty(int stageId)
that determines if a stage output is empty, non-empty or unknown using the cluster key statistics.REPLACE ALL
query over an existingALL
grain segment, the query will generate a tombstone spanning eternity which will be removed eventually be the coordinator duty, so subsequent data can be appended. This would be effectively similar to aDELETE
table command as the eternity tombstone will mask all the underlying data.MSQInsertTest
,MSQReplaceTest
to test the new default behavior (i.e., whenfailOnEmptyInsert = false
)failOnEmptyInsert = true
)Release note
InsertCannotBeEmpty
MSQ fault.failOnEmptyInsert
, which defaults to false.failOnEmptyInsert
is:false
, the default behavior, MSQE will allow empty ingest queries. An empty INSERT is essentially a no-op, and an empty REPLACE will delete all data that matches the OVERWRITE clause.true
, MSQE will throwInsertCannotBeEmpty
MSQ fault.Key changed/added classes in this PR
ControllerImpl.java
ControllerQueryKernel.java
TombstoneHelper.java
This PR has: