Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty inserts and replaces in MSQ. #15495

Merged
merged 14 commits into from
Jan 2, 2024

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Dec 5, 2023

Problem:

Currently, the MSQ engine requires ingest queries to generate output rows. A DML query that doesn't produces any output rows will result in an InsertCannotBeEmpty MSQ fault, requiring a user to validate and adjust their query so that the query produces non-empty rows to succeed.

Description:

  • Introduce a new query context failOnEmptyInsert, which defaults to false.
  • When failOnEmptyInsert is:
    • false, the default behavior after this patch, MSQE will allow empty ingest queries. An empty INSERT is essentially a no-op, and an empty REPLACE will delete all data that matches the OVERWRITE clause.
    • true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault.

Implementation:

  • The controller query kernel has a method isStageOutputEmpty(int stageId) that determines if a stage output is empty, non-empty or unknown using the cluster key statistics.
  • For a REPLACE ALL query over an existing ALL grain segment, the query will generate a tombstone spanning eternity which will be removed eventually be the coordinator duty, so subsequent data can be appended. This would be effectively similar to a DELETE table command as the eternity tombstone will mask all the underlying data.
  • Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false)
  • Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true)

Release note

  • MSQE defaults to allowing ingest queries that produce no data instead of failing them with the InsertCannotBeEmpty MSQ fault.
  • Introduce a new MSQ query context failOnEmptyInsert, which defaults to false.
  • When failOnEmptyInsert is:
    • false, the default behavior, MSQE will allow empty ingest queries. An empty INSERT is essentially a no-op, and an empty REPLACE will delete all data that matches the OVERWRITE clause.
    • true, MSQE will throw InsertCannotBeEmpty MSQ fault.

Key changed/added classes in this PR
  • ControllerImpl.java
  • ControllerQueryKernel.java
  • TombstoneHelper.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

- Introduce a new query context failOnEmptyInsert which defaults to false.
- When this context is false (default), MSQE will now allow empty inserts and replaces.
- When this context is true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault.
- For REPLACE ALL over an ALL grain segment, the query will generate a tombstone spanning eternity
which will be removed eventually be the coordinator.
- Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false)
- Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true)
@github-actions github-actions bot added Area - Documentation Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 5, 2023
@abhishekrb19 abhishekrb19 marked this pull request as draft December 6, 2023 04:26
@abhishekrb19 abhishekrb19 marked this pull request as ready for review December 7, 2023 17:57
1. Doc suggestions
2. Add tests for empty insert and replace queries with ALL grain and limit in the
   default failOnEmptyInsert mode (=false). Add similar tests to MSQFaultsTest with
   failOnEmptyInsert = true, so the query does fail with an InsertCannotBeEmpty fault.
3. Nullable annotation and javadocs
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments:
The release notes should mention the behavior change more explicitly. Other than that LGTM.

@abhishekrb19
Copy link
Contributor Author

The release notes should mention the behavior change more explicitly.

Good callout! I've updated the description. Thanks for the reviews!

@abhishekrb19 abhishekrb19 merged commit 9c7d7fc into apache:master Jan 2, 2024
83 checks passed
@abhishekrb19 abhishekrb19 deleted the allow_empty_inserts branch January 2, 2024 21:05
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Release Notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants