Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset offsets supervisor API #14772

Merged
merged 24 commits into from
Aug 17, 2023

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Aug 7, 2023

This PR adds a new API: /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets. This API accepts a reset datasource metadata in the body and only resets the specified partition offsets in the metadata store. Unlike the /reset API, this new API retains any previously checkpointed partition offsets not specified in the payload, so tasks will resume from where they left off. Further, only tasks serving the reset partitions will be restarted.

Description

A few use cases:

  • A supervisor that is reading from an offset that is invalid -- moved past the retention period, topic recreated, etc. So instead of a /reset that will clear the entire datasource metadata and start from the earliest or latest offset, users can now specify specific partition offsets.
  • Users wanting some history for a streaming datasource, but not all the way to earliest can also benefit from this API.
  • With support for multiple kafka streams by a single supervisor, a user can selectively reset a single topic's partition offsets instead of all the topics.

Why not re-purpose the existing reset API endpoint and the associated interface method?

  • To support backward compatibility with existing clients. The web-console (and possibly other clients) already send an empty body payload to the reset API, so accepting a new body in the existing API will cause some clients to break.

The API flow is as follows:

  • User submits the reset datasource metadata to the supervisor's /resetOffsets endpoint
  • Few validations like the stream and metadata type are done before a ResetOffsetsNotice is created
  • When the notice is handled asynchronously, the reset offsets is done, writing offsets to the metadata store and then restarting the appropriate tasks as necessary

Release note

Users can reset specific partition offsets for a supervisor using the resetOffsets API.


Key changed/added classes in this PR
  • SupervisorResource.java
  • Supervisor.java
  • SeekableStreamSupervisor.java
  • SeekableStreamSupervisorStateTest.java
  • TestSeekableStreamDataSourceMetadata.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch from d3500ef to 9212182 Compare August 8, 2023 06:44
@abhishekagarwal87 abhishekagarwal87 added the Needs web console change Backend API changes that would benefit from frontend support in the web console label Aug 8, 2023
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch 2 times, most recently from fe09257 to e9b4dec Compare August 12, 2023 06:03
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch from e9b4dec to 03b9324 Compare August 12, 2023 06:20
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch from 9d97584 to 8bcb1e5 Compare August 12, 2023 15:09
@abhishekrb19
Copy link
Contributor Author

The CodeQL alert about log entries depending on a user-provided value appears to be flagging existing logging code that's not part of this PR. I think it's a false positive alert.

@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch 3 times, most recently from 94f0b4f to 11671f6 Compare August 16, 2023 20:29
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch 3 times, most recently from c57d0fc to 3293590 Compare August 16, 2023 21:26
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch from 3293590 to eaa1d26 Compare August 16, 2023 21:36
@abhishekrb19 abhishekrb19 force-pushed the supervisor_reset_offsets_api branch from c6c23ed to 9e04ad5 Compare August 17, 2023 02:41
@vogievetsky
Copy link
Contributor

Ignoring the CodeQL issues per the comment above

@vogievetsky vogievetsky merged commit 37db5d9 into apache:master Aug 17, 2023
@vogievetsky vogievetsky removed the Needs web console change Backend API changes that would benefit from frontend support in the web console label Aug 17, 2023
@abhishekrb19 abhishekrb19 deleted the supervisor_reset_offsets_api branch August 21, 2023 13:01
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
pagrawal10 pushed a commit to confluentinc/druid that referenced this pull request Jun 12, 2024
* Add supervisor /resetOffsets API.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.

* Add a new interface method for backwards compatibility.

* Rename

* Adjust tests and javadocs.

* Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules

* UT fix

* Doc updates.

* remove extraneous debugging logs.

* Remove the boolean setting; only ResetHandle() and resetInternal()

* Relax constraints and add a new ResetOffsetsNotice; cleanup old logic.

* A separate ResetOffsetsNotice and some cleanup.

* Minor cleanup

* Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers

* Add unit tests for the no op implementations for test coverage

* CodeQL fix

* checkstyle from merge conflict

* Doc changes

* DOCUSAURUS code tabs fix. Thanks, Brian!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants