-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement pull query routing to standbys if active is down #4398
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vpapavas, this is great work!
There's a lot of comments below, but many/most are just suggestions of improvements to the code.
Happy to review again - just add me back in or ping me on slack.
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-execution/src/main/java/io/confluent/ksql/services/SimpleKsqlClient.java
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/HeartbeatAgent.java
Outdated
Show resolved
Hide resolved
ksql-rest-model/src/main/java/io/confluent/ksql/rest/entity/HostStatusEntity.java
Outdated
Show resolved
Hide resolved
.../main/java/io/confluent/ksql/execution/streams/materialization/ks/ActiveAndStandByNodes.java
Outdated
Show resolved
Hide resolved
.../main/java/io/confluent/ksql/execution/streams/materialization/ks/ActiveAndStandByNodes.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/io/confluent/ksql/execution/streams/materialization/ks/KsStateStoreTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't had a chance to do a full review yet and I've gotta go, so will complete later. I think this could benefit from being split into 2 PRs (see inline comment)
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/HeartbeatAgent.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/HeartbeatAgent.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/HeartbeatAgent.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/HeartbeatAgent.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/KsqlRestApplication.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/LivenessFilter.java
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/LivenessFilter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few refactoring/structural comments.. Will do a final pass once the pending comments are pushed and pr rebased against your other change
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/util/KsqlConfig.java
Outdated
Show resolved
Hide resolved
ksql-execution/src/main/java/io/confluent/ksql/services/SimpleKsqlClient.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/execution/PullQueryExecutor.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/execution/PullQueryExecutor.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/LivenessFilter.java
Outdated
Show resolved
Hide resolved
progress Added functional test for routing and endpoint for active standby info undo changes to log files fixing tests fixed tests addressed vinoth's comments fixed tests fixed broken test after kafka merge address alans comments add missed file made hostStatus map atomic reference, changed copy-on-read to copy-on-write fix dead store bug? applying andy's comments successful rebase rebase and apply comments
a0de6e1
to
626b7ef
Compare
ksql-execution/src/main/java/io/confluent/ksql/services/SimpleKsqlClient.java
Outdated
Show resolved
Hide resolved
ksql-streams/src/main/java/io/confluent/ksql/execution/streams/RoutingFilter.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/ActiveHostFilter.java
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work @vpapavas ! Almost ready.. One more round and we should be home..
Could you also please add unit tests for the two RoutingFilter impls and RoutingFilters class.
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/ActiveHostFilter.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/KsqlRestApplication.java
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/KsqlRestApplication.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Outdated
Show resolved
Hide resolved
...reams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsStateStore.java
Outdated
Show resolved
Hide resolved
...reams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsStateStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor logging related comments..
LGTM! Please go ahead merge, once you are ready!
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/execution/PullQueryExecutor.java
Outdated
Show resolved
Hide resolved
ksql-rest-model/src/main/java/io/confluent/ksql/rest/entity/ActiveStandbyEntity.java
Outdated
Show resolved
Hide resolved
...-streams/src/main/java/io/confluent/ksql/execution/streams/materialization/ks/KsLocator.java
Show resolved
Hide resolved
@vpapavas seems like the CI build has an issue?
|
Description
Previously, pull queries were handled only by the active node of a partition. If the active goes down or rebalance happens, pull queries experience a period of unavailability. This PR allows standbys to handle pull queries until a new active is elected.
For this, it uses the liveness information provided by the heartbeat mechanism. If a node determines that the active is dead, it will forward the query to a standby that is alive. A query fails only if both the active and all standbys are dead.
Additionally, there is no busy loop with timeout when routing queries. Instead queries fail fast. This allows the client to implement the logic of retries.
Testing done
Functional test that ensures queries are routed to standby or active based on whether active is dead or alive.
In addition to this, manual tests will be performed both locally and on EC2 cluster as outlined in #4360
Reviewer checklist