Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka(ticdc): reset the admin client to fix broken pipe #8228

Merged
merged 6 commits into from
Feb 11, 2023

Conversation

3AceShowHand
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #8225, close #8223

What is changed and how it works?

AdminClient may meet write: broken pipe, it's a know issue have not been addressed yet.

  • reset the admin client if meet broken pipe error.

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None`

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 10, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • CharlesCheung96
  • sdojjy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-triage-completed labels Feb 10, 2023
@3AceShowHand 3AceShowHand changed the title Fix broken pipe kafka(ticdc): reset the admin client to fix broken pipe Feb 10, 2023
@3AceShowHand
Copy link
Contributor Author

/run-all-tests

pkg/sink/kafka/admin.go Show resolved Hide resolved
pkg/sink/kafka/admin.go Outdated Show resolved Hide resolved
pkg/sink/kafka/admin.go Outdated Show resolved Hide resolved
@3AceShowHand
Copy link
Contributor Author

/run-all-tests

@3AceShowHand 3AceShowHand added the needs-cherry-pick-release-6.6 Should cherry pick this PR to release-6.6 branch. label Feb 10, 2023
@3AceShowHand
Copy link
Contributor Author

/run-all-tests

_ = a.client.Close()
a.client = newClient

return errors.New("retry after reset")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we always return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if return nil here, the retry logic will exit, this return an error is on purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is more appropriate to return this error outside of reset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very strange that a func always returns an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we move it outside, we still have to write one error when retry, no matter reset return error or not.

brokers []*sarama.Broker
err error
)
err = retry.Do(ctx, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the retry logic is almost the same, can you refine it to reuse same code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Could the retry logic be extracted to a common place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, we can use high order function to solve this.

@zhaoxinyu
Copy link
Contributor

I'm wondering what is the route cause of this "broken pipe" error? Is the original issue related to this sarama issue (IBM/sarama#2173)? Maybe it's a sarama bug which hasn't been fixed thoroughly.

@3AceShowHand 3AceShowHand requested a review from sdojjy February 11, 2023 04:20
@3AceShowHand
Copy link
Contributor Author

I'm wondering what is the route cause of this "broken pipe" error? Is the original issue related to this sarama issue (Shopify/sarama#2173)? Maybe it's a sarama bug which hasn't been fixed thoroughly.

yes, it's a know bug issue last for a few years, which has not been addressed yet.

@3AceShowHand
Copy link
Contributor Author

/run-all-tests

1 similar comment
@3AceShowHand
Copy link
Contributor Author

/run-all-tests

@3AceShowHand
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 11, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 11, 2023
@3AceShowHand
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 1969a7f

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 11, 2023
@3AceShowHand
Copy link
Contributor Author

/run-kafka-integration-test
/run-verify-ci

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.6: #8230.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.6 Should cherry pick this PR to release-6.6 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kafka sink not robust to admin tasks Kakfa changefeed stucks
6 participants