Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: fix new watcher hang when registering for removed resource #7853

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Nov 18, 2024

Addresses: #7819

Right now if any new watcher try to register watcher for removed resource, it hangs as there is no update to be received. This fix adds a condition in xDS Client to trigger watch callback OnResourceDoesNotExist() to new watcher if its registering for a removed resource.

RELEASE NOTES:

  • xds: fixed an edge-case issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use.

@purnesh42H purnesh42H force-pushed the xds-new-watcher-resource-not-exist branch from df8ae14 to a8ada61 Compare November 18, 2024 08:50
@purnesh42H purnesh42H force-pushed the xds-new-watcher-resource-not-exist branch from a8ada61 to a632665 Compare November 18, 2024 08:51
@purnesh42H purnesh42H changed the title xdsclient/test/lds_watcher_test: fix new watcher if registering for removed resource xdsclient/test: fix new watcher if registering for removed resource Nov 18, 2024
@purnesh42H purnesh42H changed the title xdsclient/test: fix new watcher if registering for removed resource xds/internal/xdsclient: fix new watcher if registering for removed resource Nov 18, 2024
@purnesh42H purnesh42H changed the title xds/internal/xdsclient: fix new watcher if registering for removed resource xds/internal/xdsclient: fix new watcher hang when registering for removed resource Nov 18, 2024
@purnesh42H purnesh42H added this to the 1.69 Release milestone Nov 18, 2024
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.86%. Comparing base (66385b2) to head (b5651f0).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7853      +/-   ##
==========================================
+ Coverage   81.84%   81.86%   +0.02%     
==========================================
  Files         374      374              
  Lines       37993    37980      -13     
==========================================
- Hits        31096    31093       -3     
+ Misses       5598     5588      -10     
  Partials     1299     1299              
Files with missing lines Coverage Δ
xds/internal/xdsclient/authority.go 75.43% <100.00%> (+0.10%) ⬆️

... and 24 files with indirect coverage changes

---- 🚨 Try these New Features:

@easwars
Copy link
Contributor

easwars commented Nov 18, 2024

Same here. I think this is also worth release noting.

@easwars easwars assigned purnesh42H and unassigned easwars Nov 18, 2024
@purnesh42H purnesh42H force-pushed the xds-new-watcher-resource-not-exist branch from 908ccfe to 815f2a8 Compare November 19, 2024 13:48
@purnesh42H purnesh42H force-pushed the xds-new-watcher-resource-not-exist branch from 815f2a8 to ab561b3 Compare November 19, 2024 13:50
@purnesh42H purnesh42H requested a review from easwars November 19, 2024 13:53
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Nov 19, 2024
@purnesh42H purnesh42H changed the title xds/internal/xdsclient: fix new watcher hang when registering for removed resource xdsclient: fix new watcher hang when registering for removed resource Nov 19, 2024
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. Mostly nits this time around.

xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/tests/lds_watchers_test.go Outdated Show resolved Hide resolved
@easwars easwars assigned purnesh42H and unassigned easwars Nov 19, 2024
@purnesh42H purnesh42H requested a review from easwars November 19, 2024 18:34
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Nov 19, 2024
@easwars
Copy link
Contributor

easwars commented Nov 19, 2024

For the release note, how about something like:

  • xds: fixed an issue in the xDS client where new watchers would hang when registering a watch for a deleted resource
  • xdsclient: fixed an issue where new watchers would hang when registering a watch for a deleted resource

@easwars easwars assigned purnesh42H and unassigned easwars Nov 19, 2024
@dfawley
Copy link
Member

dfawley commented Nov 19, 2024

For the release note, how about something like:

Nobody is using our xdsclient so we don't want to document it this way. Could this affect grpc users? If so, how? Something like:

  • xds: fixed an issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use

?

@purnesh42H
Copy link
Contributor Author

purnesh42H commented Nov 20, 2024

Nobody is using our xdsclient so we don't want to document it this way. Could this affect grpc users? If so, how? Something like:

yeah initially i didn't add release notes because i thought its not being directly used by users. I guess its just used as part of xds lb policy so may be having xds as top level package is fine.

fixed an issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use

this is not clear to me. If someone is using xds lb policy, they would know about resource deletions right? So, its fine to mention that? May be something like, the client initialization can hang for xds resources if the resource that existed previously got deleted?

@purnesh42H
Copy link
Contributor Author

Nobody is using our xdsclient so we don't want to document it this way. Could this affect grpc users? If so, how? Something like:

yeah initially i didn't add release notes because i thought its not being directly used by users. I guess its just used as part of xds lb policy so may be having xds as top level package is fine.

fixed an issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use

this is not clear to me. If someone is using xds lb policy, they would know about resource deletions right? So, its fine to mention that? May be something like, the client initialization can hang for xds resources if the resource that existed previously got deleted?

@dfawley

@easwars
Copy link
Contributor

easwars commented Nov 21, 2024

this is not clear to me. If someone is using xds lb policy, they would know about resource deletions right? So, its fine to mention that? May be something like, the client initialization can hang for xds resources if the resource that existed previously got deleted?

The way users use xDS on the client is by specifying a dial target with an xds:/// or xdstp:/// scheme, and by importing the xds package for side-effects. The resources are watched by the xDS resolver and the LB policies and these are not directly exposed to the user.

And the way users use xDS on the server is by explicitly created an xDS-enabled gRPC server. This functionality is provided by the xds package. Here as well, the actual resources are watched by the server implementation and not directly exposed to the user.

So, the user will not see the effects on watch callbacks being invoked or not, but they would be able to see the knock-on effects of those.

So yes, I agree with @dfawley in making the release note more applicable from the user's pov. But I feel the proposal from Doug made it sounds more scary than it actually is. So maybe?

xds: fixed a (rare|edge-case|corner-case) issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use

@purnesh42H
Copy link
Contributor Author

this is not clear to me. If someone is using xds lb policy, they would know about resource deletions right? So, its fine to mention that? May be something like, the client initialization can hang for xds resources if the resource that existed previously got deleted?

The way users use xDS on the client is by specifying a dial target with an xds:/// or xdstp:/// scheme, and by importing the xds package for side-effects. The resources are watched by the xDS resolver and the LB policies and these are not directly exposed to the user.

And the way users use xDS on the server is by explicitly created an xDS-enabled gRPC server. This functionality is provided by the xds package. Here as well, the actual resources are watched by the server implementation and not directly exposed to the user.

So, the user will not see the effects on watch callbacks being invoked or not, but they would be able to see the knock-on effects of those.

So yes, I agree with @dfawley in making the release note more applicable from the user's pov. But I feel the proposal from Doug made it sounds more scary than it actually is. So maybe?

xds: fixed a (rare|edge-case|corner-case) issue where some clients or servers would not initialize correctly if another channel or server with the same target was already in use

Updated the release notes as edge case. We can revisit during release again. Merging now

@purnesh42H purnesh42H merged commit 87f0254 into grpc:master Nov 21, 2024
15 checks passed
@dfawley
Copy link
Member

dfawley commented Nov 22, 2024

I'm not really concerned with the release notes sounding too scary, but they should be accurate. So it should clearly indicate that it's a "sometimes" not an "always". But "an edge-case issue" sounds awkward to me.

  • xds: fixed a bug causing clients or servers to hang at startup if the same target is already in use and some of its resources are not available.

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants