Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tracking for deleted namespace status check in restore flow #8233

Merged
merged 4 commits into from
Nov 18, 2024

Conversation

sangitaray2021
Copy link
Contributor

@sangitaray2021 sangitaray2021 commented Sep 20, 2024

Thank you for contributing to Velero!

Please add a summary of your change

Does your change fix a particular issue?

Fixes #8234

Please indicate you've done the following:

@anshulahuja98
Copy link
Collaborator

For - #8234

@anshulahuja98 anshulahuja98 self-requested a review September 20, 2024 09:42
Copy link

codecov bot commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 89.79592% with 5 lines in your changes missing coverage. Please review.

Project coverage is 58.98%. Comparing base (3f9c2dc) to head (6a5e8c2).
Report is 102 commits behind head on main.

Files with missing lines Patch % Lines
pkg/util/kube/resource_deletionstatus_tracker.go 76.19% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8233      +/-   ##
==========================================
- Coverage   59.15%   58.98%   -0.18%     
==========================================
  Files         367      368       +1     
  Lines       30777    38929    +8152     
==========================================
+ Hits        18206    22962    +4756     
- Misses      11113    14505    +3392     
- Partials     1458     1462       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add changelog and run linter.

@reasonerjt
Copy link
Contributor

Given we've reached FC and this may impact the restore flow, let's hold this one until the branch is cut for v1.15

@sangitaray2021 sangitaray2021 force-pushed the fixrestore_ns branch 3 times, most recently from 72d3b56 to c63dd9b Compare September 26, 2024 07:28
Signed-off-by: sangitaray2021 <[email protected]>

fixed unittest

Signed-off-by: sangitaray2021 <[email protected]>

refactored tracker execution and caller

Signed-off-by: sangitaray2021 <[email protected]>

added change log

Signed-off-by: sangitaray2021 <[email protected]>

Author:    sangitaray2021 <[email protected]>

Author:    sangitaray2021 <[email protected]>
Date:      Thu Sep 19 02:26:14 2024 +0530
Signed-off-by: sangitaray2021 <[email protected]>
Signed-off-by: sangitaray2021 <[email protected]>
@anshulahuja98
Copy link
Collaborator

@kaovilai can you help review the PR?

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from reviewing, this change reduce potential duplicate await on if namespace exists (wait for 10 min polling) to just one instance per restore. Is that correct?

It's not meant to add polling until deletion is complete to resume some other process?

@sangitaray2021
Copy link
Contributor Author

IIUC from reviewing, this change reduce potential duplicate await on if namespace exists (wait for 10 min polling) to just one instance per restore. Is that correct?

It's not meant to add polling until deletion is complete to resume some other process?

yes thats correct. We are reducing duplicate wait by caching the namespace which is under deletion.

Signed-off-by: sangitaray2021 <[email protected]>
Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pkg/util/kube/resource_deletionstatus_tracker.go Outdated Show resolved Hide resolved
Signed-off-by: sangitaray2021 <[email protected]>
@anshulahuja98
Copy link
Collaborator

@reasonerjt can you review this PR now given 1.15 is cut?

@reasonerjt
Copy link
Contributor

reasonerjt commented Nov 15, 2024

@anshulahuja98 I don't quite understand the background of the problem. I'm asking Xun @blackpiglet to take a look as he used to have discussion around this topic.

How often do you see a namespace being deleted when velero is trying to restore an object in this namespace?

@blackpiglet
Copy link
Contributor

Per my understanding, this PR tracks the namespace hanging in the terminating state.
If the namespace already hits a timeout during the restore, it is added to the tracked, and a flag is marked as true.
If there is a second time meeting the same timeout issue for the namespace, the restore is marked as failed.

IMO, revoking the ongoing restore is the ideal solution for this issue, but that obviously needs more effort, so I think this is also an acceptable solution for now.

@anshulahuja98
Copy link
Collaborator

anshulahuja98 commented Nov 15, 2024

@reasonerjt

we have seen instances of this happening for our customers.
The issue is that if say you have 100 resources in 1 namespace to be restored, in current behaviour the code will take 100*10 minutes to wait which is not ideal in any way.
With this fix we will only wait 10 minutes per namespace.

@anshulahuja98
Copy link
Collaborator

@blackpiglet can you signoff if the PR looks good?

@anshulahuja98 anshulahuja98 merged commit 74790d9 into vmware-tanzu:main Nov 18, 2024
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terminating namespace polling for each resource during restore
5 participants