Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 migration-controller depending on cluster-manager condition. #328

Conversation

xuezhaojun
Copy link
Member

@xuezhaojun xuezhaojun commented Dec 6, 2023

Changes

  1. Extract parseStorageVersionMigrationFiles and put it at the beginning of the reconcile.
    1. motivation: the removeStorageVersionMigrations , applyStorageVersionMigrations, syncStorageVersionMigrationsCondition all have the parse files process, we should extract them into one place.
  2. Move supportStorageVersionMigration before removeStorageVersionMigrations
    1. motivation: the supported is the base of every StorageVersionMigration related process, it should be done at the beginning.
  3. Change applyStorageVersionMigrations to createStorageVersionMigrations , add CRD check and remove the update part
    1. motivation: If the current CRD storage version equals to the version base set in the storageversionmigration, then no need to create the svm instance, because if will still be restored using the same version.
    2. motivation: the storageversionmigration is a create-only job-style object, the version field update won’t trigger another round of migration.
  4. Remove testMigrationRequestFiles and any code locked with specific CRD types. Using foo and bar instead.
    1. motivation: We want to create test cases that won’t be affected by actual different migrationfiles cross releases.

Related issue(s)

Fixes #

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 114 lines in your changes are missing coverage. Please review.

Comparison is base (f89d535) 61.75% compared to head (e6bec14) 61.54%.
Report is 4 commits behind head on main.

Files Patch % Lines
...ollers/migrationcontroller/migration_controller.go 49.18% 56 Missing and 6 partials ⚠️
pkg/registration/hub/lease/clocksynccontroller.go 42.85% 47 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
- Coverage   61.75%   61.54%   -0.22%     
==========================================
  Files         132      133       +1     
  Lines       13992    14125     +133     
==========================================
+ Hits         8641     8693      +52     
- Misses       4585     4670      +85     
+ Partials      766      762       -4     
Flag Coverage Δ
unit 61.54% <46.47%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xuezhaojun
Copy link
Member Author

/assign @haoqing0110

@xuezhaojun
Copy link
Member Author

/assign @qiujian16

if storageVersion == migration.Spec.Resource.Version {
errs = append(errs, fmt.Errorf("the current storage version of %v is %v, which is the same as the version in the migration CR %v",
resourceToCRDName(migration.Spec.Resource.Resource, migration.Spec.Resource.Group),
storageVersion, migration.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we continue here, how does the controller get triggered after crd is updated later?

Copy link
Member Author

@xuezhaojun xuezhaojun Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point of time, the migration instance is not created yet.

The errs = append(errs, fmt.Errorf("the current storage version of %v is %v, which is the same as the version in the migration CR %v", resourceToCRDName(migration.Spec.Resource.Resource, migration.Spec.Resource.Group), storageVersion, migration.Name)) makes sure this round of reconcile will return error and fail.

And then the controller will requene the request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking you would want to requeue instead of error fallback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, sperate another checkCRDStorageVersion function and requeue if:

  1. CRD doens't exist
  2. CRD storage version is not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I think we probably should requeue under these conditions, or we need to watch crds.

@xuezhaojun xuezhaojun force-pushed the fix-migration-controller branch from c539e80 to aa6177e Compare December 7, 2023 12:10
err = checkCRDStorageVersion(ctx, migrations, apiExtensionClient)
if err != nil {
klog.Errorf("Failed to check CRD current storage version. %v", err)
controllerContext.Queue().Add(clusterManagerName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddAfter for a certain period. You would not want to requeue immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, add a reSyncTime equals to 5 seconds.

@xuezhaojun xuezhaojun force-pushed the fix-migration-controller branch from aa6177e to dfa9384 Compare December 7, 2023 15:01
@xuezhaojun xuezhaojun closed this Dec 7, 2023
@xuezhaojun xuezhaojun reopened this Dec 7, 2023
@xuezhaojun xuezhaojun closed this Dec 8, 2023
@xuezhaojun xuezhaojun reopened this Dec 8, 2023
@xuezhaojun xuezhaojun closed this Dec 8, 2023
@xuezhaojun xuezhaojun reopened this Dec 8, 2023
@xuezhaojun xuezhaojun requested a review from qiujian16 December 9, 2023 05:45
@qiujian16
Copy link
Member

/approve

@elgnay would you take a look pls?

Copy link
Contributor

openshift-ci bot commented Dec 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiujian16, xuezhaojun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xuezhaojun xuezhaojun force-pushed the fix-migration-controller branch from dfa9384 to e6bec14 Compare December 13, 2023 03:11
// 1.The CRD must exists before the migration CR is created.
// 2.The CRD must have at least 2 version.
// 3.The version set in the migration CR must exist in the CRD.
// 4.The currrent storage vesion in CRD should not be the version in the migration CR, otherwise during the migration, the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires we following this practice in the following upgrade flow: when create a StorageVersionMigration, the version must be the "migration from" version, not the "migration to" version. @qiujian16

cc @elgnay

@xuezhaojun
Copy link
Member Author

xuezhaojun commented Dec 13, 2023

@elgnay PR update:

  1. Add more check steps for CRD: a. CRD must have at lease 2 version; b. the version in migration CR must be included in the CRD
  2. Update the condition when checkCRD or createMigration fails.
  3. Change to AddRateLimit.

@qiujian16 In the last offline discussion with YangLe, he mentioned the point that StorageVersionMigration doesn't limit the version in its spec must be the "migration from" version, the version in its spec could be any versions in the CRD.

That means we have to make it a convention of the upgrade flow, more details see: https://github.com/open-cluster-management-io/ocm/pull/328/files#r1424801861

@xuezhaojun
Copy link
Member Author

/assign @elgnay
Please take another review when you have time, thanks!

@elgnay
Copy link
Contributor

elgnay commented Dec 19, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 19, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 93a9d19 into open-cluster-management-io:main Dec 19, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants