-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: reconcile thinruntime failed when dataset is deleted #3300
bugfix: reconcile thinruntime failed when dataset is deleted #3300
Conversation
Signed-off-by: wangshulin <[email protected]>
Signed-off-by: wangshulin <[email protected]>
Hi @wangshli. Thanks for your PR. I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
@@ Coverage Diff @@
## master #3300 +/- ##
==========================================
+ Coverage 65.51% 65.55% +0.03%
==========================================
Files 399 399
Lines 23198 23196 -2
==========================================
+ Hits 15198 15205 +7
+ Misses 6215 6212 -3
+ Partials 1785 1779 -6
|
@@ -87,17 +87,17 @@ func (r *RuntimeReconciler) ReconcileInternal(ctx cruntime.ReconcileRequestConte | |||
return utils.RequeueIfError(errors.Wrap(err, "Failed to create")) | |||
} | |||
|
|||
// 2.Get or create the engine | |||
engine, err := r.implement.GetOrCreateEngine(ctx) | |||
// 2.Get the ObjectMeta of runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add comments about the reason of changing the order of step 2 and 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the order was used to judge GetOrCreateEngine failed reason which is runtime having deletionTimeStamp. In this case we should ignore the GetOrCreateEngine error and continue to reconcileruntimeDeletion, but it would cause engine is a nil pointer. And we have resolvd this problem inside GetOrCreateEngine so that it would return an engine although it could not get mounted dataset. So the order is no need to change now and i will fix it.
Signed-off-by: wangshulin <[email protected]>
pkg/ddc/base/dataset_test.go
Outdated
@@ -79,7 +79,7 @@ func TestGetMountedDatasetNamespacedName(t *testing.T) { | |||
}, | |||
} | |||
for _, tt := range tests { | |||
if got := GetMountedDatasetNamespacedName(tt.virtualDataset); len(got) != tt.want { | |||
if got := GetMountedDatasetNamespacedName(tt.virtualDataset.Spec.Mounts); len(got) != tt.want { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming the function name to GetPhysicalDatasetFromMounts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
pkg/ddc/thin/engine.go
Outdated
@@ -117,10 +117,21 @@ func Precheck(client client.Client, key types.NamespacedName) (found bool, err e | |||
func CheckReferenceDatasetRuntime(client client.Client, runtime *datav1alpha1.ThinRuntime) (bool, error) { | |||
dataset, err := utils.GetDataset(client, runtime.Name, runtime.Namespace) | |||
if err != nil { | |||
return false, err | |||
if utils.IgnoreNotFound(err) == nil && runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make it work even the virtualDataset is already deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the virtualDataset would not be deleted because its reference runtime has not been cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it may happen when deleting virtualDataset forcely. How to handle this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/test fluid-e2e |
Signed-off-by: wangshulin <[email protected]>
Signed-off-by: wangshulin <[email protected]>
pkg/ddc/thin/engine.go
Outdated
var mounted []types.NamespacedName | ||
if dataset != nil { | ||
// getMountedDataset from dataset first | ||
mounted = base.GetPhysicalDatasetFromMounts(dataset.Spec.Mounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding more logging info for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 { | ||
// then try to getMountedDataset from runtime | ||
mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if dataset is not found and the length of runtime mounts is 0? How will the user handle this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case will be protected by checking existence of reference datasets before removing any physical dataset. This can be done in the next PR.
pkg/ddc/thin/engine.go
Outdated
mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts) | ||
} | ||
// not mount other datasets | ||
if len(mounted) == 0 { | ||
return false, nil | ||
} | ||
|
||
// patch runtime with reference annotation | ||
_, err = PatchReferenceThinRuntimeAnnotation(ctx.Client, runtime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving PatchReferenceThinRuntimeAnnotation to another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Signed-off-by: wangshulin <[email protected]>
/test fluid-e2e |
} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 { | ||
// then try to getMountedDataset from runtime | ||
mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case will be protected by checking existence of reference datasets before removing any physical dataset. This can be done in the next PR.
pkg/ddc/thin/engine.go
Outdated
func CheckReferenceDatasetRuntime(ctx cruntime.ReconcileRequestContext, runtime *datav1alpha1.ThinRuntime) (bool, error) { | ||
dataset, err := utils.GetDataset(ctx.Client, runtime.Name, runtime.Namespace) | ||
if err != nil && utils.IgnoreNotFound(err) != nil { | ||
// ignore dataset not found err and try to get mounted dataset from runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment should be added below? This is the case which does not ignore error.
if dataset != nil { | ||
// get mountedRuntimeInfo from dataset first | ||
mountedNameSpacedNames = base.GetPhysicalDatasetFromMounts(dataset.Spec.Mounts) | ||
} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check runtime.Status.Mounts != nil
here because len(nil) == 0
. We can remove runtime.Status.Mounts != nil
to avoid code redundancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: wangshulin <[email protected]>
} | ||
|
||
// getMountedRuntimeInfo get mountedRuntimeInfo from dataset. | ||
// If could not get dataset, getMountedRuntimeInfo try to get mountedRuntimeInfo from runtime status. | ||
func (e *ReferenceDatasetEngine) getMountedRuntimeInfo() (base.RuntimeInfoInterface, error) { | ||
if e.mountedRuntimeInfo != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment: // If already have mountedRuntimeInfo, return it directly
Signed-off-by: wangshulin <[email protected]>
…ncedataset Signed-off-by: wangshulin <[email protected]>
Signed-off-by: wangshulin <[email protected]>
Signed-off-by: wangshulin <[email protected]>
/test fluid-e2e |
1 similar comment
/test fluid-e2e |
pkg/ddc/base/dataset_test.go
Outdated
@@ -260,7 +260,7 @@ func TestGetMountedDatasetSubPath(t *testing.T) { | |||
} | |||
for _, tt := range tests { | |||
t.Run(tt.name, func(t *testing.T) { | |||
if got := GetMountedDatasetSubPath(tt.args.dataset); !reflect.DeepEqual(got, tt.want) { | |||
if got := GetPhysicalDatasetSubPath(tt.args.dataset); !reflect.DeepEqual(got, tt.want) { | |||
t.Errorf("GetMountedDatasetSubPath() = %v, want %v", got, tt.want) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls also fix the error msg in t.Errorf
and Fix the test function's name
t.Errorf("GetMountedDatasetSubPath() = %v, want %v", got, tt.want) | |
t.Errorf("GetPhysicalDatasetSubPath() = %v, want %v", got, tt.want) |
physicalDataset = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts) | ||
dataset, err := utils.GetDataset(ctx.Client, runtime.Name, runtime.Namespace) | ||
if err != nil { | ||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not return every error here because in cases where len(runtime.Status.Mounts) == 0
&& Dataset not found
, the func will return error to keep engine building failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, CheckReferenceDatasetRuntime
can't judge whether this dataset is a reference dataset, so we raise the error now. And this case will be repaired by next PR.
newDataset := mountedDataset.DeepCopy() | ||
newDataset.Status.DatasetRef = utils.RemoveString(newDataset.Status.DatasetRef, datasetRefName) | ||
err := e.Client.Status().Update(context.TODO(), newDataset) | ||
if physicalRuntimeInfo != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add log message to indicate physicalRuntimeInfo == nil
case so that we can know corner case happened.
Signed-off-by: wangshulin <[email protected]>
@@ -61,7 +61,7 @@ func Build(id string, ctx cruntime.ReconcileRequestContext) (base.Engine, error) | |||
return nil, fmt.Errorf("engine %s is failed due to type conversion", ctx.Name) | |||
} | |||
|
|||
isRef, err := CheckReferenceDatasetRuntime(ctx.Client, runtime) | |||
isRef, err := CheckReferenceDatasetRuntime(ctx, runtime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in future we can simply check len(runtime.profileName) == 0
to indicate whether it is a VirtualRuntime or ThinRuntime instead of checking all the dataset mounts.
/test fluid-e2e |
Signed-off-by: wangshulin <[email protected]>
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheyang, TrafalgarZZZ The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
bugfix: reconcile thinruntime failed when dataset is deleted
Ⅱ. Does this pull request fix one issue?
fixes #3295
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews