bugfix: reconcile thinruntime failed when dataset is deleted #3300

wangshli · 2023-06-21T06:53:06Z

Ⅰ. Describe what this PR does

bugfix: reconcile thinruntime failed when dataset is deleted

Ⅱ. Does this pull request fix one issue?

fixes #3295

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Signed-off-by: wangshulin <[email protected]>

fluid-e2e-bot · 2023-06-21T06:53:23Z

Hi @wangshli. Thanks for your PR.

I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

codecov · 2023-06-21T07:11:17Z

Codecov Report

Merging #3300 (1adf649) into master (3fe30e2) will increase coverage by 0.03%.
The diff coverage is 67.22%.

❗ Current head 1adf649 differs from pull request most recent head a6a2f41. Consider uploading reports for the commit a6a2f41 to get more accurate results

@@            Coverage Diff             @@
##           master    #3300      +/-   ##
==========================================
+ Coverage   65.51%   65.55%   +0.03%     
==========================================
  Files         399      399              
  Lines       23198    23196       -2     
==========================================
+ Hits        15198    15205       +7     
+ Misses       6215     6212       -3     
+ Partials     1785     1779       -6

Impacted Files	Coverage Δ
pkg/ddc/juicefs/data_migrate.go	`67.96% <ø> (+1.43%)`	⬆️
pkg/ddc/juicefs/transform_fuse.go	`78.29% <ø> (+1.21%)`	⬆️
pkg/ddc/juicefs/utils.go	`78.71% <ø> (-0.72%)`	⬇️
pkg/ddc/thin/referencedataset/cm.go	`46.83% <45.83%> (ø)`
pkg/ddc/thin/referencedataset/engine.go	`50.00% <58.82%> (-2.95%)`	⬇️
pkg/ddc/thin/referencedataset/runtime.go	`68.18% <64.70%> (+13.07%)`	⬆️
pkg/ddc/base/dataset.go	`100.00% <100.00%> (ø)`
pkg/ddc/thin/engine.go	`86.79% <100.00%> (+1.68%)`	⬆️
pkg/ddc/thin/referencedataset/sync.go	`60.00% <100.00%> (+2.46%)`	⬆️
pkg/ddc/thin/referencedataset/volume.go	`65.09% <100.00%> (ø)`

cheyang · 2023-06-21T09:08:21Z

pkg/controllers/runtime_controller.go

@@ -87,17 +87,17 @@ func (r *RuntimeReconciler) ReconcileInternal(ctx cruntime.ReconcileRequestConte
 		return utils.RequeueIfError(errors.Wrap(err, "Failed to create"))
 	}

-	// 2.Get or create the engine
-	engine, err := r.implement.GetOrCreateEngine(ctx)
+	// 2.Get the ObjectMeta of runtime


Could you add comments about the reason of changing the order of step 2 and 3?

Changing the order was used to judge GetOrCreateEngine failed reason which is runtime having deletionTimeStamp. In this case we should ignore the GetOrCreateEngine error and continue to reconcileruntimeDeletion, but it would cause engine is a nil pointer. And we have resolvd this problem inside GetOrCreateEngine so that it would return an engine although it could not get mounted dataset. So the order is no need to change now and i will fix it.

Signed-off-by: wangshulin <[email protected]>

cheyang · 2023-06-21T11:01:08Z

pkg/ddc/base/dataset_test.go

@@ -79,7 +79,7 @@ func TestGetMountedDatasetNamespacedName(t *testing.T) {
 		},
 	}
 	for _, tt := range tests {
-		if got := GetMountedDatasetNamespacedName(tt.virtualDataset); len(got) != tt.want {
+		if got := GetMountedDatasetNamespacedName(tt.virtualDataset.Spec.Mounts); len(got) != tt.want {


How about renaming the function name to GetPhysicalDatasetFromMounts?

cheyang · 2023-06-21T11:07:20Z

pkg/ddc/thin/engine.go

@@ -117,10 +117,21 @@ func Precheck(client client.Client, key types.NamespacedName) (found bool, err e
 func CheckReferenceDatasetRuntime(client client.Client, runtime *datav1alpha1.ThinRuntime) (bool, error) {
 	dataset, err := utils.GetDataset(client, runtime.Name, runtime.Namespace)
 	if err != nil {
-		return false, err
+		if utils.IgnoreNotFound(err) == nil && runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 {


I think we should make it work even the virtualDataset is already deleted.

Normally, the virtualDataset would not be deleted because its reference runtime has not been cleaned up.

But it may happen when deleting virtualDataset forcely. How to handle this then?

cheyang · 2023-06-25T01:49:18Z

/test fluid-e2e

Signed-off-by: wangshulin <[email protected]>

cheyang · 2023-06-25T04:05:49Z

pkg/ddc/thin/engine.go

+	var mounted []types.NamespacedName
+	if dataset != nil {
+		// getMountedDataset from dataset first
+		mounted = base.GetPhysicalDatasetFromMounts(dataset.Spec.Mounts)


I suggest adding more logging info for debugging.

cheyang · 2023-06-25T04:09:45Z

pkg/ddc/thin/engine.go

+	} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 {
+		// then try to getMountedDataset from runtime
+		mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts)
+	}


What will happen if dataset is not found and the length of runtime mounts is 0? How will the user handle this situation?

The case will be protected by checking existence of reference datasets before removing any physical dataset. This can be done in the next PR.

cheyang · 2023-06-26T02:11:21Z

pkg/ddc/thin/engine.go

 		mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts)
 	}
 	// not mount other datasets
 	if len(mounted) == 0 {
 		return false, nil
 	}

+	// patch runtime with reference annotation
+	_, err = PatchReferenceThinRuntimeAnnotation(ctx.Client, runtime)


I suggest moving PatchReferenceThinRuntimeAnnotation to another PR.

Signed-off-by: wangshulin <[email protected]>

cheyang · 2023-06-26T06:50:09Z

/test fluid-e2e

TrafalgarZZZ · 2023-06-26T07:58:43Z

pkg/ddc/thin/engine.go

+	} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 {
+		// then try to getMountedDataset from runtime
+		mounted = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts)
+	}


The case will be protected by checking existence of reference datasets before removing any physical dataset. This can be done in the next PR.

TrafalgarZZZ · 2023-06-26T09:14:27Z

pkg/ddc/thin/engine.go

+func CheckReferenceDatasetRuntime(ctx cruntime.ReconcileRequestContext, runtime *datav1alpha1.ThinRuntime) (bool, error) {
+	dataset, err := utils.GetDataset(ctx.Client, runtime.Name, runtime.Namespace)
+	if err != nil && utils.IgnoreNotFound(err) != nil {
+		// ignore dataset not found err and try to get mounted dataset from runtime


I think the comment should be added below? This is the case which does not ignore error.

TrafalgarZZZ · 2023-06-26T09:16:15Z

pkg/ddc/thin/referencedataset/runtime.go

+	if dataset != nil {
+		// get mountedRuntimeInfo from dataset first
+		mountedNameSpacedNames = base.GetPhysicalDatasetFromMounts(dataset.Spec.Mounts)
+	} else if runtime.Status.Mounts != nil && len(runtime.Status.Mounts) != 0 {


No need to check runtime.Status.Mounts != nil here because len(nil) == 0. We can remove runtime.Status.Mounts != nil to avoid code redundancy

Signed-off-by: wangshulin <[email protected]>

cheyang · 2023-06-27T09:33:46Z

pkg/ddc/thin/referencedataset/runtime.go

 }

+// getMountedRuntimeInfo get mountedRuntimeInfo from dataset.
+// If could not get dataset, getMountedRuntimeInfo try to get mountedRuntimeInfo from runtime status.
 func (e *ReferenceDatasetEngine) getMountedRuntimeInfo() (base.RuntimeInfoInterface, error) {
 	if e.mountedRuntimeInfo != nil {


Please add comment: // If already have mountedRuntimeInfo, return it directly

Signed-off-by: wangshulin <[email protected]>

…ncedataset Signed-off-by: wangshulin <[email protected]>

Signed-off-by: wangshulin <[email protected]>

cheyang · 2023-06-29T01:54:44Z

/test fluid-e2e

cheyang · 2023-06-29T01:56:45Z

/test fluid-e2e

TrafalgarZZZ · 2023-06-29T03:49:19Z

pkg/ddc/base/dataset_test.go

@@ -260,7 +260,7 @@ func TestGetMountedDatasetSubPath(t *testing.T) {
 	}
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
-			if got := GetMountedDatasetSubPath(tt.args.dataset); !reflect.DeepEqual(got, tt.want) {
+			if got := GetPhysicalDatasetSubPath(tt.args.dataset); !reflect.DeepEqual(got, tt.want) {
 				t.Errorf("GetMountedDatasetSubPath() = %v, want %v", got, tt.want)


Pls also fix the error msg in t.Errorf and Fix the test function's name

Suggested change

t.Errorf("GetMountedDatasetSubPath() = %v, want %v", got, tt.want)

t.Errorf("GetPhysicalDatasetSubPath() = %v, want %v", got, tt.want)

TrafalgarZZZ · 2023-06-29T04:11:30Z

pkg/ddc/thin/engine.go

-		physicalDataset = base.GetPhysicalDatasetFromMounts(runtime.Status.Mounts)
+	dataset, err := utils.GetDataset(ctx.Client, runtime.Name, runtime.Namespace)
+	if err != nil {
+		return false, err


We should not return every error here because in cases where len(runtime.Status.Mounts) == 0 && Dataset not found, the func will return error to keep engine building failed.

In this case, CheckReferenceDatasetRuntime can't judge whether this dataset is a reference dataset, so we raise the error now. And this case will be repaired by next PR.

TrafalgarZZZ · 2023-06-29T04:20:32Z

pkg/ddc/thin/referencedataset/engine.go

-		newDataset := mountedDataset.DeepCopy()
-		newDataset.Status.DatasetRef = utils.RemoveString(newDataset.Status.DatasetRef, datasetRefName)
-		err := e.Client.Status().Update(context.TODO(), newDataset)
+	if physicalRuntimeInfo != nil {


Please add log message to indicate physicalRuntimeInfo == nil case so that we can know corner case happened.

Signed-off-by: wangshulin <[email protected]>

TrafalgarZZZ · 2023-06-29T06:04:01Z

pkg/ddc/thin/engine.go

@@ -61,7 +61,7 @@ func Build(id string, ctx cruntime.ReconcileRequestContext) (base.Engine, error)
 		return nil, fmt.Errorf("engine %s is failed due to type conversion", ctx.Name)
 	}

-	isRef, err := CheckReferenceDatasetRuntime(ctx.Client, runtime)
+	isRef, err := CheckReferenceDatasetRuntime(ctx, runtime)


Maybe in future we can simply check len(runtime.profileName) == 0 to indicate whether it is a VirtualRuntime or ThinRuntime instead of checking all the dataset mounts.

cheyang · 2023-06-29T08:03:11Z

/test fluid-e2e

Signed-off-by: wangshulin <[email protected]>

sonarqubecloud · 2023-06-29T08:31:43Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
3 Code Smells

No Coverage information
10.7% Duplication

TrafalgarZZZ

/lgtm

cheyang

/lgtm
/approve

fluid-e2e-bot · 2023-06-29T11:33:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang, TrafalgarZZZ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [TrafalgarZZZ,cheyang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wangshli added 2 commits June 21, 2023 14:48

bugfix: reconcile thinruntime failed when dataset is deleted

d8803fd

Signed-off-by: wangshulin <[email protected]>

fix checkDatasetMountSupport return statements

7c17839

Signed-off-by: wangshulin <[email protected]>

fluid-e2e-bot bot added the needs-ok-to-test label Jun 21, 2023

cheyang reviewed Jun 21, 2023

View reviewed changes

recover reconcileInternal order

b1502eb

Signed-off-by: wangshulin <[email protected]>

cheyang reviewed Jun 21, 2023

View reviewed changes

wangshli added 2 commits June 25, 2023 10:23

rename GetMountedDatasetNamespacedName to GetPhysicalDatasetFromMounts

bab8073

Signed-off-by: wangshulin <[email protected]>

fix CheckReferenceDatasetRuntime

d8b9502

Signed-off-by: wangshulin <[email protected]>

cheyang reviewed Jun 25, 2023

View reviewed changes

wangshli requested a review from cheyang June 25, 2023 09:01

cheyang reviewed Jun 26, 2023

View reviewed changes

add logs in CheckReferenceDatasetRuntime

be22993

Signed-off-by: wangshulin <[email protected]>

wangshli force-pushed the thin-reconcile branch from 194fcf0 to be22993 Compare June 26, 2023 02:16

TrafalgarZZZ requested changes Jun 26, 2023

View reviewed changes

fix getMountedRuntimeInfo return nil pointer

23cfbbb

Signed-off-by: wangshulin <[email protected]>

cheyang reviewed Jun 27, 2023

View reviewed changes

wangshli added 3 commits June 28, 2023 10:38

refactor getPhysicalRuntimeInfo

bab35e0

Signed-off-by: wangshulin <[email protected]>

rename mounted dataset/runtime to physical dataset/runtime for refere…

bfe7774

…ncedataset Signed-off-by: wangshulin <[email protected]>

add ut

0d84355

Signed-off-by: wangshulin <[email protected]>

wangshli requested a review from TrafalgarZZZ June 28, 2023 06:34

refactor CheckReferenceDatasetRuntime

21548bb

Signed-off-by: wangshulin <[email protected]>

TrafalgarZZZ requested changes Jun 29, 2023

View reviewed changes

Add logs

c3a4e09

Signed-off-by: wangshulin <[email protected]>

TrafalgarZZZ reviewed Jun 29, 2023

View reviewed changes

update log

a6a2f41

Signed-off-by: wangshulin <[email protected]>

TrafalgarZZZ reviewed Jun 29, 2023

View reviewed changes

fluid-e2e-bot bot assigned TrafalgarZZZ Jun 29, 2023

fluid-e2e-bot bot added the lgtm label Jun 29, 2023

cheyang approved these changes Jun 29, 2023

View reviewed changes

fluid-e2e-bot bot assigned cheyang Jun 29, 2023

fluid-e2e-bot bot added the approved label Jun 29, 2023

TrafalgarZZZ approved these changes Jun 29, 2023

View reviewed changes

fluid-e2e-bot bot merged commit b56249f into fluid-cloudnative:master Jun 29, 2023

	t.Errorf("GetMountedDatasetSubPath() = %v, want %v", got, tt.want)
	t.Errorf("GetPhysicalDatasetSubPath() = %v, want %v", got, tt.want)

bugfix: reconcile thinruntime failed when dataset is deleted #3300

bugfix: reconcile thinruntime failed when dataset is deleted #3300

Conversation

wangshli commented Jun 21, 2023

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

fluid-e2e-bot bot commented Jun 21, 2023

codecov bot commented Jun 21, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheyang Jun 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheyang commented Jun 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheyang commented Jun 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheyang commented Jun 29, 2023

cheyang commented Jun 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheyang commented Jun 29, 2023

sonarqubecloud bot commented Jun 29, 2023

TrafalgarZZZ left a comment

Choose a reason for hiding this comment

cheyang left a comment

Choose a reason for hiding this comment

fluid-e2e-bot bot commented Jun 29, 2023

codecov bot commented Jun 21, 2023 •

edited

Loading

cheyang Jun 21, 2023 •

edited

Loading