[clone]refactor the clone action as we introduced external path #4844

neuyilan · 2025-01-06T13:06:33Z

Purpose

https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management

refactor the clone action as we introduced the external path.

I want to point out that regardless of where the data in the source table is stored (warehouse path or external path). We will all copy the data to the warehouse path of the target table.

If we still use the external path of the source table as the data path in target table. In that case, the data from the source table and the target table will be merged together.
what's your opinion？

Tests

Add CloneActionITCase.testCloneTableWithSourceTableExternalPath

API and Format

no

Documentation

JingsongLi · 2025-01-07T07:11:43Z

I feel that the current clone process needs to be refactored:

Single parallelism to query all manifest files and copy the manifest list file.
Read the manifest in a distributed parallelism, determine whether to rewrite it (with or without an external path), and complete the copy or rewrite of the manifest.
shuffle by data file name.
Distributed copy data files.

This hierarchical approach to copying is the correct solution.

neuyilan · 2025-01-07T11:23:54Z

I feel that the current clone process needs to be refactored:

Single parallelism to query all manifest files and copy the manifest list file.

Read the manifest in a distributed parallelism, determine whether to rewrite it (with or without an external path), and complete the copy or rewrite of the manifest.

shuffle by data file name.

Distributed copy data files.

This hierarchical approach to copying is the correct solution.

Thanks for your advice, I will try do this best.

neuyilan · 2025-01-07T16:13:02Z

Hi, Jingsong, according to the original design[1] and the above discussion, I plan to refactore to the following Flink batch job.

The first stage is responsible for pick the tables need cloned.If the database parameter is not passed, then all tables of all databases will be cloned.If the table parameter is not passed, then all tables of the database will be cloned. (not changed, the same as the original design).
The second stage pick related files(Snapshot, Schema, ManifestList, Manifest, Datafile, ChangeLog, IndexFile) of the snapshot in source table.(not changed, the same as the original design).
The thrid stage is only copy the schema files to the target path. the schema files contains: Snapshot, Schema, ManifestList and IndexFile.
The fourth stage mainly involves copying or rewriting the manifest file in distributed parallelism. If it is an external path, rewrite it; otherwise, copy it.
Shuffle the data file by the filename.(data file contains Datafile and ChangeLog).
The fifth stage is copy the data files in distributed parallelism.
Shuffle by the target's table name to next stage.
The sixth stage is recreate the snapshot hint file. (not changed, the same as the original design).

Please help confirm if this refactoring is appropriate, Thanks.

[1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Action+and+Procedure

JingsongLi · 2025-01-08T08:20:15Z

Hi @neuyilan , thanks for your design!

The second stage, I think we can just pick manifests. We don't need to pick files here.

neuyilan · 2025-01-08T09:22:33Z

The second stage, I think we can just pick manifests. We don't need to pick files here.

Hi, @JingsongLi ,
if we only pick the manifests files in second stage, when we copy the Snapshot, Schema and IndexFile files, do you mean that we only pass one snapshot ID upstream and downstream, and then pick the required files at each step, and then copy the corresponding files?

The original design was to pick out all files and then copy the corresponding files according to the file type at each step.

JingsongLi · 2025-01-09T00:38:12Z

The second stage, I think we can just pick manifests. We don't need to pick files here.

Hi, @JingsongLi , if we only pick the manifests files in second stage, when we copy the Snapshot, Schema and IndexFile files, do you mean that we only pass one snapshot ID upstream and downstream, and then pick the required files at each step, and then copy the corresponding files?

The original design was to pick out all files and then copy the corresponding files according to the file type at each step.

Yes, I think we can refactor it now.

neuyilan · 2025-01-09T03:53:02Z

Hi, @JingsongLi , thanks again for advice, and I have refactored to the following Flink batch job, please review it again. Thanks.

The first stage is responsible for pick the tables need cloned.If the database parameter is not passed, then all tables of all databases will be cloned.If the table parameter is not passed, then all tables of the database will be cloned. (not changed, the same as the original design).
The second stage just pick the schema files and copy it to the target path, the schema file contains Snapshot, Schema, ManifestList and IndexFile.
The thrid stage just pick the mainifest file in single parallelism.
The fourth stage mainly involves copying or rewriting the manifest file in distributed parallelism. If it is an external path, rewrite it; otherwise, copy it.
The fifth stage is picking all the data files in single parallelism. (data file contains Datafile and ChangeLog).
Shuffle the data file by the filename.
The sixth stage is copy the data files in distributed parallelism.
Shuffle by the target's table name to next stage.
The seventh stage is recreate the snapshot hint file. (not changed, the same as the original design).

wwj6591812 · 2025-01-09T16:22:34Z

@neuyilan
Very thanks for prepare this PR.
I think change the job topology like this has no problem. And "pick the required files at each step, then copy the corresponding files" not only more clearer, but also increases the scalability.
Only one small question, why you emphasize this refactor only for batch job？Why don't modify the stream job's topology as same as the batch job?

neuyilan · 2025-01-10T02:45:13Z

Only one small question, why you emphasize this refactor only for batch job？Why don't modify the stream job's topology as same as the batch job?

Hi, @wwj6591812, Thanks for remind, I had a misunderstanding before. After this modification, both batch job and stream job will be affected. Is that right?

neuyilan · 2025-01-13T02:11:30Z

@JingsongLi @wwj6591812 PTAL, Thanks.

JingsongLi · 2025-01-14T02:23:29Z

paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java

@@ -484,6 +484,28 @@ public DataFileMeta copy(List<String> newExtraFiles) {
                externalPath);
    }

+    public DataFileMeta copy(String newExternalPath) {


copy => newExternalPath(String newExternalPath)

JingsongLi · 2025-01-14T02:25:01Z

paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java

@@ -53,10 +62,25 @@ public String getTargetIdentifier() {
        return targetIdentifier;
    }

+    @Nullable
+    public FileType getFileType() {


Please remove useless field.

JingsongLi · 2025-01-14T07:14:19Z

paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java

    private final String sourceIdentifier;
    private final String targetIdentifier;
+    private final long snapshotId;


Where will this variable be used?

This variable is used to transfer selections between upstream and downstream. For example, when creating the schema file, the latest snapshotid of the snapshot is determined. When selecting the data file later, the snapshot id is used for selection. Because in this process, the latest snapshot may change. If the snapshot id is not passed, the cloned schema file and data file may not be from the same snapshot.

JingsongLi · 2025-01-17T06:12:14Z

...link-common/src/main/java/org/apache/paimon/flink/clone/PickSchemaFilesForCloneOperator.java

@@ -50,18 +50,19 @@
 * Pick the files to be cloned of a table based on the input record. The record type it produce is
 * CloneFileInfo that indicate the information of copy file.
 */
-public class PickFilesForCloneOperator extends AbstractStreamOperator<CloneFileInfo>
+public class PickSchemaFilesForCloneOperator extends AbstractStreamOperator<CloneFileInfo>


CopyMetaFilesOperator (One parallelism)?

In this operator, just copy meta files directly:

create table if needed.

copy all schema files.

copy snapshot file.

copy manifest list files.

copy index manifest file.

copy statistics file.

And then, one link, send index files to one CopyIndexFilesOperator (Multiple parallelism).
And another link, send manifest files to CopyManifestFilesOperator (Multiple parallelism).

JingsongLi · 2025-01-17T06:13:55Z

paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java

    private final String sourceIdentifier;
    private final String targetIdentifier;
+    private final long snapshotId;


Please remove this field, see my comments for operators.

I think we can not remove this filed. because if do not provided the snapshotId. when we do job in CopyManifestFilesOperator, we can not just pick the data files in manifest file, because the data files maybe delete in another mainifest file.

for example, in snapshot1, we add one data file data-file1.parquet in manifest-file1 ; in snapshot 2, we add one data file data-file2.parquet and delete data-file1.parquet in manifest-file2. And these two manifest files were processed in two separate tasks, when processing manifest-file1 and copy data-file1.parquet, the job will fail.

So we can not just pick the data files in manifest file. I think we still need the snapshot id. What do you think?

I think we don't need to worry about deleting files in the manifest anymore. We just need to see which data files we need. Although we may copy extra files, it won't cause any problems

The problem is that, when snapshot1 is expired, then the data-file1.parquet will be deleted. when we read the manifest-file1 and copy data-file1.parquet, it will always fail, because the file do not exist.

JingsongLi · 2025-01-17T06:15:32Z

...aimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CopyManifestFileOperator.java

+        FileStore<?> store = sourceTable.store();
+        ManifestFile manifestFile = store.manifestFileFactory().create();
+
+        List<ManifestEntry> manifestEntries =


Why not just emit data files to downstream here?

fix the clone tests

644594d

neuyilan marked this pull request as draft January 6, 2025 13:06

neuyilan changed the title ~~[clone]fix the clone when we introduced external path~~ [clone]fix the clone action when we introduced external path Jan 6, 2025

refactor the clone job

aeeb30f

neuyilan changed the title ~~[clone]fix the clone action when we introduced external path~~ [clone]refactor the clone action as we introduced external path Jan 10, 2025

neuyilan added 2 commits January 10, 2025 11:41

add clone it for external path table

31bf4e7

remove useless codes

80a034f

neuyilan marked this pull request as ready for review January 10, 2025 03:48

JingsongLi reviewed Jan 14, 2025

View reviewed changes

remove FileType

0f96789

JingsongLi reviewed Jan 14, 2025

View reviewed changes

neuyilan requested a review from JingsongLi January 14, 2025 16:01

merge master

3e47968

neuyilan closed this Jan 16, 2025

neuyilan reopened this Jan 16, 2025

neuyilan closed this Jan 17, 2025

neuyilan reopened this Jan 17, 2025

JingsongLi reviewed Jan 17, 2025

View reviewed changes

neuyilan added 2 commits January 17, 2025 17:40

refactor the clone action

bb5523a

ignore the testCloneTableWithExpiration

902e659

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clone]refactor the clone action as we introduced external path #4844

[clone]refactor the clone action as we introduced external path #4844

neuyilan commented Jan 6, 2025 •

edited

Loading

JingsongLi commented Jan 7, 2025

neuyilan commented Jan 7, 2025

neuyilan commented Jan 7, 2025

JingsongLi commented Jan 8, 2025

neuyilan commented Jan 8, 2025

JingsongLi commented Jan 9, 2025

neuyilan commented Jan 9, 2025

wwj6591812 commented Jan 9, 2025

neuyilan commented Jan 10, 2025

neuyilan commented Jan 13, 2025

JingsongLi Jan 14, 2025

neuyilan Jan 14, 2025

JingsongLi Jan 14, 2025

neuyilan Jan 14, 2025

JingsongLi Jan 14, 2025

neuyilan Jan 14, 2025

JingsongLi Jan 17, 2025

JingsongLi Jan 17, 2025

neuyilan Jan 18, 2025

JingsongLi Jan 20, 2025

neuyilan Jan 20, 2025

JingsongLi Jan 17, 2025

[clone]refactor the clone action as we introduced external path #4844

Are you sure you want to change the base?

[clone]refactor the clone action as we introduced external path #4844

Conversation

neuyilan commented Jan 6, 2025 • edited Loading

Purpose

Tests

API and Format

Documentation

JingsongLi commented Jan 7, 2025

neuyilan commented Jan 7, 2025

neuyilan commented Jan 7, 2025

JingsongLi commented Jan 8, 2025

neuyilan commented Jan 8, 2025

JingsongLi commented Jan 9, 2025

neuyilan commented Jan 9, 2025

wwj6591812 commented Jan 9, 2025

neuyilan commented Jan 10, 2025

neuyilan commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neuyilan commented Jan 6, 2025 •

edited

Loading