Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5582] improvement(hadoop3-filesystem): Remove configuration fs.gvfs.filesystem.providers from GVFS client. #5634

Merged
merged 9 commits into from
Dec 16, 2024

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Configuration fs.gvfs.filesystem.providers is redundant, so we'd better remove this configuation.

Why are the changes needed?

This configuration is redundant.
Fix: #5582

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Existing tests.

@yuqi1129 yuqi1129 self-assigned this Nov 20, 2024
@yuqi1129 yuqi1129 requested a review from jerryshao November 29, 2024 02:02
@jerryshao
Copy link
Contributor

I think we should not simply delete this configuration, it would be better to 1) make it deprecated; 2) keep the compatible if user set this configuration; 3) warn user that this configuration is deprecated.

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 4, 2024

I think we should not simply delete this configuration, it would be better to 1) make it deprecated; 2) keep the compatible if user set this configuration; 3) warn user that this configuration is deprecated.

I'm not 100% clear on why we only need to deprecate it. The change in PR is a compatibility modification, and the previous one was not very elegant in design.

for the user which uses fs.gvfs.filesystem.providers in 0.7.0 will definitely work in 0.8.0 after this change is merged.

@jerryshao
Copy link
Contributor

I think we should not simply delete this configuration, it would be better to 1) make it deprecated; 2) keep the compatible if user set this configuration; 3) warn user that this configuration is deprecated.

I'm not 100% clear on why we only need to deprecate it. The change in PR is a compatibility modification, and the previous one was not very elegant in design.

for the user which uses fs.gvfs.filesystem.providers in 0.7.0 will definitely work in 0.8.0 after this change is merged.

Can you please explain more why it is compatible?

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 4, 2024

I think we should not simply delete this configuration, it would be better to 1) make it deprecated; 2) keep the compatible if user set this configuration; 3) warn user that this configuration is deprecated.

I'm not 100% clear on why we only need to deprecate it. The change in PR is a compatibility modification, and the previous one was not very elegant in design.
for the user which uses fs.gvfs.filesystem.providers in 0.7.0 will definitely work in 0.8.0 after this change is merged.

Can you please explain more why it is compatible?

In 0.7.0, If the user does not specify fs.gvfs.filesystem.providers , it won't work even though the corresponding bundle jar has been placed in the classpath.

After this PR is merged, if users:

  1. Assign fs.gvfs.filesystem.providers a value like s3, this configuration will not take effect, as long as the AWS-bundle are in the classpath, the GVFS client can always use it.
  2. Do not set fs.gvfs.filesystem.providers a value, the GVFS client will automatically load all file system providers in the classpath, things are also okay.

The only difference is that if users put aws-bundle and gcs-bundle in the classpath, and then they do not assign a value to fs.gvfs.filesystem.providers, Gravitino will throw exceptions in 0.7 but work well in 0.8.0. In my opinion, the behavior in 0.8.0 is more acceptable and reasonable. GVFS clients should be transparent to file system providers.

Please correct me if I'm wrong, thanks.

@jerryshao
Copy link
Contributor

I think we should not simply delete this configuration, it would be better to 1) make it deprecated; 2) keep the compatible if user set this configuration; 3) warn user that this configuration is deprecated.

I'm not 100% clear on why we only need to deprecate it. The change in PR is a compatibility modification, and the previous one was not very elegant in design.
for the user which uses fs.gvfs.filesystem.providers in 0.7.0 will definitely work in 0.8.0 after this change is merged.

Can you please explain more why it is compatible?

In 0.7.0, If the user does not specify fs.gvfs.filesystem.providers , it won't work even though the corresponding bundle jar has been placed in the classpath.

After this PR is merged, if users:

  1. Assign fs.gvfs.filesystem.providers a value like s3, this configuration will not take effect, as long as the AWS-bundle are in the classpath, the GVFS client can always use it.
  2. Do not set fs.gvfs.filesystem.providers a value, the GVFS client will automatically load all file system providers in the classpath, things are also okay.

The only difference is that if users put aws-bundle and gcs-bundle in the classpath, and then they do not assign a value to fs.gvfs.filesystem.providers, Gravitino will throw exceptions in 0.7 but work well in 0.8.0. In my opinion, the behavior in 0.8.0 is more acceptable and reasonable. GVFS clients should be transparent to file system providers.

Please correct me if I'm wrong, thanks.

I see, thanks for the explanation.

@@ -41,12 +41,27 @@ public static Map<String, FileSystemProvider> getFileSystemProviders(String file
ServiceLoader<FileSystemProvider> allFileSystemProviders =
ServiceLoader.load(FileSystemProvider.class);

if (null == fileSystemProviders) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already removed this fileSystemProviders configuration, why do we still need to pass in this argument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this method since we load all the fs providers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will be also utilized by Gravitino server, in the server side, file system providers are required if we want to access cloud storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you to figure out a better way, use null value to check and differentiate different code path is not a good way.

}
resultMap.put(fileSystemProvider.scheme(), fileSystemProvider);
});
return resultMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to add HDFS and local FS provider here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceLoader<FileSystemProvider> allFileSystemProviders =
        ServiceLoader.load(FileSystemProvider.class);

The code above will load HDFS and Local FS automatically.

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 4, 2024

FileSystemProvider is defined in moudle catalog-hadoop, which is why module filesystem-hadoop3 depends on catalog-hadoop, we can remove this dependency when we move FileSystemProvider to a common module like hadooop-common, is that okay for you?

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 4, 2024

FileSystemProvider is defined in moudle catalog-hadoop, which is why module filesystem-hadoop3 depends on catalog-hadoop, we can remove this dependency when we move FileSystemProvider to a common module like hadooop-common, is that okay for you?

@jerryshao

@jerryshao
Copy link
Contributor

FileSystemProvider is defined in moudle catalog-hadoop, which is why module filesystem-hadoop3 depends on catalog-hadoop, we can remove this dependency when we move FileSystemProvider to a common module like hadooop-common, is that okay for you?

Yeah, this looks reasonable.

Copy link
Contributor

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm.

@yuqi1129 yuqi1129 requested a review from jerryshao December 10, 2024 12:56
implementation(project(":common")) {
exclude(group = "*")
}

implementation(project(":catalogs:hadoop-common")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this alphabetically ordered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -26,7 +26,8 @@ plugins {
dependencies {
compileOnly(project(":clients:client-java-runtime", configuration = "shadow"))
compileOnly(libs.hadoop3.common)
implementation(project(":catalogs:catalog-hadoop")) {

implementation(project(":catalogs:hadoop-common")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jerryshao jerryshao merged commit 1d92626 into apache:main Dec 16, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Remove fs.gvfs.filesystem.providers from GVFS client.
3 participants