Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47113][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #45193

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null

Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0.

That is: if fs.s3a.endpoint is unset it will stay unset.

The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints

The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set.

Why are the changes needed?

AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to
be set if the region (fs.s3a.region) value is set. This means the spark code to
fix an endpoint is not only un-needed, it causes problems when trying to use specific
storage options (S3 Express) or security options (FIPS)

Does this PR introduce any user-facing change?

Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good.

How was this patch tested?

Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy.

Was this patch authored or co-authored using generative AI tooling?

No

Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null

Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0.

That is: if fs.s3a.endpoint is unset it will stay unset.

The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints

- apache/hadoop#6277

The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set.

AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to
be set if the region (fs.s3a.region) value is set. This means the spark code to
fix an endpoint is not only un-needed, it causes problems when trying to use specific
storage options (S3 Express) or security options (FIPS)

Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good.

Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy.

No

Closes apache#44834 from steveloughran/SPARK-46793-revert-region-fixup-SPARK-35878.

Authored-by: Steve Loughran <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@github-actions github-actions bot added the CORE label Feb 21, 2024
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 21, 2024

This is a clone of the following with a correct new JIRA ID and the original authorship. #44834 was merged and reverted due to the wrong JIRA ID.

$ git log | head -n3
commit d533edccaaa3e76e5f0efecf1576af44c7026917
Author: Steve Loughran <[email protected]>
Date:   Tue Feb 20 22:28:59 2024 -0800

@dongjoon-hyun
Copy link
Member Author

cc @steveloughran and @shameersss1

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 21, 2024

Could you review this PR, @HyukjinKwon ?

Actually, I merged and reverted this code (at #44834) because of the wrong JIRA ID.

Screenshot 2024-02-20 at 23 02 04

@dongjoon-hyun
Copy link
Member Author

Thank you!

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-47113 branch February 21, 2024 07:12
@steveloughran
Copy link
Contributor

sorry, I'd just used the original as it was a revert

@dongjoon-hyun
Copy link
Member Author

No problem. It was my mistake because I mistakenly merged it even when I knew it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants