-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve][Connector-V2][Iceberg] Add hadoop s3 catalog e2e testcase #5745
Conversation
931af66
to
b1c6e40
Compare
b1c6e40
to
7ceedd0
Compare
cc @ic4y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cc @ic4y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a test case for this feature?
I don't think it's going to be easy because it requires an S3 bucket. |
I think we can create a minio container in e2e test case for s3a file read. |
@EricJoy2048 |
21f2ce6
to
2dfb53a
Compare
I created minio test container and added e2e test case. I think I need to add hadoop-aws/aws-java-sdk jar file to the test container docker image and need your help.
I'm also looking to see if there are other workarounds. |
Thanks @4chicat , I will take a look in the few days. |
98537cc
to
1e195be
Compare
1e195be
to
4cdebc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connector-iceberg-s3-e2e
e2e case was written based on connector-iceberg-hadoop3-e2e
.
This file was created by copying the file below.
https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-hadoop3-e2e/src/test/resources/log4j.properties
Should I delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connector-iceberg-s3-e2e
e2e case was written based onconnector-iceberg-hadoop3-e2e
. This file was created by copying the file below. https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-hadoop3-e2e/src/test/resources/log4j.propertiesShould I delete it?
It is already included in the engine and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connector-iceberg-s3-e2e
e2e case was written based onconnector-iceberg-hadoop3-e2e
. This file was created by copying the file below. https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-iceberg-hadoop3-e2e/src/test/resources/log4j.properties
Should I delete it?It is already included in the engine and can be removed
Removed log4j.properties
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @Hisoka-X
Look like this PR bring new leak thread in seatunnel server. Can you try closing these leaking threads? If not, please add it to the known issues list. Line 291 in 92f847c
|
d6536b1
to
2fb59ec
Compare
public static List<TestContainer> discoverTestSpark3Containers() { | ||
return discoverTestContainers().stream() | ||
.filter(testContainer -> testContainer instanceof Spark3Container) | ||
.collect(Collectors.toList()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been removed.
|| s.contains("java-sdk-http-connection-reaper") | ||
|| s.contains("Timer for 's3a-file-system' metrics system") | ||
|| s.startsWith("MutableQuantiles-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to isIssueWeAlreadyKnow
method.
2078dff
to
cedd449
Compare
cedd449
to
97222fc
Compare
This PR is very valuable and I think we need to look at it again and review it. |
97222fc
to
861a07c
Compare
@DisabledOnContainer( | ||
value = {TestContainerId.SPARK_2_4}, | ||
type = {EngineType.FLINK}, | ||
disabledReason = "Needs hadoop-aws,aws-java-sdk jar for flink and spark2.4.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me annotate this section in the usage document and provide an example for AWS S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. Added to document.
// Iceberg S3 Hadoop catalog | ||
|| threadName.contains("java-sdk-http-connection-reaper") | ||
|| threadName.contains("Timer for 's3a-file-system' metrics system") | ||
|| threadName.startsWith("MutableQuantiles-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hisoka-X cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
<dependency> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-aws</artifactId> | ||
<version>${hadoop3.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @EricJoy2048
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add hadoop-aws
to hadoop shade jar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to not add hadoop-aws
.
Like the driver, you can add
|
[Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # bugfix [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # add e2e test case [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # add @DisabledOnContainer [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # remove log4j.properties [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # add removeSystemThread [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # move and remove code [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # resolve conflict [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # mvn spotless:apply [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # modify IcebergSourceIT [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # modify doc iceberg source [Improve][Connector-V2][Iceberg] Support for S3 in hadoop catalog # copy jars to test container docker
d321ae0
to
1a89e8e
Compare
…isabled seatunnel engine
6624f28
to
fe14b3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose of this pull request
We use S3 as iceberg warehouse storage.
If
catalog_type
ishadoop
, only hdfs is supported, so it is being customized and used.I want to contribute my customized code, so I push a pull request.
Does this PR introduce any user-facing change?
we can use the iceberg table saved in s3.
config example
How was this patch tested?
add e2e test case :
connector-iceberg-s3-e2e
Check list
New License Guide
release-note
.