Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9702] Update Java KinesisIO.Read to support AWS SDK v2 #11318

Merged
merged 2 commits into from
Jul 27, 2020

Conversation

jfarr
Copy link
Contributor

@jfarr jfarr commented Apr 6, 2020

Hi @aromanenko-dev, I thought it would be a good idea to separate #9899 into two PRs so that users can potentially benefit from the update to use the v2 AWS SDK while enhanced fanout support is still being worked on. lmkwyt

FYI, I found these commands helpful for comparing v1 vs. v2:

diff -u -r sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/ > diff-main.txt
diff -u -r sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/ sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/kinesis/ > diff-test.txt
diff -u sdks/java/io/kinesis/build.gradle sdks/java/io/amazon-web-services2/build.gradle

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@aromanenko-dev
Copy link
Contributor

Yes, I agree that it will make sense to split it into several PR for clearance. In the same time, I'm more concerned that with two packages we will have a lot of duplicated code there.

@jfarr
Copy link
Contributor Author

jfarr commented Apr 7, 2020

Hi @aromanenko-dev, I share your concern. However, that was always going to be the case, splitting it into multiple PRs doesn’t change that, and it’s already the case for other AWS IOs such as SNS and DynamoDB. What do you propose for a solution?

@iemejia iemejia requested a review from aromanenko-dev April 8, 2020 14:17
@aromanenko-dev
Copy link
Contributor

@jfarr Agree, but I don't have a perfect solution except to extract and to wrap all AWS depended code into own interfaces and then use them to work with AWS. All what I'm afraid is to lose all latest fixes and improvements that we done for old KinesisIO and probably will be done in the future since people actively use it.

@jfarr
Copy link
Contributor Author

jfarr commented Apr 11, 2020

Hi, @aromanenko-dev. That sounds like a significant rewrite of the existing KinesisIO. Should the other AWS IOs be redesigned to fit that model too, or where does that stop? I actually like the idea, just trying to clarify what you have in mind.

And don't you think that the same clearance logic applies here? A rewrite is going to take some time, probably a significant amount. Why not get this PR into the hands of the community so they can start using it now and then do the refactor in a separate PR?

@aromanenko-dev
Copy link
Contributor

@jfarr I agree with your point but I'm afraid that once we will have two versions of KinesisIO (for AWS SDK 1 & 2), that are based on the large amount of the same code but existing in two different packages , then we will need to always keep in mind to do code changes in two places. I'd definitively want to avoid it since these KinesisIO will diverge quickly and I'm not sure that we will deprecate and remove the AWS SDK V1 version soon.

Well, maybe it will be a good point to discuss this on dev@ mailing list and to hear what the community think about it.

@jfarr
Copy link
Contributor Author

jfarr commented Apr 18, 2020

@aromanenko-dev Good idea, I started a new thread.

@aromanenko-dev
Copy link
Contributor

@jfarr I think that the main conclusion of this thread (as @iemejia properly reminded there) is that we should start the deprecation of AWS SDK V1 based IOs. So I agree that it won't make sense to extract any common code and create the new abstractions to support both versions of AWS SDKs.

In the same time, to finish this PR, please update the code with the latest changes in KinesisIOV1 to keep it up-to-date (if any.)

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Jul 24, 2020

@jfarr Sorry for delay with a response, I took another look and it LGTM. Could you just rebase one more time against master and after we can merge it (if all checks pass)?

@aromanenko-dev
Copy link
Contributor

@jfarr Btw, do you think we could implement KinesisIO.Write with AWS SDK v2 in effective way (since KPL is still on SDL v1)?

@jfarr
Copy link
Contributor Author

jfarr commented Jul 26, 2020

Hi @aromanenko-dev, I ported over a few more missing commits from v1 and rebased:

9e10b54
1e182c6
ecedd3e
248b794

Please LMK when you're done reviewing the changes and I'll squash them.

@jfarr
Copy link
Contributor Author

jfarr commented Jul 26, 2020

@jfarr Btw, do you think we could implement KinesisIO.Write with AWS SDK v2 in effective way (since KPL is still on SDL v1)?

I'm sure it's possible but it seems like it would basically entail porting the KPL to AWS v2 ourselves, so I'm not sure if it's worth the effort. What would be the value gained, besides having the reader and writer together in the same class again? In my opinion I've always felt that the way Beam IOs combine readers and writers in the same class violates single-responsibility principle, with predictable results such as this.

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I squashed and merged

@aromanenko-dev aromanenko-dev merged commit 22822b8 into apache:master Jul 27, 2020
@aromanenko-dev
Copy link
Contributor

@jkff The reason why I'm asking about Write is because we are going to deprecate and finally remove AWS SDK v1 IOs in the future. Though, we still need to have "write" part but seems we need to wait for KPL v2 (at least for old KinesisIO).

Generally, it's not required to have Read and Write part in the same class, it's usually done just for users convenience to have them under the same name of main class and avoid confusion.

@aromanenko-dev aromanenko-dev changed the title [BEAM-9702] Update Java KinesisIO to support AWS SDK v2 [BEAM-9702] Update Java KinesisIO.Read to support AWS SDK v2 Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants