Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windowing Support for the Dask Runner #23913

Closed
wants to merge 139 commits into from

Conversation

alxmrs
Copy link
Contributor

@alxmrs alxmrs commented Nov 1, 2022

This is currently as work-in-progress. To server near-term goals, the Dask runner needs to support side-inputs. In order to do this, Windowing needs to be supported. This CL adds basic Windowing support to this runner, including a few tests for side inputs.

Reviewers: @pabloem


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

alxmrs and others added 30 commits September 19, 2022 16:34
- CoGroupByKey is broken due to how tags are used with GroupByKey
- GroupByKey should output `[('0', None), ('1', 1)]`, however it actually outputs: [(None, ('1', 1)), (None, ('0', None))]
- Once that is fixed, we may have test pipelines work on Dask.
@github-actions github-actions bot added build and removed build labels Nov 28, 2022
@github-actions github-actions bot added build and removed build labels Nov 30, 2022
@github-actions github-actions bot added build and removed build labels Nov 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2022

Reminder, please take a look at this pr: @AnandInguva

@github-actions
Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @pabloem for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @pabloem

@github-actions
Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@damccorm
Copy link
Contributor

stop reviewer notifications

@damccorm
Copy link
Contributor

(pablo is already reviewing, this one should probably be sticky)

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: requested by reviewer

@pabloem
Copy link
Member

pabloem commented Jan 24, 2023

@alxmrs how do we deal with this one? : )

@alxmrs
Copy link
Contributor Author

alxmrs commented Jan 26, 2023

Hey Pablo! I'm happy to take a look at these dask related issues in a week or two. I imagine that these changes have diverged from the upstream main branch since they were written.

For a sooner timeline, do you have any capacity @cisaacstern or maybe @TomAugspurger? Charles, can you help me understand timelines for when Pangeo would like Dask support in Beam?

@cisaacstern
Copy link
Contributor

cisaacstern commented Jan 26, 2023

Thanks for the ping, @alxmrs, and glad to see this progressing.

I do not have extra bandwidth to look at this, unfortunately.

Regarding Pangeo Forge timelines, for the next 6 months or so at least, and maybe a fair bit longer, our needs are covered by the Dataflow and Flink runners.

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 28, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants