-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache Beam Executor #225
Apache Beam Executor #225
Conversation
Edit: When I wrote the above, I didn't notice the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conda forge wants a dash in the name apache-beam
: https://github.com/conda-forge/apache-beam-feedstock
I merged my suggested change to the py3.8.yml environment. This turned up an error in the beam executor: https://github.com/pangeo-forge/pangeo-forge-recipes/runs/4138427592?check_suite_focus=true#step:8:2443 |
…parently, `func` is also an internal variable name.
Co-authored-by: Ryan Abernathey <[email protected]>
Thanks, I should have caught that locally. Apparently, a variable name I chose to meet the linter created a type check conflict with Beam. |
I think this has now turned up an upstream issue with Xarray. (0.20.0 was just released) I'll make a PR for that fix. |
For some reason, the 3.9 conda environment was taking over an hour to build. I'm restarting the build. |
For some reason, the python 3.9 environment seems to be unsolvable. https://github.com/pangeo-forge/pangeo-forge-recipes/runs/4143328072?check_suite_focus=true#step:5:302 There must be a version conflict somewhere, but I don't know how to debug it. Any ideas? |
mamba tends to give better error messages when there are conflicts. That said, mamba solves fine for me locally (linux). Maybe try using mamba if you aren't already? https://github.com/conda-incubator/setup-miniconda#example-6-mamba |
A PR to switch the CI to Mamba would be most welcome! |
It just occurred to me: Apache Beam only officially supports Python 3.8 -- it doesn't yet have support for 3.9, though it's on the roadmap. |
If you merge master (now that #226 is in), the tests should pass here. Just remove beam from the 3.9 env if it is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Alex! Just one place I'd like to tighten up.
…ort problems (now, using pytest fixtures).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great! We are almost there. Just a few tiny comments then I promise we can stop iterating and merge this.
The pangeo-forge pipeline model is adapted to run on Apache Beam. Here, we use dummy stage numbers ("steps") as the input and output PCollections needed to work with Beam's programming model. Where substantive inputs need to be mapped to stages with arguments, "mappables" are combined with these steps. The result is that each stage executes serially, and mappable stages execute in parallel.
Fixes #169.