Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update pytorch, tensorflow, r, and python images #48

Merged
merged 15 commits into from
Aug 19, 2022

Conversation

ngam
Copy link
Contributor

@ngam ngam commented Aug 10, 2022

No description provided.

@ngam ngam mentioned this pull request Aug 10, 2022
4 tasks
@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

Would you like me to address this?

  • We can probably remove these and add them to the conda env. Instead of doing this now, I will wait until these essential updates go through and I can submit another PR later for cleanup/tidying.
    odc-algo>=0.2.0a3
    odc-stac>=0.2.0a6
    azure-data-tables
    stac-geoparquet
  • In e31fc28, I aggressively pushed the base-image. Hopefully this doesn't break anything. Please feel free to push into my branch if you'd like to get this to an appropriate place to merge :)

@ngam ngam changed the title update pytorch to latest update pytorch, tensorflow, and python images Aug 15, 2022
@TomAugspurger
Copy link

Thanks for offering to handle the requirements.txt. Agreed with leaving that to a followup (I need to submit a PR to staged-recipes for stac-geoparquet).

Happy to push the base image as far as possible. I'll be running our integration tests after this is merged.

@TomAugspurger
Copy link

I suspect that the secrets we're using aren't available on PRs. I spent a bit of time a while ago trying to figure something out but didn't come up with an elegant solution that didn't involve copying a lot of GitHub Actions configuration.

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

I suspect that the secrets we're using aren't available on PRs. I spent a bit of time a while ago trying to figure something out but didn't come up with an elegant solution that didn't involve copying a lot of GitHub Actions configuration.

Let me try to run these tests quickly manually in a devcontainer and will let you know if you should run your internal tests

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

Let me also try to add a separate PR-safe github actions... should be doable by automatically login in to ghcr.io registry but not saving/publishing anything

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

tests on python pass, but tf fails with installing the pip packages (because it cannot build them from source due to the lack of gcc)

@ngam ➜ /workspaces/planetary-computer-containers (pt_) $ docker build -f ./python/Dockerfile ./python/
[+] Building 490.0s (14/14) FINISHED                                                                                           
 => [internal] load build definition from Dockerfile                                                                      0.2s
 => => transferring dockerfile: 74B                                                                                       0.0s
 => [internal] load .dockerignore                                                                                         0.4s
 => => transferring context: 2B                                                                                           0.0s
 => [internal] load metadata for docker.io/pangeo/base-image:2022.07.27                                                   0.2s
 => [internal] load build context                                                                                         0.2s
 => => transferring context: 78.27kB                                                                                      0.0s
 => CACHED [1/1] FROM docker.io/pangeo/base-image:2022.07.27@sha256:e897d695b7f0398067984cfd027af9b27a0e70b4fd094e580fe3  0.0s
 => [2/1] COPY --chown=jovyan:jovyan . /home/jovyan                                                                       0.6s
 => [3/1] RUN echo "Checking for 'binder' or '.binder' subfolder"         ; if [ -d binder ] ; then         echo "Using   0.8s
 => [4/1] RUN echo "Checking for 'apt.txt'..."         ; [ -d binder ] && cd binder         ; [ -d .binder ] && cd .bin  28.4s 
 => [5/1] RUN echo "Checking for 'jupyter_notebook_config.py'..."         ; [ -d binder ] && cd binder         ; [ -d .b  0.7s 
 => [6/1] RUN echo "Checking for 'conda-linux-64.lock' or 'environment.yml'..."         ; [ -d binder ] && cd binder    286.5s 
 => [7/1] RUN echo "Checking for pip 'requirements.txt'..."         ; [ -d binder ] && cd binder         ; [ -d .binder   9.3s 
 => [8/1] RUN echo "Checking for 'postBuild'..."         ; [ -d binder ] && cd binder         ; [ -d .binder ] && cd .bi  0.7s 
 => [9/1] RUN echo "Checking for 'start'..."         ; [ -d binder ] && cd binder         ; [ -d .binder ] && cd .binder  0.8s 
 => exporting to image                                                                                                  160.9s 
 => => exporting layers                                                                                                 160.8s 
 => => writing image sha256:fdd9c7d88bed5e6cf9efd8a90edc78970fba19d70cde3abd58f0550a17d3e7c0                              0.0s 
@ngam ➜ /workspaces/planetary-computer-containers (pt_ ✗) $ docker ps                                                          
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
@ngam ➜ /workspaces/planetary-computer-containers (pt_) $ docker images
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
<none>       <none>    fdd9c7d88bed   4 minutes ago   3.41GB
@ngam ➜ /workspaces/planetary-computer-containers (pt_) $ 
@ngam ➜ /workspaces/planetary-computer-containers (pt_) $ docker run --rm -v ${PWD}/.github/workflows/scripts:/scripts fdd9c7d88bed  /scripts/python

@TomAugspurger
Copy link

Looks like odc-stac / odc-algo (which IIRC are pulling in the psycopg2 dependency) are available through conda-forge, so I've moved them over to the environment.yaml in d6d89f1
.

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

I don't like what these pip deps are doing...

#12 8.964 Installing collected packages: mpmath, sympy, humanfriendly, click, dask, coloredlogs, onnxruntime-gpu, distributed, dask-cuda, azure-data-tables, stac-geoparquet
[3364](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3367)
#12 17.44   Attempting uninstall: click
[3365](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3368)
#12 17.44     Found existing installation: click 8.1.3
[3366](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3369)
#12 17.45     Uninstalling click-8.1.3:
[3367](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3370)
#12 17.47       Successfully uninstalled click-8.1.3
[3368](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3371)
#12 17.54   Attempting uninstall: dask
[3369](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3372)
#12 17.54     Found existing installation: dask 2022.8.0
[3370](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3373)
#12 17.63     Uninstalling dask-2022.8.0:
[3371](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3374)
#12 17.69       Successfully uninstalled dask-2022.8.0
[3372](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3375)
#12 21.28   Attempting uninstall: distributed
[3373](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3376)
#12 21.28     Found existing installation: distributed 2022.8.0
[3374](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3377)
#12 21.34     Uninstalling distributed-2022.8.0:
[3375](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3378)
#12 21.39       Successfully uninstalled distributed-2022.8.0
[3376](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3379)
#12 22.07 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
[3377](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3380)
#12 22.07 cmip6-preprocessing 0.6.0 requires pint-xarray, which is not installed.
[3378](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3381)
#12 22.07 xclim 0.37.0 requires Click>=8.1, but you have click 8.0.4 which is incompatible.
[3379](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3382)
#12 22.07 intake-esm 2021.8.17 requires h5netcdf>=0.8.1, but you have h5netcdf 0.0.0 which is incompatible.
[3380](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3383)
#12 22.07 dask-gateway 2022.6.1 requires click>=8.1.3, but you have click 8.0.4 which is incompatible.
[3381](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3384)
#12 22.07 cmip6-preprocessing 0.6.0 requires xgcm<0.7.0, but you have xgcm 0.8.0 which is incompatible.
[3382](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3385)
#12 22.07 Successfully installed azure-data-tables-12.4.0 click-8.0.4 coloredlogs-15.0.1 dask-2022.5.2 dask-cuda-22.6.0 distributed-2022.5.2 humanfriendly-10.0 mpmath-1.2.1 onnxruntime-gpu-1.12.1 stac-geoparquet-0.1.0 sympy-1.10.1
[3383](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3386)
#12 DONE 24.4s
[3384](https://github.com/microsoft/planetary-computer-containers/runs/7846092164?check_suite_focus=true#step:6:3387)

Maybe related to this: #40

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

Let me initiate adding these to conda-forge tonight... unless you want to do that yourself...

@ngam
Copy link
Contributor Author

ngam commented Aug 15, 2022

I will try to have a go at enabling onnxruntime-gpu in conda-forge, but that seems like a tough cookie conda-forge/onnxruntime-feedstock#7

@ngam ngam changed the title update pytorch, tensorflow, and python images update pytorch, tensorflow, r, and python images Aug 16, 2022
@TomAugspurger
Copy link

Don't worry too much about the QGIS build if that's failing.

@ngam
Copy link
Contributor Author

ngam commented Aug 16, 2022

Well, we have this failure now

/srv/conda/envs/notebook/lib/python3.9/site-packages/dask_cuda/cuda_worker.py:18: FutureWarning: parse_bytes is deprecated and will be removed in a future release. Please use dask.utils.parse_bytes instead.
[58](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:59)
  from distributed.utils import parse_bytes
[59](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:60)
Traceback (most recent call last):
[60](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:61)
  File "/scripts/gpu-pytorch", line 17, in <module>
[61](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:62)
    import dask_cuda
[62](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:63)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/dask_cuda/__init__.py", line 5, in <module>
[63](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:64)
    from .cuda_worker import CUDAWorker
[64](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:65)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/dask_cuda/cuda_worker.py", line 19, in <module>
[65](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:66)
    from distributed.worker import parse_memory_limit
[66](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:67)
ImportError: cannot import name 'parse_memory_limit' from 'distributed.worker' (/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/worker.py)
[67](https://github.com/microsoft/planetary-computer-containers/runs/7865754228?check_suite_focus=true#step:7:68)
Error: Process completed with exit code 1.

No idea what's going on

@TomAugspurger
Copy link

Looks like dask-cuda (perhaps) requires dask>=2022.7.1, while we've installed 2022.6.0 from conda-forge. Perhaps if you try pangeo-notebook==2022.08.08?

@ngam
Copy link
Contributor Author

ngam commented Aug 18, 2022

@TomAugspurger could you please allow testing in PRs without having to wait for approval? It would make iterating easier. Nonetheless, this latest commit should fix things, hopefully.

@TomAugspurger
Copy link

I think that's a GitHub limitation / feature and isn't something I can change.

@ngam
Copy link
Contributor Author

ngam commented Aug 18, 2022

maybe it is microsoft-wide policy... but it should be under actions / general settings...

_

@ngam
Copy link
Contributor Author

ngam commented Aug 18, 2022

Anyway, have a look. I found that this combo was the "default" one that would work:

 - dask==2022.3.0
 - dask-cuda==22.4.0
 - dask-core==2022.3.0
 - distributed==2022.3.0

default simply means, conda create -n testing cuda-dask gets these

@ngam
Copy link
Contributor Author

ngam commented Aug 18, 2022

In the future, we can unify the two containers if you want. Our tensorflow and pytorch are compatible and can coexist easily in envs now. We worked quite hard on the pinning, etc. in conda-forge. So instead of having two large containers, we can have one that has all the goodies in it

@TomAugspurger
Copy link

Thanks @ngam, I'll merge this now.

I'll need to verify that our examples run correctly before deploying these updated containers to production. I'll hopefully have time for that next week.

@ngam
Copy link
Contributor Author

ngam commented Aug 19, 2022

Okay, great! Let me know if I can be of any help! Just tag me if you think you'd like my help or input :)

@TomAugspurger TomAugspurger merged commit 879b9c6 into microsoft:main Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants