From c9d1e94cfc2b7ec12e6d9840a7fdb23075cbf771 Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Wed, 19 Feb 2025 10:02:58 -0800 Subject: [PATCH 1/3] readme updates --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 43a56e2..4c5cf1c 100644 --- a/README.md +++ b/README.md @@ -94,8 +94,9 @@ machine urs.earthdata.nasa.gov ### GPU Installation We have tried to make the environment as open, flexible, and transparent as possible. -In particular, we are using the `conda-forge` distribution of the libraries, including relevant python packages for CUDA compatibility. -We have provided an `environment_gpu.yml` which adds a minimum version for the `cudatoolkit` to ensure on our GPU systems that GPU is accessible. +However, GPU compatibility requires us to fix the CUDA version. +We are abke to use the `conda-forge` distribution of the libraries, including relevant CUDA libraries. +We have provided an `environment_gpu.yml` which fixes the `cudatoolkit` version to ensure on our GPU systems that GPU is accessible. This will *not* be installable on non-Linux systems. The library `cudatoolkit` is the `conda-forge` distribution of NVIDIA's cuda tool kit (see [here](https://anaconda.org/conda-forge/cudatoolkit)). @@ -206,15 +207,15 @@ There are two main components to the DIST-S1 workflow: 1. Curation and localization of the OPERA RTC-S1 products. This is captured in the `run_dist_s1_sas_prep_workflow` function within the [`workflows.py` file](src/dist_s1/workflows.py). 2. Application of the DIST-S1 algorithm to the localized RTC-S1 products. This is captured in the `run_dist_s1_sas_workflow` function within the [`workflows.py` file](src/dist_s1/workflows.py). -These two steps can be run serially as a single workflow via `run_dist_s1_sas_workflow` in the [`workflows.py` file](src/dist_s1/workflows.py). There are associated CLI entrypoints to the functions via the `dist-s1` main command (see [SAS usage](#as-a-sds-science-application-software-sas) or the [run_sas.sh](examples/run_sas.sh) script). +These two steps can be run serially as a single workflow via `run_dist_s1_workflow` in the [`workflows.py` file](src/dist_s1/workflows.py). There are associated CLI entrypoints to the functions via the `dist-s1` main command (see [SAS usage](#as-a-sds-science-application-software-sas) or the [run_sas.sh](examples/run_sas.sh) script). -In terms of design, each step of the workflow relies heavily on writing its outputs to disk. This allows for testing of each step via staging of inputs on disk. It also provides a means to visually inspect the outputs of a given step (e.g. via QGIS) without additional boilerplate code to load/serialize in-memory data. There is a Class `RunConfigData` (that can be serialized as a `run_config.yml`) that functions to validate the inputs provided by the user and store the necessary paths for intermediate and output products (including those required for each of the workflow's steps). Storing these paths is quite tedious and each run config instance stores these paths via tables or dictionaries for easier lookup (e.g. by `jpl_burst_id` and acquisition timestamp). +In terms of design, each step of the workflow relies heavily on writing its outputs to disk. This allows for testing of each step by staging the relevant inputs on disk. It also provides a means to visually inspect the outputs of a given step (e.g. via QGIS) without additional boilerplate code to load/serialize in-memory data. There is a class `RunConfigData` (that can be serialized as a `run_config.yml`) that functions to validate the inputs provided by the user and store the necessary paths for intermediate and output products (including those required for each of the workflow's steps). Storing these paths is quite tedious and each run config instance stores these paths via tables or dictionaries to allow for efficient lookup (e.g. find all the paths of for RTC-S1 despeckled inputs by `jpl_burst_id`). There are also important libraries used to do the core of the disturbance detections including: 1. [`distmetrics`](https://github.com/opera-adt/distmetrics) which provides an easy interface to compute the disturbance metrics as they relate to a baseline of RTC-S1 inputs and a recent set of acquisition data. -2. [`dist-s1-enumerator`](https://github.com/opera-adt/dist-s1-enumerator) which provides the functionality to localize the necessary RTC-S1 inputs. -3. [`tile-mate`](https://github.com/opera-calval/tile-mate) which provides the functionality to localize static tiles including the water mask. +2. [`dist-s1-enumerator`](https://github.com/opera-adt/dist-s1-enumerator) which provides the functionality to query the OPERA RTC-S1 catalog and localize the necessary RTC-S1 inputs. +3. [`tile-mate`](https://github.com/opera-calval/tile-mate) which provides the functionality to localize static tiles including the UMD GLAD data used for the water mask. These are all available via `conda-forge` and maintained by the DIST-S1 team. From 88d86a39e0b983f3da082f6adeb9a86ebc93ab00 Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Wed, 19 Feb 2025 10:17:45 -0800 Subject: [PATCH 2/3] update readme --- CHANGELOG.md | 1 + README.md | 16 ++++++++-------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0adee42..9461570 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,7 @@ and uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - CLI issues with bucket/prefix for S3 upload (resolves [#32](https://github.com/opera-adt/dist-s1/issues/32)). - Included `__main__.py` testing for the SAS entrypoint of the CLI; uses the cropped dataset to test the workflow. - Includes `dist-s1 run_sas` testing and golden dataset comparision. +- Updates to README regarding GPU environment setup. ## [0.0.4] diff --git a/README.md b/README.md index 4a0aaf3..9d6e711 100644 --- a/README.md +++ b/README.md @@ -94,16 +94,16 @@ machine urs.earthdata.nasa.gov ### GPU Installation We have tried to make the environment as open, flexible, and transparent as possible. -However, GPU compatibility requires us to fix the CUDA version. -We are abke to use the `conda-forge` distribution of the libraries, including relevant CUDA libraries. -We have provided an `environment_gpu.yml` which fixes the `cudatoolkit` version to ensure on our GPU systems that GPU is accessible. +However, ensuring that the GPU is accessible within a Docker container and is consistent with our OPERA GPU server requires us to fix the CUDA version. +We are able to use the `conda-forge` distribution of the required libraries, including pytorch (even though pytorch is no long supported officially on conda-forge). +We have provided such an environment file as `environment_gpu.yml` which fixes the `cudatoolkit` version to ensure on our GPU systems that GPU is accessible. This will *not* be installable on non-Linux systems. The library `cudatoolkit` is the `conda-forge` distribution of NVIDIA's cuda tool kit (see [here](https://anaconda.org/conda-forge/cudatoolkit)). -Although pytorch is no long supported officially on conda-forge, we have elected to use the distribution there because our library relies heavily on gdal, which is most easily installed via conda-forge. -There are likely many ways to accomplish GPU compatibility. -We can force cuda builds of pytorch via the environment file using regex versions: `- pytorch>=*=cuda118*`. -There are other ways to accomplish this including `pytorch-gpu`. -Our approach is motivated by the requirement to have this environment be compatible with our docker setup. +We have elected to use the distribution there because we use conda to manage our virtual environments andour library relies heavily on gdal, which has in our experience been most easily installed via conda-forge. +There are likely many ways to accomplish GPU pass through to the container, but this approach has worked for us. +Our approach is also motivated to ensure our local server environment is compatible with our docker setup (so we can confidently run the test within a workstation rather than a docker container). +Regarding the environment, we highlight that we can force cuda builds of pytorch using regex versions: `pytorch>=*=cuda118*`. +There are other conda-forge packages such as [`pytorch-gpu`](https://anaconda.org/conda-forge/pytorch-gpu) that may also be effectively utilizing the same libaries, but we have not compared or looked into the exact differences. To resolve environment issues related to having access to the GPU, we successfully used `conda-tree` to identify CPU bound dependencies. For example, From 2592e64d8e68121f8e887b732823a16d54f788ce Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Wed, 19 Feb 2025 11:03:16 -0800 Subject: [PATCH 3/3] update to create --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9d6e711..0a5e35e 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,7 @@ There sample `run_config.yml` file is provided in the [examples](examples) direc We recommend using the mamba/conda package manager and `conda-forge` distributions to install the DIST-S1 workflow, manage the environment, and install the dependencies. ``` -mamba update -f environment.yml +mamba env create -f environment.yml # or use mamba env create -f environment_gpu.yml for GPU installation with CUDA 11.8 conda activate dist-s1-env mamba install -c conda-forge dist-s1 python -m ipykernel install --user --name dist-s1-env @@ -132,7 +132,7 @@ mamba install jupyterlab ipywidgets black isort jupyterlab_code_formatter As above, we recommend using the mamba/conda package manager to install the DIST-S1 workflow, manage the environment, and install the dependencies. ``` -mamba update -f environment.yml +mamba env create -f environment_gpu.yml conda activate dist-s1-env pip install -e . # Optional for Jupyter notebook development