v0.0.5 (#36)

opera-adt · Feb 20, 2025 · 759c504 · 759c504
2 parents 1756e2f + cb991df
commit 759c504
Show file tree

Hide file tree

Showing 9 changed files with 446 additions and 55 deletions.
diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
@@ -24,6 +24,6 @@ jobs:
       user: ${{ github.actor }}
       release_branch: main     
       develop_branch: dev  
-      file: Dockerfile
+      file: Dockerfile.nvidia
     secrets:
       USER_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [PEP 440](https://www.python.org/dev/peps/pep-0440/)
 and uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.0.5] - 2025-02-19
+
+### Fixed
+- CLI issues with bucket/prefix for S3 upload (resolves [#32](https://github.com/opera-adt/dist-s1/issues/32)).
+- Included `__main__.py` testing for the SAS entrypoint of the CLI; uses the cropped dataset to test the workflow.
+- Includes `dist-s1 run_sas` testing and golden dataset comparision.
+- Updates to README regarding GPU environment setup.
+
 ## [0.0.4]
 
 ### Added 
@@ -18,7 +26,7 @@ and uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Golden dataset test for SAS workflow
 - Allow user to specify bucket/prefix for S3 upload - makes library compatible with Hyp3.
 - Ensure Earthdata credentials are provided in ~/.netrc and allow for them to be passed as suitable evnironment variables.
-- Create a GPU compatible docker image.
+- Create a GPU compatible docker image (ongoing) - use nvidia docker image.
 - Ensures pyyaml is in the environment (used for serialization of runconfig).
 - Update equality testing for DIST-S1 product comparison.
 

diff --git a/Dockerfile.nvidia b/Dockerfile.nvidia
@@ -12,6 +12,8 @@ RUN apt-get update && apt-get install -y \
 ENV MINIFORGE_VERSION=23.3.1-0
 ENV MINIFORGE_HOME=/opt/miniforge
 ENV PATH="$MINIFORGE_HOME/bin:$PATH"
+# https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-virtual.html
+ENV CONDA_OVERRIDE_CUDA=11.8
 
 RUN wget https://github.com/conda-forge/miniforge/releases/download/${MINIFORGE_VERSION}/Miniforge3-Linux-x86_64.sh -O /tmp/miniforge.sh && \
     bash /tmp/miniforge.sh -b -p $MINIFORGE_HOME && \

diff --git a/README.md b/README.md
@@ -73,7 +73,7 @@ There sample `run_config.yml` file is provided in the [examples](examples) direc
 We recommend using the mamba/conda package manager and `conda-forge` distributions to install the DIST-S1 workflow, manage the environment, and install the dependencies.
 
 ```
-mamba update -f environment.yml
+mamba env create -f environment.yml  # or use mamba env create -f environment_gpu.yml for GPU installation with CUDA 11.8
 conda activate dist-s1-env
 mamba install -c conda-forge dist-s1
 python -m ipykernel install --user --name dist-s1-env
@@ -94,10 +94,16 @@ machine urs.earthdata.nasa.gov
 ### GPU Installation
 
 We have tried to make the environment as open, flexible, and transparent as possible. 
-In particular, we are using the `conda-forge` distribution of the libraries, including relevant python packages for CUDA compatibility.
-We have provided an `environment_gpu.yml` which adds a minimum version for the `cudatoolkit` to ensure on our GPU systems that GPU is accessible.
+However, ensuring that the GPU is accessible within a Docker container and is consistent with our OPERA GPU server requires us to fix the CUDA version.
+We are able to use the `conda-forge` distribution of the required libraries, including pytorch (even though pytorch is no long supported officially on conda-forge).
+We have provided such an environment file as `environment_gpu.yml` which fixes the `cudatoolkit` version to ensure on our GPU systems that GPU is accessible.
 This will *not* be installable on non-Linux systems.
 The library `cudatoolkit` is the `conda-forge` distribution of NVIDIA's cuda tool kit (see [here](https://anaconda.org/conda-forge/cudatoolkit)).
+We have elected to use the distribution there because we use conda to manage our virtual environments andour library relies heavily on gdal, which has in our experience been most easily installed via conda-forge.
+There are likely many ways to accomplish GPU pass through to the container, but this approach has worked for us.
+Our approach is also motivated to ensure our local server environment is compatible with our docker setup (so we can confidently run the test within a workstation rather than a docker container).
+Regarding the environment, we highlight that we can force cuda builds of pytorch using regex versions: `pytorch>=*=cuda118*`.
+There are other conda-forge packages such as [`pytorch-gpu`](https://anaconda.org/conda-forge/pytorch-gpu) that may also be effectively utilizing the same libaries, but we have not compared or looked into the exact differences.
 
 To resolve environment issues related to having access to the GPU, we successfully used `conda-tree` to identify CPU bound dependencies.
 For example,
@@ -126,7 +132,7 @@ mamba install jupyterlab ipywidgets black isort jupyterlab_code_formatter
 As above, we recommend using the mamba/conda package manager to install the DIST-S1 workflow, manage the environment, and install the dependencies.
 
 ```
-mamba update -f environment.yml
+mamba env create -f environment_gpu.yml
 conda activate dist-s1-env
 pip install -e .
 # Optional for Jupyter notebook development
@@ -151,7 +157,7 @@ Notes:
 
 Make sure you have Docker installed for [Mac](https://docs.docker.com/desktop/setup/install/mac-install/) or [Windows](https://docs.docker.com/desktop/setup/install/windows-install/). We call the docker image `dist_s1_img` for the remainder of this README.
 We have two dockerfiles: `Dockerfile` and `Dockerfile.nvidia`.
-They both utilize `miniforge`, but the former has a base from `conda-forge` and the latter has a base from `nvidia`.
+They both utilize `miniforge`, but the former has a base from `conda-forge` and the latter has a base from `nvidia` that includes a nvidia cuda base image.
 To build the image on Linux, run:
 ```
 docker build -f Dockerfile -t dist-s1-img .
@@ -161,6 +167,12 @@ On Mac ARM, you can specify the target platform via:
 docker buildx build --platform linux/amd64 -f Dockerfile -t dist-s1 .
 ```
 
+### GPU Docker Image
+
+Getting docker to work with a GPU enabled container is a work in progress, i.e. `docker run --gpus all ...`.
+The image utilizes the `Dockerfile.nvidia` file to ensure a specific CUDA version is utilized.
+See issue [#22](https://github.com/opera-adt/dist-s1/issues/22) for more details and discussion.
+
 ### Running the Container Interactively
 
 To run the container interactively:
@@ -206,15 +218,15 @@ There are two main components to the DIST-S1 workflow:
 1. Curation and localization of the OPERA RTC-S1 products. This is captured in the `run_dist_s1_sas_prep_workflow` function within the [`workflows.py` file](src/dist_s1/workflows.py).
 2. Application of the DIST-S1 algorithm to the localized RTC-S1 products. This is captured in the `run_dist_s1_sas_workflow` function within the [`workflows.py` file](src/dist_s1/workflows.py).
 
-These two steps can be run serially as a single workflow via `run_dist_s1_sas_workflow` in the [`workflows.py` file](src/dist_s1/workflows.py). There are associated CLI entrypoints to the functions via the `dist-s1` main command (see [SAS usage](#as-a-sds-science-application-software-sas) or the [run_sas.sh](examples/run_sas.sh) script).
+These two steps can be run serially as a single workflow via `run_dist_s1_workflow` in the [`workflows.py` file](src/dist_s1/workflows.py). There are associated CLI entrypoints to the functions via the `dist-s1` main command (see [SAS usage](#as-a-sds-science-application-software-sas) or the [run_sas.sh](examples/run_sas.sh) script).
 
-In terms of design, each step of the workflow relies heavily on writing its outputs to disk. This allows for testing of each step via staging of inputs on disk. It also provides a means to visually inspect the outputs of a given step (e.g. via QGIS) without additional boilerplate code to load/serialize in-memory data. There is a Class `RunConfigData` (that can be serialized as a `run_config.yml`) that functions to validate the inputs provided by the user and store the necessary paths for intermediate and output products (including those required for each of the workflow's steps). Storing these paths is quite tedious and each run config instance stores these paths via tables or dictionaries for easier lookup (e.g. by `jpl_burst_id` and acquisition timestamp).
+In terms of design, each step of the workflow relies heavily on writing its outputs to disk. This allows for testing of each step by staging the relevant inputs on disk. It also provides a means to visually inspect the outputs of a given step (e.g. via QGIS) without additional boilerplate code to load/serialize in-memory data. There is a class `RunConfigData` (that can be serialized as a `run_config.yml`) that functions to validate the inputs provided by the user and store the necessary paths for intermediate and output products (including those required for each of the workflow's steps). Storing these paths is quite tedious and each run config instance stores these paths via tables or dictionaries to allow for efficient lookup (e.g. find all the paths of for RTC-S1 despeckled inputs by `jpl_burst_id`).
 
 There are also important libraries used to do the core of the disturbance detections including:
 
 1. [`distmetrics`](https://github.com/opera-adt/distmetrics) which provides an easy interface to compute the disturbance metrics as they relate to a baseline of RTC-S1 inputs and a recent set of acquisition data.
-2. [`dist-s1-enumerator`](https://github.com/opera-adt/dist-s1-enumerator) which provides the functionality to localize the necessary RTC-S1 inputs.
-3. [`tile-mate`](https://github.com/opera-calval/tile-mate) which provides the functionality to localize static tiles including the water mask.
+2. [`dist-s1-enumerator`](https://github.com/opera-adt/dist-s1-enumerator) which provides the functionality to query the OPERA RTC-S1 catalog and localize the necessary RTC-S1 inputs.
+3. [`tile-mate`](https://github.com/opera-calval/tile-mate) which provides the functionality to localize static tiles including the UMD GLAD data used for the water mask.
 
 These are all available via `conda-forge` and maintained by the DIST-S1 team.
 

diff --git a/environment_gpu.yml b/environment_gpu.yml
@@ -7,7 +7,9 @@ dependencies:
  - asf_search
  - backoff
  - click
+ - cudatoolkit=11.8
  - dem_stitcher
+ - dist_s1_enumerator
  - distmetrics
  - flake8
  - flake8-blind-except
@@ -22,17 +24,13 @@ dependencies:
  - pydantic
  - pytest
  - pytest-cov
- - pytorch
+ - pytorch>=*=cuda118*
  - pyyaml
  - rasterio
  - ruff
  - scipy
  - setuptools
  - setuptools_scm
  - shapely
+ - tile_mate>=0.0.12
  - tqdm
- - cudatoolkit>=11.8
- - pip:
-   - dist_s1_enumerator
-   - pyarrow
-   - tile-mate>=0.0.12
diff --git a/src/dist_s1/__main__.py b/src/dist_s1/__main__.py
@@ -139,6 +139,8 @@ def run_sas_prep(
     dst_dir: str | Path,
     water_mask_path: str | Path | None,
     product_dst_dir: str | Path | None,
+    bucket: str | None,
+    bucket_prefix: str,
 ) -> str:
     """Run SAS prep workflow."""
     run_config = run_dist_s1_sas_prep_workflow(
@@ -156,6 +158,8 @@ def run_sas_prep(
         water_mask_path=water_mask_path,
         n_lookbacks=n_lookbacks,
         product_dst_dir=product_dst_dir,
+        bucket=bucket,
+        bucket_prefix=bucket_prefix,
     )
     run_config.to_yaml(runconfig_path)
 

diff --git a/tests/conftest.py b/tests/conftest.py
@@ -51,6 +51,14 @@ def test_opera_golden_dummy_dataset() -> Path:
     return golden_dummy_dataset
 
 
+@pytest.fixture
+def cropped_10SGD_dataset_runconfig() -> Path:
+    """Fixture to provide the path to the test_out directory."""
+    test_dir = Path(__file__)
+    runconfig_path = test_dir.parent / 'test_data' / 'cropped' / 'sample_runconfig_10SGD_cropped.yml'
+    return runconfig_path
+
+
 @pytest.fixture
 def cli_runner() -> CliRunner:
     """Fixture to provide a Click test runner."""