Skip to content

Commit

Permalink
Merge pull request #11 from crim-ca/mlm-bands
Browse files Browse the repository at this point in the history
  • Loading branch information
fmigneault authored May 14, 2024
2 parents 602a274 + 542604b commit 3f76eb9
Show file tree
Hide file tree
Showing 14 changed files with 429 additions and 53 deletions.
20 changes: 5 additions & 15 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
Untitled.ipynb
/package-lock.json
/node_modules
.vscode
.idea

### ArchLinuxPackages ###
*.tar
*.tar.*
Expand Down Expand Up @@ -156,6 +150,7 @@ coverage.xml
*.py,cover
.hypothesis/
.pytest_cache/
**/.benchmarks
cover/

# Translations
Expand All @@ -179,6 +174,7 @@ target/

# Jupyter Notebook
.ipynb_checkpoints
Untitled.ipynb

# IPython
profile_default/
Expand Down Expand Up @@ -261,7 +257,7 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

### Git ###
# Created by git for backups. To disable backups in Git:
Expand Down Expand Up @@ -578,12 +574,7 @@ tags
[._]*.un~

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
.vscode/

# Local History for Visual Studio Code
.history/
Expand Down Expand Up @@ -936,6 +927,7 @@ FakesAssemblies/
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
node_modules/
/package-lock.json

# Visual Studio 6 build log
*.plg
Expand Down Expand Up @@ -1036,5 +1028,3 @@ FodyWeavers.xsd

### VisualStudio Patch ###
# Additional files built by Visual Studio

# End of https://www.toptal.com/developers/gitignore/api/linux,archlinuxpackages,osx,windows,python,c,django,database,pycharm,visualstudio,visualstudiocode,vim,zsh,git,diff,microsoftoffice,spreadsheet,ssh,certificates
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Add the missing JSON schema `item_assets` definition under a Collection to ensure compatibility with
the [Item Assets](https://github.com/stac-extensions/item-assets) extension, as mentioned this specification.
- Add `ModelBand` representation using `name`, `format` and `expression` properties to allow derived band references
(fixes [crim-ca/mlm-extension#7](https://github.com/crim-ca/mlm-extension/discussions/7)).

### Changed
- Adds a job to publish.yaml to publish the stac-model package
Expand Down
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ lint:
.PHONY: check-lint
check-lint: lint

.PHONY: format-lint
format-lint:
poetry run ruff --config=pyproject.toml --fix ./

.PHONY: install-npm
install-npm:
npm install
Expand All @@ -101,7 +105,8 @@ check-examples: install-npm
format-examples: install-npm
npm run format-examples

fix-%: format-%s
FORMATTERS := lint markdown examples
$(addprefix fix-, $(FORMATTERS)): fix-%: format-%

.PHONY: lint-all
lint-all: lint mypy check-safety check-markdown
Expand Down
50 changes: 38 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,18 +209,18 @@ set to `true`, there would be no `accelerator` to contain against. To avoid conf

### Model Input Object

| Field Name | Type | Description |
|-------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | string | **REQUIRED** Name of the input variable defined by the model. If no explicit name is defined by the model, an informative name (e.g.: `"RGB Time Series"`) can be used instead. |
| bands | \[string] | **REQUIRED** The names of the raster bands used to train or fine-tune the model, which may be all or a subset of bands available in a STAC Item's [Band Object](#bands-and-statistics). If no band applies for one input, use an empty array. |
| input | [Input Structure Object](#input-structure-object) | **REQUIRED** The N-dimensional array definition that describes the shape, dimension ordering, and data type. |
| description | string | Additional details about the input such as describing its purpose or expected source that cannot be represented by other properties. |
| norm_by_channel | boolean | Whether to normalize each channel by channel-wise statistics or to normalize by dataset statistics. If True, use an array of `statistics` of same dimensionality and order as the `bands` field in this object. |
| norm_type | [Normalize Enum](#normalize-enum) \| null | Normalization method. Select an appropriate option or `null` when none applies. Consider using `pre_processing_function` for custom implementations or more complex combinations. |
| norm_clip | \[number] | When `norm_type = "clip"`, this array supplies the value for each `bands` item, which is used to divide each band before clipping values between 0 and 1. |
| resize_type | [Resize Enum](#resize-enum) \| null | High-level descriptor of the rescaling method to change image shape. Select an appropriate option or `null` when none applies. Consider using `pre_processing_function` for custom implementations or more complex combinations. |
| statistics | \[[Statistics Object](#bands-and-statistics)] | Dataset statistics for the training dataset used to normalize the inputs. |
| pre_processing_function | [Processing Expression](#processing-expression) \| null | Custom preprocessing function where normalization and rescaling, and any other significant operations takes place. |
| Field Name | Type | Description |
|-------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | string | **REQUIRED** Name of the input variable defined by the model. If no explicit name is defined by the model, an informative name (e.g.: `"RGB Time Series"`) can be used instead. |
| bands | \[string \| [Model Band Object](#model-band-object)] | **REQUIRED** The raster band references used to train, fine-tune or perform inference with the model, which may be all or a subset of bands available in a STAC Item's [Band Object](#bands-and-statistics). If no band applies for one input, use an empty array. |
| input | [Input Structure Object](#input-structure-object) | **REQUIRED** The N-dimensional array definition that describes the shape, dimension ordering, and data type. |
| description | string | Additional details about the input such as describing its purpose or expected source that cannot be represented by other properties. |
| norm_by_channel | boolean | Whether to normalize each channel by channel-wise statistics or to normalize by dataset statistics. If True, use an array of `statistics` of same dimensionality and order as the `bands` field in this object. |
| norm_type | [Normalize Enum](#normalize-enum) \| null | Normalization method. Select an appropriate option or `null` when none applies. Consider using `pre_processing_function` for custom implementations or more complex combinations. |
| norm_clip | \[number] | When `norm_type = "clip"`, this array supplies the value for each `bands` item, which is used to divide each band before clipping values between 0 and 1. |
| resize_type | [Resize Enum](#resize-enum) \| null | High-level descriptor of the rescaling method to change image shape. Select an appropriate option or `null` when none applies. Consider using `pre_processing_function` for custom implementations or more complex combinations. |
| statistics | \[[Statistics Object](#bands-and-statistics)] | Dataset statistics for the training dataset used to normalize the inputs. |
| pre_processing_function | [Processing Expression](#processing-expression) \| null | Custom preprocessing function where normalization and rescaling, and any other significant operations takes place. The `pre_processing_function` should be applied over all available `bands`. For respective band operations, see [Model Band Object](#model-band-object). |

Fields that accept the `null` value can be considered `null` when omitted entirely for parsing purposes.
However, setting `null` explicitly when this information is known by the model provider can help users understand
Expand Down Expand Up @@ -253,6 +253,9 @@ and [Common Band Names][stac-band-names].
Only bands used as input to the model should be included in the MLM `bands` field.
To avoid duplicating the information, MLM only uses the `name` of whichever "Band Object" is defined in the STAC Item.
An input's `bands` definition can either be a plain `string` or a [Model Band Object](#model-band-object).
When a `string` is employed directly, the value should be implicitly mapped to the `name` property of the
explicit object representation.

One distinction from the [STAC 1.1 - Band Object][stac-1.1-band] in MLM is that [Statistics][stac-1.1-stats] object
(or the corresponding [STAC Raster - Statistics][stac-raster-stats] for STAC 1.0) are not
Expand All @@ -269,6 +272,29 @@ properties of the model.
[stac-raster-stats]: https://github.com/stac-extensions/raster?tab=readme-ov-file#statistics-object
[stac-band-names]: https://github.com/stac-extensions/eo?tab=readme-ov-file#common-band-names

#### Model Band Object

| Field Name | Type | Description |
|------------|--------|----------------------------------------------------------------------------------------------------------------------------------------|
| name | string | **REQUIRED** Name of the band referring to an extended band definition (see [Bands](#bands-and-statistics). |
| format | string | The type of expression that is specified in the `expression` property. |
| expression | \* | An expression compliant with the `format` specified. The expression can be applied to any data type and depends on the `format` given. |

> :information_source: <br>
> Although `format` and `expression` are not required in this context, they are mutually dependent on each other. <br>
> See also [Processing Expression](#processing-expression) for more details and examples.
The `format` and `expression` properties can serve multiple purpose.

1. Applying a band-specific pre-processing step,
in contrast to [`pre_processing_function`](#model-input-object) applied over all bands.
For example, reshaping a band to align its dimensions with other bands before stacking them.

2. Defining a derived-band operation or a calculation that produces a virtual band from other band references.
For example, computing an indice that applies an arithmetic combination of other bands.

For a concrete example, see [examples/item_bands_expression.json](examples/item_bands_expression.json).

#### Data Type Enum

When describing the `data_type` provided by a [Band](#bands-and-statistics), whether for defining
Expand Down
4 changes: 4 additions & 0 deletions examples/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@
"href": "item_basic.json",
"rel": "item"
},
{
"href": "item_bands_expression.json",
"rel": "item"
},
{
"href": "item_eo_bands.json",
"rel": "item"
Expand Down
204 changes: 204 additions & 0 deletions examples/item_bands_expression.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
{
"$comment": "Demonstrate the use of MLM and EO for bands description, with EO bands directly in the Model Asset.",
"stac_version": "1.0.0",
"stac_extensions": [
"https://crim-ca.github.io/mlm-extension/v1.1.0/schema.json",
"https://stac-extensions.github.io/eo/v1.1.0/schema.json",
"https://stac-extensions.github.io/raster/v1.1.0/schema.json",
"https://stac-extensions.github.io/file/v1.0.0/schema.json",
"https://stac-extensions.github.io/ml-aoi/v0.2.0/schema.json"
],
"type": "Feature",
"id": "resnet-18_sentinel-2_all_moco_classification",
"collection": "ml-model-examples",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-7.882190080512502,
37.13739173208318
],
[
-7.882190080512502,
58.21798141355221
],
[
27.911651652899923,
58.21798141355221
],
[
27.911651652899923,
37.13739173208318
],
[
-7.882190080512502,
37.13739173208318
]
]
]
},
"bbox": [
-7.882190080512502,
37.13739173208318,
27.911651652899923,
58.21798141355221
],
"properties": {
"description": "Sourced from torchgeo python library, identifier is ResNet18_Weights.SENTINEL2_ALL_MOCO",
"datetime": null,
"start_datetime": "1900-01-01T00:00:00Z",
"end_datetime": "9999-12-31T23:59:59Z",
"mlm:name": "Resnet-18 Sentinel-2 ALL MOCO",
"mlm:tasks": [
"classification"
],
"mlm:architecture": "ResNet",
"mlm:framework": "pytorch",
"mlm:framework_version": "2.1.2+cu121",
"file:size": 43000000,
"mlm:memory_size": 1,
"mlm:total_parameters": 11700000,
"mlm:pretrained_source": "EuroSat Sentinel-2",
"mlm:accelerator": "cuda",
"mlm:accelerator_constrained": false,
"mlm:accelerator_summary": "Unknown",
"mlm:batch_size_suggestion": 256,
"mlm:input": [
{
"name": "RBG+NDVI Bands Sentinel-2 Batch",
"bands": [
{
"name": "B04"
},
{
"name": "B03"
},
{
"name": "B02"
},
{
"name": "NDVI",
"format": "rio-calc",
"expression": "(B08 - B04) / (B08 + B04)"
}
],
"input": {
"shape": [
-1,
13,
64,
64
],
"dim_order": [
"batch",
"channel",
"height",
"width"
],
"data_type": "float32"
}
}
],
"mlm:output": [
{
"name": "classification",
"tasks": [
"segmentation",
"semantic-segmentation"
],
"result": {
"shape": [
-1,
10
],
"dim_order": [
"batch",
"class"
],
"data_type": "float32"
},
"classification_classes": [
{
"value": 1,
"name": "vegetation",
"title": "Vegetation",
"description": "Pixels were vegetation is detected.",
"color_hint": "00FF00",
"nodata": false
},
{
"value": 0,
"name": "background",
"title": "Non-Vegetation",
"description": "Anything that is not classified as vegetation.",
"color_hint": "000000",
"nodata": false
}
],
"post_processing_function": null
}
]
},
"assets": {
"weights": {
"href": "https://example.com/model-rgb-ndvi.pth",
"title": "Pytorch weights checkpoint",
"description": "A vegetation classification model trained on Sentinel-2 imagery and NDVI.",
"type": "application/octet-stream; application=pytorch",
"roles": [
"mlm:model",
"mlm:weights"
],
"$comment": "Following 'eo:bands' is required to fulfil schema validation of 'eo' extension.",
"eo:bands": [
{
"name": "B02",
"common_name": "blue",
"description": "Blue (band 2)",
"center_wavelength": 0.49,
"full_width_half_max": 0.098
},
{
"name": "B03",
"common_name": "green",
"description": "Green (band 3)",
"center_wavelength": 0.56,
"full_width_half_max": 0.045
},
{
"name": "B04",
"common_name": "red",
"description": "Red (band 4)",
"center_wavelength": 0.665,
"full_width_half_max": 0.038
},
{
"name": "B08",
"common_name": "nir",
"description": "NIR 1 (band 8)",
"center_wavelength": 0.842,
"full_width_half_max": 0.145
}
]
}
},
"links": [
{
"rel": "collection",
"href": "./collection.json",
"type": "application/json"
},
{
"rel": "self",
"href": "./item_bands_expression.json",
"type": "application/geo+json"
},
{
"rel": "derived_from",
"href": "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a",
"type": "application/json",
"ml-aoi:split": "train"
}
]
}
Loading

0 comments on commit 3f76eb9

Please sign in to comment.