Release blank / nonblank model (#228)

* add bnb weights and model option * update docs and add model as a test option * missing bracket * correct load validation * update name and version number * add official models folder for blank nonblank * format * add bnb template and add mdlite image size to td * reset patience * always keep blank model if binary and it exists * udpate comment * test real pred for species and blank model * note default in doc string * WIP add number of videos to table and include blank nonblank * simplify redundancy and finish BNB * update to four models and add bnb * build docs * tweak * lowercase label column before OHE * fix test because we are now lowercasing species * test case for blank * remove model count
drivendataorg · Sep 23, 2022 · 291b34f · 291b34f
1 parent 2a4e9fc
commit 291b34f
Show file tree

Hide file tree

Showing 20 changed files with 364 additions and 59 deletions.
diff --git a/docs/docs/configurations.md b/docs/docs/configurations.md
@@ -188,7 +188,7 @@ Path to a model checkpoint to load and use for inference. If you train your own
 
 #### `model_name (time_distributed|slowfast|european, optional)`
 
-Name of the model to use for inference. The three model options that ship with `zamba` are `time_distributed`, `slowfast`, and `european`. See the [Available Models](models/species-detection.md) page for details. Defaults to `time_distributed`
+Name of the model to use for inference. The model options that ship with `zamba` are `blank_nonblank`, `time_distributed`, `slowfast`, and `european`. See the [Available Models](models/species-detection.md) page for details. Defaults to `time_distributed`
 
 #### `gpus (int, optional)`
 
@@ -301,7 +301,7 @@ A [PyTorch learning rate schedule](https://pytorch.org/docs/stable/optim.html#ho
 
 #### `model_name (time_distributed|slowfast|european, optional)`
 
-Name of the model to use for inference. The three model options that ship with `zamba` are `time_distributed`, `slowfast`, and `european`. See the [Available Models](models/species-detection.md) page for details. Defaults to `time_distributed`
+Name of the model to use for inference. The model options that ship with `zamba` are `blank_nonblank`, `time_distributed`, `slowfast`, and `european`. See the [Available Models](models/species-detection.md) page for details. Defaults to `time_distributed`
 
 #### `dry_run (bool, optional)`
 

diff --git a/docs/docs/debugging.md b/docs/docs/debugging.md
@@ -36,7 +36,7 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error
 
 #### Decreasing video size
 
-Resize video frames to be smaller before they are passed to the model. The default for all three models is 240x426 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
+Resize video frames to be smaller before they are passed to the model. The default for all models is 240x426 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
 
 If you are using MegadetectorLite to select frames (which is the default for the official models we ship with), you can also decrease the size of the frame used at this stage by setting [`frame_selection_height` and `frame_selection_width`](configurations/#frame_selection_height-int-optional-frame_selection_width-int-optional).
 

diff --git a/docs/docs/extra-options.md b/docs/docs/extra-options.md
@@ -31,7 +31,7 @@ The options for `weight_download_region` are `us`, `eu`, and `asia`. Once a mode
 
 ## Video size
 
-When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to superior accuracy in prediction, but will use more memory and take longer to train and/or predict. The default video loading configuration for all three pretrained models resizes images to 240x426 pixels.
+When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to superior accuracy in prediction, but will use more memory and take longer to train and/or predict. The default video loading configuration for all pretrained models resizes images to 240x426 pixels.
 
 Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. For example, to resize all images to 150x150 pixels instead of the default 240x426:
 
@@ -113,7 +113,7 @@ A simple option is to sample frames that are evenly distributed throughout a vid
 
 ### MegadetectorLite
 
-You can use a pretrained object detection model called [MegadetectorLite](models/species-detection.md#megadetectorlite) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the MegadetectorLite model. If `megadetector_lite_config` is None, the MegadetectorLite model will not be used.
+You can use a pretrained object detection model called [MegadetectorLite](models/species-detection.md#megadetectorlite) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the MegadetectorLite model. If `megadetector_lite_config` is None, the MegadetectorLite model will not be used.
 
 For example, to take the 16 frames with the highest probability of detection:
 

diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -5,7 +5,10 @@
 [![codecov](https://codecov.io/gh/drivendataorg/zamba/branch/master/graph/badge.svg)](https://codecov.io/gh/drivendataorg/zamba)
 <!-- [![PyPI](https://img.shields.io/pypi/v/zamba.svg)](https://pypi.org/project/zamba/) -->
 
- <div class="embed-responsive embed-responsive-16by9" width=500>     <iframe width=600 height=340 class="embed-responsive-item" src="https://s3.amazonaws.com/drivendata-public-assets/monkey-vid.mp4" frameborder="0" allowfullscreen=""></iframe></div>
+
+<div class="embed-responsive embed-responsive-16by9" width=500> 
+    <iframe width=600 height=340 class="embed-responsive-item" src="https://s3.amazonaws.com/drivendata-public-assets/monkey-vid.mp4" 
+frameborder="0" allowfullscreen=""></iframe></div>
 
 > *Zamba* means "forest" in Lingala, a Bantu language spoken throughout the Democratic Republic of the Congo and the Republic of the Congo.
 

diff --git a/docs/docs/models/species-detection.md b/docs/docs/models/species-detection.md
@@ -1,6 +1,6 @@
 # Available models
 
-The algorithms in `zamba` are designed to identify species of animals that appear in camera trap videos. There are three models that ship with the `zamba` package: `time_distributed`, `slowfast`, and `european`. For more details of each, read on!
+The algorithms in `zamba` are designed to identify species of animals that appear in camera trap videos. The pretrained models that ship with the `zamba` package are: `blank_nonblank`, `time_distributed`, `slowfast`, and `european`. For more details of each, read on!
 
 ## Model summary
 
@@ -10,34 +10,52 @@ The algorithms in `zamba` are designed to identify species of animals that appea
     <th>Geography</th>
     <th>Relative strengths</th>
     <th>Architecture</th>
+    <th>Number of training videos</th>
+  </tr>
+  <tr>
+    <td><code>blank_nonblank</code></td>
+    <td>Central Africa, West Africa, and Western Europe</td>
+    <td>Just blank detection, without species classification </td>
+    <td>Image-based <code>TimeDistributedEfficientNet</code></td>
+    <td>~263,000</td>
   </tr>
   <tr>
     <td><code>time_distributed</code></td>
     <td>Central and West Africa</td>
-    <td>Better than <code>slowfast</code> at duikers, chimps, and gorillas and other larger species</td>
+    <td>Recommended species classification model for jungle ecologies</td>
     <td>Image-based <code>TimeDistributedEfficientNet</code></td>
+    <td>~250,000</td>
   </tr>
   <tr>
       <td><code>slowfast</code></td>
       <td>Central and West Africa</td>
-      <td>Better than <code>time_distributed</code> at blank detection and small species detection</td>
+      <td>Potentially better than <code>time_distributed</code> at small species detection</td>
       <td>Video-native <code>SlowFast</code></td>
+    <td>~15,000</td>
     </tr>
   <tr>
     <td><code>european</code></td>
     <td>Western Europe</td>
     <td>Trained on non-jungle ecologies</td>
     <td>Finetuned <code>time_distributed</code>model</td>
+    <td>~13,000</td>
   </tr>
 </table>
 
+The models trained on the largest datasets took a couple weeks to train on a single GPU machine. Some models will be updated in the future, and you can always check the [changelog](../../changelog) to see if there have been updates.
+
 All models support training, fine-tuning, and inference. For fine-tuning, we recommend using the `time_distributed` model as the starting point.
 
 <h2 id="species-classes"></h2>
 
 ## What species can `zamba` detect?
 
-`time_distributed` and `slowfast` are both trained to identify 32 common species from Central and West Africa. The output labels in these models are:
+The `blank_nonblank` model is trained to do blank detection without the species classification. The output labels from this model are:
+
+* `blank`
+* `nonblank`
+
+The `time_distributed` and `slowfast` models are both trained to identify 32 common species from Central and West Africa. The output labels in these models are:
 
 * `aardvark`
 * `antelope_duiker`
@@ -72,7 +90,7 @@ All models support training, fine-tuning, and inference. For fine-tuning, we rec
 * `small_cat`
 * `wild_dog_jackal`
 
-`european` is trained to identify 11 common species in Western Europe. The possible class labels are:
+The `european` model is trained to identify 11 common species in Western Europe. The possible class labels are:
 
 * `bird`
 * `blank`
@@ -86,6 +104,25 @@ All models support training, fine-tuning, and inference. For fine-tuning, we rec
 * `weasel`
 * `wild_boar`
 
+<a id='blank-nonblank'></a>
+
+## `blank_nonblank` model
+
+### Architecture
+
+The `blank_nonblank` uses the same [architecture](#time-distributed) as `time_distributed` model, but there is only one output class as this is a binary classification problem.
+
+### Default configuration
+
+The full default configuration is available on [Github](https://github.com/drivendataorg/zamba/blob/master/zamba/models/official_models/blank_nonblank/config.yaml).
+
+The `blank_nonblank` model uses the same [default configuration](#time-distributed-config) as the `time_distributed` model. For the frame selection, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then the classification model is run on only the 16 frames with the highest predicted probability of detection.
+
+### Training data
+
+The `blank_nonblank` model was trained on all the data used for the the [`time_distributed`](#time-distributed-training-data) and [`european`](#european-training-data) models.
+
+
 <a id='time-distributed'></a>
 
 ## `time_distributed` model
@@ -98,7 +135,7 @@ The `time_distributed` model was built by re-training a well-known image classif
 
 ### Training data
 
-`time_distributed` was trained using data collected and annotated by trained ecologists from Cameroon, Central African Republic, Democratic Republic of the Congo, Gabon, Guinea, Liberia, Mozambique, Nigeria, Republic of the Congo, Senegal, Tanzania, and Uganda, as well as citizen scientists on the [Chimp&See](https://www.chimpandsee.org/) platform.
+The `time_distributed` model was trained using data collected and annotated by trained ecologists from Cameroon, Central African Republic, Democratic Republic of the Congo, Gabon, Guinea, Liberia, Mozambique, Nigeria, Republic of the Congo, Senegal, Tanzania, and Uganda, as well as citizen scientists on the [Chimp&See](https://www.chimpandsee.org/) platform.
 
 The data included camera trap videos from:
 
@@ -197,7 +234,7 @@ The data included camera trap videos from:
   </tr>
 </table>
 
-The most recent release of trained models took around 2-3 days to train on a single GPU machine on approximately 14,000 1-minute long videos for the African species, and around 13,000 videos for the European species. These models will be updated in the future, and you can always check the [changelog](../../changelog) to see if there have been updates.
+<a id='time-distributed-config'></a>
 
 ### Default configuration
 
@@ -218,6 +255,9 @@ video_loader_config:
     confidence: 0.25
     fill_mode: score_sorted
     n_frames: 16
+    frame_batch_size: 24
+    image_height: 640
+    image_width: 640
 ```
 
 You can choose different frame selection methods and vary the size of the images that are used by passing in a custom [YAML configuration file](../yaml-config.md). The only requirement for the `time_distributed` model is that the video loader must return 16 frames.
@@ -240,7 +280,7 @@ Unlike `time_distributed`, `slowfast` is video native. This means it takes into
 
 ### Training data
 
-The `slowfast` model was trained using the same data as the [`time_distributed` model](#time-distributed-training-data).
+The `slowfast` model was trained on a subset of the [data used](#time-distributed-training-data) for the `time_distributed` model.
 
 ### Default configuration
 
@@ -262,6 +302,8 @@ video_loader_config:
     confidence: 0.25
     fill_mode: score_sorted
     n_frames: 32
+    image_height: 416
+    image_width: 416
 ```
 
 You can choose different frame selection methods and vary the size of the images that are used by passing in a custom [YAML configuration file](../yaml-config.md). The two requirements for the `slowfast` model are that:
@@ -275,7 +317,9 @@ You can choose different frame selection methods and vary the size of the images
 
 ### Architecture
 
-The `european` model starts from the trained `time_distributed` model, and then replaces and trains the final output layer to predict European species.
+The `european` model starts from the a previous version of the `time_distributed` model, and then replaces and trains the final output layer to predict European species.
+
+<a id='european-training-data'></a>
 
 ### Training data
 
@@ -285,22 +329,7 @@ The `european` model is finetuned with data collected and annotated by partners
 
 The full default configuration is available on [Github](https://github.com/drivendataorg/zamba/blob/master/zamba/models/official_models/european/config.yaml).
 
-The `european` model uses the same frame selection as the `time_distributed` model. By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `european` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels following frame selection.
-
-The full default video loading configuration is:
-```yaml
-video_loader_config:
-  model_input_height: 240
-  model_input_width: 426
-  crop_bottom_pixels: 50
-  fps: 4
-  total_frames: 16
-  ensure_total_frames: true
-  megadetector_lite_config:
-    confidence: 0.25
-    fill_mode: score_sorted
-    n_frames: 16
-```
+The `european` model uses the same [default configuration](#time-distributed-config) as the `time_distributed` model. 
 
 As with all models, you can choose different frame selection methods and vary the size of the images that are used by passing in a custom [YAML configuration file](../yaml-config.md). The only requirement for the `european` model is that the video loader must return 16 frames.
 

diff --git a/docs/docs/predict-tutorial.md b/docs/docs/predict-tutorial.md
@@ -25,7 +25,7 @@ To run `zamba predict` in the command line, you must specify `--data-dir` and/or
 * **`--data-dir PATH`:** Path to the folder containing your videos. If you don't also provide `filepaths`, Zamba will recursively search this folder for videos.
 * **`--filepaths PATH`:** Path to a CSV file with a column for the filepath to each video you want to classify. The CSV must have a column for `filepath`. Filepaths can be absolute on your system or relative to the data directory that your provide in `--data-dir`.
 
-All other flags are optional. To choose the model you want to use for prediction, either `--model` or `--checkpoint` must be specified. Use `--model` to specify one of the three [pretrained models](models/species-detection.md) that ship with `zamba`. Use `--checkpoint` to run inference with a locally saved model. `--model` defaults to [`time_distributed`](models/species-detection.md#what-species-can-zamba-detect).
+All other flags are optional. To choose the model you want to use for prediction, either `--model` or `--checkpoint` must be specified. Use `--model` to specify one of the [pretrained models](models/species-detection.md) that ship with `zamba`. Use `--checkpoint` to run inference with a locally saved model. `--model` defaults to [`time_distributed`](models/species-detection.md#what-species-can-zamba-detect).
 
 ## Basic usage: Python package
 

diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md
@@ -81,7 +81,7 @@ eleph.mp4,elephant
 leopard.mp4,leopard
 ```
 
-There are three pretrained models that ship with `zamba`: `time_distributed`, `slowfast`, and `european`. Which model you should use depends on your priorities and geography (see the [Available Models](models/species-detection.md) page for more details). By default `zamba` will use the `time_distributed` model. Add the `--model` argument to specify one of other options:
+There are pretrained models that ship with `zamba`: `blank_nonblank`, `time_distributed`, `slowfast`, and `european`. Which model you should use depends on your priorities and geography (see the [Available Models](models/species-detection.md) page for more details). By default `zamba` will use the `time_distributed` model. Add the `--model` argument to specify one of other options:
 
 ```console
 $ zamba predict --data-dir example_vids/ --model slowfast

diff --git a/templates/blank_nonblank.yaml b/templates/blank_nonblank.yaml
@@ -0,0 +1,43 @@
+train_config:
+  # data_dir: YOUR_DATA_DIR HERE
+  # labels: YOUR_LABELS_CSV_HERE
+  model_name: blank_nonblank
+  backbone_finetune_config:
+    backbone_initial_ratio_lr: 0.01
+    multiplier: 1
+    pre_train_bn: true
+    train_bn: false
+    unfreeze_backbone_at_epoch: 3
+    verbose: true
+  early_stopping_config:
+    patience: 5
+  scheduler_config:
+    scheduler: MultiStepLR
+    scheduler_params:
+      gamma: 0.5
+      milestones:
+      - 3
+      verbose: true
+
+video_loader_config:
+  model_input_height: 240
+  model_input_width: 426
+  crop_bottom_pixels: 50
+  fps: 4
+  total_frames: 16
+  ensure_total_frames: true
+  megadetector_lite_config:
+    confidence: 0.25
+    fill_mode: score_sorted
+    frame_batch_size: 24
+    image_height: 640
+    image_width: 640
+    n_frames: 16
+
+predict_config:
+  # data_dir: YOUR_DATA_DIR HERE
+  # or
+  # filepaths: YOUR_FILEPATH_CSV_HERE
+  model_name: blank_nonblank
+  # or
+  # checkpoint: YOUR_CKPT_HERE
diff --git a/templates/time_distributed.yaml b/templates/time_distributed.yaml
@@ -29,6 +29,9 @@ video_loader_config:
   megadetector_lite_config:
     confidence: 0.25
     fill_mode: score_sorted
+    frame_batch_size: 24
+    image_height: 640
+    image_width: 640
     n_frames: 16
 
 predict_config:

diff --git a/tests/test_cli.py b/tests/test_cli.py
@@ -71,7 +71,7 @@ def test_shared_cli_options(mocker, minimum_valid_train, minimum_valid_predict):
         assert "Config file: None" in result.output
 
         # check all models options are valid
-        for model in ["time_distributed", "slowfast", "european"]:
+        for model in ["time_distributed", "slowfast", "european", "blank_nonblank"]:
             result = runner.invoke(app, command + ["--model", model])
             assert result.exit_code == 0
 
@@ -154,7 +154,8 @@ def test_predict_specific_options(mocker, minimum_valid_predict, tmp_path):  # n
     assert result.exit_code == 0
 
 
-def test_actual_prediction_on_single_video(tmp_path):  # noqa: F811
+@pytest.mark.parametrize("model", ["time_distributed", "blank_nonblank"])
+def test_actual_prediction_on_single_video(tmp_path, model):  # noqa: F811
     data_dir = tmp_path / "videos"
     data_dir.mkdir()
     shutil.copy(TEST_VIDEOS_DIR / "data" / "raw" / "benjamin" / "04250002.MP4", data_dir)
@@ -172,6 +173,8 @@ def test_actual_prediction_on_single_video(tmp_path):  # noqa: F811
             "--yes",
             "--save-dir",
             str(save_dir),
+            "--model",
+            model,
         ],
     )
     assert result.exit_code == 0