Skip to content

Commit

Permalink
TD model performance report (#235)
Browse files Browse the repository at this point in the history
* First draft of td model performance report

* Adding removing blank section

* Figure for performance report

* Moving figures to s3

* Update docs/docs/models/td-full-metrics.md

Co-authored-by: Emily Miller <[email protected]>

* Update docs/docs/models/td-full-metrics.md

Co-authored-by: Emily Miller <[email protected]>

* Update docs/docs/models/td-full-metrics.md

Co-authored-by: Emily Miller <[email protected]>

* Update docs/docs/models/td-full-metrics.md

Co-authored-by: Emily Miller <[email protected]>

* Moving the performance report in the nav bar

* Reordering sections and revising

* Revision

* Error correction

* Response to queries

* Pushing to re-run tests

* Update docs/docs/models/td-full-metrics.md

Co-authored-by: glipstein <[email protected]>

* Edits

* Adding link

Co-authored-by: Emily Miller <[email protected]>
Co-authored-by: glipstein <[email protected]>
  • Loading branch information
3 people authored Sep 29, 2022
1 parent 9b05c55 commit 5bbc33a
Show file tree
Hide file tree
Showing 6 changed files with 172 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
171 changes: 171 additions & 0 deletions docs/docs/models/td-full-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Time-distributed model performance


## African forest model

The African species time-distributed model was trained using almost **250,000 videos from 14 countries** in West, Central, and East Africa.
These videos include examples of 30 animal species, plus some blank videos and some showing humans.
To evaluate the performance of the model, we held out 30,324 videos from 101 randomly-chosen sites.


### Removing blank videos

One use of this model is to identify blank videos so they can be discarded or ignored.
In this dataset, 42% of the videos are blank, so removing them can save substantial amounts of
viewing time and storage space.

The model assigns a probability that each video is blank, so one strategy is to remove videos
if their probability exceeds a given threshold.
Of course, the model is not perfect, so there is a chance we will wrongly remove a video that actually
contains an animal.

To assess this tradeoff, we can use the holdout set to simulate this strategy with a range of thresholds.
For each threshold, we compute the fraction of blank videos correctly discarded and the fraction of non-blank
videos incorrectly discarded.
The following figure shows the results.

<img src="https://s3.amazonaws.com/drivendata-public-assets/zamba/td_full_set_recall_recall_curve.png" alt="" style="width:600px;"/>

The markers indicate three levels of tolerance for losing non-blank videos. For example, if it's acceptable to
lose 5% of non-blank videos, we can choose a threshold that removes 63% of the blank videos.
If we can tolerate a loss of 10%, we can remove 80% of the blanks.
And if we can tolerate a loss of 15%, we can remove 90% of the blanks.
Above that, the percentage of lost videos increases steeply.


### Accuracy

In addition to identifying blank videos, the model also computes a probability that each of 30 animal species appears in each video (plus human and blank).
We can use these probabilities to quantify the accuracy of the model for species classification.
Specifically, we computed:

* Top-1 accuracy, which is the fraction of videos where the species with the highest predicted probability is, in fact, present.

* Top-3 accuracy, which is the fraction of videos where one of the three species the model considered most likely is present.

Over all videos in the holdout set, the **top-1 accuracy is 82%; the top-3 accuracy is 94%**.
As an example, if you choose a video at random and the species with the highest predicted probability is elephant, the probability is 82% that the video contains an elephant, according to the human-generated labels.
If the three most likely species were elephant, hippopotamus, and cattle, the probability is 94% that the video
contains at least one of those species.

These results depend in part on the species represented in a particular dataset. For example, in the small
number of videos from Equatorial Guinea, only three species appear. For these videos, the top-1 accuracy is 97%, much
higher than the overall accuracy.
In the videos from Ivory Coast, 21 species are represented, so the problem is harder. For these
videos, top-1 accuracy is 80%, a little lower than the overall accuracy.


### Recall and precision by species

One of the goals of classification is to help with retrieval, that is, efficiently finding videos containing
a particular species. To evaluate the performance of the model for retrieval, we can use

* Recall, which is the fraction of videos containing a particular species that are correctly classified, and

* Precision, which is the fraction of videos the model labels with a particular species that actually contain that species.

The following figure shows recall and precision for the species in the holdout set,
excluding 11 species where there are too few examples in the holdout set to compute meaningful estimates of these metrics.

<img src="https://s3.amazonaws.com/drivendata-public-assets/zamba/td_full_set_precision_recall_by_species.png" alt="" style="width:800px;"/>

It's clear that we are able to retrieve some species more efficiently than others. For example, elephants are relatively
easy to find. Of the videos that contain elephants, 84% are correctly classified; and of the videos that the model
labels "elephant", 94% contain elephants.
So a researcher using the model to find elephant videos could find a large majority of them while viewing only a
small number of non-elephant videos.

Not surprisingly, smaller animals are harder to find. For example, the recall for rodent videos is only 22%.
However, it is still possible to search for rodents by selecting videos that assign a relatively high probability
to "rodent", even if it assigns a higher probability to another species or "blank".


### Description of the holdout set

The videos in the holdout set are a random sample from the complete set of labeled videos, but they are
selected on a transect-by-transect basis; that is, videos from each transect are assigned entirely to the
training set or entirely to the holdout set.
So the performance of the model on the holdout set should reflect its performance on videos from a transect
the model has never seen.

All 14 countries are represented in the holdout set; the following table shows the number of videos from
each country.
These proportions are roughly consistent with the proportions in the complete set.

<table>
<thead>
<tr>
<th>Country</th>
<th>Number of videos</th>
</tr>
</thead>
<tbody>
<tr>
<th>Ivory Coast</th>
<td>10,987</td>
</tr>
<tr>
<th>Guinea</th>
<td>4,300</td>
</tr>
<tr>
<th>DR Congo</th>
<td>2,750</td>
</tr>
<tr>
<th>Uganda</th>
<td>2,497</td>
</tr>
<tr>
<th>Tanzania</th>
<td>1,794</td>
</tr>
<tr>
<th>Mozambique</th>
<td>1,168</td>
</tr>
<tr>
<th>Senegal</th>
<td>1,131</td>
</tr>
<tr>
<th>Gabon</th>
<td>1,116</td>
</tr>
<tr>
<th>Cameroon</th>
<td>1,114</td>
</tr>
<tr>
<th>Liberia</th>
<td>1,065</td>
</tr>
<tr>
<th>Central African Republic</th>
<td>997</td>
</tr>
<tr>
<th>Nigeria</th>
<td>889</td>
</tr>
<tr>
<th>Congo Republic</th>
<td>678</td>
</tr>
<tr>
<th>Equatorial Guinea</th>
<td>38</td>
</tr>
</tbody>
</table>

The following figure shows the number of videos containing each of 30 animal species, plus some videos showing humans and
a substantial number of blank videos.

<img src="https://s3.amazonaws.com/drivendata-public-assets/zamba/td_full_set_number_videos_by_species.png" alt="" style="width:800px;"/>

One of the challenges of this kind of classification is that some species are
much more common than others. For species that appear in a small number of videos, we expect
the model to be less accurate because it has fewer examples to learn from.
Also, for these species it is hard to assess performance precisely because there are few examples in the holdout set. If you would like to add more examples of the species you work with, see [how to build on the model](../../train-tutorial/).

1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ nav:
- "Available Models":
- Species detection: "models/species-detection.md"
- DensePose: "models/densepose.md"
- African species performance: "models/td-full-metrics.md"
- "Advanced Options":
- All configuration options: "configurations.md"
- Using YAML configuration files: "yaml-config.md"
Expand Down

0 comments on commit 5bbc33a

Please sign in to comment.