Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLCube support for Object Detection Benchmark #501

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

davidjurado
Copy link
Contributor

@davidjurado davidjurado commented Jul 23, 2021

Benchmark execution with MLCube

Project setup

# Create Python environment and install MLCube Docker runner 
virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker

# Fetch the Object Detection workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/501/head:feature/object_detection && git checkout feature/object_detection
cd ./object_detection/mlcube

Dataset

The COCO dataset will be downloaded and extracted. Sizes of the dataset in each step:

Dataset Step MLCube Task Format Size
Download (Compressed dataset) download_data Tar/Zip files ~20.5 GB
Extract (Uncompressed dataset) download_data Jpg/Json files ~21.2 GB
Total (After all tasks) All ~41.7 GB

Tasks execution

Parameters are defined at these files:

  • MLCube user parameters: mlcube/workspace/parameters.yaml
  • Project user parameters: pytorch/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
  • Project default parameters: pytorch/maskrcnn_benchmark/config/defaults.py
# Download COCO dataset. Default path = /workspace/data
mlcube run --task=download_data -Pdocker.build_strategy=always

# Run benchmark. Default paths = ./workspace/data
mlcube run --task=train -Pdocker.build_strategy=always

Demo execution

These tasks will use a demo dataset (39M) to execute a faster training workload for a quick demo (~12 min):

# Download subsampled dataset. Default path = /workspace/demo
mlcube run --task=download_demo -Pdocker.build_strategy=always

# Run benchmark. Default paths = ./workspace/demo and ./workspace/demo_output
mlcube run --task=demo -Pdocker.build_strategy=always

It's also possible to execute the two tasks in one single instruction:

mlcube run --task=download_demo,demo -Pdocker.build_strategy=always

Aditonal options

Parameters defined at mculbe/mlcube.yaml could be overridden using: --param=input

We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this:

mlcube run ... -Pdocker.build_strategy=always

@github-actions
Copy link

github-actions bot commented Jul 23, 2021

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@matthew-frank matthew-frank added object_detection object_detection benchmark MLCube labels Dec 2, 2022
@johntran-nv
Copy link
Contributor

@mmarcinkiewicz are you the right person to review this one?


```bash
mlcube run ... -Pdocker.build_strategy=always
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the benchmark README template. So can you please add sections that have been moved to README.old back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for pointing this out, move mlcube explanation into the mlcube folder

curl -O http://images.cocodataset.org/zips/train2017.zip
echo "Extracting train2017.zip:"
n_files=`unzip -l train2017.zip| grep .jpg | wc -l`
unzip train2017.zip | { I=-1; while read; do printf "Progress: $((++I*100/$n_files))%%\r"; done; echo ""; }

# TBD: MD5 verification
# $md5sum *.zip *.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a checksum verification step to make sure changes do not affect the dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added the validation inside the download_dataset.sh file.

@@ -0,0 +1,5 @@
SAVE_CHECKPOINTS: "True" # Instead of False use empty value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAVE_CHECKPOINTS should be False by default since that code path is not well tested in the recent past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@ShriyaPalsamudram ShriyaPalsamudram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share a log from an end-to-end training run using mlcube so it can be compared to the previous workflow?

@nv-rborkar
Copy link
Contributor

@davidjurado can you please address Shriya's feedback. We can then merge this PR.

@davidjurado davidjurado requested a review from a team as a code owner March 22, 2024 15:48
@davidjurado davidjurado force-pushed the feature/object_detection branch from 2c969eb to 489ed0f Compare November 15, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MLCube object_detection object_detection benchmark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants