-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single Stage Detection with MLCube™ [request for feedback] #465
Single Stage Detection with MLCube™ [request for feedback] #465
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
||
class DownloadDataTask(object): | ||
|
||
urls = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this entire file except these urls is boilerplate copy/paste, so there should just be one copy of the code in the central mlcube distribution. Then these urls can go in the single_stage_detector/mlcube/.mlcube.yaml file.
|
||
|
||
class DownloadModelTask(object): | ||
url = "https://download.pytorch.org/models/resnet34-333f7ec4.pth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also move this to single_stage_detector/mlcube/.mlcube.yaml
cached_archive = cache_dir / archive_name | ||
if not cached_archive.exists(): | ||
print(f"Data ({name}) is not in cache ({cached_archive}), downloading ...") | ||
os.system(f"cd {cache_dir}; curl -O {DownloadDataTask.urls[name]};") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be followed by an md5sum check (and we should have the md5sum for each downloaded file in addition to just the url)
shutil.copyfile(cached_archive, dest_archive) | ||
|
||
print(f"Extracting archive ({archive_name}) ...") | ||
os.system(f"cd {data_dir}; unzip {archive_name};") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be helpful to have a consistency check here as well (presumably not an md5sum of every file, but some indication that the user has the right data? For this case, perhaps the number of .jpgs?
export DATASET_DIR="/data/coco2017" | ||
export TORCH_MODEL_ZOO="/data/torchvision" | ||
export DATASET_DIR=${DATASET_DIR:-"/data/coco2017"} | ||
export TORCH_MODEL_ZOO=${TORCH_MODEL_ZOO:-"/data/torchvision"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the TORCH_MODEL_ZOO variable is unnecessary if you are pre-downloading the resnet34-333f7ec4.pth file and using the --pretrained_backbone
script argument. Also Pytorch changes the name of the envvar in almost every version, so you can't really depend on it (which is part of why we implemented the --pretrained_backbone
script argument).
This PR refers to the old, retired, ssd-v1 benchmark, which was replaced by the Retinanet benchmark. In an effort to do a better job maintaining this repo, we're closing PRs for retired benchmarks. The old benchmark code still exists, but has been moved to https://github.com/mlcommons/training/tree/master/retired_benchmarks/ssd-v1/. If you think there is useful cleanup to be done to the retired_benchmarks subtree, please submit a new PR. |
Updates
06/05/2021-02
Adding--force-reinstall
switch forpip install
command in the step-by-step guide below.06/05/2021-01
Fixing bug: "docker image exists" check now uses docker command specified in docker platform file. In previous version, thedocker
command was hard coded for this check.05/05/2021-01
Adding missing dependency to MLCube™ docker file (unzip).02/05/2021-01
Fixed errors inCurrent implementation
section related to installing MLCube from GitHub repository.01/05/2021-01
Vision section below now clearly states it's not a working example.22/04/2021-01
All pending MLCube PRs have been merged into master.Known problems
05/05/2021
User environment needssudo
to run docker containers. A quick fix could be to replacecommand: docker
withcommand: sudo docker
in docker.yaml.Introduction
MLCommons™ Best Practices WG is working towards simplifying the process of running ML workloads, including MLCommons reference training and inference benchmarks. We have developed a prototype of a library that we call MLCube™.
The goal of this PR is to show how MLCube can be used to run MLCommons training and inference workloads, and to gather a feedback.
Vision
One possible way of interacting with MLCubes is presented in this section. To simplify the process of running ML models, users need to know the following:
mlcube describe
in a MLCube directory.Install MLCube:
virtualenv -p python3 ./mlcube_env source ./mlcube_env/bin/activate pip install mlcube
Get the MLCommons SSD reference benchmark:
mlcube pull https://github.com/mlcommons/training --project single_stage_detector cd ./single_stage_detector
Explore what tasks SSD MLCube supports:
Run SSD benchmark using local Docker runtime:
Current implementation
We'll be updating this section as we merge MLCube PRs and make new MLCube releases.