Skip to content

Commit

Permalink
Improving documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
gfursin committed Feb 13, 2025
1 parent a43092e commit 8636fa5
Show file tree
Hide file tree
Showing 15 changed files with 293 additions and 120 deletions.
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,23 @@ It includes the following sub-projects.

### Common Metadata eXchange framework (CMX, 2024+)

The [Common Metadata eXchange framework (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
was developed to support open science and facilitate
collaborative, reproducible, and reusable research, development,
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data).

It helps users non-intrusively convert their software projects
into file-based repositories of portable and reusable artifacts
(code, data, models, scripts) with extensible metadata,
a unified command-line interface, and a simple Python API.

Such artifacts can be easily chained together into portable
and technology-agnostic automation workflows, enabling users to
rerun, reproduce, and reuse complex experimental setups across diverse and rapidly evolving models, datasets,
software, and hardware.

For example, CMX helps to modularize, automate and customize MLPerf benchmarks.

See the [project page](https://github.com/mlcommons/ck/tree/master/cmx) for more details.

### Collective Mind framework (CM, 2021-2024)
Expand Down Expand Up @@ -124,3 +141,5 @@ for their feedback and contributions!

If you found the CM automations helpful, kindly reference this article:
[ [ArXiv](https://arxiv.org/abs/2406.16791) ], [ [BibTex](https://github.com/mlcommons/ck/blob/master/citation.bib) ].

You are welcome to contact the [author](https://cKnowledge.org/gfursin) to discuss long-term plans and potential collaboration.
16 changes: 9 additions & 7 deletions cm/README.CMX.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,21 @@
The [Common Metadata eXchange framework (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
was developed to support open science and facilitate
collaborative, reproducible, and reusable research, development,
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data).
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data)
and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reproducibility).

It helps users non-intrusively convert their software projects,
directories, and Git(Hub) repositories into file-based repositories
of portable and reusable artifacts (code, data, models, scripts)
with extensible metadata, a unified command-line interface,
and a simple Python API.

Such artifacts can be easily chained together into portable automation
workflows, enabling users to rerun, reproduce, and reuse complex
experimental setups across diverse and rapidly evolving models, datasets,
software, and hardware.
Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,
enabling users to rerun, reproduce, and reuse complex experimental setups across diverse and rapidly
evolving models, datasets, software, and hardware.

Such workflows, in turn, can be easily integrated with CI/CD pipelines and GitHub Actions
and used to create powerful, portable, modular and GUI-based applications.

For example, you can run image classification and the MLPerf inference benchmark on Linux, macOS,
and Windows using a few CMX commands as follows:
Expand All @@ -40,8 +43,7 @@ cmx run script "run-mlperf inference _performance-only _short" --model=resnet50
cmx show cache
```

CMX extends the [Collective Knowledge (CK)](https://learning.acm.org/techtalks/reproducibility)
and [Collective Mind (CM)](https://zenodo.org/records/8105339) concepts,
CMX extends the [Collective Mind (CM) framework](https://zenodo.org/records/8105339),
which have been successfully validated to
[modularize, automate, and modernize MLPerf benchmarks](https://arxiv.org/abs/2406.16791).

Expand Down
57 changes: 25 additions & 32 deletions cmx/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
# Common Metadata eXchange (CMX)

The [Common Metadata eXchange framework (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
was developed to support open science and facilitate
We are developing the [Common Metadata eXchange framework (CMX)](https://github.com/mlcommons/ck/tree/master/cmx)
to support open science and facilitate
collaborative, reproducible, and reusable research, development,
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data).
and experimentation based on [FAIR principles](https://en.wikipedia.org/wiki/FAIR_data)
and the [Collective Knowledge concept](https://learning.acm.org/techtalks/reproducibility).

It helps users non-intrusively convert their software projects,
directories, and Git(Hub) repositories into file-based repositories
of portable and reusable artifacts (code, data, models, scripts)
with extensible metadata, a unified command-line interface,
and a simple Python API.
It helps users non-intrusively convert their software projects
into file-based repositories of portable and reusable artifacts
(code, data, models, scripts) with extensible metadata,
a unified command-line interface, and a simple Python API.

Such artifacts can be easily chained together into portable automation
workflows, enabling users to rerun, reproduce, and reuse complex
experimental setups across diverse and rapidly evolving models, datasets,
software, and hardware.
Such artifacts can be easily chained together into portable and technology-agnostic automation workflows,
enabling users to rerun, reproduce, and reuse complex experimental setups across diverse and rapidly
evolving models, datasets, software, and hardware.

Such workflows, in turn, can be easily integrated with CI/CD pipelines and GitHub Actions
and used to create powerful, portable, modular and GUI-based applications.

For example, you can run image classification and the MLPerf inference benchmark on Linux, macOS,
and Windows using a few CMX commands as follows:
Expand All @@ -27,8 +29,7 @@ cmx run script "run-mlperf inference _performance-only _short" --model=resnet50
cmx show cache
```

CMX extends the [Collective Knowledge (CK)](https://learning.acm.org/techtalks/reproducibility)
and [Collective Mind (CM)](https://zenodo.org/records/8105339) concepts,
CMX extends the [Collective Mind (CM) framework](https://zenodo.org/records/8105339),
which have been successfully validated to
[modularize, automate, and modernize MLPerf benchmarks](https://arxiv.org/abs/2406.16791).

Expand Down Expand Up @@ -67,16 +68,16 @@ Collective Mind (CM) in the Python cmind package:
*Under preparation*

* [Installation (Linux, Windows, MacOS)](install.md)
* [Getting Started Guide](getting-started.md)
* [MLOps, DevOps and MLPerf automations](https://access.cknowledge.org/playground/?action=scripts)
* [High-level architecture](architecture-4.0.0.png)
* [Python API](https://cknowledge.org/docs/cmx)
* CMX Guide:
* [Understanding CMX](understanding-cmx.md)
* [CMX commands to share and reuse artifacts](commands.md)
* [CMX automation commands](cmx-automations.md)
* [Reusing CMX automations and artifacts for MLOps, DevOps and MLPerf](cmx4mlops.md)
* [CMX Python API](https://cknowledge.org/docs/cmx)
* [CMX internal architecture](architecture-4.0.0.png)
* [Motivation](motivation.md)





## Author

[Grigori Fursin](https://cKnowledge.org/gfursin).
Expand All @@ -94,17 +95,9 @@ Copyright (c) 2024-2025 MLCommons

Grigori Fursin and the cTuning foundation donated this project to MLCommons to benefit everyone.

## Concepts

To learn more about the motivation behind this project, please explore the following articles and presentations:

* HPCA'25 article "MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI": [ [Arxiv](https://arxiv.org/abs/2410.12032) ], [ [tutorial to reproduce results using CM/CMX](https://github.com/aryatschand/MLPerf-Power-HPCA-2025/blob/main/measurement_tutorial.md) ]
* "Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments": [ [ArXiv](https://arxiv.org/abs/2406.16791) ]
* ACM REP'23 keynote about the MLCommons CM automation framework: [ [slides](https://doi.org/10.5281/zenodo.8105339) ]
* ACM TechTalk'21 about Collective Knowledge project: [ [YouTube](https://www.youtube.com/watch?v=7zpeIVwICa4) ] [ [slides](https://learning.acm.org/binaries/content/assets/leaning-center/webinar-slides/2021/grigorifursin_techtalk_slides.pdf) ]
* Journal of Royal Society'20: [ [paper](https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0211) ]

## Citation

If you found the CMX automations helpful, kindly reference this article:
If you found the CM/CMX automations for MLOps, DevOps and MLPerf helpful, kindly reference this article:
[ [ArXiv](https://arxiv.org/abs/2406.16791) ], [ [BibTex](https://github.com/mlcommons/ck/blob/master/citation.bib) ].

You are welcome to contact the [author](https://cKnowledge.org/gfursin) to discuss long-term plans and potential collaboration.
7 changes: 7 additions & 0 deletions cmx/commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[ [Back to documentation](README.md) ]

# CMX commands

## Command Line

## Python API
3 changes: 0 additions & 3 deletions cmx/getting-started.md

This file was deleted.

1 change: 1 addition & 0 deletions cmx/mlperf-inference/v4.1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
1 change: 1 addition & 0 deletions cmx/mlperf-inference/v5.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
81 changes: 9 additions & 72 deletions cmx/motivation.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,14 @@
[ [Back to index](README.md) ]
[ [Back to documentation](README.md) ]

Introduction to the MLCommons Collective Mind (CM) workflow automation framework and its new version, Common Metadata eXchange (CMX).
# CK/CM/CMX motivation

## Introduction
To learn more about the concepts and motivation behind this project, please explore the following articles and presentations:

During the past 10 years, the community has considerably improved
the reproducibility of experimental results from research projects and published papers
by introducing the [artifact evaluation process](https://cTuning.org/ae)
with a [unified artifact appendix and reproducibility checklists](https://github.com/mlcommons/ck/blob/master/docs/artifact-evaluation/checklist.md),
Jupyter notebooks, containers, and Git repositories.
* HPCA'25 article "MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI": [ [Arxiv](https://arxiv.org/abs/2410.12032) ], [ [tutorial to reproduce results using CM/CMX](https://github.com/aryatschand/MLPerf-Power-HPCA-2025/blob/main/measurement_tutorial.md) ]
* "Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments": [ [ArXiv](https://arxiv.org/abs/2406.16791) ]
* ACM REP'23 keynote about the MLCommons CM automation framework: [ [slides](https://doi.org/10.5281/zenodo.8105339) ]
* ACM TechTalk'21 about Collective Knowledge project: [ [YouTube](https://www.youtube.com/watch?v=7zpeIVwICa4) ] [ [slides](https://learning.acm.org/binaries/content/assets/leaning-center/webinar-slides/2021/grigorifursin_techtalk_slides.pdf) ]
* Journal of Royal Society'20: [ [paper](https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0211) ]

On the other hand, [our experience reproducing more than 150 papers](https://www.youtube.com/watch?v=7zpeIVwICa4)
revealed that it still takes weeks and months of painful and
repetitive interactions between researchers and evaluators to reproduce experimental results.
You are welcome to contact the [author](https://cKnowledge.org/gfursin) to discuss long-term plans and potential collaboration.

This effort includes decrypting numerous README files, examining ad-hoc artifacts
and containers, and figuring out how to reproduce computational results.
Furthermore, snapshot containers pose a challenge to optimize algorithms' performance,
accuracy, power consumption and operational costs across diverse
and rapidly evolving software, hardware, and data used in the real world.

![](https://raw.githubusercontent.com/ctuning/ck-guide-images/master/cm-ad-hoc-projects.png)

This practical experience and the feedback from the community motivated
us to establish the [MLCommons Task Force on Automation and Reproducibility](taskforce.md)
and develop a light-weight, technology agnostic, and English-like
workflow automation language called Collective Mind (MLCommons CM).

This language provides a common, non-intrusive and human-readable interface to any software project
transforming it into a collection of [reusable automation recipes (CM scripts)]( https://github.com/mlcommons/ck/tree/master/cm-mlops/script ).
Following [FAIR principles](https://www.go-fair.org/fair-principles), CM automation actions and scripts
are simple wrappers around existing user scripts and artifacts to make them
* findable via human-readable tags, aliases and unique IDs;
* accessible via a unified CM CLI and Python API with JSON/YAML meta descriptions;
* interoperable and portable across any software, hardware, models and data;
* reusable across all projects.

CM is written in simple Python and uses JSON and/or YAML meta descriptions with a unified CLI
to minimize the learning curve and help researchers and practitioners describe, share, and reproduce experimental results
in a unified, portable, and automated way across any rapidly evolving software, hardware, and data
while solving the "dependency hell" and automatically generating unified README files and modular containers.

![](https://raw.githubusercontent.com/ctuning/ck-guide-images/master/cm-unified-projects.png)

Our ultimate goal is to use CM language to facilitate reproducible research for AI, ML and systems projects,
minimize manual and repetitive benchmarking and optimization efforts,
and reduce time and costs when transferring technology to production
across continuously changing software, hardware, models, and data.


## Some projects supported by CM

* [A unified way to run MLPerf inference benchmarks with different models, software and hardware](mlperf/inference). See [current coverage](https://github.com/mlcommons/ck/issues/1052).
* [A unitied way to run MLPerf training benchmarks](tutorials/reproduce-mlperf-training.md) *(prototyping phase)*
* [A unified way to run MLPerf tiny benchmarks](tutorials/reproduce-mlperf-tiny.md) *(prototyping phase)*
* A unified CM to run automotive benchmarks *(prototyping phase)*
* [An open-source platform to aggregate, visualize and compare MLPerf results](https://access.cknowledge.org/playground/?action=experiments)
* [Leaderboard for community contributions](https://access.cknowledge.org/playground/?action=contributors)
* [Artifact Evaluation and reproducibility initiatives](https://cTuning.org/ae) at ACM/IEEE/NeurIPS conferences:
* [A unified way to run experiments and reproduce results from ACM/IEEE MICRO'23 and ASPLOS papers](https://github.com/ctuning/cm4research)
* [Student Cluster Competition at SuperComputing'23](https://github.com/mlcommons/ck/blob/master/docs/tutorials/scc23-mlperf-inference-bert.md)
* [CM automation to reproduce IPOL paper](https://github.com/mlcommons/ck/blob/master/cm-mlops/script/reproduce-ipol-paper-2022-439/README-extra.md)
* [Auto-generated READMEs to reproduce official MLPerf BERT inference benchmark v3.0 submission with a model from the Hugging Face Zoo](https://github.com/mlcommons/submissions_inference_3.0/tree/main/open/cTuning/code/huggingface-bert/README.md)
* [Auto-generated Docker containers to run and reproduce MLPerf inference benchmark](../cm-mlops/script/app-mlperf-inference/dockerfiles/retinanet)

## Presentations

* [CK vision (ACM Tech Talk at YouTube)](https://www.youtube.com/watch?v=7zpeIVwICa4)
* [CK concepts (Philosophical Transactions of the Royal Society)](https://doi.org/10.1098/rsta.2020.0211)
* [CM workflow automation introduction (slides from ACM REP'23 keynote)](https://doi.org/10.5281/zenodo.8105339)
* [MLPerf inference submitter orientation (slides)](https://doi.org/10.5281/zenodo.8144274)

## Common Metadata eXchange automation framework (CMX)

Since 2025, we have been developing a new backward-compatible version of CM with simpler
and more intuitive interfaces for automation recipes in MLOps, DevOps, and MLPerf.
Loading

0 comments on commit 8636fa5

Please sign in to comment.