Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: move Exp org patterns to Persisting Exps, improve #3178

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions content/docs/command-reference/exp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ positional arguments:
`dvc exp` subcommands provide specialized ways to create and manage data
science/ machine learning experiments.

📖 See [Experiment Management](/doc/user-guide/experiment-management) for more
info.
📖 See
[DVC Experiments Overview](/doc/user-guide/experiment-management/experiments-overview)
for more info.

> ⚠️ Note that DVC assumes that experiments are deterministic (see **Avoiding
> unexpected behavior** in `dvc stage add`).
Expand Down
35 changes: 0 additions & 35 deletions content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,41 +40,6 @@ They support support these main approaches:
> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
> introduction to DVC experiments.

### Organization patterns

It's up to you to decide how to organize completed experiments. These are the
main alternatives:

- **Git tags and branches** - use the repo's "time dimension" to distribute your
experiments. This makes the most sense for experiments that build on each
other. Git-based experiment structures are especially helpful along with Git
history exploration tools
[like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network).

- **Directories** - the project's "space dimension" can be structured with
directories (folders) to organize experiments. Useful when you want to see all
your experiments at the same time (without switching versions) by just
exploring the file system.

- **Hybrid** - combining an intuitive directory structure with a good repo
branching strategy tends to be the best option for complex projects.
Completely independent experiments live in separate directories (and can be
generated with [`foreach` stages], for example), while their progress can be
found in different branches.

- **Labels** - in general, you can record experiments in a separate system and
structure them using custom labeling. This is typical in dedicated experiment
tracking tools. A possible problem with this approach is that it's easy to
lose the connection between your project history and the experiments logged.

DVC takes care of arranging `dvc exp` experiments and the data
<abbr>cache</abbr> under the hood so there's no need to decide on the above
until your experiments are made [persistent].

[`foreach` stages]:
/doc/user-guide/project-structure/pipelines-files#foreach-stages
[persistent]: /doc/user-guide/experiment-management/persisting-experiments

## Run Cache: Automatic Log of Stage Runs

Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,86 @@ files, etc.) can be stored in Git.

> Please note that you need to `dvc push` in order to share or backup the DVC
> cache contents.

## Organization patterns

While internally all experiments are special branches off a baseline (see
[Overview](/doc/user-guide/experiment-management/experiments-overview)), it's up
to you to decide how to organize them once completed. Here are the main
alternatives:

### Git commits, tags, and branches

Use the repo's "time dimension" to distribute your experiments. This makes the
most sense for experiments that build on each other. Git-based experiment
structures are especially helpful along with Git history exploration tools [like
GitHub]. Example:

![](/img/exp-branches.png) _From our [example-dvc-checkpoints] repo_

[example-dvc-checkpoints]:
https://github.com/iterative/example-dvc-checkpoints/network

### Directories

The project's "space dimension" can be structured with directories (folders) to
organize experiments. Useful when you want to see all your experiments at the
same time (without switching versions) by just exploring the file system.
Example:

```
├── data
│ └── labels.raw
├── dvc.yaml
└── experiments
├── cnn_128
├── cnn_64
└── linear
```

(ℹ️) When your `dvc.yaml` files are organized inside recursive subfolders, you
can run their pipeline(s) using `dvc run --recursive`.

> 📖 See also [Running all pipelines]

### Hybrid

Combining an intuitive directory structure with a good repo branching strategy
tends to be the best option for complex projects. Completely independent
experiments live in separate directories, while their progress can be found in
different branches. Example:

<cards>
<card>
v0.1.0

```
└── experiments
├── cnn_128
└── cnn_64
```

</card>
<card>
v0.2.0

```
└── experiments
├── cnn_128
└── cnn_512
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of naming these cnn_128, cnn_512, etc. it may be more understandable to use the same words and common steps in an ML pipeline, like data augmentation, dataset1, dataset2 etc. cnn is always in the training phase, and may be different versions of model directory.


</card>
</cards>

### Labels (ad hoc)

In general, you can record experiments in a separate system and structure them
using custom labeling. This is typical in dedicated experiment tracking tools. A
possible problem with this approach is that it's easy to lose the connection
between your project history and the experiments logged.

[like github]:
https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network
[running all pipelines]:
/doc/user-guide/experiment-management/running-experiments#running-all-pipelines
Binary file added static/img/exp-branches.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.