diff --git a/content/docs/command-reference/exp/index.md b/content/docs/command-reference/exp/index.md index c733a2d77e..fbe98b3687 100644 --- a/content/docs/command-reference/exp/index.md +++ b/content/docs/command-reference/exp/index.md @@ -46,8 +46,9 @@ positional arguments: `dvc exp` subcommands provide specialized ways to create and manage data science/ machine learning experiments. -📖 See [Experiment Management](/doc/user-guide/experiment-management) for more -info. +📖 See +[DVC Experiments Overview](/doc/user-guide/experiment-management/experiments-overview) +for more info. > ⚠ī¸ Note that DVC assumes that experiments are deterministic (see **Avoiding > unexpected behavior** in `dvc stage add`). diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index bf4e15ce38..d4fe142e40 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -40,41 +40,6 @@ They support support these main approaches: > 👨‍đŸ’ģ See [Get Started: Experiments](/doc/start/experiments) for a hands-on > introduction to DVC experiments. -### Organization patterns - -It's up to you to decide how to organize completed experiments. These are the -main alternatives: - -- **Git tags and branches** - use the repo's "time dimension" to distribute your - experiments. This makes the most sense for experiments that build on each - other. Git-based experiment structures are especially helpful along with Git - history exploration tools - [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network). - -- **Directories** - the project's "space dimension" can be structured with - directories (folders) to organize experiments. Useful when you want to see all - your experiments at the same time (without switching versions) by just - exploring the file system. - -- **Hybrid** - combining an intuitive directory structure with a good repo - branching strategy tends to be the best option for complex projects. - Completely independent experiments live in separate directories (and can be - generated with [`foreach` stages], for example), while their progress can be - found in different branches. - -- **Labels** - in general, you can record experiments in a separate system and - structure them using custom labeling. This is typical in dedicated experiment - tracking tools. A possible problem with this approach is that it's easy to - lose the connection between your project history and the experiments logged. - -DVC takes care of arranging `dvc exp` experiments and the data -cache under the hood so there's no need to decide on the above -until your experiments are made [persistent]. - -[`foreach` stages]: - /doc/user-guide/project-structure/pipelines-files#foreach-stages -[persistent]: /doc/user-guide/experiment-management/persisting-experiments - ## Run Cache: Automatic Log of Stage Runs Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it diff --git a/content/docs/user-guide/experiment-management/persisting-experiments.md b/content/docs/user-guide/experiment-management/persisting-experiments.md index 9abd46b7f2..278542438b 100644 --- a/content/docs/user-guide/experiment-management/persisting-experiments.md +++ b/content/docs/user-guide/experiment-management/persisting-experiments.md @@ -94,3 +94,86 @@ files, etc.) can be stored in Git. > Please note that you need to `dvc push` in order to share or backup the DVC > cache contents. + +## Organization patterns + +While internally all experiments are special branches off a baseline (see +[Overview](/doc/user-guide/experiment-management/experiments-overview)), it's up +to you to decide how to organize them once completed. Here are the main +alternatives: + +### Git commits, tags, and branches + +Use the repo's "time dimension" to distribute your experiments. This makes the +most sense for experiments that build on each other. Git-based experiment +structures are especially helpful along with Git history exploration tools [like +GitHub]. Example: + +![](/img/exp-branches.png) _From our [example-dvc-checkpoints] repo_ + +[example-dvc-checkpoints]: + https://github.com/iterative/example-dvc-checkpoints/network + +### Directories + +The project's "space dimension" can be structured with directories (folders) to +organize experiments. Useful when you want to see all your experiments at the +same time (without switching versions) by just exploring the file system. +Example: + +``` +├── data +│ └── labels.raw +├── dvc.yaml +└── experiments + ├── cnn_128 + ├── cnn_64 + └── linear +``` + +(ℹī¸) When your `dvc.yaml` files are organized inside recursive subfolders, you +can run their pipeline(s) using `dvc run --recursive`. + +> 📖 See also [Running all pipelines] + +### Hybrid + +Combining an intuitive directory structure with a good repo branching strategy +tends to be the best option for complex projects. Completely independent +experiments live in separate directories, while their progress can be found in +different branches. Example: + + + + v0.1.0 + + ``` + └── experiments + ├── cnn_128 + └── cnn_64 + ``` + + + + v0.2.0 + + ``` + └── experiments + ├── cnn_128 + └── cnn_512 + ``` + + + + +### Labels (ad hoc) + +In general, you can record experiments in a separate system and structure them +using custom labeling. This is typical in dedicated experiment tracking tools. A +possible problem with this approach is that it's easy to lose the connection +between your project history and the experiments logged. + +[like github]: + https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network +[running all pipelines]: + /doc/user-guide/experiment-management/running-experiments#running-all-pipelines diff --git a/static/img/exp-branches.png b/static/img/exp-branches.png new file mode 100644 index 0000000000..d8bbd901b4 Binary files /dev/null and b/static/img/exp-branches.png differ