guide: DVC Experiments Overview (#2909)

* guide: add DVC Experiments page and links + some copy edits * guide: remove checkpoint related changes * guide: remove `dvc experiments` long cmd autolinks per #2901 * guide: move run-cache section back to Exp Mgmt index bottom per #2909 (review) * guide: Exp Mgmt/ DVC Exps -> Exps Overview per #2909 (review) * guide: clear separation between Exp Mgmt index and Overview page rel #2909 (comment) * guide: single guide for Persisting Exps content and fix links * guide: begin extracting Exp details from Running to Overview rel #2909 (comment) * guide: make ToC entry for Run Cache section rel #2909 (comment) * Update content/docs/user-guide/experiment-management/index.md Co-authored-by: Ivan Shcheklein <[email protected]> * [NESTED] guide: Exp implementation details, naming into Overview (#3006) * guide: bring exp implementation details and naming from ref. per #2909 (review) * guide: copy edits to exp naming info. * guide: emphasize dvc exps are not part of Git tree in overview rel #2909 (review) * guide: ID->name in dvc exps overview per #2909 (review) * guide: ID->name in other exp guides rel #2909 (review) * guide: Visualize->Review in exp/overview/basic-workflow per #2909 (review) * guide: don't say "cleans the slate" in exp/overview/basic-workflow per #2909 (review) * giude: soften params description in exps index per #2909 (review) * guide: generalize dvc exps basic workflow * guide: Properties section in DVC Exps overview page * guide: exp init section in Exp Overview page * guide: clarify dvc exp implementation * guide: expand on Exp Overview motivation per #2909 (comment) * guide: direct language in Exp Overview/ workflow intro per #2909 (comment) * guide: mention metrics in exp init intro (Exp Overview) per #2909 (comment) * guide: intro exp init before giving specific examples of what it does per #2909 (comment) * guide: hint forach stages for hybrid exp org pattern rel. #2909 (comment) * guide: exp mgmt index copy edits * guide: mention label-based exp organization rel. https://docs.google.com/presentation/d/1C_owNoC72GvrpyMGlonHEYJ9I2rl2SLHkZQDMx0eT7A/edit#slide=id.gcb78e52e40_0_635 * guide: hide exp naming section in overview page and other details per #2909 (comment) et al. * guide: mention `exp init -i` in Overview per #2909 (comment) * guide: typo fix per #2909 (comment) * ref: exp apply copy edits per #2909 (review) * ref: mention init before exp init per #2909 (review) * guide: correct info aboug exp init in Exp Overview per pending comments in #2909 (review) * ref: link from exp init to corresponding guide * guide: make exp intro more concrete per #2909 (comment) * guide: rewrite exp init section of Exps Overview page per #2909 (review) * ref: roll back unrelated ref changes (moved to ref/exp-misc) * guide: roll back unrelated changes (moved to #3080) Co-authored-by: Ivan Shcheklein <[email protected]>
iterative · Dec 13, 2021 · 359e05f · 359e05f
1 parent 2aa3993
commit 359e05f
Show file tree

Hide file tree

Showing 7 changed files with 150 additions and 92 deletions.
diff --git a/content/docs/command-reference/exp/init.md b/content/docs/command-reference/exp/init.md
@@ -3,6 +3,9 @@
 Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run
 [experiments](/doc/user-guide/experiment-management).
 
+> Requires a <abbr>DVC repository</abbr>, created with `git init` and
+> `dvc init`.
+
 ## Synopsis
 
 ```usage
@@ -32,6 +35,11 @@ training of machine learning models.
 This command is intended to be a quick way to start running experiments. To
 create more complex stages and pipelines, use `dvc stage add`.
 
+> 📖 More context in [Experiments Overview].
+
+[experiments overview]:
+  /doc/user-guide/experiment-management/experiments-overview
+
 ### The `command` argument
 
 The `command` argument is optional, if you are using `--interactive` mode. The

diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
@@ -148,6 +148,7 @@
         "slug": "experiment-management",
         "source": "experiment-management/index.md",
         "children": [
+          "experiments-overview",
           "running-experiments",
           "comparing-experiments",
           "sharing-experiments",

diff --git a/content/docs/user-guide/experiment-management/cleaning-experiments.md b/content/docs/user-guide/experiment-management/cleaning-experiments.md
@@ -2,7 +2,9 @@
 
 Although DVC uses minimal resources to keep track of the experiments, they may
 clutter tables and the workspace. DVC allows to remove specific experiments from
-the workspace or delete all not-yet-persisted experiments at once.
+the workspace or delete all not-yet-[persisted] experiments at once.
+
+[persisted]: /doc/user-guide/experiment-management/persisting-experiments
 
 ## Removing specific experiments
 

diff --git a/content/docs/user-guide/experiment-management/experiments-overview.md b/content/docs/user-guide/experiment-management/experiments-overview.md
@@ -0,0 +1,72 @@
+# DVC Experiments Overview
+
+DVC Experiments are captured automatically by DVC when [run]. Each experiment
+creates and tracks a variation of your data science project based on the changes
+in your <abbr>workspace</abbr>.
+
+Experiments preserve a connection to the latest commit in the current branch
+(Git `HEAD`) as their parent or _baseline_, but do not form part of the regular
+Git tree (unless you make them [persistent]). This prevents bloating your repo
+with temporary commits and branches.
+
+[run]: /doc/user-guide/experiment-management/running-experiments
+
+<details>
+
+### ⚙️ How does DVC track experiments?
+
+Experiments are custom [Git references](/blog/experiment-refs) (found in
+`.git/refs/exps`) with one or more commits based on `HEAD`. These commits are
+hidden and not checked out by DVC. Note that these are not pushed to Git remotes
+by default either (see `dvc exp push`).
+
+Note that DVC Experiments require a unique name to identify them. DVC will
+usually auto-generate one by default, such as `exp-bfe64` (based on the
+experiment's hash). A custom name can be set instead, using the `--name`/`-n`
+option of `dvc exp run`. These names can be used to reference experiments in
+other `dvc exp` subcommands.
+
+</details>
+
+## Basic workflow
+
+`dvc exp` commands let you automatically track a variation of a project version
+(the baseline). You can create independent groups of experiments this way, as
+well as review, compare, and restore them later. The basic workflow goes like
+this:
+
+- Modify hyperparameters or other dependencies (input data, source code,
+  commands to execute, etc.). Leave these changes un-committed in Git.
+- [Run experiments][run] with `dvc exp run` (instead of `repro`). The results
+  are reflected in your <abbr>workspace</abbr>, and tracked automatically.
+- Review and [compare] experiments with `dvc exp show` or `dvc exp diff`, using
+  [metrics](/doc/command-reference/metrics) to identify the best one(s). Repeat
+  🔄
+- Make certain experiments [persistent] by committing their results to Git. This
+  lets you repeat the process from that point.
+
+[compare]: /doc/user-guide/experiment-management/comparing-experiments
+[persistent]: /doc/user-guide/experiment-management/persisting-experiments
+
+## Initialize DVC Experiments on any project
+
+To use DVC Experiments you need a <abbr>DVC project</abbr> with a minimal
+structure and configuration. To avoid having to bootstrap DVC manually, the
+`dvc exp init` command lets you quickly onboard an existing project to the DVC
+Experiments workflow.
+
+It will create a simple `dvc.yaml` metafile, which codifies your planned
+experiments. This includes the locations for expected <abbr>dependencies</abbr>
+(data, parameters, source code) and <abbr>outputs</abbr> (ML models,
+<abbr>metrics</abbr>, etc.). These assume [sane defaults] but can be customized
+with the options of `dvc exp init`.
+
+💡 We recommend adding the `-i` flag to use its `--interactive` mode. This will
+ask you how to run the experiments, and guide you through customizing the
+aforementioned locations (optional).
+
+You can review the resulting changes to your repo (and commit them to Git) to
+begin using DVC Experiments. Now you can move on to [running experiments][run]
+(next).
+
+[sane defaults]: /doc/command-reference/exp/init#description
diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md
@@ -1,113 +1,90 @@
 # Experiment Management
 
-_New in DVC 2.0 (see `dvc version`)_
-
-Data science and ML are iterative processes that require a large number of
-attempts to reach a certain level of a metric. Experimentation is part of the
-development of data features, hyperspace exploration, deep learning
-optimization, etc. DVC helps you codify and manage all of your
-<abbr>experiments</abbr>, supporting these main approaches:
-
-1. Create [experiments](#experiments) that derive from your latest project
-   version without having to track them manually. DVC does that automatically,
-   letting you list and compare them. The best ones can be made persistent, and
-   the rest archived.
-2. Place in-code [checkpoints](#checkpoints-in-source-code) that mark a series
-   of variations, forming a deep experiment. DVC helps you capture them at
-   runtime, and manage them in batches.
-3. Make experiments or checkpoints [persistent](#persistent-experiments) by
-   committing them to your <abbr>repository</abbr>. Or create these versions
-   from scratch like typical project changes.
-
-   At this point you may also want to consider the different
-   [ways to organize](#organization-patterns) experiments in your project (as
-   Git branches, as folders, etc.).
-
-DVC also provides specialized features to codify and analyze experiments.
-[Parameters](/doc/command-reference/params) are simple values you can tweak in a
-human-readable text file, which cause different behaviors in your code and
-models. On the other end, [metrics](/doc/command-reference/metrics) (and
+Data science and machine learning are iterative processes that require a large
+number of attempts to reach a certain level of a metric. Experimentation is part
+of the development of data features, hyperspace exploration, deep learning
+optimization, etc.
+
+Some of DVC's base features already help you codify and analyze experiments.
+[Parameters](/doc/command-reference/params) are simple values in a formatted
+text file which you can tweak and use in your code. On the other end,
+[metrics](/doc/command-reference/metrics) (and
 [plots](/doc/command-reference/plots)) let you define, visualize, and compare
-meaningful measures for the experimental results.
-
-> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
-> introduction to DVC experiments.
+quantitative measures of your results.
 
-## Experiments
+## Experimentation in DVC
 
-`dvc exp` commands let you automatically track a variation to an established
-[data pipeline](/doc/command-reference/dag). You can create multiple isolated
-experiments this way, as well as review, compare, and restore them later, or
-roll back to the baseline. The basic workflow goes like this:
-
-- Modify stage <abbr>parameters</abbr> or other dependencies (e.g. input data,
-  source code) of committed stages.
-- Use `dvc exp run` (instead of `repro`) to execute the pipeline. The results
-  are reflected in your <abbr>workspace</abbr>, and tracked automatically.
-- Use [metrics](/doc/command-reference/metrics) to identify the best
-  experiment(s).
-- Visualize, compare experiments with `dvc exp show` or `dvc exp diff`. Repeat
-  🔄
-- Use `dvc exp apply` to roll back to the best one.
-- Make the selected experiment persistent by committing its results to Git. This
-  cleans the slate so you can repeat the process.
-
-## Checkpoints in source code
+_New in DVC 2.0 (see `dvc version`)_
 
-To track successive steps in a longer experiment, you can register checkpoints
-from your code at runtime. This allows you, for example, to track the progress
-in deep learning techniques such as evolving neural networks.
+DVC experiment management features build on top of base DVC features to form a
+comprehensive framework to organize, execute, manage, and share ML experiments.
+They support support these main approaches:
 
-This kind of experiments track a series of variations (the checkpoints) and its
-execution can be stopped and resumed as needed. You interact with them using
-`dvc exp run` and its `--rev`, `--reset` options (see also the `checkpoint`
-field in `dvc.yaml` `outs`).
+- Compare parameters and metrics of existing project versions (for example
+  different Git branches) against each other or against new, uncommitted results
+  in your workspace. One tool to do so is `dvc exp diff`.
 
-> 📖 To learn more, see the dedicated
-> [Checkpoints](/doc/user-guide/experiment-management/checkpoints) guide.
+- [Run and capture] multiple experiments (derived from any project version as
+  baseline) without polluting your Git history. DVC tracks them for you, letting
+  you compare and share them. 📖 More info in the [Experiments
+  Overview][experiments].
 
-## Persistent experiments
+- Generate [checkpoints] at runtime to keep track of the internal progress of
+  deeper experiments. DVC captures [live metrics](/doc/dvclive), which you can
+  manage in batches.
 
-When your experiments are good enough to save or share, you may want to store
-them persistently as Git commits in your <abbr>repository</abbr>.
+[run and capture]: /doc/user-guide/experiment-management/running-experiments
+[experiments]: /doc/user-guide/experiment-management/experiments-overview
+[checkpoints]: /doc/user-guide/experiment-management/checkpoints
 
-Whether the results were produced with `dvc repro` directly, or after a
-`dvc exp` workflow (refer to previous sections), the `dvc.yaml` and `dvc.lock`
-pair in the <abbr>workspace</abbr> will codify the experiment as a new project
-version. The right <abbr>outputs</abbr> (including
-[metrics](/doc/command-reference/metrics)) should also be present, or available
-via `dvc checkout`.
+> 👨‍💻 See [Get Started: Experiments](/doc/start/experiments) for a hands-on
+> introduction to DVC experiments.
 
 ### Organization patterns
 
-DVC takes care of arranging `dvc exp` experiments and the data
-<abbr>cache</abbr> under the hood. But when it comes to full-blown persistent
-experiments, it's up to you to decide how to organize them in your project.
-These are the main alternatives:
+It's up to you to decide how to organize completed experiments. These are the
+main alternatives:
 
 - **Git tags and branches** - use the repo's "time dimension" to distribute your
   experiments. This makes the most sense for experiments that build on each
   other. Helpful if the Git [revisions](https://git-scm.com/docs/revisions) can
   be easily visualized, for example with tools
   [like GitHub](https://docs.github.com/en/github/visualizing-repository-data-with-graphs/viewing-a-repositorys-network).
+
 - **Directories** - the project's "space dimension" can be structured with
   directories (folders) to organize experiments. Useful when you want to see all
   your experiments at the same time (without switching versions) by just
   exploring the file system.
+
 - **Hybrid** - combining an intuitive directory structure with a good repo
   branching strategy tends to be the best option for complex projects.
-  Completely independent experiments live in separate directories, while their
-  progress can be found in different branches.
+  Completely independent experiments live in separate directories (and can be
+  generated with [`foreach` stages], for example), while their progress can be
+  found in different branches.
+
+- **Labels** - in general, you can record experiments in a separate system and
+  structure them using custom labeling. This is typical in dedicated experiment
+  tracking tools. A possible problem with this approach is that it's easy to
+  lose the connection between your project history and the experiments logged.
+
+DVC takes care of arranging `dvc exp` experiments and the data
+<abbr>cache</abbr> under the hood so there's no need to decide on the above
+until your experiments are made [persistent].
+
+[`foreach` stages]:
+  /doc/user-guide/project-structure/pipelines-files#foreach-stages
+[persistent]: /doc/user-guide/experiment-management/persisting-experiments
 
-## Automatic log of stage runs (run-cache)
+## Run Cache: Automatic Log of Stage Runs
 
-Every time you `dvc repro` pipelines or `dvc exp run` experiments, DVC logs the
-unique signature of each stage run (to `.dvc/cache/runs` by default). If it
-never happened before, the stage command(s) are executed normally. Every
+Every time you [reproduce](/doc/command-reference/repro) a pipeline with DVC, it
+logs the unique signature of each stage run (in `.dvc/cache/runs` by default).
+If it never happened before, the stage command(s) are executed normally. Every
 subsequent time a [stage](/doc/command-reference/run) runs under the same
 conditions, the previous results can be restored instantly, without wasting time
 or computing resources.
 
 ✅ This built-in feature is called <abbr>run-cache</abbr> and it can
-dramatically improve performance. It's enabled out-of-the-box (but can be
-disabled with the `--no-run-cache` command option).
+dramatically improve performance. It's enabled out-of-the-box (can be disabled),
+which means DVC is already saving all of your tests and experiments behind the
+scene. But there's no easy way to explore it.
diff --git a/content/docs/user-guide/experiment-management/persisting-experiments.md b/content/docs/user-guide/experiment-management/persisting-experiments.md
@@ -1,11 +1,9 @@
 # Persisting Experiments
 
-DVC runs experiments outside of the Git stage/commit cycle for quick iteration.
-When your experiments are good enough to save or share, you may want to store
-them persistently as Git commits in your repository.
-
-In this section, we describe how to bring them to the standard Git workflow with
-`dvc exp branch` and `dvc exp apply`.
+DVC Experiments run outside of the regular Git workflow for faster iteration and
+to avoid polluting your <abbr>repository</abbr>'s history. Once experiments are
+good enough to keep or distribute, you may want to store them persistently as
+Git commits.
 
 ## Create a Git branch from an experiment
 
@@ -73,7 +71,7 @@ $ dvc exp show --include-params=my_param
 
 The results found in the workspace are shown in the respective row. When you
 want to bring another experiment to the workspace, you can reference it using
-it's name or ID, e.g.:
+it's name, e.g.:
 
 ```dvc
 $ dvc exp apply exp-e6c97

diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md
@@ -1,8 +1,8 @@
 # Running Experiments
 
-We explain how DVC codifies and executes experiments, setting their parameters,
-using multiple jobs to run them in parallel, and running them in queues, among
-other details.
+We explain how to execute DVC Experiments, setting their parameters, using
+multiple jobs to run them in parallel, and running them in queues, among other
+details.
 
 > 📖 If this is the first time you are introduced into data science
 > experimentation, you may want to check the basics in
@@ -231,7 +231,7 @@ Note that Git-ignored files/dirs are explicitly excluded from queued/temp runs
 to avoid committing unwanted files into Git (e.g. once successful experiments
 are [persisted]).
 
-[persisted]: /doc/user-guide/experiment-management#persistent-experiments
+[persisted]: /doc/user-guide/experiment-management/persisting-experiments
 
 > 💡 To include untracked files, stage them with `git add` first (before
 > `dvc exp run`) and `git reset` them afterwards.