Another CR pass - redirects fixes and some copy editing

iterative · Apr 12, 2023 · c88fe8a · c88fe8a
1 parent 120df9e
commit c88fe8a
Show file tree

Hide file tree

Showing 4 changed files with 72 additions and 94 deletions.
diff --git a/content/docs/start/experiments/experiment-pipelines.md b/content/docs/start/experiments/experiment-pipelines.md
@@ -7,35 +7,41 @@ description:
 
 # Get Started: Experiment Pipelines
 
-Eventually, managing your notebook cells may start to feel fragile, and you may
-want to structure your project and code for reproducible execution. When you are
-ready to
+If you've been following the guide in order, you might have gone through the
+chapter about [data pipelines](/doc/start/data-management/data-pipelines)
+already. Here, we will use the same functionality as a basis for an
+experimentation build system.
+
+Running an <Abbr>experiment</abbr> is achieved by executing <abbr>DVC
+pipelines</abbr>, and the term refers to the set of trackable changes associated
+with this execution. This includes code changes and resulting artifacts like
+plots, charts and models. The various `dvc exp` subcommands allow you to
+execute, share and manage experiments in various ways. Below, we'll build an
+experiment pipeline, and use `dvc exp run` to execute it with a few very handy
+capabilities like experiment queueing and parametrization.
+
+## Stepping up and out of the notebook
+
+After some time spent in your IPython notebook (e.g.
+[Jupyter](https://jupyter-notebook.readthedocs.io/en/latest/)) doing data
+exploration and basic modeling, managing your notebook cells may start to feel
+fragile, and you may want to structure your project and code for reproducible
+execution, testing and further automation. When you are ready to
 [migrate from notebooks to scripts](https://towardsdatascience.com/from-jupyter-notebook-to-sc-582978d3c0c),
-DVC <abbr>Pipelines</abbr> can help you standardize your workflow following
-software engineering best practices:
+DVC <abbr>Pipelines</abbr> help you standardize your workflow following software
+engineering best practices:
 
-- **Modularization**: split the different logical steps in your notebook into
+- **Modularization**: Split the different logical steps in your notebook into
   separate scripts.
 
-- **Parametrization**: adapt your scripts to decouple the configuration from the
+- **Parametrization**: Adapt your scripts to decouple the configuration from the
   source code.
 
-If you've been following the guide in order, you might have gone through the
-chapter about [data pipelines](/doc/start/data-management/data-pipelines)
-already. We will now use the same functionality as a basis for an
-experimentation build system.
-
-Running an <Abbr>experiment</abbr> is achieved by executing the pipeline, and
-the term refers to the set of trackable changes associated with that execution -
-including code changes and resulting artifacts like plots, charts and models.
-The various `dvc exp` sub commands allow you to execute, share and manage
-experiments in various ways. Below, we'll build an experiment pipeline, and use
-`dvc exp run` to execute it with a few very handy capabilities like experiment
-queueing and parametrization.
+## Creating the experiment pipeline
 
-## Creating the pipeline
-
-In our example repo, we first extract data preparation from the
+In our
+[example repo](https://github.com/iterative/example-get-started-experiments), we
+first extract data preparation logic from the
 [original notebook](https://github.com/iterative/example-get-started-experiments/blob/main/notebooks/TrainSegModel.ipynb)
 into
 [`data_split.py`](https://github.com/iterative/example-get-started-experiments/blob/main/src/data_split.py).
@@ -52,7 +58,9 @@ def data_split():
 ...
 ```
 
-You can use `dvc stage add` to transform a script into a <abbr>stage</abbr>:
+We now use `dvc stage add` commands to transform our scripts into individual
+<abbr>stages</abbr> starting with a `data_split` stage for
+[`data_split.py`](https://github.com/iterative/example-get-started-experiments/blob/main/src/data_split.py):
 
 ```cli
 $ dvc stage add --name data_split \
@@ -62,8 +70,14 @@ $ dvc stage add --name data_split \
   python src/data_split.py
 ```
 
-A `dvc.yaml` file is generated. It includes information about the command you
-want to run (`python src/data_split.py`), its <abbr>dependencies</abbr>,
+A `dvc.yaml` file is automatically generated with the stage details.
+
+<details>
+
+### Expand to see the created `dvc.yaml`
+
+It includes information about the stage we added, like the executable command
+(`python src/data_split.py`), its <abbr>dependencies</abbr>,
 <abbr>parameters</abbr>, and <abbr>outputs</abbr>:
 
 ```yaml
@@ -81,61 +95,21 @@ stages:
       - data/test_data
 ```
 
-`dvc exp run` will run all stages in the `dvc.yaml` file:
-
-```cli
-$ dvc exp run
-'data/pool_data.dvc' didn't change, skipping
-Running stage 'data_split':
-> python src/data_split.py
-Generating lock file 'dvc.lock'
-Updating lock file 'dvc.lock'
-...
-```
-
-<admon type="info">
-
-Learn more about [Stages](/doc/user-guide/pipelines/defining-pipelines#stages)
-
-</admon>
-
-## Building a DAG
-
-By using `dvc stage add` multiple times and defining <abbr>outputs</abbr> of a
-stage as <abbr>dependencies</abbr> of another, you describe a sequence of
-commands which forms a [pipeline](/doc/user-guide/pipelines/defining-pipelines),
-also called a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph).
+</details>
 
-Let's create a `train` stage using
+Now, create the `train` and `evaluate` stages using
 [`train.py`](https://github.com/iterative/example-get-started-experiments/blob/main/src/train.py)
-to train the model:
+and
+[`evaluate.py`](https://github.com/iterative/example-get-started-experiments/blob/main/src/evaluate.py)
+to train the model and evaluate its performance respectively:
 
 ```cli
 $ dvc stage add -n train \
   -p base,train \
   -d src/train.py -d data/train_data \
   -o models/model.pkl \
   python src/train.py
-```
-
-`dvc exp run` checks the `data_split` stage first and then the `train` stage
-since it depends on the <abbr>outputs</abbr> of `data_split`. If a stage has not
-changed or has been run before with the same <abbr>dependencies</abbr> and
-<abbr>parameters</abbr>, it will be
-[skipped](/doc/user-guide/pipelines/run-cache):
-
-```cli
-$ dvc exp run
-'data/pool_data.dvc' didn't change, skipping
-Stage 'data_split' didn't change, skipping
-Running stage 'train':
-> python src/train.py
-...
-```
 
-Finally, let's add an `evaluate` stage:
-
-```cli
 $ dvc stage add -n evaluate \
   -p base,evaluate \
   -d src/evaluate.py -d models/model.pkl -d data/test_data \
@@ -185,6 +159,8 @@ stages:
 
 </details>
 
+<details>
+
 ## Visualizing the experiment DAG
 
 As the number of stages grows, the `dvc dag` command becomes handy for
@@ -220,6 +196,8 @@ it by running `dvc exp run` to create and track new experiment runs. This
 enables some new features in DVC like Queueing experiments, and a canonical way
 to work with parameters and hyper-parameters.
 
+</details>
+
 ## Modifying parameters
 
 You can modify <abbr>parameters</abbr> from the CLI using

diff --git a/content/docs/start/index.md b/content/docs/start/index.md
@@ -63,9 +63,8 @@ scenarios:
   code, and use DVC as a build system for reproducible, data driven pipelines.
 
 - **Experiment Management** - Easily track your experiments and their progress
-  by only instrumenting your code. For more advanced control, use DVC pipelines
-  as a build system to run lots of experiments managed tracked and managed in
-  Git, and collaborate on ML experiments like software engineers do for code.
+  by only instrumenting your code, and collaborate on ML experiments like
+  software engineers do for code.
 
 The following chapters are grouped into the above 2 trails and are all pretty
 self-contained.

diff --git a/content/docs/user-guide/data-management/discovering-and-accessing-data.md b/content/docs/user-guide/data-management/discovering-and-accessing-data.md
@@ -1,17 +1,10 @@
 # Discovering and accessing data
 
-<details>
-
-### 🎬 Click to watch a video intro.
-
-https://youtu.be/EE7Gk84OZY8
-
-</details>
-
-We've learned how to _track_ data and models with DVC, and how to commit their
-versions to Git. The next questions are: How can we _use_ these artifacts
-outside of the project? How do we download a model to deploy it? How to download
-a specific version of a model? Or reuse datasets across different projects?
+Assuming you've learned the basics of how to
+[track and version data](/doc/start/data-management/data-versioning) with DVC,
+you might wonder: How can we access and use these artifacts _outside_ of the DVC
+project? How do we download a model to deploy it? How to download a specific
+version of a model? How to reuse datasets across different projects?
 
 <admon type="tip">
 
@@ -24,12 +17,20 @@ instead of the original file name such as `model.pkl` or `data.xml`).
 
 </admon>
 
-Remember those `.dvc` files `dvc add` generates? Those files (and `dvc.lock`,
-which we'll cover later) have their history in Git. DVC's remote storage config
-is also saved in Git, and contains all the information needed to access and
-download any version of datasets, files, and models. It means that a Git
-repository with <abbr>DVC files</abbr> becomes an entry point, and can be used
-instead of accessing files directly.
+<details>
+
+### 🎬 Click to watch a video about sharing data and models
+
+https://youtu.be/EE7Gk84OZY8
+
+</details>
+
+Remember those `.dvc` files `dvc add` generates? Those files (and `dvc.lock`)
+have their history in Git. DVC's remote storage config is also saved in Git, and
+contains all the information needed to access and download any version of
+datasets, files, and models. It means that a Git repository with <abbr>DVC
+files</abbr> becomes an entry point, and can be used instead of accessing files
+directly.
 
 ## Find a file or directory
 

diff --git a/redirects-list.json b/redirects-list.json
@@ -47,7 +47,7 @@
   "^/doc/start/data-and-model-access(/.*)?$                                               /doc/user-guide/data-management/discovering-and-accessing-data 302",
   "^/doc/start/data-pipelines(/.*)?$                                                      /doc/start/data-management/data-pipelines 302",
   "^/doc/start/metrics-parameters-plots(/.*)?$                                            /doc/start/data-management/metrics-parameters-plots 302",
-  "^/doc/start/experiment-management(/.*)?$                                               /doc/start/experiment-management",
+  "^/doc/start/experiment-management(/.*)?$                                               /doc/start/experiments",
   "^/doc/tutorial(/.*)?$                                                                  /doc/start",
   "^/doc/tutorials(/.*)?                                                                  /doc/start",
   "^/doc/tutorials/get-started(/.*)?$                                                     /doc/start",
@@ -98,9 +98,9 @@
   "^/doc/command-reference/run$                                                           /doc/command-reference/stage/add",
   "^/doc/command-reference/exp/init$                                                      /doc/command-reference/stage/add",
 
-  "^/doc/dvclive/dvclive-with-dvc$                                                        /doc/start/experiment-management",
+  "^/doc/dvclive/dvclive-with-dvc$                                                        /doc/start/experiments",
   "^/doc/dvclive/api-reference/$                                                          /doc/dvclive/",
-  "^/doc/dvclive/get-started$                                                             /doc/start/experiment-management",
+  "^/doc/dvclive/get-started$                                                             /doc/start/experiments",
 
   "^/doc/cml(/.*)?$                                                                       https://cml.dev/doc$1",