Skip to content

Commit

Permalink
initial
Browse files Browse the repository at this point in the history
  • Loading branch information
omesser committed Apr 10, 2023
1 parent 8a23aaf commit f9dd1ad
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 62 deletions.
4 changes: 2 additions & 2 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
"source": false,
"children": [
"data-versioning",
"data-and-model-access",
"discovering-and-accessing-data",
"data-pipelines",
{
"label": "Metrics, Parameters, and Plots",
Expand All @@ -53,7 +53,7 @@
]
},
{
"slug": "experiments",
"slug": "experiment-management",
"source": false,
"children": [
"experiment-versioning",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: 'Get Started: Data and Model Access'
title: 'Get Started: Discovering and accessing data'
description: 'Get started with accessing data and models with DVC. Learn how to
bring, explore, and access data artifacts from outside the project'
---

# Get Started: Data and Model Access
# Get Started: Discovering and accessing data

<details>

Expand Down
98 changes: 50 additions & 48 deletions content/docs/start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,31 +54,50 @@ $ git commit -m "Initialize DVC"

Now you're ready to DVC!

The value of DVC's several feature sets is best understood from different
angles. Pick one of the two trails below to learn about DVC from that
perspective:
## Before You Begin

## Data
To help you understand and use DVC consider those two high level scenarios:

- **[Data and model versioning]** is the base layer of DVC for large files,
datasets, and machine learning models. Use a standard Git workflow, but
without storing large files in the repo. Data is cached by DVC, allowing for
efficient sharing. Think "Git for data".
- **Data Management** - Track and version large amounts of data along with your
code, and use DVC as a build-system for reproducible, data driven pipelines.

- **[Data and model access]** goes over using data artifacts from outside of the
project and importing them from another DVC project. This can help to download
a specific version of an ML model to a deployment server or import a dataset
into another project.
- **Experiment Management** - Track experiments using only Git as a storage
system (no service/DB required). Manage parameters, metrics and plots easily,
and get powerful live tracking capabilities for training jobs via code
instrumentation.

- **[Data pipelines]** describe how models and other data artifacts are built,
and provide an efficient way to reproduce them. Think "Makefiles for data and
ML projects" done right.
The following chapters are categorized into the above 2 trails and are all
pretty self-contained.

- **[Metrics, parameters, and plots]** can be attached to pipelines. These let
you capture, evaluate, and visualize ML projects without leaving Git.
<admon type="tip">

Feel free to "choose your own adventure" and skip to the chapters which answer
your specific needs. In case you're unsure where to start, we recommend going
over the chapters in order.

</admon>

## Data Management

- **[Data and model versioning]** - Manage large files, datasets, and machine
learning models. DVC helps you track your data and couple its versions to your
code, while your data is stored outside of your Git repo.

- **[Discovering and accessing data]** - Accessing and using data artifacts from
outside of the project and importing them from anywhere. This can help to
download a specific version of an ML model to a deployment server or import a
dataset into another project.

- **[Data pipelines]** - Use pipelines to describe how models and other data
artifacts are built, and provide an efficient way to reproduce them. Think
"Makefiles for data and ML projects" done right.

- **[Metrics, parameters, and plots]** - Those are 1st class citizens in DVC
pipelines. Capture, evaluate, and visualize ML projects without leaving Git.

[data and model versioning]: /doc/start/data-management/data-versioning
[data and model access]: /doc/start/data-management/data-and-model-access
[discovering and accessing data]:
/doc/start/data-management/discovering-and-accessing-data
[data pipelines]: /doc/start/data-management/data-pipelines
[metrics, parameters, and plots]:
/doc/start/data-management/metrics-parameters-plots
Expand All @@ -95,26 +114,20 @@ The steps and results of some of these chapters are captured in our

</admon>

## Experiments

- **[Experiment versioning]**
## Experiment Management

Track the changes to the code, data, metrics, parameters and plots associated
with each experiment, without bloating your Git repo.
- **[Experiment versioning]** - Track the changes to the code, data, metrics,
parameters and plots associated with each experiment, without bloating your
Git repo.

- **[Experiment management]**
- **[Experiment management]** - Manage experiments and share them with others
using software engineering best practices.

Manage experiments and share them with others using software engineering best
practices.
- **[Building pipelines]** - Split your workflow into stages and build a
pipeline by connecting dependencies and outputs.

- **[Building pipelines]**

Split your workflow into stages and build a pipeline by connecting
dependencies and outputs.

- **[Experiments Iterations]**

Explore the benefits of running experiments using DVC Pipelines.
- **[Experiments Iterations]** - Explore the benefits of running experiments
using DVC Pipelines.

[experiment versioning]: /doc/start/experiments/experiment-versioning
[experiment management]: /doc/start/experiments/experiment-management
Expand All @@ -132,18 +145,7 @@ These are captured in our [example-dvc-experiments] repo (see its

</admon>

## Following the Get Started
## Where To Go Next

Each page in the trails above is more or less independent, especially if you're
only reading them to get a general idea of the features in question. For better
learning, try each step yourself from the beginning of any trail. Some of the
preparation steps may be inside collapsed sections you can click on to expand:

<details>

### Click for an example!

Click the header again to collapse this message. Or move on by picking a page
from the list above, left-side navigation, or just click `NEXT` below!

</details>
Picking a page from the list above, left-side navigation, or just click `NEXT`
below!
7 changes: 4 additions & 3 deletions content/docs/use-cases/data-registry/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
One of the main uses of <abbr>DVC repositories</abbr> is the
[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning).
DVC also enables cross-project
[reusability](/doc/start/data-management/data-and-model-access) of these
<abbr>data artifacts</abbr>. This means that your projects can depend on data
from other repositories — like a **package management system for data science**.
[reusability](/doc/start/data-management/discovering-and-accessing-data) of
these <abbr>data artifacts</abbr>. This means that your projects can depend on
data from other repositories — like a **package management system for data
science**.

![](/img/data-registry.png) _Data management middleware_

Expand Down
2 changes: 1 addition & 1 deletion content/docs/use-cases/model-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,6 @@ can sync with the state of the artifacts in your registry.

[modeling process]: /doc/start/data-management/data-pipelines
[remote storage]: /doc/user-guide/data-management/remote-storage
[sharing]: /doc/start/data-management/data-and-model-access
[sharing]: /doc/start/data-management/discovering-and-accessing-data
[via cml]: https://cml.dev/doc/cml-with-dvc
[gitops]: https://www.gitops.tech/
2 changes: 1 addition & 1 deletion content/docs/use-cases/versioning-data-and-models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Benefits of our approach include:

[remotely]: /doc/user-guide/data-management/remote-storage
[internally]: /doc/user-guide/how-to/share-a-dvc-cache
[reuse]: /doc/start/data-management/data-and-model-access
[reuse]: /doc/start/data-management/discovering-and-accessing-data

- **Data compliance**: Review data modification attempts as Git
[pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/).
Expand Down
10 changes: 5 additions & 5 deletions redirects-list.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@
"^/doc/start/data-versioning$ /doc/start/data-management",
"^/doc/start/data-and-model-versioning(/.*)?$ /doc/start/data-management 302",
"^/doc/start/sharing-data-and-model-files$ /doc/start/data-management#storing-and-sharing 302",
"^/doc/start/data-access$ /doc/start/data-management/data-and-model-access",
"^/doc/start/data-and-model-access(/.*)?$ /doc/start/data-management/data-and-model-access 302",
"^/doc/start/data-access$ /doc/start/data-management/discovering-and-accessing-data",
"^/doc/start/data-and-model-access(/.*)?$ /doc/start/data-management/discovering-and-accessing-data 302",
"^/doc/start/data-pipelines(/.*)?$ /doc/start/data-management/data-pipelines 302",
"^/doc/start/metrics-parameters-plots(/.*)?$ /doc/start/data-management/metrics-parameters-plots 302",
"^/doc/start/experiment-management(/.*)?$ /doc/start/experiments",
"^/doc/start/experiment-management(/.*)?$ /doc/start/experiment-management",
"^/doc/tutorial(/.*)?$ /doc/start",
"^/doc/tutorials(/.*)? /doc/start",
"^/doc/tutorials/get-started(/.*)?$ /doc/start",
Expand Down Expand Up @@ -98,9 +98,9 @@
"^/doc/command-reference/run$ /doc/command-reference/stage/add",
"^/doc/command-reference/exp/init$ /doc/command-reference/stage/add",

"^/doc/dvclive/dvclive-with-dvc$ /doc/start/experiments",
"^/doc/dvclive/dvclive-with-dvc$ /doc/start/experiment-management",
"^/doc/dvclive/api-reference/$ /doc/dvclive/",
"^/doc/dvclive/get-started$ /doc/start/experiments",
"^/doc/dvclive/get-started$ /doc/start/experiment-management",

"^/doc/cml(/.*)?$ https://cml.dev/doc$1",

Expand Down

0 comments on commit f9dd1ad

Please sign in to comment.