Skip to content

Commit

Permalink
Rewrite the explanation of pipelines in dag command's description ite…
Browse files Browse the repository at this point in the history
  • Loading branch information
sahilbhosale63 committed Jul 24, 2020
1 parent b2e4983 commit 961a995
Showing 1 changed file with 20 additions and 19 deletions.
39 changes: 20 additions & 19 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,26 @@ positional arguments:

## Description

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example console commands that take an
input and produce an <abbr>output</abbr>). A pipeline may produce intermediate
data, and has a final result.

Data processing or ML pipelines typically start a with large raw datasets,
include intermediate featurization and training stages, and produce a final
model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.
`dvc dag` command displays the stages of a pipeline up to the target stage. If
`target` is omitted, it will show the full project DAG.
A Data pipeline refers to a series of [stages](/doc/command-reference/run)
through which our data moves. Each stage of a pipeline takes some input and
produces some output. This output is then passed onto the next stage of a
pipeline. This process continues until we reach the final stage which produces
the final results. A pipeline works the same way as a compiler works, it takes
some data as an input and produces an output.

You can create multiple pipelines and each pipeline would be considered as an
experiment. After completing one experiment, you can commit the changes and add
a tag to your experiment. A tag is a name that you give to your experiment.

Using DVC, you can create a metafile `data.dvc` which allows us to reproduce
each stage of a pipeline using `dvc repro`. At the end of every pipeline, you
can save your output in a metrics file using `dvc metrics` command. This file
will help you in comparing the results of every experiment.

DVC provides a `dvc dag` command which creates a direct acyclic graph
([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) that gives a
pictorial view of a pipeline. It also tells you in which stage of a pipeline you
are currently in.

## Options

Expand Down

0 comments on commit 961a995

Please sign in to comment.