Skip to content

Commit

Permalink
Addition of documentation that covers the set up of a simple study (#168
Browse files Browse the repository at this point in the history
)

* Addition of simple Hello World spec.

* Addition of basics page to index.

* Addition of hello_world documentation.

* Additions to hello_world.

* More documentation in single step section.

* Continued edits to Hello World.

* Addition of parameter section.

* Addition of a note about %% token.

* Addition of directory structure.

* Continuation of parameter documentation.

* Removal of the depends key.

* Addition of the env section description.

* Addition of a link to Docker documentation for Dockerfiles.

* Addition of single parameter hello world.

* Correction of double colons.

* Correction of indentation.

* Addition of print out to verify output.

* Addition of sample specifications for multi and single params.

* Addition of more documentation for single param.

* Additional output to show parameter results.

* Correction to formatting.

* Addition of samples.

* Addition of simple Hello World spec.

* Addition of basics page to index.

* Addition of hello_world documentation.

* Additions to hello_world.

* More documentation in single step section.

* Continued edits to Hello World.

* Addition of parameter section.

* Addition of a note about %% token.

* Addition of directory structure.

* Continuation of parameter documentation.

* Removal of the depends key.

* Addition of the env section description.

* Addition of a link to Docker documentation for Dockerfiles.

* Addition of single parameter hello world.

* Correction of double colons.

* Correction of indentation.

* Addition of print out to verify output.

* Addition of sample specifications for multi and single params.

* Addition of more documentation for single param.

* Additional output to show parameter results.

* Correction to formatting.

* Updates to docstrings for data structures.

* Updates to clear Sphinx warnings.

* Removal of escape on the *args becuase of flake8 failure.

* Clean up of existing hello world specs.

* Addition of multistep example spec.

* Removal of * to fix sphinx errors.

* Correction to some docstrings.

* Tweaks to specs for consistent naming.

* Finished multi-step parameterized example.

* Tweaks to hello world docs.

* Addition of link to examples on GitHub.
  • Loading branch information
Francesco Di Natale authored Jun 26, 2019
1 parent 1d1f8f7 commit 28ea0c0
Show file tree
Hide file tree
Showing 15 changed files with 505 additions and 56 deletions.
4 changes: 3 additions & 1 deletion docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Getting Started
================

Maestro Docker Container
*********************
************************

In order to set up the Docker container execute the following from the root of the Maestro repository::

Expand All @@ -16,6 +16,8 @@ Once inside the Docker container, the following should bring up help::

$ maestro -h

For more information on using Dockerfiles, checkout Docker's `documentation <https://docs.docker.com/engine/reference/builder/>`.

Installing MaestroWF
*********************

Expand Down
284 changes: 284 additions & 0 deletions docs/source/hello_world.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
Basics of Study Construction
=============================

Now that you're acquainted with Maestro's interface running a pre-made example, we'll walk you through the basics of making a simple "Hello, World" specification of your own. This page will walk you through the following:

- A single step "Hello World" without parameterization.
- An introduction to a single parameter "Hello World" study.
- An introduction to a multi-parameter "Hello World" study.
- Adding a "farewell" step to "Hello World".

Maestro's default study description uses general YAML notation, which stands for "Yet Another Markup Language" and is a standard data serialization language. For more information on the YAML language, head `here <https://yaml.org/spec/1.2/spec.html>`_ to learn more.

Creating a Single Step Study
*****************************

To start, we will walk through constructing a single step "Hello World" study that simply echoes "Hello, World!" to a file. The first step is to name your study -- in this case we'll settle for something simple and just call our study "Hello World". In your editor of choice, begin by adding the following:

.. code-block:: yaml
:linenos:
description:
name: hello_world
description: A simple 'Hello World' study.
.. note:: The `description` block is a required section in every study and has two required keys: name, and description. You may add other keys to the description section, but Maestro will not check for them.

Next we will add the `env` section. This section isn't required, but in this case, we want to stash all study workspaces in a common directory. The `env` section can contain a section named `variables`, which can contain a variable named `OUTPUT_PATH`. Maestro recognizes `OUTPUT_PATH` as a keyword and we can use it to have Maestro create new workspaces for this study in a single place. In this case, we want to create the path `./sample_output/hello_world` to collect all "Hello World" studies. To do that, add the `env` section as follows to the specification:

.. code-block:: yaml
:linenos:
env:
variables:
OUTPUT_PATH: ./sample_output/hello_world
.. note:: The `OUTPUT_PATH` variable is a Maestro recognized keyword that specifies the base path where study output is written.

The final section to add will be the `study` section which will only contain a single step. Below the `description` section in the study file you've created add the following block:

.. code-block:: yaml
:linenos:
study:
- name: hello_world
description: Say hello to the world!
run:
cmd: |
echo "Hello, World!" > hello_world.txt
.. note:: The `-` denotes a list item in YAML. To add elements, simply add new elements prefixed with a hyphen. For now, we will keep it simple with one step and will cover adding extra steps later in this guide.

The only required keys for a study step are the name, description, and a run section containing a command (`cmd`). You might notice the similarity in requirement to the study itself of a `name` and `description` entry. This requirement is intentional in order to encourage documentation as you develop a study. The following are descriptions of the required keys:

.. glossary::

name
A unique name to identify this step by (tip: make this something relevant).

description
A human-readable sentence or paragraph describing what this step is meant to achieve.

run
A dictionary containing keys that describe what runs in this step.

cmd
A string of commands to be executed by this step.

The completed "Hello World" specification should now look like the following:

.. code-block:: yaml
:linenos:
description:
name: hello_world
description: A simple 'Hello World' study.
env:
variables:
OUTPUT_PATH: ./sample_output/hello_world
study:
- name: hello_world
description: Say hello to the world!
run:
cmd: |
echo "Hello, World!" > hello_world.txt
Now that the single step "Hello World" study is complete, go ahead and save it to the file `hello_world.yaml`. In order to run the study, simply run the following::

$ maestro run hello_world.yaml

The command above will produce a timestamped folder that contains the output of the above study. If you'd like to know more about Maestro's command line interface and study output, take a look at our :doc:`Quick Start <./quick_start>` guide. The "hello_world" study above produces a directory that looks similar to the following:

.. code-block:: bash
drwxr-xr-x 6 frank staff 192B Jun 18 11:32 hello_world
-rw-r--r-- 1 frank staff 1.8K Jun 18 11:32 hello_world.pkl
-rw-r--r-- 1 frank staff 0B Jun 18 11:32 hello_world.txt
-rw-r--r-- 1 frank staff 306B Jun 18 11:32 hello_world.yaml
drwxr-xr-x 3 frank staff 96B Jun 18 11:32 logs
drwxr-xr-x 5 frank staff 160B Jun 18 11:32 meta
-rw-r--r-- 1 frank staff 241B Jun 18 11:32 status.csv
From here, change into the "hello_world" subdirectory. Here you'll see that there are four files: the generated "hello_world.sh" shell script, the resulting output "hello_world.txt", a .out log file, and a .err error log. Your directory should look similar to:

.. code-block:: bash
-rw-r--r-- 1 frank staff 0B Jun 18 11:32 hello_world.err
-rw-r--r-- 1 frank staff 0B Jun 18 11:32 hello_world.out
-rwxr--r-- 1 frank staff 53B Jun 18 11:32 hello_world.sh
-rw-r--r-- 1 frank staff 14B Jun 18 11:32 hello_world.txt
You'll notice that the study directory only contains "hello_world" and the contents for a single run (which corresponds to the singular step above). Maestro detects that the step is not parameterized and uses the workspace that corresponds with the "hello_world" step. If we execute the command `cat hello_world.txt` we see that the output is exactly as specified in the `cmd` portion of the step::

$ cat hello_world.txt
$ Hello, World!

In the next section we cover the basics of how to add a single parameter to the "Hello World" study.

Adding a Single Parameter to "Hello World"
*******************************************

Now that you have a functioning single step study, let's expand "Hello World" to greet multiple people. To add this new functionality, that means you need to add a new section called `global.parameters` to our `hello_world.yaml` study specification. So, let's say we want to say hello to Pam, Jim, Michael, and Dwight. The `global.paramters` section would look as follows:

.. code-block:: yaml
:linenos:
global.parameters:
NAME:
values: [Pam, Jim, Michael, Dwight]
label: NAME.%%
.. note:: `%%` is a special token that defines where the value in the label is placed. In this case the parameter labels will be `NAME.Pam`, `NAME.Jim`, and etc. The label can take a custom text format, so long as the `%%` token is included to be able to substitute the parameter's value in the appropriate place.

In order to use the `NAME` parameter, we simply modify the "hello_world" step as follows:

.. code-block:: yaml
:linenos:
study:
- name: hello_world
description: Say hello to the world!
run:
cmd: |
echo "Hello, $(NAME)!" > hello_world.txt
.. note:: The `$(NAME)` format is an example of the general format used for variables, parameters, dependency references, and labels. For more examples of referencing values, see the `LULESH study <https://github.com/LLNL/maestrowf/blob/develop/samples/lulesh/lulesh_sample1_unix.yaml>`_ in the samples folder in the Maestro GitHub repository.

The full single parameter version of the study specification that says hello to different people is as follows:

.. code-block:: yaml
:linenos:
description:
name: hello_world
description: A simple 'Hello World' study.
env:
variables:
OUTPUT_PATH: ./sample_output/hello_world
study:
- name: hello_world
description: Say hello to someone!
run:
cmd: |
echo "Hello, $(NAME)!" > hello_world.txt
global.parameters:
NAME:
values: [Pam, Jim, Michael, Dwight]
label: NAME.%%
If we execute the study and print the contents of the study's workspace, we'll see that the contents are the same as described above. Just as before, if we change into the `hello_world` directory we'll see that the format of the directory has changed. There will now be a set of four directories, one for each parameter value, each containing the `hello_world.txt` output.

.. code-block:: bash
drwxr-xr-x 6 root root 4096 Mar 25 01:30 ./
drwxr-xr-x 5 root root 4096 Mar 25 01:30 ../
drwxr-xr-x 2 root root 4096 Mar 25 01:30 NAME.Dwight/
drwxr-xr-x 2 root root 4096 Mar 25 01:30 NAME.Jim/
drwxr-xr-x 2 root root 4096 Mar 25 01:30 NAME.Michael/
drwxr-xr-x 2 root root 4096 Mar 25 01:30 NAME.Pam/
However, if we `cat` each of the outputs from each directory, we'll see that the value for `$(NAME)` has been substituted::

$ cat */hello_world.txt
$ Hello, Dwight!
$ Hello, Jim!
$ Hello, Michael!
$ Hello, Pam!


Expanding "Hello World" to Multiple Steps
******************************************

Now that we've got our specification set up to say hello to multiple people, let's take a step back and look at our base "Hello World" specification and add "bye_world" as specified below:

.. code-block:: yaml
:linenos:
description:
name: hello_world
description: A simple 'Hello World' study.
env:
variables:
OUTPUT_PATH: ./sample_output/hello_world
study:
- name: hello_world
description: Say hello to the world!
run:
cmd: |
echo "Hello, World!" > hello_world.txt
- name: bye_world
description: Say bye to someone!
run:
cmd: |
echo "Bye, World!" > bye_world.txt
depends: [hello_world]
After adding this step to your specification, go ahead and run it using `maestro run` as before. Now, if you look at the generated study directory, we see that the study generates an extra directory for the "bye_world" step.

.. code-block:: bash
drwxr-xr-x 6 frank staff 192B Jun 25 20:54 bye_world
-rw-r--r-- 1 frank staff 2.3K Jun 25 20:54 hello_bye.pkl
-rw-r--r-- 1 frank staff 0B Jun 25 20:53 hello_bye.txt
-rw-r--r-- 1 frank staff 551B Jun 25 20:53 hello_bye_world.yaml
drwxr-xr-x 6 frank staff 192B Jun 25 20:53 hello_world
drwxr-xr-x 3 frank staff 96B Jun 25 20:54 logs
drwxr-xr-x 5 frank staff 160B Jun 25 20:53 meta
-rw-r--r-- 1 frank staff 383B Jun 25 20:54 status.csv
If you change into this directory, you'll see that a similar set of files to the previous "hello_world" step have been created. You'll see that executing `cat bye_world.txt` prints out "Bye, World!". Now, to take this a step further -- what if we wanted to say bye to each particular person in our parameterized "hello world" example?

Now, if we start with our parameterized hello world specification, we add the "bye_world" step and make it dependent on the "hello_world" step. You should also update the description and study name to something meaningful for the new study.

.. code-block:: YAML
description:
name: hello_bye_parameterized
description: A study that says hello and bye to multiple people.
env:
variables:
OUTPUT_PATH: ./sample_output/hello_world
study:
- name: hello_world
description: Say hello to someone!
run:
cmd: |
echo "$(GREETING), $(NAME)!" > hello_world.txt
- name: bye_world
description: Say bye to someone!
run:
cmd: |
echo "Bye, World!" > bye_world.txt
depends: [hello_world]
global.parameters:
NAME:
values: [Pam, Jim, Michael, Dwight]
label: NAME.%%
GREETING:
values: [Hello, Ciao, Hey, Hi]
label: GREETING.%%
The study workspace looks the same as the "hello_bye_world" study specified above at the top level; however, like the multi-parameterized "hello_world" study you'll see that each step's workspaces have parameterized folders. The "hello_world" step has the same workspace set up as the previous parameterized study as expected.

.. code-block:: bash
drwxr-xr-x 6 frank staff 192B Jun 25 22:33 GREETING.Ciao.NAME.Jim
drwxr-xr-x 6 frank staff 192B Jun 25 22:33 GREETING.Hello.NAME.Pam
drwxr-xr-x 6 frank staff 192B Jun 25 22:33 GREETING.Hey.NAME.Michael
drwxr-xr-x 6 frank staff 192B Jun 25 22:33 GREETING.Hi.NAME.Dwight
If you look into the "bye_world" workspace, you'll also notice it has the same exact set of folders as "hello_world". While this set up might seem weird at first, it is a feature of how Maestro expands the study using parameters. In a later section, we'll describe how Maestro expands the study in a predictable manner -- but for now, it is enough to know that the "bye_world" step was expanded in a 1:1 fashion because the step is dependent on "hello_world" and the parameters it used. Maestro, in this case, can not make any assumptions and simply expands the "bye_world" one to one with each parameterized "hello_world".

.. note:: You can view the sample specifications constructed here in their entirety in Maestro's GitHub repository [here](https://github.com/LLNL/maestrowf/tree/develop/samples/hello_world).
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Welcome to Maestro Workflow Conductor Documentation

getting_started
quick_start
hello_world
lulesh_breakdown
maestro_core

Expand Down
57 changes: 30 additions & 27 deletions maestrowf/datastructures/core/study.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,27 +139,29 @@ class Study(DAG):
- Creating the global workspace for a study.
- Setting up the parameterized workspaces for each combination.
- Acquiring dependencies as specified in the StudyEnvironment.
- Intelligently constructing the expanded DAG to be able to:
- Recognize when a step executes in a parameterized workspace
- Recognize when a step executes in the global workspace
- Expanding the abstract flow to the full set of specified parameters.
Future functionality that makes sense to add here:
- Metadata collection. If we're setting things up here, collect the
general information. We might even want to venture to say that a set
of directives may be useful so that they could be placed into
Dependency classes as hooks for dumping that data automatically.
general information. We might even want to venture to say that a set
of directives may be useful so that they could be placed into
Dependency classes as hooks for dumping that data automatically.
- A way of packaging an instance of the class up into something that is
easy to store in the ExecutionDAG class so that an API can be
designed in whatever class ends up managing all of this to have
machine learning applications pipe messages to spin up new studies
using the same environment.
easy to store in the ExecutionDAG class so that an API can be
designed in whatever class ends up managing all of this to have
machine learning applications pipe messages to spin up new studies
using the same environment.
- The current solution to this is VERY basic. Currently the plan is
to write a parameterized specification (not unlike the method of
using parameterized .dat files for simulators) and just have the
ML engine string replace those. It's crude because currently we'd
have to just construct a new environment, with no way to manage
injecting the new set into an existing workspace.
to write a parameterized specification (not unlike the method of
using parameterized .dat files for simulators) and just have the
ML engine string replace those. It's crude because currently we'd
have to just construct a new environment, with no way to manage
injecting the new set into an existing workspace.
"""

def __init__(self, name, description,
Expand Down Expand Up @@ -327,7 +329,7 @@ def add_step(self, step):
the order that they will be encountered. The method attempts to be
intelligent and make the intended edge based on the 'depends' entry in
a step. When adding steps out of order it's recommended to just use the
base class DAG functionality and manually make connections.
base class DAG functionality and manually make connections.
:param step: A StudyStep instance to be added to the Study instance.
"""
Expand Down Expand Up @@ -389,21 +391,22 @@ def setup_environment(self):
def configure_study(self, submission_attempts=1, restart_limit=1,
throttle=0, use_tmp=False, hash_ws=False):
"""
Perform initial configuration of a study.
The method is used for going through and actually acquiring each
dependency, substituting variables, sources and labels.
:param submission_attempts: Number of attempted submissions before
marking a step as failed.
:param restart_limit: Upper limit on the number of times a step with
a restart command can be resubmitted before it is considered failed.
:param throttle: The maximum number of in-progress jobs allowed. [0
denotes no cap].
:param use_tmp: Boolean value specifying if the generated
ExecutionGraph dumps its information into a temporary directory.
:returns: True if the Study is successfully setup, False otherwise.
Perform initial configuration of a study. \
The method is used for going through and actually acquiring each \
dependency, substituting variables, sources and labels. \
:param submission_attempts: Number of attempted submissions before \
marking a step as failed. \
:param restart_limit: Upper limit on the number of times a step with \
a restart command can be resubmitted before it is considered failed. \
:param throttle: The maximum number of in-progress jobs allowed. [0 \
denotes no cap].\
:param use_tmp: Boolean value specifying if the generated \
ExecutionGraph dumps its information into a temporary directory. \
:returns: True if the Study is successfully setup, False otherwise. \
"""

self._submission_attempts = submission_attempts
self._restart_limit = restart_limit
self._submission_throttle = throttle
Expand Down
Loading

0 comments on commit 28ea0c0

Please sign in to comment.