forked from oliver-sanders/cylc-tutorial
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro-section.tex
443 lines (355 loc) · 13.9 KB
/
intro-section.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
\section{Cylc Introduction}
\label{Cylc Introduction}
\note{For more information on the topics of this section, see Appendix~\ref{Appendix
Cylc Introduction} and the Cylc User Guide.}
\subsection{The suite.rc File}
A cylc suite is a collection of files in a {\em suite directory} configured by
a single {\em suite.rc} file, which is written in a nested INI format with
section and sub-section (etc.) headings denoted by square brackets.
\begin{lstlisting}[language=suiterc]
[section]
option = value
[[sub-section]]
option = value
\end{lstlisting}
The most important top level sections in a suite.rc file are:
\begin{tabular}{ll}
\begin{lstlisting}
[cylc]
\end{lstlisting} & various suite-level settings\\
\begin{lstlisting}
[scheduling]
\end{lstlisting} & determines {\em when} tasks are ready to run (e.g.\ dependencies)\\
\begin{lstlisting}
[runtime]
\end{lstlisting} & defines {\em what} to run when a task is ready, and
{\em where} and {\em how} to run it \\
\end{tabular}
\subsection{Hello World in cylc}
\label{Hello World in cylc}
This suite runs a single task named \lstinline{hello} that prints ``Hello
World!'', sleeps for a few seconds, then exits:
% Call task just "hello" to distinguish the task name from the suite name.
% Sleep for 30 seconds - long enough to be seen in the GUI.
\begin{lstlisting}[language=suiterc]
[scheduling]
[[dependencies]]
graph = hello
[runtime]
[[hello]]
script = echo "Hello World!"; sleep 30
\end{lstlisting}
The \lstinline{[scheduling]} section says to run the hello task immediately on
start-up, because it doesn't depend anything else; and the
\lstinline=[runtime]= section says that the task should run the given inlined
shell scripting.
You can run a new cylc suite like this (but don't do it just yet):
\begin{lstlisting}[language=bash]
$ cylc register hello_world /path/to/hello_world/ # register name "hello_world"
$ cylc validate hello_world # check for configuration errors
$ cylc run hello_world # run the suite
$ cylc gui hello_world & # open a suite control GUI
\end{lstlisting}
In this tutorial however, we will be using the \lstinline{rose suite-run},
command to run suites. It automatically installs, registers, validates, starts
the suite with \lstinline=cylc run=, and opens the cylc GUI.
The \lstinline{cylc run} command starts a new {\em suite daemon} to run your
suite. It will stay alive even if you log out, until your suite runs to
completion or you tell it to shut down.
\terminology{A cylc \underline{suite daemon} is a light-weight server program
dedicated to managing a single workflow.}
\subsection{Hello World Tutorial}
\begin{shaded*}
If you are running this tutorial on the cylc VM, the raw directory structure of the suite is set up for you. Change directory to {\em \$HOME/tutorial/suites/hello\_world/}.
Inside that directory is a cylc {\em suite.rc} file that looks like this:
\begin{lstlisting}[language=suiterc]
[scheduling]
[[dependencies]]
graph = hello
[runtime]
[[hello]]
script = echo "Hello World!"; sleep 30
\end{lstlisting}
Now run the following commands:
\begin{lstlisting}[language=bash]
$ rose suite-run # install, register, validate, run the suite, and open the GUI
\end{lstlisting}
If no errors are found in the suite.rc file your suite will start up and a
GUI window will appear showing the \lstinline{hello} task with a coloured
square representing its state. For example, green means `running' and gray
`succeeded'. Once the task has succeeded, the suite has no more tasks to run
and will shut down.
Note that ``Hello World!'' is not printed to the terminal. It is printed by
the \lstinline=hello= task, which is launched by the suite daemon as a separate
process (potentially on another machine, although not in this case).
\terminology{A task \underline{job script} is a shell script generated by cylc
to run a task as defined in the suite.rc file.}
The job script and its output are written to a standard job log directory:
\begin{lstlisting}[language=bash]
$ ls $HOME/cylc-run/hello_world/log/job/1/hello/01/
job # task job script
job.out # task standard out
job.err # task standard error
job.status
job-activity.log
\end{lstlisting}
The task job logs are automatically retrieved to the suite host, if the task
runs on another machine.
While a task is visible in the GUI, right-click on it to view its log files.
After that, look in its log directory:
\begin{lstlisting}[language=bash]
$ cd $HOME/cylc-run/hello_world/log/job/1/hello/01/
$ cat job.out
...
Hello World!
...
\end{lstlisting}
or use the \lstinline{cylc cat-log} command:
\begin{lstlisting}[language=bash]
$ cylc cat-log --help # see "cylc --help" for top level command help
$ cylc cat-log --stdout hello_world hello.1
...
Hello World!
...
\end{lstlisting}
or use the \lstinline{rose suite-log} web-based suite log file viewer (also known as {\em Rose Bush}):
\begin{lstlisting}[language=bash]
$ cd ~/hello_world
$ rose suite-log
# (now view suite and job logs in your web browser)
\end{lstlisting}
\end{shaded*}
\subsection{Defining Tasks}
A task can run an external program or script, or shell scripting
inlined in the suite.rc file, or any combination of the two. A suite
\lstinline{bin} directory is automatically added to your shell
\lstinline{$PATH} so that scripts residing in it can be called by name,
% changed original script name 'get-host-details' - users may wonder if host
% details are somehow communicated back to the suite.
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = run-model # in <suite-dir>/bin/ (or elsewhere in $PATH)
\end{lstlisting}
or use a full file path,
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = /path/to/my-scripts/run-model
\end{lstlisting}
or set \lstinline{$PATH} in the task environment (this makes more sense
if the environment is inherited by multiple tasks - see
Section~\ref{Families}),
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = run-model
[[[environment]]]
PATH = /path/to/my-scripts:$PATH
\end{lstlisting}
You can pass information to a script via its command line or environment,
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = run-model --color=green
[[[environment]]]
START_TIME = $CYLC_TASK_CYCLE_POINT
\end{lstlisting}
or use custom multi-line scripting, in triple quotes, to do anything you like:
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = """
cat > model-input.txt <<__EOF__
COLOR=green
START_TIME=$CYLC_TASK_CYCLE_POINT
__EOF__
run-model model-input.txt"""
\end{lstlisting}
An optional \lstinline=[[[remote]]]= section determines {\em where} a task
will run (defaults to localhost),
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = run-model
[[[remote]]]
host = supercomputer
\end{lstlisting}
(In this case the \lstinline=run-model= script and any files that it needs -
such as the model program itself - must be installed on host ``supercomputer'').
An optional \lstinline=[[[job]]]= sub-section determines {\em how} the task
should be submitted to run (defaults to ``background'', an ordinary shell
subprocess).
\begin{lstlisting}[language=suiterc]
[runtime]
[[model]]
script = run-model
[[[job]]]
batch system = pbs
\end{lstlisting}
\note{So far we have only configured individual tasks. In fact the
\lstinline=[runtime]= section is a {\em multiple inheritance hierarchy} that
allows all shared configuration to be factored out into {\em families} that are
inherited by multiple tasks - see Section~\ref{Families}.}
\subsection{Dependency Graphs}
The hello\_world suite contained the {\em graph string}
\lstinline@graph = hello_world@. Graph strings specify the scheduling logic
that determines when tasks can run: which tasks depend on, or {\em trigger off},
which other tasks, if any. For instance if we have two tasks \lstinline{foo}
and \lstinline{bar}, and \lstinline{foo} depends on \lstinline{bar} succeeding,
we could write:
\begin{lstlisting}[language=suiterc]
graph = "foo:succeed => bar"
\end{lstlisting}
If \lstinline{foo} fails here, \lstinline{bar} will not run. Tasks can also
trigger off failure and other conditions, but success triggers are the default
so \lstinline=:succeed= can optionally be omitted:
\begin{lstlisting}[language=suiterc]
graph = "foo => bar"
\end{lstlisting}
A graph string can contain many triggers, and the default success triggers can
be chained together. This,
\begin{lstlisting}[language=suiterc]
graph = "foo => bar => baz"
\end{lstlisting}
is equivalent to this,
\begin{lstlisting}[language=suiterc]
graph = """foo => bar
bar => baz"""
\end{lstlisting}
You can also write conditional triggering logic with \lstinline=&= (AND) and
\lstinline=|= (OR) operators. For example,
\begin{lstlisting}[language=suiterc]
graph = "foo => bar & baz" # foo => bar, AND foo => baz
\end{lstlisting}
The \lstinline=cylc graph= command generates nice suite graph visualizations.
For this graph string,
\begin{lstlisting}[language=suiterc]
graph = """foo => bar => baz & qux => pin
wol"""
\end{lstlisting}
it produces,
\begin{center}
\includegraphics[width=0.2\columnwidth]{resources/intro-1.png} %tex/cylc-graph}
\end{center}
\subsection{Cycling Workflows}
The concept of a workflow of cycling tasks was introduced in Section~\ref{Cylc
Overview}. The following diagram shows the previous workflow (minus the lone
`wol' task) repeated for three integer cycle points 1, 2 and 3.
\begin{center}
\includegraphics[width=0.5\columnwidth]{resources/intro.png} %tex/cylc-cycle-graph}
\end{center}
In cylc these are three distinct workflows that can run concurrently (which is
how it should be if there is no dependence between them!). In fact many more
than three cycle points could run concurrently, but we deliberately limit the
amount of ``runahead''. The default is:
\begin{lstlisting}
[scheduling]
max active cycle points = 3
\end{lstlisting}
A cycling suite needs an {\em initial cycle point} and at least one cycling
sequence with associated dependencies defined in a graph string. An optional
{\em final cycle point} can also be given.
\begin{lstlisting}[language=suiterc]
[cylc]
cycle point format = %Y-%m
[scheduling]
initial cycle point = 2000-01
final cycle point = 2000-05
[[dependencies]]
[[[P1M]]]
graph = model
\end{lstlisting}
This suite definition says to run task \lstinline=model= for each cycle point
on a sequence of date-times with a 1-month interval between 2001-01 and 2001-05
(see~\ref{advanced-cycling} for other kinds of cycling sequence).
With no clock-triggers defined (see~\ref{Clock Triggered Tasks}) these
date-times have no connection to the real-time clock. Each instance of
\lstinline=model= is merely labelled with its cycle point value (which the task
job can use, e.g.\ as a model run start date). In this case
there is no dependence between successive \lstinline=model= instances, so they
can all run concurrently, out to \lstinline=max active cycle points=.
\subsection{Inter-Cycle Dependence}
If each instance of \lstinline=model= in the previous example really depends
on its own previous instance (for restart files, say), running multiple
models concurrently would result in failure. Here's how to express this
inter-cycle dependence correctly,
\begin{lstlisting}[language=suiterc]
[cylc]
cycle point format = %Y-%m
[scheduling]
initial cycle point = 2000-01
final cycle point = 2000-05
[[dependencies]]
[[[P1M]]]
graph = model[-P1M] => model
\end{lstlisting}
\lstinline=P1D= is an ISO8601 duration - see
\url{http://wikipedia.org/wiki/ISO_8601#Durations}. \lstinline=P= denotes a
duration and \lstinline=1M= means one month. Other examples of ISO8601
durations are:
\begin{itemize}
\item \lstinline{-PT12H} (12 hours ago).
\item \lstinline{-PT6H30M} (6 hours 30 minutes ago).
\item \lstinline{P1W} (1 week in the future).
\end{itemize}
\subsection{Cycling Suite Tutorial}
\begin{shaded*}
This demo is an example of a cycling workflow. On the cylc VM, the suite is
located at {\em \$HOME/tutorial/suites/rocket\_cycling/}. \lstinline{cd} to that
directory.
The directory structure should look like this:
\begin{lstlisting}[language=]
|-- bin/
| `-- count-down
|-- rose-suite.conf
`-- suite.rc
\end{lstlisting}
The {\em suite.rc} file should look like this:
\begin{lstlisting}[language=suiterc, title=suite.rc]
[scheduling]
initial cycle point = 2000-01-01T00
final cycle point = 2000-01-05T00
[[dependencies]]
[[[T00]]]
graph = """
blast_off[-P1D] => point_upwards
point_upwards => load_astronauts & fill_fuel_tank & set_coordinates
load_astronauts & set_coordinates => count_down
fill_fuel_tank => light_fuse => count_down
count_down => blast_off
"""
[runtime]
[[point_upwards]]
script = sleep 2; echo "spikey end pointing at sky, flamey end \
pointing at ground"
[[load_astronauts]]
script = sleep 1; echo "loaded astronauts"
[[fill_fuel_tank]]
script = sleep 5; echo "tank brimmed"
[[set_coordinates]]
script = echo "coordinates set for west wallaby st"
[[light_fuse]]
script = echo "stand well back"
[[count_down]]
script = count-down
[[blast_off]]
script = echo "blast off"
\end{lstlisting}
and the {\em bin/count-down} script should look like this:
\begin{lstlisting}[language=bash, title=bin/count-down]
#!/bin/bash
for count in {5..1}; do
echo $count
sleep 1
done
\end{lstlisting}
To run this demo suite, enter the following commands:
\begin{lstlisting}[language=bash]
$ rose suite-run
\end{lstlisting}
Again a window should open up showing you the progress of the suite.
Whilst the suite is running try entering graph mode by selecting
\lstinline[language=]{View > 1 - Graph View} from the menubar, this will show
the dependencies between the tasks as they run.
\end{shaded*}