Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a guide to writing a platform configuration. #289

Merged
merged 21 commits into from
Oct 13, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
69aaaba
added a guide to writing a platform configuration.
wxtim Aug 26, 2021
c9c9e6f
Apply suggestions from code review
wxtim Sep 3, 2021
e121f80
Update src/admin-guide/writing-global-configurations/platforms.rst
wxtim Sep 6, 2021
a7575a9
Update src/admin-guide/writing-global-configurations/platforms.rst
wxtim Sep 6, 2021
f4ff67c
reconfigured to be in the reference section
wxtim Sep 7, 2021
fd4cc17
Merge branch 'document.platform.setup' of github.com:wxtim/cylc-doc i…
wxtim Sep 7, 2021
83c9e75
Apply suggestions from code review
wxtim Sep 9, 2021
bede7eb
separted localhost and submit from localhost
wxtim Sep 9, 2021
c6fa66e
refactor platforms how to write admin guide
wxtim Sep 9, 2021
e5d7881
Update src/reference/config/writing-platform-configs.rst
wxtim Sep 10, 2021
442b59e
added a reference to the new writing platforms
wxtim Sep 10, 2021
4e0ca1b
Merge branch 'document.platform.setup' of github.com:wxtim/cylc-doc i…
wxtim Sep 10, 2021
dc8605e
document graph line continuation on boolean
wxtim Sep 10, 2021
e29dbd9
work in progress
wxtim Sep 22, 2021
a7a01f9
added note about not using platforms and groups with the same names; …
wxtim Oct 1, 2021
de2a11d
fix error in glossary
wxtim Oct 5, 2021
94474a5
Merge branch 'document.platform.setup' into selection-of-hosts-and-pl…
wxtim Oct 5, 2021
808ad40
Merge pull request #2 from wxtim/selection-of-hosts-and-platforms
wxtim Oct 5, 2021
2d30c25
Apply suggestions from code review
wxtim Oct 6, 2021
24c8574
Added really simple platform
wxtim Oct 6, 2021
8474281
Added simple server with install target
wxtim Oct 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/7-to-8/summary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,9 @@ Platform Awareness

.. seealso::

:ref:`Platforms at Cylc 8.<majorchangesplatforms>`
:ref:`Platforms at Cylc 8. <majorchangesplatforms>`

:ref:`System admin's guide to writing platforms. <AdminGuide.PlatformConfigs>`

Cylc 7 was aware of individual job hosts.

Expand Down
1 change: 1 addition & 0 deletions src/reference/config/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Configuration
workflow
global
types
writing-platform-configs
207 changes: 207 additions & 0 deletions src/reference/config/writing-platform-configs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@

.. _AdminGuide.PlatformConfigs:

Writing Platform configurations
wxtim marked this conversation as resolved.
Show resolved Hide resolved
===============================

.. versionadded:: 8.0.0

.. seealso::

- `The original platforms proposal on Cylc Admin <https://github.com/cylc/cylc-admin/blob/master/docs/proposal-platforms.md>`_.
wxtim marked this conversation as resolved.
Show resolved Hide resolved
- :ref:`Platforms Cylc 7 to 8 user upgrade guide <MajorChangesPlatforms>`.
- :cylc:conf:`flow.cylc[runtime][<namespace>]platform`
- :cylc:conf:`global.cylc[platforms]`

What are platforms?
wxtim marked this conversation as resolved.
Show resolved Hide resolved
-------------------

Platforms define settings, most importantly a set of hosts and a
``job runner`` (formerly a ``batch system``) where Cylc can submit a
task job.

Why were platforms introduced?
wxtim marked this conversation as resolved.
Show resolved Hide resolved
------------------------------

- Allow a compute cluster with multiple login nodes to be treated as a single
unit.
- Allow Cylc to elegantly handle failure of to communicate with login nodes.
- Configure multiple platforms with the same hosts; for example you can use
separate platforms to submit jobs to a batch system and to background on
``localhost``.


Example platforms
-----------------

Lots of desktop computers
^^^^^^^^^^^^^^^^^^^^^^^^^

- **Platform names are regular expressions.**
- **you can specify where to install job files.**

.. admonition:: Scenario

Everyone in your organization has a computer called ``desktopNNN``,
all with a file system shared with the scheduler host. Many users
will want a platform to run small jobs on their computer:

Cylc treats platform names as regular expressions, so in this case:

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[desktop\d\d\d]]
install target = localhost
wxtim marked this conversation as resolved.
Show resolved Hide resolved

will set up 1000 platforms, all with the same specification and one host per
platform. Job files can be installed on the workflow host.

.. note::

wxtim marked this conversation as resolved.
Show resolved Hide resolved
Cylc carries out a "fullmatch" regular expression comparison with the
the platform name so ``desktop\d\d\d`` is effectively the same as
``^desktop\d\d\d$``.

Cluster with multiple login nodes
wxtim marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Platforms can group multiple hosts together.**

.. admonition:: Scenario

You have a cluster where users submit to a single Slurm job queue from
either of a pair of identical login nodes:
wxtim marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[spice_cluster]]
hosts = login_node_1, login_node_2
job runner = Slurm
install target = login_node_1
retrieve job logs = True

If either host is unavailable Cylc will attempt to start and communicate with
jobs via the other login node.

Since the platform hosts do not share a file system with the scheduler
host we need to ask Cylc to retrieve job logs.

Submit background and PBS jobs from localhost
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Platforms can share hosts and not share batch systems.**

**There is a built in localhost platform**
wxtim marked this conversation as resolved.
Show resolved Hide resolved

.. admonition:: Scenario

You have a cluster where you can submit jobs from the Cylc scheduler host
using PBS, but also want to allow users to submit small jobs to the
scheduler host:

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[pbs_cluster]]
host = localhost
job runner = pbs
install target = localhost
[[scheduler_host\d\d]]
host = localhost
job runner = background

But ``host`` defaults to ``localhost`` so you can simplify
the ``[[pbs_cluster]]`` definition.

If a job doesn't set a platform it will run on the Cylc scheduler host
using a default ``localhost`` platform.

As a result the above configuration can be simplified to:

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[pbs_cluster]]
job runner = pbs


.. TODO unindent this after you've got platforms from platform groups in
Two similar clusters
^^^^^^^^^^^^^^^^^^^^

**Platform groups allow users to ask for jobs to be run on any
suitable computer.**

.. admonition:: Scenario

Your site has two mirrored clusters with seperate PBS queues and
file systems. Users don't mind which cluster is used and just
want to set ``flow.cylc[runtime][mytask]platform = supercomputer``:

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[clusterA]]
hosts = login_node_A1, login_node_A2
batch system = pbs
[[clusterB]]
hosts = login_node_B1, login_node_B2
batch system = pbs
[platform groups]
[[supercomputer]]
platforms = clusterA, clusterB

.. note::

Why not just have one platform with all 4 login nodes?

Having hosts in a platform means that Cylc can communicate with
jobs via any host at any time. Platform groups allow Cylc to
pick a platform when the job is started, but Cylc will not then
be able to communicate with that job via hosts on another
platform in the group.

Preferred and backup hosts and platforms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**You can set how hosts are selected from platforms.**
**You can set how platforms are selected from groups.**

.. admonition:: Scenario

You have operational cluster and a research cluster.
You want your operational workflow to run on one of the operational
platforms. If it becomes unavailable you want Cylc to start running
jobs on the research cluster.

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

[platforms]
[[operational]]
hosts = login_node_A1, login_node_A2
batch system = pbs
[[selection]]
method = random # the default anyway
[[research]]
hosts = primary, seconday, emergency
batch system = pbs
[[selection]]
method = definition order
[platform groups]
[[operational_work]]
platforms = operational, research
[[[selection]]]
method = definition order

.. note::

Random is the default selection method.