Skip to content

Commit

Permalink
Add hybrid workflow documentation for Azure (#5361) [ci skip]
Browse files Browse the repository at this point in the history

Signed-off-by: adamrtalbot <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Christopher Hakkaart <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
  • Loading branch information
3 people authored Oct 14, 2024
1 parent 0ee29a8 commit f81f27d
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions docs/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,50 @@ The value of the setting must be the identifier of a subnet available in the vir
Batch Authentication with Shared Keys does not allow to link external resources (like Virtual Networks) to the pool. Therefore, Active Directory Authentication must be used in conjunction with the `virtualNetwork` setting.
:::

### Hybrid workloads

Nextflow allows the use of multiple executors in the same workflow application. This feature lets you deploy hybrid workloads, where some jobs run on the local computer or local computing cluster, while others are offloaded to Azure Batch.

To enable this feature, configure one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the Azure Batch settings to the processes you want to offload. For example:

```groovy
process {
withLabel: bigTask {
executor = 'azurebatch'
queue = 'my-batch-pool'
container = 'my/image:tag'
}
}
azure {
storage {
accountName = '<YOUR STORAGE ACCOUNT NAME>'
accountKey = '<YOUR STORAGE ACCOUNT KEY>'
}
batch {
location = '<YOUR LOCATION>'
accountName = '<YOUR BATCH ACCOUNT NAME>'
accountKey = '<YOUR BATCH ACCOUNT KEY>'
}
}
```

With the above configuration, processes with the bigTask {ref}`process-label` run on Azure Batch, while the remaining processes run on the local computer.

Next, launch the pipeline with the `-bucket-dir` option to specify an Azure Blob Storage path for the jobs running on Azure Batch, and optionally, use the `-work-dir` option to specify local storage for the jobs running locally:

```bash
nextflow run <script or project name> -bucket-dir az://my-container/some/path
```

:::{warning}
The Azure Blob Storage path needs to contain at least one sub-directory (e.g. `az://my-container/work` rather than `az://my-container`).
:::

:::{tip}
When using [Fusion](./fusion.md), the `-bucket-dir` option is not required. Fusion implements a distributed virtual file system that allows seamless access to Azure Blob Storage using a standard POSIX interface, enabling direct mounting of remote blob storage as if it were a local file system. This simplifies and speeds up most operations, bridging the gap between cloud-native storage and data analysis workflows.
:::

## Microsoft Entra

Using Microsoft Entra for role-based access control is more secure than using access keys and should be used wherever possible. You can authenticate to Azure Entra using a Managed Identity when running on resources within the Azure environment, or by authenticating as an Azure Service Principal when running on external resources.
Expand Down

0 comments on commit f81f27d

Please sign in to comment.