Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toil ignores LoadListingRequirement #5099

Open
adrabent opened this issue Sep 19, 2024 · 8 comments
Open

toil ignores LoadListingRequirement #5099

adrabent opened this issue Sep 19, 2024 · 8 comments

Comments

@adrabent
Copy link

adrabent commented Sep 19, 2024

When using toil-cwl-runner it seems like it always uses deep_listing when files or directories are mounted for running within Docker or Singularity.
If I try to add LoadListingRequirement and set it to shallow_listing this simply gets ignored. If I use cwltool it works as expected.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1649

@stxue1
Copy link
Contributor

stxue1 commented Sep 20, 2024

toil-cwl-runner should be differentiating the listing values internally:

toil/src/toil/cwl/cwltoil.py

Lines 3456 to 3512 in 8faca0f

def determine_load_listing(
tool: Process,
) -> Literal["no_listing", "shallow_listing", "deep_listing"]:
"""
Determine the directory.listing feature in CWL.
In CWL, any input directory can have a DIRECTORY_NAME.listing (where
DIRECTORY_NAME is any variable name) set to one of the following three
options:
1. no_listing: DIRECTORY_NAME.listing will be undefined.
e.g.
inputs.DIRECTORY_NAME.listing == unspecified
2. shallow_listing: DIRECTORY_NAME.listing will return a list one level
deep of DIRECTORY_NAME's contents.
e.g.
inputs.DIRECTORY_NAME.listing == [items in directory]
inputs.DIRECTORY_NAME.listing[0].listing == undefined
inputs.DIRECTORY_NAME.listing.length == # of items in directory
3. deep_listing: DIRECTORY_NAME.listing will return a list of the entire
contents of DIRECTORY_NAME.
e.g.
inputs.DIRECTORY_NAME.listing == [items in directory]
inputs.DIRECTORY_NAME.listing[0].listing == [items in subdirectory
if it exists and is the first item listed]
inputs.DIRECTORY_NAME.listing.length == # of items in directory
See
https://www.commonwl.org/v1.1/CommandLineTool.html#LoadListingRequirement
and https://www.commonwl.org/v1.1/CommandLineTool.html#LoadListingEnum
DIRECTORY_NAME.listing should be determined first from loadListing.
If that's not specified, from LoadListingRequirement.
Else, default to "no_listing" if unspecified.
:param tool: ToilCommandLineTool
:return str: One of 'no_listing', 'shallow_listing', or 'deep_listing'.
"""
load_listing_req, _ = tool.get_requirement("LoadListingRequirement")
load_listing_tool_req = (
load_listing_req.get("loadListing", "no_listing")
if load_listing_req
else "no_listing"
)
load_listing = cast(str, tool.tool.get("loadListing", load_listing_tool_req))
listing_choices = ("no_listing", "shallow_listing", "deep_listing")
if load_listing not in listing_choices:
raise ValueError(
f"Unknown loadListing specified: {load_listing!r}. Valid choices: {listing_choices}"
)
return cast(Literal["no_listing", "shallow_listing", "deep_listing"], load_listing)

Though it seems like it's not working.

@mr-c Is there a test in the CWL conformance tests that tests this requirement?

@mr-c
Copy link
Contributor

mr-c commented Sep 21, 2024

@stxue1

listing_requirement_none and listing_requirement_shallow and listing_requirement_deep

However, they are all single CommandLineTools and not part of workflows. Nor do they use DockerRequirements.

Looks like we should create additional conformance tests once this bug is figured out.

@adrabent , thank you for reporting this!

@stxue1
Copy link
Contributor

stxue1 commented Sep 24, 2024

Seems like there are conformance tests for LoadListingRequirement: https://github.com/common-workflow-language/cwl-v1.2/blob/15d152dbf04f149845d9348c80694a377c558346/conformance_tests.yaml#L2987-L3069

toil-cwl-runner seems to pass this. @adrabent Could you provide an example of where LoadListingRequirement is not working?

@stxue1
Copy link
Contributor

stxue1 commented Sep 26, 2024

I've tried testing LoadListingRequirement on a cwl expression:

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
  InlineJavascriptRequirement: {}
  LoadListingRequirement:
    loadListing: shallow_listing
inputs:
  input_directory:
    type: Directory
outputs:
  output_file:
    type: string
    outputBinding:
      outputEval: $(JSON.stringify(inputs.input_directory))
  stdout_file:
    type: stdout
stdout: output.txt
baseCommand: tree
arguments:
  - $(inputs.input_directory)

With a JSON input of

{
    "input_directory": {"class": "Directory", "location": "directory"}
}

And a directory in the current working directory of:

heaucques@pop-os:~/Documents/toil$ tree directory
directory
├── directory
│   └── file2.txt
└── file.txt

1 directory, 2 files

After running toil-cwl-runner shallow_listing.cwl shallow_listing.json > json.txt && jq -r .output_file json.txt | jq ., the expression's view of the directory seems to have the shallow_listing as specified in the LoadListingRequirement:

{
  "class": "Directory",
  "location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==",
  "basename": "directory",
  "listing": [
    {
      "class": "Directory",
      "location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==/directory",
      "basename": "directory",
      "path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory/directory",
      "dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory"
    },
    {
      "class": "File",
      "location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==/file.txt",
      "basename": "file.txt",
      "size": 0,
      "path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory/file.txt",
      "dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory",
      "nameroot": "file",
      "nameext": ".txt"
    }
  ],
  "path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory",
  "dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a"
}
(venv3.12) heaucques@pop-os:~/Documents/toil$ cat output.txt
/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory
|-- directory
|   `-- file2.txt
`-- file.txt

1 directory, 2 files

The tree command should be fully recursive as the binding of the directory into the container is not controlled by LoadListingRequirement.

So I'm unsure how to replicate this for now.

@adrabent
Copy link
Author

adrabent commented Oct 9, 2024

Dear @stxue1 and @mr-c,

I tried to reproduce the listing behaviour with a minimal example workflow as well.
But interestingly I need to call the step two times to make the differences visible.

minimal_example.cwl

class: Workflow
cwlVersion: v1.2
id: minimal_example
label: minimal_example
inputs:
  - id: msin
    type: Directory[]
outputs:
  - id: msout
    outputSource:
      - second_pass/msout
    type: Directory[]
steps:
  - id: pass
    in:
      - id: msin
        source: msin
    out:
      - id: msout
    run: pass.cwl
  - id: second_pass
    in:
      - id: msin
        source: pass/msout
    out:
      - id: msout
    run: pass.cwl

pass.cwl

class: CommandLineTool
cwlVersion: v1.2
id: pass
baseCommand: echo
inputs:
  - id: msin
    type: 
      - Directory[]
outputs:
  - id: msout
    type:
      - Directory[]
    outputBinding:
      outputEval: $(inputs.msin)
requirements:
  - class: LoadListingRequirement
    loadListing: no_listing
  - class: InplaceUpdateRequirement
    inplaceUpdate: true
  - class: DockerRequirement
    dockerPull: ubuntu:22.04

Here, I need to make use of a docker container as well as InplaceUpdateRequirement to make it visible.
I have created a directory with some subdirectories and use this as an input:
debug.json

{
         "msin":    [  {"class": "Directory", "location": "/home/alex/debug/builds"}]
}

Now I can call cwltool minimal_workflow.cwl debug.json and then I get:

[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/i6rom9sl$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/i6rom9sl,target=/UXLgnu \
    --mount=type=bind,source=/tmp/i7ye065p,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgbe41ee57-608b-47a8-86b5-e048c0d62367/builds,readonly \
    --workdir=/UXLgnu \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/_y0zkqv1/20241009110104-450740.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/UXLgnu \
    ubuntu:22.04 \
    echo

[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/tnws_zxh$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/tnws_zxh,target=/UXLgnu \
    --mount=type=bind,source=/tmp/1o8o8v3r,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg239cbc5c-6ff3-4209-b1cb-a8c20ced43f6/builds,readonly \
    --workdir=/UXLgnu \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/h6if9iyv/20241009110105-477120.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/UXLgnu \
    ubuntu:22.04 \
    echo

[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success

Now if I change from no_listing to shallow_listing in pass.cwl it seems to ignore it in the first call, but not in the second. You see it starts mounting also subdirectories (which is not intended to happen):

[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/o0lujins$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/o0lujins,target=/VfEAOV \
    --mount=type=bind,source=/tmp/07rtl11n,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg92bbd5cd-dfd7-46d6-a9e2-44ecdff76e0f/builds,readonly \
    --workdir=/VfEAOV \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/6mhf0toc/20241009110459-557386.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/VfEAOV \
    ubuntu:22.04 \
    echo

[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/437vdd5t$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/437vdd5t,target=/VfEAOV \
    --mount=type=bind,source=/tmp/81hlfjpv,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg205417f7-4be6-465f-9a99-bdc3ada36b12/builds,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stg205417f7-4be6-465f-9a99-bdc3ada36b12/builds/RD,readonly \
    --workdir=/VfEAOV \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/vfobvl3n/20241009110500-586300.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/VfEAOV \
    ubuntu:22.04 \
    echo

[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success

For deep_listing in pass.cwl it goes really to all subdirectories (but only in the second call of the same step) and also mounts each and every subdirectory separately:

[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/125_lxcg$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/125_lxcg,target=/AfxLQQ \
    --mount=type=bind,source=/tmp/9kxi9_pe,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgb5eacd69-fbce-48fd-93b9-cbc8fcb59312/builds,readonly \
    --workdir=/AfxLQQ \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/zbdj5iz7/20241009110630-262057.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/AfxLQQ \
    ubuntu:22.04 \
    echo

[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/aspholzl$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/aspholzl,target=/AfxLQQ \
    --mount=type=bind,source=/tmp/qqsz9v_s,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD/LINC,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD/LINC,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD/LINC/results,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD/LINC/results,readonly \
    --workdir=/AfxLQQ \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/beno5qnm/20241009110631-291241.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/AfxLQQ \
    ubuntu:22.04 \
    echo

[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success

If I repeat this exercise with toil-cwl-runner it absolutey does not matter whether I use no_listing, shallow_listing or deep_listing; it behaves as if I would have used deep_listing when running it with cwltool:

[2024-10-09T11:10:12+0200] [MainThread] [I] [cwltool] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Importing input files...
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Importing tool-associated files...
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Creating root job
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Starting workflow
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Working on job 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1) from description 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v2
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Not chaining from job 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v2
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 0.035325 seconds
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Working on job 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1) from description 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1
[2024-10-09T11:10:13+0200] [MainThread] [W] [cwltool] [job minimal_example.pass.pass] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[2024-10-09T11:10:13+0200] [MainThread] [W] [cwltool] [job minimal_example.pass.pass] Skipping Docker software container '--cpus' limit despite presence of ResourceRequirement with coresMin and/or coresMax setting. Consider running with --strict-cpu-limit for increased portability assurance.
[2024-10-09T11:10:13+0200] [MainThread] [I] [cwltool] [job minimal_example.pass.pass] /tmp/tmprlpoc39q$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/tmprlpoc39q,target=/vDiunC \
    --mount=type=bind,source=/tmp/tmp83g29jku,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg975bfb8f-33b6-46c1-bc8a-e94af3a2d8d5/builds,readonly \
    --workdir=/vDiunC \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/tmpdgot8yyw/20241009111013-856420.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/vDiunC \
    ubuntu:22.04 \
    echo
[2024-10-09T11:10:14+0200] [MainThread] [I] [cwltool] [job minimal_example.pass.pass] completed success
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: CWL step complete: minimal_example.pass.pass
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v2
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Chaining from 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v2 to 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-15i34646 v1
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Working on job 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3) from description 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3
[2024-10-09T11:10:14+0200] [MainThread] [W] [cwltool] [job minimal_example.second_pass.pass] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[2024-10-09T11:10:14+0200] [MainThread] [W] [cwltool] [job minimal_example.second_pass.pass] Skipping Docker software container '--cpus' limit despite presence of ResourceRequirement with coresMin and/or coresMax setting. Consider running with --strict-cpu-limit for increased portability assurance.
[2024-10-09T11:10:14+0200] [MainThread] [I] [cwltool] [job minimal_example.second_pass.pass] /tmp/tmpjhmb_xso$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/tmpjhmb_xso,target=/vDiunC \
    --mount=type=bind,source=/tmp/tmp6a70ca7k,target=/tmp \
    --mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD/LINC,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD/LINC,readonly \
    --mount=type=bind,source=/home/alex/debug/builds/RD/LINC/results,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD/LINC/results,readonly \
    --workdir=/vDiunC \
    --read-only=true \
    --net=none \
    --user=1067:200 \
    --rm \
    --cidfile=/tmp/tmpxnebrbyw/20241009111014-914477.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/vDiunC \
    ubuntu:22.04 \
    echo
[2024-10-09T11:10:15+0200] [MainThread] [I] [cwltool] [job minimal_example.second_pass.pass] completed success
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: CWL step complete: minimal_example.second_pass.pass
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v5
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Not chaining from job 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v5
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 2.167389 seconds
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.leader] Issued job 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1 with job batch system ID: 2 and disk: 1.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Working on job 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Loaded body Job('ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1) from description 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Completed body for 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v3
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Not chaining from job 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v3
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 0.009180 seconds
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] Got message from job at time 10-09-2024 11:10:16: CWL step complete: minimal_example.pass.pass
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] Got message from job at time 10-09-2024 11:10:16: CWL step complete: minimal_example.second_pass.pass
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] minimal_example.pass.pass.stdout follows:
=========>
	
<=========
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] minimal_example.second_pass.pass.stdout follows:
=========>
	
<=========
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.leader] Finished toil run successfully.
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Collecting workflow outputs...
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Stored workflow outputs
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Computing output file checksums...
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] CWL run complete!
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/tmp/tmpbr31xux0)

I would expect two things:

  • cwltool should not mount according to the selected listing (like it does in the first call, but not in the second)
  • toil-cwl-runner should react as cwltool in that respect, i.e. only mount the parent directory

@stxue1
Copy link
Contributor

stxue1 commented Oct 24, 2024

It does look like cwltool is doing the wrong thing.

Though no matter what the LoadListingRequirement is, the docker mount should just be the top level directory. For this workflow at least, there is no reason to have any more than one mount of the TLD.

@mr-c
Is there a reason why cwltool has this behavior? This issue only occurs on the Toil side when --bypass-file-store is passed, and Toil calls into cwltool code. We're not sure what it is about the cwltool/Toil pathmapper setup that results in this behavior.

@unito-bot
Copy link

➤ Adam Novak commented:

Our options for moving forward with this might be writing a conformance test for the CWL test suite to make sure the right stuff is exposed to expressions, or digging into PathMapper to see why the cwltool one we use when bypassing the file store is making all these mounts. (I’m not sure if a bunch of superfluous mounts is actually non-conformant though.)

@adamnovak
Copy link
Member

I think if we want to fix this we need to turn it into a PR to add a failing conformance test that should pass to the CWL conformance tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants