You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running nextflow with AWS Batch, nxf_s3_download() fails to identify an S3 directory as a directory and attempts to download it as a file (i.e. cp dir instead of cp dir/*), causing an 404 error.
Expected behavior and actual behavior
Expected behavior: directory is staged properly to the ecs container running the process.
Actual behavior: The process is killed due to a 404 error.
Steps to reproduce the problem
main.nf:
process ls {
input:
path dir
output:
stdout
script:
"""
ls
"""
}
workflow {
// S3 directory
dir = channel.fromPath('s3://ryft-public-sample-data/esRedditJson')
ls(dir) | view { it }
}
Nextflow version: 24.01.0-edge.5903 (I need edge to run batch with fargate)
Java version: java-17-amazon-corretto
Operating system: Linux (Amazon Linux 2023)
Bash version: 5.2.15
S5cmd version: 2.2.2
Additional context
Following through .command.run, I was able to trace the relevant error to:
nxf_s3_download() {
local source=$1
local target=$2
local file_name=$(basename $1)
local is_dir=$(s5cmd ls $source | grep -F "DIR ${file_name}/" -c)
if [[ $is_dir == 1 ]]; then
s5cmd cp "$source/*" "$target"
else
s5cmd cp "$source" "$target"
fi
}
In particular, s5cmd ls ${source} is returning DIR esRedditJson/ rather than what seems to be the expected DIR esRedditJson/ (double space instead of the single space that seems to be anticipated).
Simple code bit to reproduce error. Changing to double spaces appears to fix the issue.
source='s3://ryft-public-sample-data/esRedditJson'
target=localdir
file_name=$(basename $source)
is_dir=$(s5cmd ls $source | grep -F "DIR ${file_name}/" -c)
if [[ $is_dir == 1 ]]; then
s5cmd cp "$source/*" "$target"
else
s5cmd cp "$source" "$target"
fi
is_dir=$(s5cmd ls $source | grep -F "DIR ${file_name}/" -c)
if [[ $is_dir == 1 ]]; then
s5cmd cp "$source/*" "$target"
else
s5cmd cp "$source" "$target"
fi
The text was updated successfully, but these errors were encountered:
Bug report
When running nextflow with AWS Batch, nxf_s3_download() fails to identify an S3 directory as a directory and attempts to download it as a file (i.e. cp dir instead of cp dir/*), causing an 404 error.
Expected behavior and actual behavior
Expected behavior: directory is staged properly to the ecs container running the process.
Actual behavior: The process is killed due to a 404 error.
Steps to reproduce the problem
main.nf:
nextflow.config:
Program output
nextflow.log
Environment
Additional context
Following through .command.run, I was able to trace the relevant error to:
In particular,
s5cmd ls ${source}
is returningDIR esRedditJson/
rather than what seems to be the expectedDIR esRedditJson/
(double space instead of the single space that seems to be anticipated).Simple code bit to reproduce error. Changing to double spaces appears to fix the issue.
The text was updated successfully, but these errors were encountered: