Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with S3 Download Skipping in Docker Example on SLURM - HPC #479

Open
nukeyyou2 opened this issue Jan 6, 2025 · 2 comments
Open

Issue with S3 Download Skipping in Docker Example on SLURM - HPC #479

nukeyyou2 opened this issue Jan 6, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@nukeyyou2
Copy link

nukeyyou2 commented Jan 6, 2025

Summary

I'm encountering an issue when using the Docker example provided in the documentation to submit tasks to an HPC cluster managed by SLURM. During the execution, I noticed that the S3 download process on the HPC skipped very quickly. As a result, the code reports an error of res_tmpl failed and the file is incomplete.

Additional details

  • ASLPrep version: latest
  • Docker version: Enroot 3.4.1

What were you trying to do?

I tried different settings of enroot on slurm but nothing works. I run the docker sample on local environment and compared the logs. And I found the time diff among the s3 downloading operation.

Reproducing the bug

log_exp.txt

#!/usr/bin/env bash

#SBATCH -J enroot
#SBATCH -o ./log/aslprep.log
#SBATCH -e ./log/aslprep_err.log
#SBATCH -N 1
#SBATCH -t 8:00:00
#SBATCH --gres=gpu:tesla_v100-pcie-32gb:1
#SBATCH -c 12
#SBATCH --mem=120GB
#SBATCH --container-writable
#SBATCH --container-mount-home
#SBATCH --container-image /home/yyou/aslprep/pennlinc+aslprep+latest.sqsh 

aslprep \
-v \
--n_cpus=12 \
--skip_bids_validation \
/home/yyou/asldata/240922healthy_aging \
/home/yyou/derivatives/aslprep \
participant \
--participant-label 01 \
--fs-license-file /home/yyou/license/license.txt \
-w /home/yyou/test/work
@nukeyyou2 nukeyyou2 added the bug Something isn't working label Jan 6, 2025
@tsalo
Copy link
Member

tsalo commented Jan 8, 2025

Do compute nodes on your HPC have internet access?

@nukeyyou2
Copy link
Author

Do compute nodes on your HPC have internet access?

Thank you for your reply.

The HPC administrator told me that the HPC has a network connection, but it may be unstable. Currently, it can be observed that the time interval between two lines of logs during normal local operation is inconsistent with that of the logs on the HPC. However, I'm not sure if there is an error prompt when the download fails. I've noticed that the subsequent graph_flow is still established normally. If it is confirmed that it is a network problem, I will discuss it with the administrator again.
image

1736362772401

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants