-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--scratch_dir not working in v 1.4.1 #311
Comments
"Dirty" workaround: in the gtdbtk/external sub-directory in file pplacer.py change lines 138 to 140 from worked for me (HP Z400, 48Gb memory, single Xeon processor, 4 core, 8 hyperthreads). 100 Gb of free space (internal SSD disc) was enough to classify 200-300 assemblies in about 3 hours (using the --cpus 8 flag). |
Lovely hack! Thanks :) |
Hello, Regards, |
Hi, did you updated the new version on conda ? |
Yes, I did. I re-installed everything (version 1.4.1) using conda (I kept the previously downloaded database through a symbolic link since that apparently has not yet changed).
Then I re-run the program using the script below:
#!/bin/bash
source ~/anaconda3/etc/profile.d/conda.sh
export GTDBTK_DATA_PATH=/bigdisk2/progs/gtdbtk/GTDBTk-stable/db_r95
conda activate gtdbtk-1.4.1
gtdbtk classify_wf --extension .fa --cpus 8 --genome_dir ./genomes_2016 --out_dir ./gtdbtk_2016_conda --scratch_dir /disk1/data/projects/pplacer --pplacer_cpus 8
conda deactivate
However the problem remained. The program proceeds until step
==> Step 5 of 9: Caching likelihood information on reference tree.
when the actual system memory used starts to grow and I have to stop the script.
I edited the pplacer.py script from the conda-update version 1.4.1 (lines 72-75):
#CAL if mmap_file:
#CAL args.append('--mmap-file')
#CAL args.append(mmap_file)
#CAL if mmap_file:
args.append('--mmap-file')
args.append('/disk1/data/projects/pplacer/')
and now it works again using the scratch directory (and the virtual shared memory).
I am not very familiar with python but I have the feeling that somehow the scratch_dir variable from the main script is not passed down to pplacer.py (or classify.py) for some reason.
Best regards,
Armin
From: GOUNOT Jean-Sebastien
Sent: Monday, April 5, 2021 12:34
To: Ecogenomics/GTDBTk ***@***.***>
Cc: arminlahm ***@***.***>; Comment ***@***.***>
Subject: Re: [Ecogenomics/GTDBTk] --scratch_dir not working in v 1.4.1 (#311)
Hi, did you updated the new version on conda ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#311 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOYCVGBX6FU5FD6A6H4ST3TTHGGZ5ANCNFSM4ZB7IDZA> .
|
I found the reason:
In main.py, on line 715 (in gtdbtk version 1.4.1) scratch_dir is explicitly set to None, so whatever is passed from the command line will be ignored. Commenting out this line resolves the issue
options.rnd_seed = None
options.skip_trimming = False
# options.scratch_dir = None
options.recalculate_red = False
From: GOUNOT Jean-Sebastien
Sent: Monday, April 5, 2021 12:34
To: Ecogenomics/GTDBTk ***@***.***>
Cc: arminlahm ***@***.***>; Comment ***@***.***>
Subject: Re: [Ecogenomics/GTDBTk] --scratch_dir not working in v 1.4.1 (#311)
Hi, did you updated the new version on conda ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#311 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOYCVGBX6FU5FD6A6H4ST3TTHGGZ5ANCNFSM4ZB7IDZA> . <https://github.com/notifications/beacon/AOYCVGEOLG4CNL3QBUKXVA3THGGZ5A5CNFSM4ZB7IDZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGB5GUJY.gif>
|
Thanks fo the fix by the way! And sorry for the late reply! |
Hej, thanks once again for GTDB and GTDBtk, big fan.
I updated to version 1.4.1 and it seems like the
--scratch_dir
option does not work anymore, it just loads it all to the RAM (and obviously then crashes my machine as I only have 128Gb of RAM).I downgraded to version 1.4.0 in the same environment and it works fine, so I don't think it's my env or machine, but I attached the settings anyhow!
regards and greetings from Sweden,
Moritz
Environment
pip list
)Server information
model name : AMD Ryzen 9 3900X 12-Core Processor
MemTotal: 131896176 kB
NAME="Ubuntu"
VERSION="20.10 (Groovy Gorilla)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.10"
VERSION_ID="20.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy
The text was updated successfully, but these errors were encountered: