-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing with TIR-Learner #4
Comments
Hi Nick, We identified this bug in TIR-Learner as you described in detail. A testing version has been pushed in the EDTA branch named "TIR-Learner1.13". Please try that out under the same active EDTA environment (no need to reinstall). In particular, if you want to just test out TIR-Learner, you can:
For your other question, yes. To do so, you can run these initial TE finders separately, then feed them to the Please let me know if you still encounter the same issue. Sorry for the inconvenience. Best, |
I pulled from the TIR-Learner 1.13 branch and just ran TIR-Learner as you suggested. Looks like it's still crashing.
|
Hi Nick, Thanks for testing, @weijiaweijia is working on this. I will update you once we have a new version. Best, |
Hi Nick, If you are under a pressing need, you may run the updated TIR-Learner1.13 branch for your genome. I just temporarily removed the TIR-Learner module in EDTA, thus you should be able to run the rest of the pipeline. Note that due to the missing of TIR-Learner, large TIR elements and autonomous TIR elements will likely be dampened in the final library. However, the MITE-Hunter should be able to pick up most of short TIR elements and MITEs. Best, |
Hi Shujun, I have an crash that seems similar to Nick's above. Here is the log file:
The issue again seems to be in TIR-learner. Perhaps it is still a TIR learner bug, but one thing that might be worth noting is that I had an issue during installation where scikit-learn=0.19.0 would not install because of some conflict with multiprocesses. I got around this problem by installing them in a different order but I later realised that by default on my cluster, python 3.7 gets installed in the environment and then a version of multiprocess that is only compatible with python 3.7 is installed. I think then the issue with scikit-learn=0.19.0 was because it only works with python 3.6. So do you think my issue above could be an installation issue, or a bug in TIR-learner? One more thing, I am dealing with a large genome of about 5 Gb. The LTR programs completed fine, but took about 3 days. So I was wondering if it is possible to re-use these outputs, rather than waiting another 3 days to see if the pipeline will pass the next step? Thanks a lot in advance for your help Best Dan |
Hi Dan, Thanks for testing. Yes, this is the issue of TIR-Learner. We are working to make a better version so please wait a week or two. For the conflicts between python, scikit-learn, and multiprocess, you may try different versions of python and multiprocess, but the trained models do require scikit-learn=0.19.0 to work properly. The last suggestion is actually on my to-do list. Good idea! Again, I am sorry for the bugs keeping you from getting meaningful results. We hope to resolve this issue in the near future. Best, |
In my case with TIR-Learner, my installed CentOS did not have a installed I fixed it like this: #genomeFile=`realpath $rawFile` #the genome file with real path genomeFile=`readlink -e $rawFile` |
Thanks @philippbayer! I did some research and found this multi-platform solution: resolve_link() {
if type -p realpath >/dev/null; then
realpath "$1"
elif type -p greadlink >/dev/null; then
greadlink -f "$1"
else
readlink -f "$1"
fi
} Ref: basherpm/basher#49 (comment) Changes will be reflected in the next version. Best, |
@oushujun would it be possible to push these changes to a development branch ahead of the release for your next version? Or do you think you are only 1-2 weeks away from your next release? If my impatience is overwhelming, it might just be easier for me to fix as you and @philippbayer have suggested. @philippbayer after you made those changes to TIR-Learner, did EDTA run properly? |
@Neato-Nick The main branch didn't like my GLIBC, so I switched to the |
Hi @Neato-Nick and @philippbayer, Thank you for waiting patiently, and I am sorry for the prolonged time of development. I went to the Evolution meeting 2 weeks ago so there was some delay there. I am working on a new version of EDTA, this version will have much better performance in both speed and quality. The main improvement is in TIR-Learner - @weijiaweijia and me are working together to make an improved, more generalized prediction model that fits most species; and also in the downstream filtering of TIR elements and Helitrons - I am working to provide more thorough filtering for raw predictions which will make the final library much smaller and better. I should be able to push these updates in 1-2 weeks if things work well - - our HPC has been down for maintenance for 3 days, so I can do nothing but talking ... Again, thank you for your interest and testing. Best, |
No worries @oushujun :) I'm just playing around with this software, the outcome doesn't depend on anything. Take all the time you need!! |
Dear All, Sorry for the delay of response. I just push a bulk update to EDTA and have tested it in different servers - it seems to work now. But I have not tested it in macOS, so some tiny differences could cause problems. For testing purposes, please use a small file, ie. 20 Mb, for faster turn around. Please let me know if there are any issues. Best, |
Thank you for the update! I'll give it a try this weekend :) |
Hi, Shujun I try the new release EDTA, the TIR_learner is still have error, the LTR, MITE and Helitron is fine. Is my genome (336M eudicots plant) have low percentage TIR? TIR command is below perl /data/software/EDTA/20190802/EDTA_raw.pl -genome genome.fa -species others -type tir -threads 24 Here is the error log cat: *-+-DTA.fa: No such file or directory
cat: *-+-DTC.fa: No such file or directorycat: *-+-DTH.fa: No such file or directory
cat: *-+-DTM.fa: No such file or directory
cat: *-+-DTT.fa: No such file or directory
cat: *-+-NonTIR.fa: No such file or directory
cat: *-+-*-+-*.gff3: No such file or directory
rm: cannot remove ‘*-+-*-+-*.gff3’: No such file or directory
Traceback (most recent call last):
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3_New/CombineAll.py", line 90, in <module>
keep=removeIRFhomo("%s.gff3"%(genome_Name+spliter+dataset),remove,"%sClean.gff3"%(genome_Name+spliter+dataset+spliter))
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3_New/CombineAll.py", line 76, in removeIRFhomo
f=pd.read_csv(file,header=None,sep="\t")
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Traceback (most recent call last):
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3/GetAllSeq.py", line 62, in <module> file=open(f,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3' |
Hi Zhugui
Thanks for testing the program.
Looks like your gff file is empty. It either because your genome doesn’t
have intact TIR (less likely) or the previous steps have some problems.
For the second case, could you check if you have a file ends with
predi.fa-+-200 and what’s the size of this file?
Thanks
Weijia
…On Sat, Aug 3, 2019 at 12:08 AM Zhigui Bao ***@***.***> wrote:
Hi, Shujun
I try the new release EDTA, the TIR_learner is still have error, the LTR,
MITE and Helitron is fine. Is my genome (336M eudicots plant) have low
percentage TIR?
TIR command is below
perl /data/software/EDTA/20190802/EDTA_raw.pl -genome genome.fa -species others -type tir -threads 24
Here is the error log
cat: *-+-DTA.fa: No such file or directory
cat: *-+-DTC.fa: No such file or directorycat: *-+-DTH.fa: No such file or directory
cat: *-+-DTM.fa: No such file or directory
cat: *-+-DTT.fa: No such file or directory
cat: *-+-NonTIR.fa: No such file or directory
cat: *-+-*-+-*.gff3: No such file or directory
rm: cannot remove ‘*-+-*-+-*.gff3’: No such file or directory
Traceback (most recent call last):
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3_New/CombineAll.py", line 90, in <module>
keep=removeIRFhomo("%s.gff3"%(genome_Name+spliter+dataset),remove,"%sClean.gff3"%(genome_Name+spliter+dataset+spliter))
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3_New/CombineAll.py", line 76, in removeIRFhomo
f=pd.read_csv(file,header=None,sep="\t")
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/data/software/Anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Traceback (most recent call last):
File "/data/software/EDTA/20190802/bin/TIR-Learner1.19/Module3/GetAllSeq.py", line 62, in <module> file=open(f,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AHUQO6SFSTPN2JW4BJQ2FRDQCUHDDA5CNFSM4H2VKO7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3PHHVQ#issuecomment-517895126>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHUQO6QP5KR5MKUKRFKPIS3QCUHDDANCNFSM4H2VKO7A>
.
|
Hi, Weijiai Thanks for reply.I didn't find the predi.fa-+-200. Where's the result should be? I find only .
├── temp
│ ├── TIR-Learner-+-Chr10.fasta
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa-+-p
├── TIR-Learner
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr1-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr2-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr3-+-GRFmite.fa-+-p |
It should be in a sub folder of Module3_New, named by your genome name.
Module3_New/YourGenomeName. If you don’t have this file, please keep this
issue open, we will check the process.
Thanks
Weijia
…On Sat, Aug 3, 2019 at 12:50 AM Zhigui Bao ***@***.***> wrote:
Hi, Weijiai
Thanks for reply.I didn't find the predi.fa-+-200. Where's the result
should be?
I find only Module3_New directory have result, the other directory are
empty.
.
├── temp
│ ├── TIR-Learner-+-Chr10.fasta
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa-+-p
├── TIR-Learner
│ ├── TIR-Learner-+-Chr10-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr1-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr2-+-GRFmite.fa-+-p
│ ├── TIR-Learner-+-Chr3-+-GRFmite.fa-+-p
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AHUQO6XGTDATP64YLH7B23LQCUMDJA5CNFSM4H2VKO7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3PHYLA#issuecomment-517897260>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHUQO6VZ3ZPW3FSUAYN3B6LQCUMDJANCNFSM4H2VKO7A>
.
|
Hi Weijia, Thanks for check. |
Oh, ok...can you show me the file/size in TIR-Learner?
…On Sat, Aug 3, 2019 at 1:14 AM Zhigui Bao ***@***.***> wrote:
Hi Weijia,
Thanks for check.
Modules3_Newonly have temp ,TIR-Learner and TIR-Learner-Result.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AHUQO6U2CP32QT3AYL2LKM3QCUO4RA5CNFSM4H2VKO7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3PICTA#issuecomment-517898572>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHUQO6TKJTZVBMXVZW4RISTQCUO4RANCNFSM4H2VKO7A>
.
|
Hi Weijia, I find the reason why I miss the Cheers, |
Hi all, Just update the testing result. It seems that new release TIR can close this issue.
Thanks for the developing. Bests, |
I just pushed some new updates to EDTA, mainly to fix the TIR-Learner issue. Please reinstall EDTA and rerun it in the same work folder. Existing results will be reused so there is essentially no waste of time. Thank you for your patience and support! |
I consider this issue resolved. Please reopen it if it doesn't. Thank you all for testing. Shujun |
I can't find TIR-Learner on github, but I'd rather be opening issues there. The errors I'm running into are typically encountered almost exclusively while running TIR-Learner, but EDTA itself is doing just fine. @oushujun Do you know if this is the right repo I should be posting to? https://github.com/weijiaweijia/TIR-Learner-Rice |
Hi Nick,
Yes that was the original repo and the version is around v1.09, but the
current TIR-Learner is at v1.23, we have improved the program
substantially, so you may just use the EDTA version. Please update EDTA and
try again. We push updates quite frequently at this point due to the
improvement of these programs. You may open issues at the EDTA repo if you
encounter any more issues.
Thanks!
Shujun
…On Fri, Aug 30, 2019, 1:23 PM Nick Carleson ***@***.***> wrote:
I can't find TIR-Learner on github, but I'd rather be opening issues
there. The errors I'm running into are typically encountered almost
exclusively while running TIR-Learner, but EDTA itself is doing just fine.
@oushujun <https://github.com/oushujun> Do you know if this is the right
repo I should be posting to?
https://github.com/weijiaweijia/TIR-Learner-Rice
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ABNX4ND3KTB6D7BP6YXPKKTQHFJQTA5CNFSM4H2VKO7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SIP2I#issuecomment-526682089>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABNX4NFQLHGAJK4Y3PQ345LQHFJQTANCNFSM4H2VKO7A>
.
|
Hello, I installed EDTA following the instructions for a conda install. I have run EDTA using the following commands: It works for LTR, but it crashes at the TIR step. Please see the error below: I have read this thread, but have not found anything to help me (I am a novice, so maybe that is why). Can someone please help me understand what is going on here, and help me figure out how to fix it? Thank you, |
Hi Aaron,
Can you provide the conda command you used to install EDTA? Also, providing
the output of the following comand executed under the EDTA env will be
helpful:
conda list > edta.env.list
Best,
Shujun
…On Sat, Jan 16, 2021 at 9:54 AM aaronphillips7493 ***@***.***> wrote:
Hello, I installed EDTA following the instructions for a conda install. I
have run EDTA using the following commands:
perl ../EDTA.pl --genome $GENOME --cds $CDS --curatedlib $CURATEDLIB
--overwrite 0 --sensitive 1 --anno 1 --species Rice --evaluate 1 --threads
10
It works for LTR, but it crashes at the TIR step. Please see the error
below:
Species: Rice
Traceback (most recent call last):
File
"/hpcfs/users/a1779884/rice_genomics/EDTA/bin/TIR-Learner2.5/Module1/Fullcov.py",
line 58, in
ProcessHomology(genome_Name)
File
"/hpcfs/users/a1779884/rice_genomics/EDTA/bin/TIR-Learner2.5/Module1/Fullcov.py",
line 47, in ProcessHomology
f = pd.read_csv(blast, header=None, sep="\t")
File
"/hpcfs/users/a1779884/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py",
line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File
"/hpcfs/users/a1779884/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py",
line 452, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File
"/hpcfs/users/a1779884/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py",
line 946, in *init*
self._make_engine(self.engine)
File
"/hpcfs/users/a1779884/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py",
line 1178, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File
"/hpcfs/users/a1779884/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py",
line 2008, in *init*
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 540, in
pandas._libs.parsers.TextReader.*cinit*
pandas.errors.EmptyDataError: No columns to parse from file
cat: *DTC-+-select.fa: No such file or directory
cat: *DTH-+-select.fa: No such file or directory
cat: *DTM-+-select.fa: No such file or directory
cat: *DTT-+-select.fa: No such file or directory
I have read this thread, but have not found anything to help me (I am a
novice, so maybe that is why). Can someone please help me understand what
is going on here, and help me figure out how to fix it?
Thank you,
Aaron :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NBM6E7M74XVRDBYSKTS2DWXHANCNFSM4H2VKO7A>
.
|
Hi Shujun, Thank you for your hasty reply! To install I did the following on Thursday 14th Jan 2021:
And then I ran the test, which worked. Please find the list of packages in the EDTA env attached to this here. I have just refreshed the EDTA page and the instructions for installation appear to be different now. Were they recently updated, and perhaps that is why I am having issues? Thank you again, |
Hi Aaron, Thanks for the details. Something may be conflicted with Best, |
Hey Shujun, I reinstalled EDTA using the .yml file. I re-ran my analyses with the overwrite option switched off (to avoid redoing the LTR finding) and I got the same errors again. I am now trying to rerun EDTA with the overwrite option switched on, so will let you know how that goes. Thanks again for your suggestions, |
Hi Aaron, That is one of my thoughts too, that you may have run multiple times on the same folder, and some erroneous runs have made the files weird and preventing new runs to proceed. Ovewriting the existing files will be a good choice. If you want to save the LTR results, you can run EDTA_raw with Best, |
Hey Shujun, LTR step worked, but TIR failed again with the same errors. I noticed that when I try to do just TIR with Do you have any other suggestions? Thank you again, |
Hi Aaron,
Did you use overwrite 1 on existing folders or start fresh? You my try the
later one.
Shujun
…On Sun, Jan 17, 2021 at 11:22 AM aaronphillips7493 ***@***.***> wrote:
Hey Shujun,
LTR step worked, but TIR failed again with the same errors. Do you have
any other suggestions?
Thank you again,
Aaron :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NC62EMWHZH3FBGOOGLS2JJVRANCNFSM4H2VKO7A>
.
|
Hi,
I copy and pasted the installation instructions from the README and am running the the script in the active EDTA environment. It seems that the EDTA.pl script chokes trying to use TIR-Learner. Looking at my output, all the correct folders and such are there. After crashing, the Helitron, MITE, and TIR folders are empty but the LTR folder is not. The only file in the parent output folder is
genome.fasta.LTR.raw.fa
.Is there a way to run the Perl pipeline script but just not use TIR-Learner, or even just not call TIRs? I'm still interested in the other features, and even if I could just use EDTA for Helitrons, LTRs, MITEs, filtering, consensus calling, and repeat classifying I would be happy.
The lines before the crash start with what's seen in #2 (comment). Then it's a traceback starting from
~/bin/EDTA/bin/TIR-Learner1.12/Module1/Fullcov.py, line 52, in <module> ProcessHomology(genome_Name)
. After that, there's some cryptic errors includingcat: '*DTA-+-select.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
There's a few more error traces after that, with each Traceback followed by various errors from files not being found by
rm
,cp
,mv
,cat
.Lastly, in the last few lines before the crash, I get these lines which tell me that it certainly is a problem with TIR-Learner
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3' mv: cannot stat 'TIR-Learner/*FinalAnn.gff3': No such file or directory mv: cannot stat 'TIR-Learner/*FinalAnn.fa': No such file or directory cp: cannot stat 'TIR-Learner-Result/TIR-Learner_FinalAnn.fa': No such file or directory Error: TIR results not found!
ERROR: Raw TIR results not found in genome.fasta.EDTA.raw/genome.fasta.TIR.raw.fa at ~bin/EDTA/EDTA.pl line 145.
While bug testing I've just been using the first two scaffolds of my genome. That file is attached.
Thanks!
PR-102_JGI_twoscafs.fasta.zip
The text was updated successfully, but these errors were encountered: