-
Notifications
You must be signed in to change notification settings - Fork 22
rna-seq build fails because of missing files bowtie2 index files #30
Comments
Even though the refseq build has a bunch of index files, they are from back before we had a formal software result for aligner indexes, and in theory are no longer used. This theory clearly fails. :( Probably the best thing is to grab the few things we die on and put them in the FTP staging location. Then redo the sync after the files stage. Sent from my iPhone
|
Yeah I was thinking that is the short term fix as well. Also, doing the index takes a long time, so having them pre-generated for the aligners used in the tutorial exercise will make that go faster for the user. For clarity. I am going to place files in TGI staging dir here: I will get the files from TGI reference annotation build here: The files we are missing are: copy command as follows: Monitor progress of staging here: Once staging is complete I will run the rysnc command again: And these missing files should appear here on my VM system: |
This worked, relaunching the rna-seq build to see if I can get past index building step now. When this new build launched, the first thing I notice is that it is still trying to generate indices: Finding or generating reference build index for aligner per-lane-tophat version 2.0.4 params -p 4 --bowtie-version=2.0.0-beta7 refbuild 106942997 This step complete successfully before but perhaps because the overall step crashed the result was not correctly logged in the DB? Unfortunately, this means I will now have to wait many hours before I know if the same crash is going to happen. Talking to Jason, it seems like this issue has been solved in 'gms-core master' before the last merge into 'gms-core pub'. See here for details: This is yet another example of why it would be great to merge from master into pub more regularly. The merge is already an active issue: #23 |
One possible short term fix for this issue, that might work is to selectively merge bug fixes made to the following module from 'gms-core master' into 'gms-core pub'. lib/perl/Genome/InstrumentData/AlignmentResult/PerLaneTophat.pm |
This should be a moot point now that master is merged into gms-pub, right? |
I think so. Need to confirm still. I definitely got past this point in my
|
The merge is complete, but a fresh rna-seq build still has the same issue. 2013-11-27 16:38:01-0600 clia1: Finding or generating reference build index for aligner per-lane-tophat version 2.0.4 params -p 4 --bowtie-version=2.0.0-beta7 refbuild 106942997 Currently it seems that we are not able to find software results for bwa indexes in reference-alignment or bowtie indexes in rna-seq alignments. Despite the fact that the actual data files seem to be present in both cases... This slows down the testing and even if we get a small data set (i.e. TST2 a with small number of reads) the total run time of the demonstration analysis will still be high because generating fresh reference indexes is slow. |
This is successfully resolved with the closing of issue 15. |
It seems that the reference sequence build that gets incorporated into GMS1 is missing many files compared to that build on the TGI filesystem... Some critical files are missing and rna-seq builds fail for example when they expect bowtie2 indices to be there but they are not:
Compare the contents of this (46 Gb):
ls /opt/gms/GMS1/fs/ams1102/info/model_data/2869585698/build106942997/
With the contents of this (5.9 Gb):
ls /gscmnt/ams1102/info/model_data/2869585698/build106942997/
Maybe these files do not need to be there anyway because it seems like the bowtie2 index files were created during the rna-seq build attempt and stored here:
/opt/gms/HU9D538/fs/HU9D538/info/model_data/ref_build_aligner_index_data/2869585698/build106942997/aligner-index-precise64-vagrant-13502-4bf701b63ced11e3b0cc080027880ca6/bowtie/2_0_0_beta7/
But the following command during the rna-seq build is looking for them here:
Error: Could not find Bowtie 2 index files. /opt/gms/GMS1/fs/ams1102/info/model_data/2869585698/build106942997/all_sequences.fa.*.bt2
/usr/bin/tophat2.0.4 -p 4 --transcriptome-only --transcriptome-index '/tmp/9.tmpdir/gm-genome_sys-2013-10-24_20_45_57--iyBS/annotation-index-precise64-vagrant-13502-513acabe3d0f11e3a0dc080027880ca6/all_sequences' -G /opt/gms/GMS1/fs/gc12001/info/model_data/2772828715/build124434505/annotation_data/rna_annotation/106942997-all_sequences.gtf --output-dir '/tmp/9.tmpdir/gm-genome_sys-2013-10-24_20_45_57--iyBS/anonymous1' /opt/gms/GMS1/fs/ams1102/info/model_data/2869585698/build106942997/all_sequences.fa /tmp/9.tmpdir/gm-genome_sys-2013-10-24_20_45_57--iyBS/anonymous0/fake_reads.fastq
Incidentally, this is kind of a 'dummy' rna-seq alignment run, perhaps performed for the purpose of creating genome and transcriptome indices?
The text was updated successfully, but these errors were encountered: