You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use your tool with GRCm39. I have been able to successfully build with m38, and GRCh38. However, for GRCm39 using ensembl v109, an error is thrown:
Command and error:
singularity run how_are_we_stranded_here_1.0.1--pyhfa5458b_0.sif check_strandedness --gtf /<REDACTED_PATH>/ensembl/v109/Mus_musculus.GRCm39.109.gtf -r1 RNASEQ.R1.FASTQ.gz_filtered -r2 RNAEQ.R1.FASTQ.gz_filtered -fa /<REDACTED_PATH>/ensembl/GRCm39/Mus_musculus.GRCm39.cds.all.fa -p
Results stored in: stranded_test_RNASEQ.R1.FASTQ_filtered_trimmed
converting gtf to bed
running command: gtf2bed --gtf /<REDACTED_PATH>/ensembl/v109/Mus_musculus.GRCm39.109.gtf --bed RNASEQ.R1.FASTQ_filtered/Mus_musculus.GRCm39.109.bed
Checking if fasta headers and bed file transcript_ids match...
Can't find transcript ids from /<REDACTED_PATH>/ensembl/GRCm39/Mus_musculus.GRCm39.cds.all.fa in stranded_test_RNASEQ.R1.FASTQ_filtered/Mus_musculus.GRCm39.109.bed
Trying to converting fasta header format to match transcript ids to the BED file...
running command: sed 's/[|]/ /g' /<REDACTED_PATH>/ensembl/GRCm39/Mus_musculus.GRCm39.cds.all.fa > stranded_test_RNASEQ.R1.FASTQ_filtered/transcripts.fa
Can't find any of the first 10 BED transcript_ids in fasta file... Check that these match
Here is the GTF, FASTA direct from Ensembl, and converted bed:
Looking at this, it seems that Ensembl no long converts all transcripts to the cds.all.fa file. I can remake this file for all transcripts in the GTF, but it would be great if the tool worked with Ensembl resources. Is it possible to make the parse more flexible for cases like this?
The text was updated successfully, but these errors were encountered:
MikeWLloyd
changed the title
Ensembl v109
Ensembl GRCm39 v109 GTF / ...cds.all.fa Parse Issue
May 1, 2023
I am trying to use your tool with GRCm39. I have been able to successfully build with m38, and GRCh38. However, for GRCm39 using ensembl v109, an error is thrown:
Command and error:
Here is the GTF, FASTA direct from Ensembl, and converted bed:
head
of GTF:head
of CDS FASTA from ensemblhead
of converted BED.Looking at this, it seems that Ensembl no long converts all transcripts to the
cds.all.fa
file. I can remake this file for all transcripts in the GTF, but it would be great if the tool worked with Ensembl resources. Is it possible to make the parse more flexible for cases like this?The text was updated successfully, but these errors were encountered: