Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborted (core dumped) occurred while doing translate #50

Closed
chtsai0105 opened this issue Aug 20, 2023 · 3 comments
Closed

Aborted (core dumped) occurred while doing translate #50

chtsai0105 opened this issue Aug 20, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@chtsai0105
Copy link

Hi - I was trying to load a dna coding sequence and want to further translate to peptide sequence for hmmsearch. I loaded the gzipped fasta dna sequence with pyhmmer.easel.SequenceFile and everything is fine so far, the Alphabet correctly identify it as dna. However when I do translate my jupyter kernel keep crashing. I also tried it with regular script but still got Aborted (core dumped) failure.

I did a quick test to find out what going on. I found that when I can successfully translate the SequenceBlock with the first 5 entries. I can also successfully translate the block from 6th to 20th entries. But when I do it with 0 to 6, it crashed. I also checked the fasta file but cannot found anything that is suspicious.

And this failure seems to be vary between machines. I did the test on another host and it success for the 0 to 6 block. But when I tried to translate the entire block it will eventually failed.

Not sure whether there is anything that I didn't noticed in my fasta so I included the fasta below.
Actinomucor_elegans_CBS_100.09.cds.fasta.gz

And here is my test script - it usually failed at the last line but in would failed in seqs[0:6].translate() when running on jupyter. I've already updated the pyhmmer to the latest version 0.10.1.

import pyhmmer

print(pyhmmer.__version__)
seq_file = pyhmmer.easel.SequenceFile("Actinomucor_elegans_CBS_100.09.cds.fasta.gz", digital=True)
seqs = seq_file.read_block()
peps = seqs[0:5].translate()
print("Try to print peptide")
print(peps[-1].sequence)
peps = seqs[6:20].translate()
print("Try to print peptide")
print(peps[0].sequence)
peps = seqs[0:6].translate()
print("Try to print peptide")
print(peps[-1].sequence)
@althonos althonos added the bug Something isn't working label Aug 20, 2023
@althonos
Copy link
Owner

Hi @chtsai0105, thank you for the bug report. I'm also getting a segfault using the test snippet you provided, I'll try to understand what's going on.

@althonos
Copy link
Owner

The DigitalSequenceBlock.translate method was occasionally writing to the wrong buffer, namely the sequence of the last protein of the block. Depending to how you'd slice, you could end up writing out of the buffer bounds, and get a segfault later! That's why you were seeing different behaviour based on the sequence coordinates.

I have patched the bug, updated tests to pick up any regression, and will make a new release with the patch.

@chtsai0105
Copy link
Author

Thank you for such a quick update! Have confirmed that it behave as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants