Aborted (core dumped) occurred while doing translate #50

chtsai0105 · 2023-08-20T07:41:45Z

Hi - I was trying to load a dna coding sequence and want to further translate to peptide sequence for hmmsearch. I loaded the gzipped fasta dna sequence with pyhmmer.easel.SequenceFile and everything is fine so far, the Alphabet correctly identify it as dna. However when I do translate my jupyter kernel keep crashing. I also tried it with regular script but still got Aborted (core dumped) failure.

I did a quick test to find out what going on. I found that when I can successfully translate the SequenceBlock with the first 5 entries. I can also successfully translate the block from 6th to 20th entries. But when I do it with 0 to 6, it crashed. I also checked the fasta file but cannot found anything that is suspicious.

And this failure seems to be vary between machines. I did the test on another host and it success for the 0 to 6 block. But when I tried to translate the entire block it will eventually failed.

Not sure whether there is anything that I didn't noticed in my fasta so I included the fasta below.
Actinomucor_elegans_CBS_100.09.cds.fasta.gz

And here is my test script - it usually failed at the last line but in would failed in seqs[0:6].translate() when running on jupyter. I've already updated the pyhmmer to the latest version 0.10.1.

import pyhmmer

print(pyhmmer.__version__)
seq_file = pyhmmer.easel.SequenceFile("Actinomucor_elegans_CBS_100.09.cds.fasta.gz", digital=True)
seqs = seq_file.read_block()
peps = seqs[0:5].translate()
print("Try to print peptide")
print(peps[-1].sequence)
peps = seqs[6:20].translate()
print("Try to print peptide")
print(peps[0].sequence)
peps = seqs[0:6].translate()
print("Try to print peptide")
print(peps[-1].sequence)

The text was updated successfully, but these errors were encountered:

althonos · 2023-08-20T14:32:07Z

Hi @chtsai0105, thank you for the bug report. I'm also getting a segfault using the test snippet you provided, I'll try to understand what's going on.

althonos · 2023-08-20T20:21:20Z

The DigitalSequenceBlock.translate method was occasionally writing to the wrong buffer, namely the sequence of the last protein of the block. Depending to how you'd slice, you could end up writing out of the buffer bounds, and get a segfault later! That's why you were seeing different behaviour based on the sequence coordinates.

I have patched the bug, updated tests to pick up any regression, and will make a new release with the patch.

chtsai0105 · 2023-08-21T05:15:56Z

Thank you for such a quick update! Have confirmed that it behave as expected.

althonos added the bug Something isn't working label Aug 20, 2023

chtsai0105 closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborted (core dumped) occurred while doing translate #50

Aborted (core dumped) occurred while doing translate #50

chtsai0105 commented Aug 20, 2023

althonos commented Aug 20, 2023

althonos commented Aug 20, 2023

chtsai0105 commented Aug 21, 2023

Aborted (core dumped) occurred while doing translate #50

Aborted (core dumped) occurred while doing translate #50

Comments

chtsai0105 commented Aug 20, 2023

althonos commented Aug 20, 2023

althonos commented Aug 20, 2023

chtsai0105 commented Aug 21, 2023