Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Pickling and unpickling an HMM object changes the cutoff values #67

Closed
jolespin opened this issue Apr 24, 2024 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@jolespin
Copy link

jolespin commented Apr 24, 2024

It took me a while to figure out what was happening or how to get a minimum reproducible example but I think I finally have one.

import pyhmmer
from pyhmmer.plan7 import HMMFile
from pyhmmer.easel import SequenceFile
from pyhmmer import hmmsearch
import sys, os, gzip,  pickle

# Versions
sys.version
# '3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:36:46) \n[Clang 16.0.6 ]'

print(pyhmmer.__version__)
# 0.10.11

# Load in HMMs
name_to_hmm = dict()
with HMMFile("/Users/jolespin/Databases/Pfam/Pfam-A.hmm.gz") as hmm_file:
    for hmm in list(hmm_file):
        name_to_hmm[hmm.accession.decode()] = hmm

# Before pickling
query_hmm = name_to_hmm["PF09847.13"]
print(query_hmm.cutoffs.gathering)
# (33.20000076293945, 33.20000076293945)

# Write pickle
with gzip.open("PF09847.13.pkl.gz", "wb") as f:
    pickle.dump(query_hmm, f)

# Read pickle
with gzip.open("PF09847.13.pkl.gz", "rb") as f:
    query_hmm_reloaded = pickle.load(f)

# After pickling
print(query_hmm_reloaded.cutoffs.gathering)
# (0.07795137166976929, 0.010516392067074776)

For some reason, when I'm pickling my HMMs the cutoffs change and I'm not sure exactly why or how this could happen.

Can you give it a try on your system?

I'm just using Pfam-A.hmm.gz from https://www.ebi.ac.uk/interpro/download/Pfam/

Let me know if I can provide anymore context.

@jolespin jolespin changed the title [Bug] Pickling and unpickling an HMM object changes the cutoffs [Bug] Pickling and unpickling an HMM object changes the cutoff values Apr 24, 2024
@althonos
Copy link
Owner

Hi @jolespin, thanks for the code to reproduce, I'm getting the same issue on my local system. What's strange is that the pickling of the cutoff is tested (I'm using PF02826, and there it works), but indeed for PF09847 the cutoffs get messed up. I'll have to figure out what's going on 😓

@althonos althonos added the bug Something isn't working label Apr 25, 2024
@althonos
Copy link
Owner

The culprits:

pyhmmer/pyhmmer/plan7.pyx

Lines 2528 to 2534 in c5234e5

memcpy(&self._hmm.evparam[0], &evparam[0], p7_NEVPARAM * sizeof(float))
memcpy(&self._hmm.cutoff[0], &cutoff[0], p7_NCUTOFFS * sizeof(float))
if self._hmm.flags & libhmmer.p7_hmm.p7H_COMPO:
compo = state["compo"]
assert compo.ndim == 1
assert compo.shape[0] == K
memcpy(&self._hmm.cutoff[0], &compo[0], K * sizeof(float))

In the last memcpy, there is a typo, the model compositions are actually copied into the cutoffs, and overwrites the actual cutoffs. This probably wasn't an issue with PF02826 because it doesn't have a model composition set.

@althonos
Copy link
Owner

Fixed in v0.10.12.

@jolespin
Copy link
Author

Excellent! Thank you for handling this so quickly. I was sitting with it for a day or two trying to figure out if I was making an obvious mistake. Glad the deep dive wasn't for nothing and could help with your development.

Took a look at your PR, why does the switch from t2pks and RREFam fix the pickling issue (IIRC the .h3m are one of the indexed HMM files)?

"with pyhmmer.plan7.HMMFile(\"data/hmms/bin/t2pks.h3m\") as hmms:\n",
"with pyhmmer.plan7.HMMFile(\"data/hmms/bin/RREFam.h3m\") as hmms:\n",

@althonos
Copy link
Owner

Ah no, the switch is for something different, I just wanted to reduce the size of the test data (mostly because I'm starting to hit the PyPI.org project size limit given that the wheels are getting big and I have 30+ wheels per release) so I used smaller HMMs.

The fix is just in 7b1ffb8 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants