Skip to content

Releases: althonos/pyhmmer

v0.8.0

01 May 14:58
Compare
Choose a tag to compare

PyHMMER has been accepted for publication in Bioinformatics. Paper accessible here: doi:10.1093/bioinformatics/btad214.

Added

  • pyhmmer.hmmer.jackhmmer function to run several JackHMMER iterative searches in parallel using multithreading (#35, by @zdk123).
  • HMM.to_profile shortcut method to allocate and configure a new Profile object.

Fixed

  • Type annotations of Pipeline.iterate_seq and Pipeline.iterate_hmm.
  • Potential memory leak on exceptions raised by HMMPressedFile.read.
  • Offsets.profile not recording offsets properly, causing pyhmmer.hmmer.hmmpress to produce invalid pressed files (#37).

Changed

  • HMM.__init__ and HMM.sample now take the Alphabet as the first argument, for consistency with the rest of the API.
  • HMM now require a name argument.

Removed

  • Deprecated ignore_gaps argument in SequenceFile.__init__.
  • Deprecated Sequence.taxonomy_id property.

v0.7.4

14 Apr 14:16
Compare
Choose a tag to compare

Added

  • Recipes page to the documentation with code example for loading multiple HMM files (#24, by @zdk123).

Fixed

  • TraceAligner methods causing a segfault when passed an uninitialized HMM (#36).

Changed

  • HMM default constructor now always creates a valid HMM (with respects to probability arrays).
  • TraceAligner now validates the input HMM before calling the HMMER code.
  • Use stack allocation for all error buffers instead of creating empty bytearray objects where applicable.

v0.7.3

24 Mar 22:03
Compare
Choose a tag to compare

Fixed

  • Wrong argument type in IterativeSearch.iterate_hmm method (#34, by @zdk123).

v0.7.2

17 Feb 11:32
Compare
Choose a tag to compare

Added

  • easel.GeneticCode class wrapping an ESL_GENCODE struct for configuring translation.
  • DigitalSequence.translate method to translate a nucleotide sequence to a protein sequence. Metadata is copied from the source sequence to its translation (#31, by @valentynbez).

Deprecated

  • Sequence.taxonomy_id property, as it is not used by Easel and implementation is not consistent (See EddyRivasLab/easel#68).

v0.7.1

15 Dec 12:52
Compare
Choose a tag to compare

Added

  • Missing __reduce__ method to TopHits.

Fixed

  • Build detection of available platform functions in setup.py.

v0.7.0

04 Dec 15:59
Compare
Choose a tag to compare

Added

  • Bitfield.zeros and Bitfield.ones classmethods for constructing an empty bitfield of known size.
  • Bitfield.copy method to copy a bitfield object.
  • SequenceBlock and OptimizedProfileBlock classes to store Python objects next to a contiguous array of pointers for iterating with the GIL released.
  • SequenceFile.read_block method to read a whole sequence block from a file.
  • HMM.sample class method to generate a HMM at random given a Randomness source.
  • hmmscan function to scan a profile database with sequence queries.
  • deepcopy implementations to HMM, Profile and OptimizedProfile classes of plan7.
  • rewind method to HMMFile, HMMPressedFile and SequenceFile to reset a file back to its initial position.
  • name attribute to HMMFile, HMMPressedFile, MSAFile and SequenceFile to expose the path of a file (when it was created from path).
  • local property to Profile and OptimizedProfile, indicating whether a profile is in local or global mode.
  • multihit property to Profile and OptimizedProfile, indicating whether a profile is in unihit or multihit mode, with a setter taking care of the reconfiguration.
  • Domain.included and Domain.reported settable properties to report the inclusion and reporting status of a single domain.
  • TopHits.included and TopHits.reported sized iterator to iterate only on included and reported hits.
  • Domains.included and Domains.reported sized iterator to iterate only on included and reported domains.

Changed

  • Bitfield, Vector and Matrix can now be created from an iterable.
  • Pipeline search methods now expect a DigitalSequenceBlock or a SequenceFile for the target sequence database.
  • Pipeline scan methods now expect an OptimizedProfileBlock or a HMMPressedFile for the target profile database.
  • TraceAligner now expect a DigitalSequenceBlock for the sequences to align to the HMM.
  • Profile.configure now uses a default value of 400 for the L argument.
  • hmmsearch, nhmmer and phmmer support being given a single query instead of requiring an iterable.
  • HMMPressedFile can now be created, closed and used as a context manager directly without having to manage the source HMMFile.
  • Renamed Profile.optimized method to Profile.to_optimized.
  • Replaced Randomness.is_fast method with the Randomness.fast property.
  • Rewrite handling of Hit flags using settable properties (Hit.included, Hit.reported, Hit.new, Hit.dropped, Hit.duplicate) instead of methods.

Fixed

  • Memory leak in the LongTargetsPipeline search loop.
  • PyPy behaviour change of readinto methods now expecting unsigned char* instead of char* memoryview.
  • NULL-pointer dereference in Pipeline.search_hmm when given a query without name.
  • LongTargetsPipeline not recording the query name and accession.
  • Memory leak caused by using a non-default prior scheme when constructing a Builder.

Removed

  • PipelineSearchTargets, replaced in functionality with easel.DigitalSequenceBlock.
  • is_local and is_multihit methods of Profile and OptimizedProfile, replaced with equivalent properties.
  • Hit.manually_drop and Hit.manually_include methods, replaced with the different Hit properties.

v0.6.3

09 Sep 13:09
Compare
Choose a tag to compare

Fixed

  • Error not being raised on alphabet detection failure in SequenceFile or MSAFile.
  • Add check in DigitalSequence constructor to make sure encoded characters are in valid range (#25).

Added

  • SequenceFile.guess_alphabet and MSAFile.guess_alphabet to guess the alphabet from an open file.
  • Alphabet.encode and Alphabet.decode to convert raw sequences between digital and text format.

v0.6.2

12 Aug 11:24
Compare
Choose a tag to compare

Changed

  • hmmsearch, phmmer and nhmmer functions will reduce the requested number of threads to the number of queries, if it can be detected using operator.length_hint.

Added

  • Documentation for loading all HMMs from an HMMFile object at once (#23).
  • List of projects depending on PyHMMER to the Examples page of the documentation.

v0.6.1

28 Jun 20:38
Compare
Choose a tag to compare

Added

  • pickle protocol support for TopHits objects, using the HMMER network serialization.
  • TopHits.write method to write hits to a file in tabular format.
  • query_name and query_accession properties to TopHits objects to access the name and accession of the query that produced the hits.

Fixed

  • Extraction of filename from file-like objects in the HMMFile constructor.
  • Use os.cpu_count instead of multiprocessing.cpu_count where applicable to preserve OS scheduling.
  • Wrong return type in docstring of HMM.insert_emissions.
  • TopHits.searched_nodes returning the searched number of residues instead of the searched number of model nodes.
  • Unsound decoding of pickled MatrixF or VectorF when data comes from a source of different endianness.

Changed

  • Rewrite pyhmmer.hmmer threading code using Deque instead of collections.Queue to store the queries and results.
  • Reduce memory consumption of pyhmmer.hmmer by reducing the number of semaphores and event flags used concurrently.
  • Make pyhmmer.hmmer main threads block on query insertion rather than result retrieval to make sure worker threads are never idling.

v0.6.0

01 May 12:57
Compare
Choose a tag to compare

Added

  • pyhmmer.daemon module with an client implementation to communicate to a hmmpgmd server.
  • Pipeline.arguments methods to get a list of CLI arguments from the parameters used to initialize the Pipeline.
  • Setters for name, accession and description properties of plan7.Hit.
  • Constructor for individual plan7.Trace objects outside a plan7.Traces list.
  • plan7.Trace.from_sequence constructor to create a faux trace from a single sequence.
  • manually_include and manually_drop methods to plan7.Hit for manually selecting the inclusion status of a Hit in a TopHits instance.
  • compare_ranking method to plan7.TopHits for comparing the order of the hits compared to a previous run on the same targets stored in an easel.KeyHash object.
  • Pipeline.iterate_seq and Pipeline.iterate_hmm to run iterative queries like JackHMMER.
  • repr implementations for easel.MSAFile, easel.SequenceFile and easel.HMMFile showing the path or file object they were created from.
  • repr implementation for easel.Randomness showing the seed and the RNG algorithm in use.
  • str implementation for plan7.Alignment using HMMER original code to display a domain alignment like in search/scan results.

Changed

  • plan7.Trace.posterior_probabilities property may now be None in case no memory is allocated for the posteriors in the P7_TRACE struct.
  • TopHits.to_msa can now add additional sequences passed as arguments to the alignment.
  • plan7.HMMPressedFile now raises an exception on attempts to create a new instance manually.
  • ignore_gaps argument of easel.SequenceFile is now deprecated.
  • repr implementations for easel types now use the fully qualified class name.

Fixed

  • easel.SequenceFile.readinto docstring not rendering properly in documentation.
  • Type annotations of hits_included and hits_reported of plan7.TopHits marking these properties as bool instead of int.
  • Setters of name, accession, description and author properties of easel.MSA crashing when given None values.
  • Exception value raised from Easel code not being properly extracted.
  • Plain strings being used in example for easel.TextSequence and easel.TextMSA constructors where byte strings are expected (#20).