Skip to content

Commit

Permalink
ProteinFoldingResult, XYZ files and plotting of folded Proteins for P…
Browse files Browse the repository at this point in the history
…roteinFoldingProblem. (#655)

* Adding files modified. Dependencies not Checked

* Fixed one Dependency

* Passing ProteinFoldingProblem as attribute to ProteinFoldingResult. Also simplified some classes

* Some details

* Created more test cases in test_protein_folding_result

* Added comments in the unittest

* Quick changes

* Separated unittests, and small modifications

* Fixed bug side turns

* Added letters to the plotter

* Fixed format and documentation

* Added release note

* Corrected indentation and typos

* Changed legend location

* Updated header's date

* Updated header's date

* Update qiskit_nature/results/protein_folding_tools/__init__.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_tools/protein_decoder.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_tools/protein_decoder.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_tools/protein_plotter.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Updated header's date

* Updated header's date

* Update protein_decoder.py

* Changed peptide getter docstring.

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Fixed multiple things

* Update qiskit_nature/results/protein_folding_tools/protein_xyz.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_tools/protein_xyz.py

Co-authored-by: dlasecki <[email protected]>

* Update releasenotes/notes/protein-folding-result-b344ac3c7f48e3ca.yaml

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_tools/protein_xyz.py

Co-authored-by: dlasecki <[email protected]>

* Modularized ProteinPlotter

* Changed documentation

* pylint

* Exposed main_turns

* Fixed make

* Fixed codeb

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Changed class description

* Reformated Unittests

* Refactored unittests

* Adding matplotlib

* Added Matplotlib requirement

* Accidentaly removed pylintdict

* Fixed circular import

* Update qiskit_nature/results/utils/protein_decoder.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/utils/protein_decoder.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/utils/protein_decoder.py

Co-authored-by: dlasecki <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: dlasecki <[email protected]>

* Fixed lint

* Refactored unittest

* best_sequence -> turns_sequence

* Update qiskit_nature/problems/sampling/protein_folding/protein_folding_problem.py

Co-authored-by: Max Rossmannek <[email protected]>

* Update test/results/test_protein_folding_result.py

Co-authored-by: Max Rossmannek <[email protected]>

* Update setup.py

Co-authored-by: Max Rossmannek <[email protected]>

* Update qiskit_nature/results/protein_folding_result.py

Co-authored-by: Max Rossmannek <[email protected]>

* Fixes

* black

* minor changes

* Fixed intentations

* Removed TODO comment

* minor changes

* Changed header XYZ file

* fixed style

* Return figure instead of plotting

* Removed files

* Returns in doc ploter

* Fixed codeblocks and reno

* RENO

* changed matplotlib requirements

* Added unittest for creating files

* Changed header xyz files

* changed direcotry name

* typo

* Make matplotlib optional

* Fixed comments

* Changed type hint

* typehint

* This version gives a circular import error. The reason is explained in a comment

* Attempts

* Fix errors

* Changed Documentation and get_figure

* Made more changes to RENO

* RENO

* reno

* Reno

* Fixed make html

* Fix sphinx

* Update releasenotes/notes/protein-folding-result-b344ac3c7f48e3ca.yaml

* Fixed bug with letters on plot

* Not plotting side chains when there are no side chains

* Fixed default values for plotting

* Make black

* Changed import order

* Fixes

* Apply suggestions from code review

Co-authored-by: Max Rossmannek <[email protected]>

* Fixed spelling and writing bug

* Changes in notebook

* Add .editorconfig (#685)

* Overwriting files

* Notebook

* Format

* Max's comments

* Fixed Notebook and changed name get_xyz_data()

* turn_sequence and tempfile

* Trailing Whitespace

* make black

* Changed notebook

* Documentation git_result...

* black

* Removed file

* Fixes

* Changed reno

* Changed capital letters on testfiles

* Update releasenotes/notes/protein-folding-result-b344ac3c7f48e3ca.yaml

Co-authored-by: dlasecki <[email protected]>
Co-authored-by: Max Rossmannek <[email protected]>
Co-authored-by: Steve Wood <[email protected]>
Co-authored-by: Manoel Marques <[email protected]>
Co-authored-by: Max Rossmannek <[email protected]>
Co-authored-by: woodsp-ibm <[email protected]>
  • Loading branch information
7 people authored Jun 16, 2022
1 parent 673be9f commit d1ff719
Show file tree
Hide file tree
Showing 20 changed files with 1,709 additions and 414 deletions.
6 changes: 6 additions & 0 deletions docs/apidocs/qiskit_nature.results.utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _qiskit_nature-results-utils:

.. automodule:: qiskit_nature.results.utils
:no-members:
:no-inherited-members:
:no-special-members:
910 changes: 589 additions & 321 deletions docs/tutorials/09_Protein_Folding.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2021.
# (C) Copyright IBM 2021, 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
Expand All @@ -10,17 +10,22 @@
# copyright notice, and modified files need to carry a notice indicating
# that they have been altered from the originals.
"""Defines a protein folding problem that can be passed to algorithms."""
from typing import Union, List
from __future__ import annotations

from qiskit.opflow import PauliSumOp, PauliOp
from typing import Union, List, TYPE_CHECKING

from qiskit.opflow import PauliSumOp, PauliOp
from qiskit.algorithms import MinimumEigensolverResult
from .peptide.peptide import Peptide
from .interactions.interaction import Interaction
from .penalty_parameters import PenaltyParameters
from .qubit_op_builder import QubitOpBuilder
from .qubit_utils import qubit_number_reducer
from ..sampling_problem import SamplingProblem

if TYPE_CHECKING:
from qiskit_nature.results.protein_folding_result import ProteinFoldingResult


class ProteinFoldingProblem(SamplingProblem):
"""Defines a protein folding problem that can be passed to algorithms. Example initialization:
Expand All @@ -30,11 +35,8 @@ class ProteinFoldingProblem(SamplingProblem):
penalty_terms = PenaltyParameters(15, 15, 15)
main_chain_residue_seq = "SAASSASAAG"
side_chain_residue_sequences = ["", "", "A", "A", "A", "A", "A", "A", "S", ""]
peptide = Peptide(main_chain_residue_seq, side_chain_residue_sequences)
mj_interaction = MiyazawaJerniganInteraction()
protein_folding_problem = ProteinFoldingProblem(peptide, mj_interaction, penalty_terms)
qubit_op = protein_folding_problem.qubit_op()
"""
Expand Down Expand Up @@ -88,12 +90,34 @@ def _qubit_op_full(self) -> Union[PauliOp, PauliSumOp]:
qubit_operator = self._qubit_op_builder._build_qubit_op()
return qubit_operator

# TODO will be implemented in another issue, including the type hint
def interpret(self):
pass
def interpret(self, raw_result: MinimumEigensolverResult) -> "ProteinFoldingResult":
"""
Interprets the raw algorithm result, in the context of this problem, and returns a
ProteinFoldingResult. The returned class can plot the protein and generate a
.xyz file with the coordinates of each of its atoms.
Args:
raw_result: The raw result of solving the protein folding problem.
Returns:
A :class:`~qiskit_nature.results.ProteinFoldingResult`
instance that contains the protein folding result.
"""
from qiskit_nature.results import ProteinFoldingResult

best_turn_sequence = max(raw_result.eigenstate, key=raw_result.eigenstate.get)
return ProteinFoldingResult(
unused_qubits=self.unused_qubits,
peptide=self.peptide,
turn_sequence=best_turn_sequence,
)

@property
def unused_qubits(self) -> List[int]:
"""Returns the list of indices for qubits in the original problem formulation that were
removed during compression."""
return self._unused_qubits

@property
def peptide(self) -> Peptide:
"""Returns the peptide defining the protein subject to the folding problem."""
return self._peptide
8 changes: 5 additions & 3 deletions qiskit_nature/problems/sampling/sampling_problem.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2021.
# (C) Copyright IBM 2021, 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
Expand All @@ -14,6 +14,9 @@
from typing import Union

from qiskit.opflow import PauliSumOp, PauliOp
from qiskit.algorithms import MinimumEigensolverResult

from qiskit_nature.results import EigenstateResult


class SamplingProblem(ABC):
Expand All @@ -24,8 +27,7 @@ def qubit_op(self) -> Union[PauliOp, PauliSumOp]:
"""Returns a qubit operator that represents a Hamiltonian encoding the sampling problem."""
pass

# TODO type hint will be addressed later on
@abstractmethod
def interpret(self):
def interpret(self, raw_result: MinimumEigensolverResult) -> EigenstateResult:
"""Interprets results of an optimization."""
pass
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@
from qiskit.utils import optionals as _optionals

if _optionals.HAS_NETWORKX:
# pylint: disable=unused-import
# pylint: disable=import-error,unused-import
import networkx as nx

if _optionals.HAS_MATPLOTLIB:
# pylint: disable=unused-import
# pylint: disable=import-error,unused-import
from matplotlib.axes import Axes
from matplotlib.colors import Colormap

Expand Down Expand Up @@ -225,7 +225,7 @@ def _mpl(graph: PyGraph, self_loop: bool, **kwargs):
Raises:
MissingOptionalLibraryError: Requires matplotlib.
"""
# pylint: disable=unused-import
# pylint: disable=import-error,unused-import
from matplotlib import pyplot as plt

if not self_loop:
Expand Down
12 changes: 12 additions & 0 deletions qiskit_nature/results/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
Qiskit Nature results such as for electronic and vibrational structure. Algorithms
may extend these to provide algorithm specific aspects in their result.
Results
=======
Expand All @@ -31,6 +32,15 @@
ElectronicStructureResult
VibrationalStructureResult
LatticeModelResult
ProteinFoldingResult
Protein Folding Result support classes
--------------------------------------
.. autosummary::
:toctree:
utils
"""

Expand All @@ -39,6 +49,7 @@
from .electronic_structure_result import DipoleTuple, ElectronicStructureResult
from .vibrational_structure_result import VibrationalStructureResult
from .lattice_model_result import LatticeModelResult
from .protein_folding_result import ProteinFoldingResult

__all__ = [
"BOPESSamplerResult",
Expand All @@ -47,4 +58,5 @@
"ElectronicStructureResult",
"VibrationalStructureResult",
"LatticeModelResult",
"ProteinFoldingResult",
]
165 changes: 141 additions & 24 deletions qiskit_nature/results/protein_folding_result.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2021.
# (C) Copyright IBM 2021, 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
Expand All @@ -9,54 +9,171 @@
# Any modifications or derivative works of this code must retain this
# copyright notice, and modified files need to carry a notice indicating
# that they have been altered from the originals.

"""The protein folding result."""
from typing import Union

from qiskit.opflow import PauliOp
from typing import List, Optional
from qiskit.utils import optionals as _optionals
from ..problems.sampling.protein_folding.peptide.peptide import Peptide
from .eigenstate_result import EigenstateResult
from .utils.protein_plotter import ProteinPlotter
from .utils.protein_shape_decoder import ProteinShapeDecoder
from .utils.protein_shape_file_gen import ProteinShapeFileGen


from qiskit_nature.problems.sampling.protein_folding.protein_folding_problem import (
ProteinFoldingProblem,
)
from qiskit_nature.results import EigenstateResult
if _optionals.HAS_MATPLOTLIB:
# pylint: disable=import-error,unused-import
from matplotlib.pyplot import figure


class ProteinFoldingResult(EigenstateResult):
"""The protein folding result."""
"""
The Protein Folding Result.
This class interprets a bitstring encoding the turns of a protein from
:class:`~qiskit_nature.problems.sampling.protein_folding_problem.ProteinFoldingProblem`
and decodes it. One can generate a .xyz file
(using :meth:`~qiskit_nature.results.ProteinFoldingResult.save_xyz_file`), which is a file
containing the cartesian coordinates of each atom in the protein. This kind of file can be
used with other software to generate plots of the molecule.
Alternatively, one can use :meth:`~qiskit_nature.results.ProteinFoldingResult.get_figure`.
Note that `matplotlib` needs to be installed in order to generate such a figure.
"""

def __init__(
self,
protein_folding_problem: ProteinFoldingProblem,
best_sequence: Union[str, PauliOp],
peptide: Peptide,
unused_qubits: List[int],
turn_sequence: str,
) -> None:
"""
Args:
peptide: The peptide defining the protein subject to the folding problem.
unused_qubits: The list of indices for qubits in the original problem formulation that were
removed during compression.
turn_sequence: The bit sequence encoding the turns of the shape of the protein.
"""
super().__init__()
self._protein_folding_problem = protein_folding_problem
self._best_sequence: str = best_sequence

self._turn_sequence = turn_sequence
self._unused_qubits = unused_qubits
self._peptide = peptide
self._main_chain_length = len(self._peptide.get_main_chain.main_chain_residue_sequence)
self._side_chain_hot_vector = self._peptide.get_side_chain_hot_vector()

self._protein_shape_decoder = ProteinShapeDecoder(
turn_sequence=self._turn_sequence,
side_chain_hot_vector=self._side_chain_hot_vector,
fifth_bit=5 in self._unused_qubits[:6],
)

self._protein_shape_file_gen = ProteinShapeFileGen(
self.protein_shape_decoder.main_turns,
self.protein_shape_decoder.side_turns,
self._peptide,
)

@property
def protein_shape_decoder(self) -> ProteinShapeDecoder:
"""Returns the :class:`ProteinShapeDecoder` of the result.
This class will interpret the result bitstring and return the encoded information."""
return self._protein_shape_decoder

@property
def protein_folding_problem(self) -> ProteinFoldingProblem:
"""Returns the protein folding problem."""
return self._protein_folding_problem
def protein_shape_file_gen(self) -> ProteinShapeFileGen:
"""Returns the :class:`ProteinShapeFileGen` of the result."""
return self._protein_shape_file_gen

@property
def best_sequence(self) -> str:
def turn_sequence(self) -> str:
"""Returns the best sequence."""
return self._best_sequence
return self._turn_sequence

def get_result_binary_vector(self) -> str:
"""Returns a string that encodes a solution of the ProteinFoldingProblem.
The ProteinFoldingProblem uses a compressed optimization problem that does not match the
"""Returns a string that encodes a solution of the
:class:`~qiskit_nature.problems.sampling.protein_folding_problem.ProteinFoldingProblem`.
The :class:`~qiskit_nature.problems.sampling.protein_folding_problem.ProteinFoldingProblem`
uses a compressed optimization problem that does not match the
number of qubits in the original objective function. This method calculates the original
version of the solution vector. Bits that can take any value without changing the
solution are denoted by '*'."""
unused_qubits = self._protein_folding_problem.unused_qubits
solution are denoted by '_'.
This string is read from right to left, and every pair of bits encodes a turn ranging from 0
to 4:
* The first 4 correspond to the first 2 turns in the sequence. These 2 turns can arbitrarily
be set to any value due to rotation symmetry. Therefore the first 4 bits will be unused.
* If there is no secondary chain going out from the 2nd bead in the main chain, another
symmetry argument makes it such that the 3rd turn has effectively only 2 options. Therefore
the 5th qubit can sometimes be unused as well.
* The remaining pairs of qubits will encode the remaining turns of the main bead and then the
turns of the secondary chains in that order.
Example: In the context of a protein of length 5 with secondary chains in the 2nd and 4th
position ``10110110`` encodes the most efficient configuration. We start by flipping the
string and pairing up the bits ``01-10-11-01``. Note that in this case we have an even number
of bits. This is only due to the fact that we have a secondary chain in the second position.
Since the first 2 turns on the main chain were arbitrarily set (In qiskit we chose to set
them to ``[1,0]`` respectively) the sequence of turns in the main chain is ``[0,1,1,2]``.
The remaining pairs of bits indicate that the turns from the secondary chains in the 2nd
and 4th position are ``3`` and ``1`` respectively.
For more information see: `<https://doi.org/10.1038/s41534-021-00368-4>`__.
"""
unused_qubits = self._unused_qubits
result = []
offset = 0
size = len(self._best_sequence)
size = len(self._turn_sequence)
for i in range(size):
index = size - 1 - i
while i + offset in unused_qubits:
result.append("*")
result.append("_")
offset += 1
result.append(self._best_sequence[index])
result.append(self._turn_sequence[index])

return "".join(result[::-1])

def save_xyz_file(
self, name: Optional[str] = None, path: str = "", comment: str = "", replace: bool = False
) -> None:

"""
Generates a .xyz file.
Args:
name: Name of the file to be generated. If the name is ``None`` the
name of the file will be the letters of the aminoacids on the main_chain.
If a file of the same name already exists then the action taken is dependent
on the `replace` arg.
path: Path where the file will be generated. If left empty the file will
be saved in the working directory.
comment: Comment to be added to the second line of the file. By default, the line will
be left blank.
replace: If ``True``, the file will be overwritten if it already exists.
Raises:
FileExistsError: If the file already exists and replace is ``False``.
"""
if name is None:
name = str(self._peptide.get_main_chain.main_chain_residue_sequence)
self.protein_shape_file_gen.save_xyz_file(
name=name, path=path, comment=comment, replace=replace
)

@_optionals.HAS_MATPLOTLIB.require_in_call
def get_figure(
self, title: str = "Protein Structure", ticks: bool = False, grid: bool = False
) -> "figure":
"""
Generates a figure of the molecule in 3D.
Args:
title: The title of the plot.
ticks: Boolean for showing ticks in the graphic.
grid: Boolean for showing the grid in the graphic.
Returns:
A figure with the folded protein.
"""
return ProteinPlotter(self.protein_shape_file_gen).get_figure(
title=title, ticks=ticks, grid=grid
)
Loading

0 comments on commit d1ff719

Please sign in to comment.