automatic official FASTA file fetching, several new utility functions related to structure including flexible 3d alignment that supports different length chains to be aligned! #101

YoelShoshan · 2024-02-21T08:41:48Z

No description provided.

…yoels

…he clusters centers/representatives including the sequence. this is useful for create splits etc.

…ng few parts in structure_io

…yoels

… yoels

…yoels

… yoels

…hains from pdbs

… yoels

…yoels

… yoels

…yoels

mosheraboh

Looks great.
Minors inline.

mosheraboh · 2024-05-26T06:36:16Z

fusedrug/data/protein/structure/extract_chains_to_pdbs.py

+from typing import Optional
+
+
+def main(


what is this script for? please explain in the comments section.

added this desc in the docstring:

"Takes an input PDB files and splits it into separate files, one per describe chain, allowing to rename the chains if desired"

mosheraboh · 2024-05-26T06:38:28Z

fusedrug/data/protein/structure/flexible_align_chains_structure.py

+        mask=None,  # TODO: check
+    )
+
+    # apply_on_atom_pos = apply_rigid_on_dynamic_concat['atom_positions']


Do you need the commented out code below?

mosheraboh · 2024-05-26T06:39:32Z

fusedrug/data/protein/structure/protein_complex.py

@@ -22,18 +22,26 @@ def __init__(self, verbose: bool = True) -> None:
        self.chains_data = {}  # maps from chain description (e.g. ('7vux', 'A')) to
        self.flattened_data = {}

+        #
+        self.per_chain_most_frequent_residue_part = {}


The key is chain_id?

a key is a tuple in the format (pdb_id, chain_id)

added a comment above it with description

mosheraboh · 2024-05-26T06:40:25Z

fusedrug/data/protein/structure/protein_complex.py

    def add(
        self,
-        pdb_id: str,
+        pdb_id_or_filename: str,
+        pdb_id: Optional[str] = None,


Why do you keep pdb_id? Is backward compatibility important here?

pdb_id_or_filename might be a path to a local pdb file, so pdb_id gives the user an option to define the pdb_id as there is no implemented way to automatically extract it.

I'll check ways to extract it automatically from the pdb file itself, and check if it's important to allow the user to override that.

That will be a separate PR.

about what you asked - backward compatibility isn't super critical here, as we can likely detect all usage locations and port them to newer usage.

mosheraboh · 2024-05-26T06:54:04Z

fusedrug/data/protein/structure/structure_io.py

@@ -174,7 +176,7 @@ def load_protein_structure_features(
    chain_id_type: str = "author_assigned",
    device: str = "cpu",
    max_allowed_file_size_mbs: float = None,
-    also_return_mmcif_object: bool = False,
+    # also_return_mmcif_object: bool = False,


mosheraboh · 2024-05-26T06:54:55Z

fusedrug/data/protein/structure/structure_io.py

    ):
        if not m_res:
            continue
        aa_idx = aa_idx.item()
-        p_res = p_res.clone().detach().cpu()  # fixme: this looks slow
-        if aa_idx == 21:
+        # if torch.is_tensor(p_res):


YoelShoshan added 30 commits March 8, 2023 05:30

adding deduplication and cluster generation generic tool

7563e3e

renamed few arguments

bef1c25

...

7140781

PR comments

68647da

printing key generating output files in cluster

3011c2c

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

8647a06

…yoels

static code checker fixes

b3bcd11

reduced dependencies and did some cleanup

1485738

added visualizations utils for antibodies

91a98fd

static code check fixes

f77178b

black mypy flake8 fixes

a7bccd3

dealing with large fasta files

8db8791

returning a consistent amount of elements in tuple

2305539

when clustering with mmseqs2, now also outputting a FASTA file with t…

8705282

…he clusters centers/representatives including the sequence. this is useful for create splits etc.

moved all mmseqs DB to a workspace to avoid clutter

19bd0fe

renamed

bed6a90

solved conflicts

9a0b4a4

better conflict merge

bfe4ba5

added splitting based on cluster.tsv

5da1694

better docstring

9aeeeb2

static checkers fixes

a3f6033

PR coments

41546df

balanced sampling and mmap lines reader

56fafdd

...

8579b30

solved a flipped file creations in cluster_using_mmseqs and refactori…

50c3dc8

…ng few parts in structure_io

added proper caching of return answer from cluster

63f1cb4

solved conflict

c31a5f1

splits and clusters

075b4c8

pdb clustering related and also adding requirements

1fa7507

pdb prepare_data related

cb01191

YoelShoshan and others added 25 commits January 10, 2024 06:43

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

c4d9c65

…yoels

...

769b249

advancing on flexible multi chain alignment

35fa988

flexible alignment

d808fca

flexible structure alignment is working well!

fcceb44

...

2b8fdca

...

25600ab

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

fe6105e

… yoels

flexible align on multiple with table as input

7ddd58e

advanced on flexible align

e4b76bc

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

00b17f5

…yoels

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

e9f6140

… yoels

flexible multiple align and also extracting and optionally renaming c…

ae667b6

…hains from pdbs

...

1307305

supporting saving pdb when given atom37 pos as input

8a280b3

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

70d722d

… yoels

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

fb0cfe1

…yoels

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

6adcb61

… yoels

complex protein

72ca1f0

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

7c22f43

… yoels

...

a635acc

Merge branch 'yoels' of https://github.com/BiomedSciAI/fuse-drug into…

77f27ad

… yoels

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

87e3ff5

…yoels

static code tests fixes

34c865a

fixed handling of bfactors - they are per atom, not per residue!

72b2328

YoelShoshan requested a review from mosheraboh May 5, 2024 06:08

Merge branch 'main' of https://github.com/BiomedSciAI/fuse-drug into …

a0a6c02

…yoels

mosheraboh approved these changes May 26, 2024

View reviewed changes

PR comments

43a9a34

YoelShoshan merged commit 4a1208a into main May 26, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic official FASTA file fetching, several new utility functions related to structure including flexible 3d alignment that supports different length chains to be aligned! #101

automatic official FASTA file fetching, several new utility functions related to structure including flexible 3d alignment that supports different length chains to be aligned! #101

YoelShoshan commented Feb 21, 2024

mosheraboh left a comment

mosheraboh May 26, 2024

YoelShoshan May 26, 2024

mosheraboh May 26, 2024

YoelShoshan May 26, 2024

mosheraboh May 26, 2024

YoelShoshan May 26, 2024

mosheraboh May 26, 2024

YoelShoshan May 26, 2024

mosheraboh May 26, 2024

YoelShoshan May 26, 2024

mosheraboh May 26, 2024

automatic official FASTA file fetching, several new utility functions related to structure including flexible 3d alignment that supports different length chains to be aligned! #101

automatic official FASTA file fetching, several new utility functions related to structure including flexible 3d alignment that supports different length chains to be aligned! #101

Conversation

YoelShoshan commented Feb 21, 2024

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment