Merge pull request #10 from Fudenberg-Research-Group/updates

Updates
Fudenberg-Research-Group · Jan 20, 2025 · 43db1a7 · 43db1a7
2 parents 7c1e829 + 9702a02
commit 43db1a7
Show file tree

Hide file tree

Showing 5 changed files with 63 additions and 156 deletions.
diff --git a/README.md b/README.md
@@ -1,85 +1,54 @@
+# Chromoscores
 
-# Python Project Template
+![Alt Text](./docs/representations.png)
 
-A low dependency and really simple to start project template for Python Projects.
+A Python package for quantitative analysis of simulated Hi-C maps, providing tools to capture, process and evaluate chromatin interaction patterns such as Topoligically Associating Domains (TADs), flames, and peaks.
 
-See also 
-- [Flask-Project-Template](https://github.com/rochacbruno/flask-project-template/) for a full feature Flask project including database, API, admin interface, etc.
-- [FastAPI-Project-Template](https://github.com/rochacbruno/fastapi-project-template/) The base to start an openapi project featuring: SQLModel, Typer, FastAPI, JWT Token Auth, Interactive Shell, Management Commands.
 
-### HOW TO USE THIS TEMPLATE
+### Requirement 📃
+- numpy
 
-> **DO NOT FORK** this is meant to be used from **[Use this template](https://github.com/rochacbruno/python-project-template/generate)** feature.
+
+### Structure of the repository
+The structure of this repository follows as below:
+- maputils : Required functions for processing maps such as obsdrved over expected, or piling up snippets with specific features. 
+- scorefunctions : functions for quantitative analysis of features.
+- snipping: functions for capturing snippets containing specific features.
+- analysis: notebooks and code as tutorials for analyzing simulated data.
 
-1. Click on **[Use this template](https://github.com/rochacbruno/python-project-template/generate)**
-3. Give a name to your project  
-   (e.g. `my_awesome_project` recommendation is to use all lowercase and underscores separation for repo names.)
-3. Wait until the first run of CI finishes  
-   (Github Actions will process the template and commit to your new repo)
-4. If you want [codecov](https://about.codecov.io/sign-up/) Reports and Automatic Release to [PyPI](https://pypi.org)  
-  On the new repository `settings->secrets` add your `PYPI_API_TOKEN` and `CODECOV_TOKEN` (get the tokens on respective websites)
-4. Read the file [CONTRIBUTING.md](CONTRIBUTING.md)
-5. Then clone your new project and happy coding!
-
-> **NOTE**: **WAIT** until first CI run on github actions before cloning your new project.
-
-### What is included on this template?
-
-- 🖼️ Templates for starting multiple application types:
-  * **Basic low dependency** Python program (default) [use this template](https://github.com/rochacbruno/python-project-template/generate)
-  * **Flask** with database, admin interface, restapi and authentication [use this template](https://github.com/rochacbruno/flask-project-template/generate).
-  **or Run `make init` after cloning to generate a new project based on a template.**
-- 📦 A basic [setup.py](setup.py) file to provide installation, packaging and distribution for your project.  
-  Template uses setuptools because it's the de-facto standard for Python packages, you can run `make switch-to-poetry` later if you want.
-- 🤖 A [Makefile](Makefile) with the most useful commands to install, test, lint, format and release your project.
-- 📃 Documentation structure using [mkdocs](http://www.mkdocs.org)
-- 💬 Auto generation of change log using **gitchangelog** to keep a HISTORY.md file automatically based on your commit history on every release.
-- 🐋 A simple [Containerfile](Containerfile) to build a container image for your project.  
-  `Containerfile` is a more open standard for building container images than Dockerfile, you can use buildah or docker with this file.
-- 🧪 Testing structure using [pytest](https://docs.pytest.org/en/latest/)
-- ✅ Code linting using [flake8](https://flake8.pycqa.org/en/latest/)
-- 📊 Code coverage reports using [codecov](https://about.codecov.io/sign-up/)
-- 🛳️ Automatic release to [PyPI](https://pypi.org) using [twine](https://twine.readthedocs.io/en/latest/) and github actions.
-- 🎯 Entry points to execute your program using `python -m <chromoscores>` or `$ chromoscores` with basic CLI argument parsing.
-- 🔄 Continuous integration using [Github Actions](.github/workflows/) with jobs to lint, test and release your project on Linux, Mac and Windows environments.
-
-> Curious about architectural decisions on this template? read [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md)  
-> If you want to contribute to this template please open an [issue](https://github.com/rochacbruno/python-project-template/issues) or fork and send a PULL REQUEST.
-
-[❤️ Sponsor this project](https://github.com/sponsors/rochacbruno/)
-
-<!--  DELETE THE LINES ABOVE THIS AND WRITE YOUR PROJECT README BELOW -->
-
----
-# chromoscores
-
-[![codecov](https://codecov.io/gh/Fudenberg-Research-Group/chromoscores/branch/main/graph/badge.svg?token=chromoscores_token_here)](https://codecov.io/gh/Fudenberg-Research-Group/chromoscores)
-[![CI](https://github.com/Fudenberg-Research-Group/chromoscores/actions/workflows/main.yml/badge.svg)](https://github.com/Fudenberg-Research-Group/chromoscores/actions/workflows/main.yml)
-
-Awesome chromoscores created by Fudenberg-Research-Group
-
-## Install it from PyPI
+
+### Installation 📦
+First, 
 
+```
+git https://github.com/Fudenberg-Research-Group/chromoscores.git
+```
+then
 ```bash
 pip install chromoscores
 ```
 
 ## Usage
 
 ```py
-from chromoscores import BaseClass
 from chromoscores import base_function
-
-BaseClass().base_method()
-base_function()
 ```
 
-```bash
-$ python -m chromoscores
-#or
-$ chromoscores
-```
+### Analysis 📊
+Observable features can be quantified, including:
+
+- Observed over expected
+- TADs (Topologically Associating Domains)
+- flames
+- Dots (loops between barriers)
+
+
+See tutorials in `./jupyter_notebooks`.
+
+
+
+[![codecov](https://codecov.io/gh/Fudenberg-Research-Group/chromoscores/branch/main/graph/badge.svg?token=chromoscores_token_here)](https://codecov.io/gh/Fudenberg-Research-Group/chromoscores)
+[![CI](https://github.com/Fudenberg-Research-Group/chromoscores/actions/workflows/main.yml/badge.svg)](https://github.com/Fudenberg-Research-Group/chromoscores/actions/workflows/main.yml)
+
 
-## Development
 
-Read the [CONTRIBUTING.md](CONTRIBUTING.md) file.
diff --git a/chromoscores/maputils.py b/chromoscores/maputils.py
@@ -5,13 +5,13 @@ def get_diagonal_pileup(contact_map, boundary_list, window_size = 10):
     """
     parameters
     ----------
-    contact_map: contact map
-    boundary_list: list of the boundary elements positions on the diagonal
-    window_size: size of the window
+    contact_map: contact map (2D array)
+    boundary_list: list of the boundary elements' positions on the diagonal
+    window_size: size of the window (must be odd for center)
 
     Returns
     -------
-    a stackup of snippts around the boundary elements
+    a stackup of snippets around the boundary elements
     """
 
     if window_size <= 0 or window_size > len(contact_map):
@@ -75,9 +75,7 @@ def get_offdiagonal_pileup_binlist(
     ----------
     contact_map: contact map
     boundary_list: list of the boundary elements positions on the diagonal
-    min_dist: minimum distance from the diagonal
-    max_dist: maximum distance from the diagonal
-    bin_num: number of bins
+    binlist : exact list of bin boundaries 
     window_size: size of the window for the pileup
 
     Returns
@@ -113,15 +111,13 @@ def get_offdiagonal_pileup_binlist_orientation(
     contact_map: contact map
     boundary_list: list of the boundary elements positions on the diagonal
     orientation: list of the boundary element orientations
-    min_dist: minimum distance from the diagonal
-    max_dist: maximum distance from the diagonal
-    bin_num: number of bins
+    binlist: exact list of bins boundaries
     window_size: size of the window for the pileup
 
     Returns
     -------
-    a list of pileups as numpy arrays around the feature (e.g., peaks) as a function of distance from the diagonal
-    and orientation
+    a list of pileups as numpy arrays around the feature (e.g., peaks) as a function of distance from the diagonal,
+    orientation between barriers, and the number of snippets at each range.
     """
     bin_border_int = binlist
     bin_num = len(bin_border_int)
@@ -136,7 +132,10 @@ def get_offdiagonal_pileup_binlist_orientation(
         mat_tandn = np.zeros((window_size, window_size))
 
         dist = (bin_border_int[i] + bin_border_int[i + 1]) / 2
-
+        n_conv = 0
+        n_dive = 0
+        n_tand_p = 0
+        n_tand_n = 0
         for i_element in boundary_list:
                 for j_element in boundary_list:
                     if bin_border_int[i] <= (j_element - i_element) < bin_border_int[i + 1]:
@@ -146,98 +145,36 @@ def get_offdiagonal_pileup_binlist_orientation(
                         ]
                         if orientation[np.flatnonzero(boundary_list==np.max([i_element, j_element]))] == '+':
                             if orientation[np.flatnonzero(boundary_list==np.min([i_element, j_element]))] == '-':
+                                n_conv += 1 
                                 mat_conv += contact_map[
                                     i_element - window_size // 2 : i_element + window_size // 2,
                                     j_element - window_size // 2 : j_element + window_size // 2,
                                 ]
                             else:
+                                n_tand_p +=1 
                                 mat_tandp += contact_map[
                                     i_element - window_size // 2 : i_element + window_size // 2,
                                     j_element - window_size // 2 : j_element + window_size // 2,
                                 ]
                         else:
                             if orientation[np.flatnonzero(boundary_list==np.min([i_element, j_element]))] == '+':
+                                n_dive +=1 
                                 mat_dive += contact_map[
                                     i_element - window_size // 2 : i_element + window_size // 2,
                                     j_element - window_size // 2 : j_element + window_size // 2,
                                 ]
                             else:
+                                n_tand_n +=1 
                                 mat_tandn += contact_map[
                                     i_element - window_size // 2 : i_element + window_size // 2,
                                     j_element - window_size // 2 : j_element + window_size // 2,
                                 ]
-
-        pile_ups.extend([[['+-',dist,mat_conv],['-+',dist,mat_dive],['++',dist,mat_tandp],['--',dist,mat_tandn],['all',dist,mat]]])
+        n_tot = n_conv + n_dive + n_tand_p + n_tand_n
+        pile_ups.extend([[['+-',dist,mat_conv, n_conv],['-+',dist,mat_dive, n_dive],['++',dist,mat_tandp, n_tand_p],['--',dist,mat_tandn, n_tand_n],['all',dist,mat, n_tot]]])
 
     return pile_ups
 
 
-
-
-def get_offdiagonal_pileup_orientation(contact_map, boundary_list, orientation, binlist, window_size=10):
-    """
-    Parameters
-    ----------
-    contact_map : np.array
-        Contact map.
-    boundary_list : list
-        List of the boundary elements positions on the diagonal.
-    orientation : list
-        List of the boundary element orientations.
-    binlist : list
-        List of bin edges.
-    window_size : int, optional
-        Size of the window for the pileup (default is 10).
-
-    Returns
-    -------
-    pile_ups : list
-        A list of pileups as numpy arrays around the feature (e.g., peaks) as a function of distance from the diagonal.
-    """
-
-    bin_num = len(binlist)
-
-    # Initialize matrices for storing pileups
-    pile_ups = []
-
-    for i in range(bin_num - 1):
-        dist = (binlist[i] + binlist[i + 1]) / 2
-
-        mat = np.zeros((window_size, window_size))
-        mat_conv = np.zeros((window_size, window_size))
-        mat_dive = np.zeros((window_size, window_size))
-        mat_tandp = np.zeros((window_size, window_size))
-        mat_tandn = np.zeros((window_size, window_size))
-
-        for i_element in boundary_list:
-            for j_element in boundary_list:
-                if binlist[i] <= (j_element - i_element) < binlist[i + 1]:
-                    window_i_start = i_element - window_size // 2
-                    window_i_end = i_element + window_size // 2
-                    window_j_start = j_element - window_size // 2
-                    window_j_end = j_element + window_size // 2
-
-                    mat += contact_map[window_i_start:window_i_end, window_j_start:window_j_end]
-
-                    max_orientation = orientation[np.flatnonzero(boundary_list == max(i_element, j_element))]
-                    min_orientation = orientation[np.flatnonzero(boundary_list == min(i_element, j_element))]
-
-                    if max_orientation == '+':
-                        if min_orientation == '-':
-                            mat_conv += contact_map[window_i_start:window_i_end, window_j_start:window_j_end]
-                        else:
-                            mat_tandp += contact_map[window_i_start:window_i_end, window_j_start:window_j_end]
-                    else:
-                        if min_orientation == '+':
-                            mat_dive += contact_map[window_i_start:window_i_end, window_j_start:window_j_end]
-                        else:
-                            mat_tandn += contact_map[window_i_start:window_i_end, window_j_start:window_j_end]
-
-        pile_ups.append([['+-', dist, mat_conv], ['-+', dist, mat_dive], ['++', dist, mat_tandp], ['--', dist, mat_tandn], ['all', dist, mat]])
-
-    return pile_ups
-
-
 def get_observed_over_expected(contact_map):
     """
     parameters

diff --git a/chromoscores/scorefunctions.py b/chromoscores/scorefunctions.py
@@ -5,7 +5,7 @@
 
 
 def peak_score_upperRight(
-    peak_snippet, peak_width=3, background_width=10, pseudo_count=0
+    peak_snippet, peak_width = 3, background_width = 10, pseudo_count = 0
 ):
     """
     parameters
@@ -41,7 +41,7 @@ def peak_score_upperRight(
 
 
 def peak_score_lowerRight(
-    peak_snippet, peak_width=3, background_width=10, pseudo_count=0
+    peak_snippet, peak_width = 3, background_width = 10, pseudo_count = 0
 ):
     """
     parameters
@@ -77,7 +77,7 @@ def peak_score_lowerRight(
 
 
 def peak_score_upperLeft(
-    peak_snippet, peak_width=3, background_width=10, pseudo_count=0
+    peak_snippet, peak_width = 3, background_width = 10, pseudo_count = 0
 ):
     """
     parameters
@@ -113,7 +113,7 @@ def peak_score_upperLeft(
 
 
 def peak_score_lowerLeft(
-    peak_snippet, peak_width=3, background_width=10, pseudo_count=0
+    peak_snippet, peak_width = 3, background_width = 10, pseudo_count = 0
 ):
     """
     parameters
@@ -207,7 +207,7 @@ def _get_isolation_areas(contact_map, delta=1, diag_offset=3, max_distance=10, s
     delta: distance from the border between in_tad and out_tad
     diag_offset: distance of the snippet from the diagonal. This also determines the size of the snippet.
     max_distance: maximum distance from the diagonal
-    state: 1 for triangle snippets, 0 for square snippets
+    snippet_shapes: shape of the snippets for taking the average. 
 
     returns
     -------
@@ -271,7 +271,7 @@ def isolation_score(snippet, delta, diag_offset, max_dist, snippet_shapes , pseu
            flames when extracting in_tad and out_tad areas.
     diag_offset: distance from the diagonal. This also determines the size of the snippet.
     max_distance: maximum distance from the diagonal
-    state: 1 for triangle snippets, 0 for square snippets
+    snippet_shapes: shape of the snippet for taking the average
     pseudo_count: pseudo count to avoid division by zero
 
     returns
@@ -292,7 +292,7 @@ def isolation_score(snippet, delta, diag_offset, max_dist, snippet_shapes , pseu
 
 
 def flame_score_vertical(
-    flame_snippet, flame_thickness, background_thickness, pseudo_count=1
+    flame_snippet, flame_thickness, background_thickness, pseudo_count = 1
 ):
     """
     parameters
@@ -322,7 +322,7 @@ def flame_score_vertical(
 
 
 def flame_score_horizontal(
-    snippet, flame_thickness, background_thickness, pseudo_count=1
+    snippet, flame_thickness, background_thickness, pseudo_count = 1
 ):
     """
     parameters

diff --git a/chromoscores/snipping.py b/chromoscores/snipping.py
@@ -58,11 +58,12 @@ def tad_snippet_sectors(
     parameters
     ----------
     contact_map: snippet of a contact map around a boundary element
+    boundary_list: boundary_list: list of the boundary elements positions on the diagonal
+    index: index of the boundary element in the boundary_list. This should be in the range of boundary_list.
     delta: distance from the border between in_tad and out_tad. is defined to exclude
            flames when extracting in_tad and out_tad areas.
     diag_offset: distance from the diagonal. This also determines the size of the snippet.
     max_distance: maximum distance from the diagonal
-    state: 1 for triangle snippets, 0 for square snippets
 
     returns
     -------

diff --git a/docs/representations.png b/docs/representations.png