Skip to content

Commit

Permalink
Footprinting (#43)
Browse files Browse the repository at this point in the history
Feat: Add a footprinting command and make improvements to pyft
  • Loading branch information
mrvollger authored Mar 13, 2024
1 parent b0b9b6b commit 0429ff0
Show file tree
Hide file tree
Showing 26 changed files with 1,628 additions and 94 deletions.
7 changes: 3 additions & 4 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@ build:
apt_packages:
- cmake
tools:
python: "3.10"
rust: "1.70"
python: "3.12"
rust: "1.75"

python:
install:
- requirements: py-ft/docs/requirements.txt
- method: pip
path: ./py-ft

#conda:
# environment: py-ft/docs/environment.yml
# environment: py-ft/docs/environment.yml
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ assert_cmd = "2.0.11"
predicates = "3.0.3"

[profile.dev]
opt-level = 0

[profile.test]
opt-level = 2

[profile.release]
Expand Down
56 changes: 56 additions & 0 deletions docs/footprint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# `ft-footprint`

## Usage
```bash
ft footprint [OPTIONS] <BAM> <BED> <YAML> <OUT>
```
The `BAM` file is an indexed fiber-seq bam file.

The `BED` file is a bed file with the motifs you'd like to test for footprints. This should include the strand the motif is on.

The `YAML` file is a file that describes the modules within the motif that can be footprinted. e.g. a CTCF `yaml` with its multiple binding sites might look like:
```yaml
modules:
- [0, 8]
- [8, 16]
- [16, 23]
- [23, 29]
- [29, 35]
```
Modules must start at zero, end at the length of the motif, be sorted, and be contiguous with one another. At most 15 modules are allowed, and the intervals are 0-based, half-open (like `BED`).

## Description of output columns

The footprinting output table is a tab-separated file with the same number of entries as the input BED file and the following columns:


| Column | Description |
| -------------------- | ------------------------------------------------------------------ |
| chrom | Chromosome |
| start | The start position of the motif |
| end | The end position of the motif |
| strand | The strand of the motif. |
| n_spanning_fibers | The number of fibers that span the motif. |
| n_spanning_msps | The number of msp that span the motif. |
| n_overlapping_nucs | The number of fibers that have an intersecting nucleosome. |
| module_X | The number of fibers that are footprinted in module X. The number of module columns is determined by the footprinting yaml. |
| footprint_codes | Comma separated list of footprint codes for each fiber. See details below. |
| fire_quals | Comma separated list of fire qualities for each fiber. -1 if the MSP is not spanning or present. Note all fire_quals will be 0 or -1 if FIRE has not been applied to the bam. |
| fiber_names | Comma separated list of fiber names that span the motif. Names share the same index as the previous column, so they can be matched with footprint codes. |

## Footprint codes
The footprint codes are an encoded bit flag similar to how filtering is done with `samtools`. If the first bit is set (1) then there is an MSP that spans the footprint. For each following bit, the bit is set if that module is footprinted by that fiber.

Here are some examples in python for how you could test a footprint code in a few ways:
```python
fp_code = 0b1001 # this is a value of 9, but in binary it is 1001
# test if the first bit is set, there is a spanning MSP, true in this example
(fp_code & 1) > 0
# test if the first module is footprinted, false in this example
(fp_code & (1 << 1)) > 0
# test if the third module is footprinted, true in this example
(fp_code & (1 << 3)) > 0
```
7 changes: 5 additions & 2 deletions py-ft/DEV-NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,8 @@ source .env/bin/activate
```


# Remote pip install of dev branch
pip install -e 'git+https://github.com/fiberseq/fibertools-rs.git@refactor#egg=pyft&subdirectory=py-ft'
# Remote pip install of another branch
in this case the other branch is called `footprint`.
```bash
pip install -e 'git+https://github.com/fiberseq/fibertools-rs.git@footprint#egg=pyft&subdirectory=py-ft'
```
16 changes: 16 additions & 0 deletions py-ft/docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

API Reference
=================

.. automodule:: pyft
:members:
:undoc-members:
:show-inheritance:
:member-order: bysource

.. automodule:: utils
:members:
:undoc-members:
:show-inheritance:
:member-order: bysource

5 changes: 4 additions & 1 deletion py-ft/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@

# import sphinx
# import sphinx.ext.autosummary as autosummary
sys.path.insert(0, os.path.abspath("../"))
# sys.path.insert(0, os.path.abspath("../"))
sys.path.insert(0, os.path.abspath("../python/pyft"))
import pyft

# -- Project information -----------------------------------------------------
Expand All @@ -33,6 +34,8 @@
"sphinx.ext.intersphinx",
# "edit_on_github",
"m2r2",
# include markdown
"nbsphinx",
]

# source_suffix = '.rst'
Expand Down
48 changes: 21 additions & 27 deletions py-ft/docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
.. pyft documentation master file, created by
sphinx-quickstart on Wed Jul 19 20:00:42 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. toctree::
:maxdepth: 2
:caption: Contents:
Getting started
---------------

pyft: python bindings for fibertools-rs
=======================================
---------------------------------------

.. image:: https://readthedocs.org/projects/py-ft/badge/?version=latest
:target: https://py-ft.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Expand All @@ -18,35 +14,33 @@ pyft: python bindings for fibertools-rs

**pyft** provides a python API for the rust library `fibertools-rs <https://github.com/fiberseq/fibertools-rs>`_. The inspiration for this API is to make analysis in python easier and faster; therefore, only extraction of data from a fiberseq bam is supported and not writing.


Install
=======
.. code-block:: bash
pip install pyft
Example
=======
.. highlight:: python
.. include:: ../example.py
:literal:
.. highlight:: none

API Reference
==================

.. automodule:: pyft
:members:
:undoc-members:
:show-inheritance:
:member-order: bysource



Vignettes
=========
The `vignettes <vignettes/index.rst>`_ are a good place to start to understand the capabilities of pyft.

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`


.. toctree::
:hidden:
:maxdepth: 2
:caption: pyft

self
api.rst
vignettes/index.rst


7 changes: 5 additions & 2 deletions py-ft/docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
m2r2
sphinx_bootstrap_theme
sphinx<7.0.0
sphinx-autodoc-typehints
sphinx-autodoc-typehints==1.23.0
sphinx-rtd-theme==1.2.2
docutils
docutils==0.18.1
setuptools==69.1.1
nbsphinx
pypandoc_binary
#sphinxawesome-theme
Loading

0 comments on commit 0429ff0

Please sign in to comment.