Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Niko Papadopoulos committed Apr 18, 2024
1 parent 4ad579a commit d2cccaf
Showing 1 changed file with 53 additions and 73 deletions.
126 changes: 53 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,14 @@ black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://gith
<img src="https://raw.githubusercontent.com/galicae/comandos/main/.github/tardi_head.png" height="256" />
</p>

As a long-time user of SAMap and a connoisseur of cross-species
comparisons, I was always frustrated by the lack of useful visualization
downstream of SAMap results. We don’t need fancy software to figure out
that ciliated cells are similar to ciliated cells, and muscle to muscle,
but what about those pesky “unknown_sensory_2” and “ciliated?\_2”
clusters? ComAnDOS is a collection of plotting functions and assorted
utilities that I hacked together to help myself and collaborators make
sense of SAMap results. It is conceived as a SAMap add-on, but it
doesn’t require it per se. It mostly works with two “primitives”:
AnnData, and Pandas DataFrames, so it should be relatively easy to adapt
to your use case.
As a long-time user of SAMap and a connoisseur of cross-species comparisons, I was always frustrated
by the lack of useful visualization downstream of SAMap results. We don’t need fancy software to
figure out that ciliated cells are similar to ciliated cells, and muscle to muscle, but what about
those pesky “unknown_sensory_2” and “ciliated?\_2” clusters? ComAnDOS is a collection of plotting
functions and assorted utilities that I hacked together to help myself and collaborators make sense
of SAMap results. It is conceived as a SAMap add-on, but it doesn’t require it per se. It mostly
works with two “primitives”: AnnData, and Pandas DataFrames, so it should be relatively easy to
adapt to your use case.

It currently includes:

Expand All @@ -43,77 +40,61 @@ It currently includes:
click to expand
</summary>

Single-cell RNA-seq (scRNA-seq) is a powerful tool to study the
transcriptome of individual cells. As the technology matured, it became
possible to use it on non-model organisms, facilitating cell type
comparison across species. Early methods for this task subsetted gene
expression matrices to one-to-one orthologous genes, assuming that
sequence conservation also implies conservation of location, magnitude,
and timing of gene expression. Not only is this assumption not true, but
it also requires us to discard a large amount of data.

[SAMap](https://elifesciences.org/articles/66747) was the first method
to try and include many-to-one orthology relations. In alternating
steps, it optimises a cell graph and a gene graph, using the former to
inform the latter and vice versa. The result is a converged
low-dimensional embedding that contains the cells of both species,
allowing for direct comparison.

SAMap comes with a small number of visualization tools, but, as I had to
find out myself, they are not sufficient for in-depth analysis. In
particular:

- Sankey diagrams, SAMap’s default visualization for cluster-cluster
relationships, obscure the fact that cell types are hierarchically
organized. They also make it harder to quantify just how similar two
cell types are according to SAMap.
- Tarashansky *et al.* used network diagrams to demonstrate highly
connected cell type families. I found that these diagrams are not very
informative, as they are hard to read and do not scale well to large
datasets.
- In the publication, heatmaps are used once, but not to their full
extent.
- Overlapping dimplots with corresponding violin plots are used to
demonstrate co-expression across species. This is a good idea, but
results in overloaded plots.
- The authors use dotplots to show gene expression across species,
color-coding the species. This loses one of the dotplots’ dimensions,
where color usually encodes expression level, and forces the use of an
additional axis to show expression magnitude. Furthermore, the
relationships between the plotted genes in the different species are
hard to visualize and need to be described in text.

These visualisations have two additional shortcomings: First, they are
not easily reproducible, as they are not part of SAMap but rather custom
solutions for very specific use cases. Second, they are extremely
specific in what they show, and thus not useful for exploratory data
analysis.
Single-cell RNA-seq (scRNA-seq) is a powerful tool to study the transcriptome of individual cells.
As the technology matured, it became possible to use it on non-model organisms, facilitating cell
type comparison across species. Early methods for this task subsetted gene expression matrices to
one-to-one orthologous genes, assuming that sequence conservation also implies conservation of
location, magnitude, and timing of gene expression. Not only is this assumption not true, but it
also requires us to discard a large amount of data.

[SAMap](https://elifesciences.org/articles/66747) was the first method to try and include
many-to-one orthology relations. In alternating steps, it optimises a cell graph and a gene graph,
using the former to inform the latter and vice versa. The result is a converged low-dimensional
embedding that contains the cells of both species, allowing for direct comparison.

SAMap comes with a small number of visualization tools, but, as I had to find out myself, they are
not sufficient for in-depth analysis. In particular:

- Sankey diagrams, SAMap’s default visualization for cluster-cluster relationships, obscure the fact
that cell types are hierarchically organized. They also make it harder to quantify just how
similar two cell types are according to SAMap.
- Tarashansky *et al.* used network diagrams to demonstrate highly connected cell type families. I
found that these diagrams are not very informative, as they are hard to read and do not scale well
to large datasets.
- In the publication, heatmaps are used once, but not to their full extent.
- Overlapping dimplots with corresponding violin plots are used to demonstrate co-expression across
species. This is a good idea, but results in overloaded plots.
- The authors use dotplots to show gene expression across species, color-coding the species. This
loses one of the dotplots’ dimensions, where color usually encodes expression level, and forces
the use of an additional axis to show expression magnitude. Furthermore, the relationships between
the plotted genes in the different species are hard to visualize and need to be described in text.

These visualisations have two additional shortcomings: First, they are not easily reproducible, as
they are not part of SAMap but rather custom solutions for very specific use cases. Second, they are
extremely specific in what they show, and thus not useful for exploratory data analysis.

</details>

## Documentation

This package was developed using [nbdev](https://nbdev.fast.ai/), which
means the source code was generated from Jupyter notebooks using the
[literate programming
paradigm](https://en.wikipedia.org/wiki/Literate_programming). You can
see the exported function signatures and assorted explanations
[online](https://galicae.github.io/comandos/). I am currently working on
tutorials for the most important use cases.
You can see the exported function signatures and assorted explanations
[here](https://galicae.github.io/comandos/). I am currently working on tutorials for the most
important use cases.

For questions or requests please open an issue on
[GitHub](https://github.com/galicae/comandos/issues/new). I will be
communicating updates, if any, on
[Twitter](https://twitter.com/galicae).
[GitHub](https://github.com/galicae/comandos/issues/new). I will be communicating updates, if any,
on [Twitter](https://twitter.com/galicae).

Example data is available on
[Zenodo](https://zenodo.org/record/8143110).
Example data is available on [Zenodo](https://zenodo.org/record/8143110).

This package was developed using [nbdev](https://nbdev.fast.ai/), which means the source code was
generated from Jupyter notebooks using the [literate programming
paradigm](https://en.wikipedia.org/wiki/Literate_programming).

## Install

It is good practice to set up a virtual environment for your Python
projects. I recommend `conda`, or `mamba` if you want faster package
installation.
It is good practice to set up a virtual environment for your Python projects. I recommend `conda`,
or `mamba` if you want faster package installation.

``` bash
conda create -n comandos python=3.9
Expand All @@ -128,8 +109,7 @@ First install dependencies:
pip install scanpy jupyterlab
```

After installing dependencies, clone the latest version from GitHub and
install it:
After installing dependencies, clone the latest version from GitHub and install it:

``` bash
cd /directory/of/choice
Expand Down

0 comments on commit d2cccaf

Please sign in to comment.