updated readme

galicae · Apr 18, 2024 · d2cccaf · d2cccaf
1 parent 4ad579a
commit d2cccaf
Showing 1 changed file with 53 additions and 73 deletions.
diff --git a/README.md b/README.md
@@ -18,17 +18,14 @@ black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://gith
 <img src="https://raw.githubusercontent.com/galicae/comandos/main/.github/tardi_head.png" height="256" />
 </p>
 
-As a long-time user of SAMap and a connoisseur of cross-species
-comparisons, I was always frustrated by the lack of useful visualization
-downstream of SAMap results. We don’t need fancy software to figure out
-that ciliated cells are similar to ciliated cells, and muscle to muscle,
-but what about those pesky “unknown_sensory_2” and “ciliated?\_2”
-clusters? ComAnDOS is a collection of plotting functions and assorted
-utilities that I hacked together to help myself and collaborators make
-sense of SAMap results. It is conceived as a SAMap add-on, but it
-doesn’t require it per se. It mostly works with two “primitives”:
-AnnData, and Pandas DataFrames, so it should be relatively easy to adapt
-to your use case.
+As a long-time user of SAMap and a connoisseur of cross-species comparisons, I was always frustrated
+by the lack of useful visualization downstream of SAMap results. We don’t need fancy software to
+figure out that ciliated cells are similar to ciliated cells, and muscle to muscle, but what about
+those pesky “unknown_sensory_2” and “ciliated?\_2” clusters? ComAnDOS is a collection of plotting
+functions and assorted utilities that I hacked together to help myself and collaborators make sense
+of SAMap results. It is conceived as a SAMap add-on, but it doesn’t require it per se. It mostly
+works with two “primitives”: AnnData, and Pandas DataFrames, so it should be relatively easy to
+adapt to your use case.
 
 It currently includes:
 
@@ -43,77 +40,61 @@ It currently includes:
 click to expand
 </summary>
 
-Single-cell RNA-seq (scRNA-seq) is a powerful tool to study the
-transcriptome of individual cells. As the technology matured, it became
-possible to use it on non-model organisms, facilitating cell type
-comparison across species. Early methods for this task subsetted gene
-expression matrices to one-to-one orthologous genes, assuming that
-sequence conservation also implies conservation of location, magnitude,
-and timing of gene expression. Not only is this assumption not true, but
-it also requires us to discard a large amount of data.
-
-[SAMap](https://elifesciences.org/articles/66747) was the first method
-to try and include many-to-one orthology relations. In alternating
-steps, it optimises a cell graph and a gene graph, using the former to
-inform the latter and vice versa. The result is a converged
-low-dimensional embedding that contains the cells of both species,
-allowing for direct comparison.
-
-SAMap comes with a small number of visualization tools, but, as I had to
-find out myself, they are not sufficient for in-depth analysis. In
-particular:
-
-- Sankey diagrams, SAMap’s default visualization for cluster-cluster
-  relationships, obscure the fact that cell types are hierarchically
-  organized. They also make it harder to quantify just how similar two
-  cell types are according to SAMap.
-- Tarashansky *et al.* used network diagrams to demonstrate highly
-  connected cell type families. I found that these diagrams are not very
-  informative, as they are hard to read and do not scale well to large
-  datasets.
-- In the publication, heatmaps are used once, but not to their full
-  extent.
-- Overlapping dimplots with corresponding violin plots are used to
-  demonstrate co-expression across species. This is a good idea, but
-  results in overloaded plots.
-- The authors use dotplots to show gene expression across species,
-  color-coding the species. This loses one of the dotplots’ dimensions,
-  where color usually encodes expression level, and forces the use of an
-  additional axis to show expression magnitude. Furthermore, the
-  relationships between the plotted genes in the different species are
-  hard to visualize and need to be described in text.
-
-These visualisations have two additional shortcomings: First, they are
-not easily reproducible, as they are not part of SAMap but rather custom
-solutions for very specific use cases. Second, they are extremely
-specific in what they show, and thus not useful for exploratory data
-analysis.
+Single-cell RNA-seq (scRNA-seq) is a powerful tool to study the transcriptome of individual cells.
+As the technology matured, it became possible to use it on non-model organisms, facilitating cell
+type comparison across species. Early methods for this task subsetted gene expression matrices to
+one-to-one orthologous genes, assuming that sequence conservation also implies conservation of
+location, magnitude, and timing of gene expression. Not only is this assumption not true, but it
+also requires us to discard a large amount of data.
+
+[SAMap](https://elifesciences.org/articles/66747) was the first method to try and include
+many-to-one orthology relations. In alternating steps, it optimises a cell graph and a gene graph,
+using the former to inform the latter and vice versa. The result is a converged low-dimensional
+embedding that contains the cells of both species, allowing for direct comparison.
+
+SAMap comes with a small number of visualization tools, but, as I had to find out myself, they are
+not sufficient for in-depth analysis. In particular:
+
+- Sankey diagrams, SAMap’s default visualization for cluster-cluster relationships, obscure the fact
+  that cell types are hierarchically organized. They also make it harder to quantify just how
+  similar two cell types are according to SAMap.
+- Tarashansky *et al.* used network diagrams to demonstrate highly connected cell type families. I
+  found that these diagrams are not very informative, as they are hard to read and do not scale well
+  to large datasets.
+- In the publication, heatmaps are used once, but not to their full extent.
+- Overlapping dimplots with corresponding violin plots are used to demonstrate co-expression across
+  species. This is a good idea, but results in overloaded plots.
+- The authors use dotplots to show gene expression across species, color-coding the species. This
+  loses one of the dotplots’ dimensions, where color usually encodes expression level, and forces
+  the use of an additional axis to show expression magnitude. Furthermore, the relationships between
+  the plotted genes in the different species are hard to visualize and need to be described in text.
+
+These visualisations have two additional shortcomings: First, they are not easily reproducible, as
+they are not part of SAMap but rather custom solutions for very specific use cases. Second, they are
+extremely specific in what they show, and thus not useful for exploratory data analysis.
 
 </details>
 
 ## Documentation
 
-This package was developed using [nbdev](https://nbdev.fast.ai/), which
-means the source code was generated from Jupyter notebooks using the
-[literate programming
-paradigm](https://en.wikipedia.org/wiki/Literate_programming). You can
-see the exported function signatures and assorted explanations
-[online](https://galicae.github.io/comandos/). I am currently working on
-tutorials for the most important use cases.
+You can see the exported function signatures and assorted explanations
+[here](https://galicae.github.io/comandos/). I am currently working on tutorials for the most
+important use cases.
 
 For questions or requests please open an issue on
-[GitHub](https://github.com/galicae/comandos/issues/new). I will be
-communicating updates, if any, on
-[Twitter](https://twitter.com/galicae).
+[GitHub](https://github.com/galicae/comandos/issues/new). I will be communicating updates, if any,
+on [Twitter](https://twitter.com/galicae).
 
-Example data is available on
-[Zenodo](https://zenodo.org/record/8143110).
+Example data is available on [Zenodo](https://zenodo.org/record/8143110).
+
+This package was developed using [nbdev](https://nbdev.fast.ai/), which means the source code was
+generated from Jupyter notebooks using the [literate programming
+paradigm](https://en.wikipedia.org/wiki/Literate_programming).
 
 ## Install
 
-It is good practice to set up a virtual environment for your Python
-projects. I recommend `conda`, or `mamba` if you want faster package
-installation.
+It is good practice to set up a virtual environment for your Python projects. I recommend `conda`,
+or `mamba` if you want faster package installation.
 
 ``` bash
 conda create -n comandos python=3.9
@@ -128,8 +109,7 @@ First install dependencies:
 pip install scanpy jupyterlab
 ```
 
-After installing dependencies, clone the latest version from GitHub and
-install it:
+After installing dependencies, clone the latest version from GitHub and install it:
 
 ``` bash
 cd /directory/of/choice