-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
122 lines (97 loc) · 7.64 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
\documentclass{article}
\usepackage{hyperref}
\begin{document}
\title{ComAnDOS: advanced visualizations for cross-species comparisons}
\author{Nikolaos Papadopoulos}
\date{\today}
\maketitle
\begin{abstract}
Recent experimental progress has made it possible to obtain single-cell transcriptomes for
entire animals, reigniting the debate about the definition of cell types and enabling their
study in an evolutionary context. The SAMap algorithm has been instrumental in this endeavor,
allowing cross-species comparisons without discarding many-to-one orthologous genes. However,
SAMap's lack of downstream analysis and visualization tools effectively requiring advanced
programming skills to unlock the full potential of its analysis. Here, we present ComAnDOS
(COMparative ANalysis Downstream Of Samap), a Python package that bundles many useful downstream
analysis and visualization tools for SAMap. ComAnDOS is freely available at
\url{https://github.com/galicae/comandos}.
\end{abstract}
\section{Introduction}
Even before single-cell people had noticed that cells came in groups that looked similar; and that
cells in different parts of the body did different things. They guessed that cells that looked
similar probably worked in a similar way. It was wild to see that you could find the same cell types
(as in: very, very similarly looking) in different parts of the same body, never mind very, very
similar cells in different species; even ones as remotely related as humans and jellyfish.
As molecular techniques and knowledge about DNA came along we even figured out that many of the very
similar cells used many similar genes; in fact, in many cases even the regulatory apparatus that
governed what these cells were supposed to become was shared. This was a big deal, because it meant
that possibly the ancestor of these animals, who must have had cell types that did similar things,
was already using this regulatory mechanism to build these cell types - maybe things like neurons,
vision, muscles, and so on were invented once. The limitation here was that we could only look at a
one or maybe a handful of genes at once, so lots of effort had to go into figuring out just one cell
type or just one regulatory relationship.
Single-cell RNA sequencing (scRNA-seq) changed all that, because we can now look at the cell type
complement of entire organisms. People mostly looked at human and mouse stuff, because that's what
was easy and medically (financially) interesting, but as time went on and things became cheaper, you
could suddenly single-cell sequence entire weird animals. And people did.
And then smart people like Günter and Detlev noticed that there seems to be a hierarchy in cell
types - not only in their morphology, location in the body, etc. but also in their gene expression
profiles. Duplication and divergence is a very plausible and widespread mechanism for evolution to
make new things, so why could it not have been the source of cell type evolution too? They proposed
that cooperating groups of transcription factors are the key to cell type evolution, and that these
should be conserved between species.
Seeing as we can't query evolution but rather its phenotypic footprint in extant animals, going from
A to B in that theory is not straightforward, but the hope is that by looking at many animals at
once we might be able to see the red thread that connects it all. By comparing phenotypic similarity
between cell types and seeing who is similar to whom, we hope to figure out which cell types in
extant animals used to be the same cell type in their last common ancestor. By seeing which genes
are still used by both cell types today, we hope to figure out what function the ancestral cell type
was performing.
Originally, people just kept one-to-one orthologs and reduced the scRNA-seq data of both species to
that. This has the problem that it assumes too many things - A) that the orthologs are conserved not
only in sequence, but also in function, location, timing, and magnitude of expression, and B) that
all the subfunctionalisation that happened throughout evolution is basically meaningless and we'll
get the important details despite discarding something like 60-80\% of the information. This worked
surprisingly well \cite{welch2019single}, but it was clear that it was not the best way to do it.
This changed with SAMap \cite{tarashansky2021mapping}, which takes a BLAST map of genes as input,
and takes turns optimizing a cell-cell graph and a gene-gene graph until convergence. SAMap was a
game-changer for cross-species comparisons, and is now the de-facto standard for this kind of work.
However, SAMap is not without its limitations. The two most important ones are that it is rather
hard to install and use (also owing to suboptimal documentation), and that it doesn't offer
intuitive downstream analysis tools. The Sankey plot is fit for infographics but nothing more, and
circle plots were never useful for anything in the recorded history of mankind.
\section{Advanced visualizations for cross-species comparisons}
\subsection{Replacing Sankey plots with annotated heatmaps}
One of the main outputs of SAMap is a table of mapping scores between query and target clusters.
This table naturally lends itself to a heatmap visualization. We provide wrappers around the heatmap
functions of the \texttt{seaborn} Python library \cite{bisong2019matplotlib} to automate the
plotting of annotated heatmaps that can visualize cell type mapping scores along with a coarse level
clustering, such as tissue or cell type family. By visualizing all pairwise cluster mapping scores
the hierarchical nature of cell type relationships becomes evident, with cell type families across
species clustering together in the heatmap. This is in contrast to the Sankey plot, which only shows
the higher mapping scores, implying mostly one-to-one relationships between cell types.
\subsection{Visualizing cross-species gene expression}
One of the more challenging aspects of cross-species comparisons is finding the conserved gene
expression patterns that will lend credence to hypothesized evolutionary relationships between cell
types. SAMap proposes pairs of genes that show correlated expression between cell types, but offers
no way to visualize this. ComAnDOS offers a re-implementation of the popular dotplot visualization
that allows plotting the expression of multiple genes for two species at once. The dotplots of the
query and target species are arranged to the left and right, accordingly. Furthermore, linking lines
can be drawn between genes, visualizing relationships between them, such as homology (e.g. solid
lines connecting orthologs and dashed lines connecting paralogs).
\subsection{High-quality documentation and modularity for easy extension}
The nature of scRNA-seq data analysis in general and cross-species comparisons in particular is
exceedingly exploratory. Comparative plots must reflect this exploratory nature, and be easy to
repurpose and extend. To this end, the functions in ComAnDOS are deliberately kept generic, exposed
to the users, and documented extensively. For example, the paired dotplot function can be used to
visualize any combination of gene groups, connected in any way desired by the user.
\section{}
\section{Conclusions}
ComAnDOS offers tools that facilitate common plotting tasks in scRNA-seq cross-species comparisons.
This unlocks the full potential of SAMap, the premiere tool for cross-species comparisons. By
visualizing cluster relationships and gene expression patterns, ComAnDOS allows the exploration of
evolutionary relationships between cell types in a more intuitive way.
\bibliographystyle{plain}
\bibliography{main.bib}
\include{bibliography}
\end{document}