-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsvs.tex
230 lines (203 loc) · 14.6 KB
/
svs.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Genomic \acfp{sv} are commonly defined as variants affecting more than
50 consecutive base pairs of the DNA. The main purpose of this definition is to
distinguish \acp{sv} from smaller indel variants or multi-nucleotide substitutions
(i.e. blocks of consecutive \acp{snv}) \citep{Alkan2011}. Indels are variants
(of up to 50~bp) that insert or delete nucleotides, which is the same situation
seen from different perspectives (hence the neologism \textit{indel}).
A more appealing definition than the arbitrary 50~bp threshold is that indels
are detectable inside a contiguously mapped DNA sequencing read (introduced in
\cref{sec:mps}) whereas \acp{sv} are detectable across alignments, yet also
this definition no longer fully applies in the light of novel long-read
sequencing technologies (\cref{sec:long_read_seq}). Fortunately, a clear
distinction is not biologically relevant. \Acp{sv} come in many different
flavors of which the major ones are described subsequently.
\Acp{sv} in the human genome are of particular relevance for health and disease.
For example, they are implicated in various Mendelian diseases and in cancer
\citep{Weischenfeldt2013}.
Later, in \cref{sec:balancer_background}, I specifically discuss the phenotypic
impact of \acp{sv} and present a study, in which I investigated a particular
aspect of the functional consequences of \acp{sv}.
\subsection{Different classes of structural variation}
\label{sec:sv_classes}
The spectrum of \aclp{sv} is broad. The major \sv classes are generally be
divided into \aclp{cnv}, such as deletions and
duplications, or balanced rearrangements, such as inversions and translocations,
yet a series of other \sv forms is known. Below, I introduce the major classes
of \acp{sv} that are relevant in this work.
\emph{\Acfp{cnv}} describe the focal loss or gain of genetic material. They are
termed \emph{imbalanced}, as they do not leave the balance of the two homologues
intact. A loss of DNA is called a \emph{deletion}, and a gain either
\emph{duplication}, \emph{triplication} or simply by its \emph{copy number}.
For example, a deletion has a copy number of one instead of the expected copy
number of two in a diploid organism. A duplication that arises on one of the
homologues leads to total copy number of three, and so on.
Duplications are in \emph{tandem} when the additional copy inserted in direct
proximity to the original locus instead of somewhere else in the genome. The
latter is referred to as \emph{interspersed} duplication (\cref{fig:SV_classes}).
The introduction of new sequence is called an \emph{insertion}; however,
depending on the source of the incorporated DNA, insertions can be assigned to
one of several classes, only one of which is briefly mentioned later---they are
typically not counted as \acp{cnv} though.
The loss or gain of whole (or major parts of) chromosomes, historically visible
under a microscope, is summarized as \emph{aneuploidy}.
Aneuploidy can range from a single chromosome (or at least the majority of the
chromosome) being lost or gained, up to a complete increase or decrease of the
ploidy level (of all chromosomes). The expected \explain{ploidy}{In humans,
$N$ equals 23, meaning that we carry 46 chromosomes in our cells.
Interestingly, this number was falsely believed to be 48 for three decades
before it was corrected by \citet{Tijo1956}}
in diploid organisms is $2N$, where $N$ is the number of chromosomes and $2$ the
number of homologous copies. Ploidy can aberrantly increase to \emph{triploidy}
($3N$), \emph{tetraploidy} ($4N$) or even higher states covered by the general term
\emph{polyploidy} (\cref{fig:SV_classes}). Cells can also be in a purely \emph{haploid} state ($1N$),
but they are rarely viable due to problems with chromosome segregation. Mixed
states, where only some chromosomes increase their copy number, are sometimes
also referred to as hyperploidy.
Other types of \acp{sv} do not change the total copy number of a locus. Notably,
\emph{inversions} reverse the orientation of a locus, but generally do not
include gains or losses (\cref{fig:SV_classes}). In fact, even an inversion can
introduce (or co-locate with) \acp{cnv} depending on its mechanism of formation,
which his is one of the major findings of \cref{sec:complex_invs}.
In cases where multiple \sv classes occur within the same allele we term them
\emph{complex}. The most prominent examples are \emph{inverted duplications},
which are duplications that insert in reverse orientation into the genome
(\cref{fig:SV_classes}). Nevertheless, non-complex, i.e. \emph{simple inversions}
are the prime example of balanced \acp{sv} as they re-structure the genome
without gaining or loosing genetic material.
\figuretextwidth[t]{SVs.pdf}{SV_classes}{Types of structural variants}
{Each case is depicted by the original locus on the left and the affected
locus on the right, where dashed lines are used to highlight the orientation.
\subpanel{A} Different types of focal \acp{sv} of a genomic locus (red)
within double-stranded DNA (represented by gray line). \subpanel{B}
Chromosomes are depicted by double oval shapes. In the (balanced)
translocation, chromosomes 12 and 17 are chosen exemplary to stress that
exchange happens between non-homologous chromosomes. In a \acl{loh},
though, the maternal and paternal homologue of the same chromosome are
shown. \subpanel{C} Ideograms of a normal and aneuploid cell are shown.
For the sake of demonstration, the affected cells carries a haploid,
a triploid and a tetraploid chromosome.}
Another class falling into the category of balanced \acp{sv} are
\emph{translocations}. In a translocation event, genetic
material is exchanged between two non-homologous chromosomes. A \emph{reciprocal
translocation} is balanced because the total amount of genetic material does
not change, just the assignment of certain loci (potentially of whole chromosomal arms)
to chromosomes.
But \emph{imbalanced translocations} can arise, too. Here, one chromosome remains
largely unchanged but a part of the homologue is duplicated and added to another
chromosome, which might itself loose genetic material at the same time. This can
involve whole chromosome arms, but also smaller loci, which is also covered by
the definition of translocation. Typically though, translocation refers to the
special case of a reciprocal translocation as shown in \cref{fig:SV_classes}.
Furthermore, when cells lack one of the two alleles within a larger genomic region,
this is called a \emph{\acf{loh}}. \loh is an immediate consequence of the
(partial) loss of a homologue during a deletion.
However, there are also copy-neutral \loh events in which the same haploid
genotype is present in two copies. This might for example occur when an
individual inherits two copies of a chromosome from one parent, and none from the
other parent (uniparental disomy), but it can also occur via other mechanisms
(\cref{sec:mechanisms}). \Ac{loh} is often not observed directly, but indirectly
by looking at smaller variants (notably \acp{snv}) in a given genomic
region---the absence of heterozygous variants is an indicator of \loh.
Finally, various other forms of \acp{sv} exist that are of less relevance for
this work. One exception, which shall briefly be mentioned here, are
\emph{\acfp{mei}}. Mobile elements, notably \emph{transposons}, are DNA elements
that can ``jump'' within a host genome. The human consists to a large fraction of
the remainders of such elements \citep{Haubold2006}, which are largely
prohibited from active transposition by repressive mechanisms in the host cell.
A \mei may occur in a cut-and-paste or a copy-and-paste fashion and, although
they principally resemble duplications or translocation, they are seen as
separate class due to the fundamentally different mechanisms of formation.
\subsection{Molecular mechanisms underlying the formation of SVs}
\label{sec:mechanisms}
In order to truly conceive \aclp{sv}, it is important to understand how they
originate. Because we understand certain mechanisms of formation, we
can today explain why \acp{sv} are not evenly distributed across the genome, for example, or
why they re-occur independently in specific locations \citep{Hastings2009}.
More and more accurate discovery of \acp{sv}, on the other hand, has led to
a better understanding of the functioning and impact of these mechanisms
\citep{Hastings2009,Abyzov2015}. Based on specific scars around the breakpoints
of \acp{sv}, the mechanism that introduced a \sv can sometimes be unraveled in
retrospect. This is exactly the idea I apply in \cref{sec:complex_invs} to find
out how the complex \acp{sv} we find were formed. Here, the major molecular
mechanisms involved in formation of aneuploidy, focal copy number changes and
inversions shall be introduced. They have previously been described in
great detail by James Lupski and colleagues \citep{Hastings2009,Carvalho2016}.
%I explicitly exclude the pathways behind MEIs, for which I recommend \citet{Levin2011}.
Aneuploidy occurs through missegregation of single chromosomes during cell
division. During meiosis, either in oocytes or spermatozoa, this can lead to
inheritable aneuploidy, which was estimated to occur in 5\% of human pregnancies
\citep{Templado2013}. Missegregation occurs via nondisjunction of chromosomes in
meiosis I, when homologues fail to separate, or in meiosis II and mitosis, when
sister chromatids are not separated properly. Alternatively, it occurs as a
consequence of anaphase lag, in which a chromosome is lost in both daughter
cells due to a delayed movement of chromosomes in anaphase \citep{webAneuploidy}.
Polyploidy arises differently, for example when an egg is fertilized by two
sperm cells simultaneously or a fertilized ovum fuses with a sperm cell
\citep{webAneuploidy}. Missegregation of chromosomes occurring during mitosis may
lead to somatic aneuploidy, which is observed in many cancer types
\citep{Gordon2012}. Also polyploidy can occur somatically, for example via
repeated rounds of DNA replication without subsequent mitosis or with partial
mitosis without subsequent cytokinesis. This occurs naturally, as for example
in the polytene chromosomes in insect salivary glands or in hepatocytes of the
human liver, but also spontaneously as frequently seen in different types of
cancer \citep{Davoli2011}.
Focal \acp{sv} arise either during replication or after a break of the
double-stranded DNA backbone \citep{Hastings2009}. Double strand breaks and
replication errors occur stochastically or result from cellular stress, but the
cell actively counters such errors through its powerful repair mechanisms. DNA
repair is not always faithful, though, sometimes leading to the formation of
\acp{sv}. A major mechanism of \sv formation employs the homologous
recombination machinery, which uses homologous sequence (from the sister
chromatid) as a template to repair a break. Homologous recombination during
meiosis can lead to \emph{gene conversion}, which results in the replacement of
an allele via the second allele (\loh).
However, given the repetitive nature of
the human genome, homology (or near identical sequence) might not only be present at the
respective locus but also in other, non-allelic loci. This \emph{\acf{nahr}}
can create various types of \acp{sv}, including deletions, duplications,
inversions and even translocation, depending on position and orientation of the
ectopic homologous sequence \citep{Carvalho2016}. The presence of homology,
notably of large segmental duplications of more than 90\% sequence identity and
several kilobases in size, predisposes the human genome to the formation of
recurrent \acp{sv} via \nahr \citep{Carvalho2016}. Conversely, when
near identical sequence is detected flanking \sv breakpoints on both ends
(20~bp are a usual lower threshold), such an \sv is believed to be formed via
\nahr \citep{Onishi-Seebacher2011}.
Other repair mechanisms do not require homology. \emph{\Acf{nhej}} is the
dominant pathway during G0/G1-phase to efficiently re-ligate the ends of DNA
double strands, usually leaving traces of not more than a few deleted or
inserted base pairs at the junction \citep{Lieber2008}. Importantly, \nhej is
available to the cell before sister chromatids are present and is a fast way to
react to double strand breaks. When multiple double strand breaks arise
simultaneously, \nhej can falsely ligate genomic loci in the wrong order and
thus introduce \acp{sv}. A related mechanism uses sequence identity of few
(as little as 1-4) base pairs, also known as \explain{\textit{micro-homology}}{It is
debatable whether the term \emph{homology} is correct here, i.e. whether the
short stretches of identical DNA on both sides of an \sv breakpoint in fact
share a common evolutionary ancestry}, to initiate re-ligation via a
mechanism called \emph{micro-homology-mediated end joining} \citep{Hastings2009}.
Replication of DNA, which happens prior to each cell division, is also
susceptible to errors. For example, through a process called \textit{replication
slippage} smaller deletions and duplications can arise between stretches of
homology within a replication fork, limited by the size of an Okazaki fragment
\citep{Hastings2009}. Furthermore, the DNA backbone can break within a
replication fork, leaving a single-ended double strand break. Such a break can
be faithfully resolved by the \emph{\acf{bir}} mechanism: a single strand of the
unfinished DNA molecule anneals to homologous sequence in the template DNA to
restart replication, which can continue up to hundreds of kilobases from there
\citep{Carvalho2016}. Again, this search for homology may fail and either anneal
to the homologous chromosome (instead of the sister chromatid), leading to
extended stretch\-es of \loh, or ectopically, resulting in one of several possible
\sv types including \acp{cnv} and inversions.
Other replicative mechanisms of \sv formation requires no, or only short
stretches of micro-homology. Notably, a version of \bir that can operate
independent of the homologous recombination machinery was described, which
relies only on micro-homology (4-15bp) to invade template DNA. The mechanism
was consequently called \emph{\acf{mmbir}} \citep{Hastings2009a}.
Moreover, homology-independent rearrangements occurring during replication can
include multiple complex rearrangements and are more prone to copy gains than
losses, in concordance with a model of \acf{fostes} \citep{Zhang2009a,Hastings2009}.
In summary, errors during cell division, replication or double-strand break repair
can introduce various forms of \aclp{sv}. Especially \acp{cnv} and inversions may
arise via many different mechanisms, which can sometimes, but not always, be
inferred in retrospect based on the nucleotide sequence around their breakpoints.