forked from statOmics/SGA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro.Rmd
756 lines (456 loc) · 20.1 KB
/
intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
---
title: "Position of the Course"
author: "Lieven Clement"
date: "[statOmics](https://statomics.github.io)"
output:
html_document:
code_download: true
toc: true
toc_float: true
highlight: tango
number_sections: true
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
</br>
# What is Life?
## Schrödinger en Prigogine
Book: "What Is Life? The Physical Aspect of the Living Cell" (Schrödinger, 1944).
Life is
1. een open system that can generate order from chaos by exploiting external energy sources (entropy is disorder),
2. with the capacity to transmit its own specific blueprint from generation to generation (reproductive invariance).
Note, that DNA was not known yet when Schrödinger wrote his seminal book.
</br>
### How is this possible
- Second law of thermodynamics: a closed system is always gearing towards maximal entropy
- Entropy is a measure for disorder
<center>![](https://www.handysquad.com/wp-content/uploads/2020/10/tie-up-cables.jpg){width=30%} $\rightarrow$ ![](https://eadn-wc05-201489.nxedge.io/cdn/wp-content/uploads/2018/09/EntropyChaos2-700-467.jpg){width=30%}
</center>
</br>
- Life is
- based on chemical reactions
- an open system
- solar energie
- chaos of molecules on earth
- While chemical reactions produce structure a lot of energy is lost into heat: dissipation.
$\rightarrow$ Increase of entropy
</br>
Prigogine did research to chemical systems far from equilibrium.
- Open systems: influx of matter en energy
- Reactions consist of many feedback loops
- They dissipate incoming energy while producing structure
- He called them dissipative structures.
</br>
Our globe is lying in a dissipative zone
- Life produces entropy by dissipating energy from photons (light and UV) to heat through organic pigments (e.g. chlorophyll)
- This heat induces secondary dissipative processes like the water cycle, wind and sea currents etc.
- So life also transforms its environment.
</br>
## Definition according to de Duve
Book: "Life Evolving - Molecules, Mind and Meaning" (de Duve, 2002, First chapters give a very good overview. Later chapters get rather speculative).
Life is
1. one
2. chemistry
3. information
</br>
In bioinformatics we will work with the information part of life, however, we should not forget that life is more than that!
</br>
Origin of life:
- first energy
- then chemistry and
- subsequently information
</br>
### Life is one
#### All organism consist of cells
- unicellular organisms
![](./figs/Prochlorococcus_marinus.jpg){width=50%}
(Cyanobacterium, source: Chisholm Lab, wikipedia)
</br>
- Essential: membrane that is separating them from there environment and enables interaction with the environment
</br>
- Multicellular organisms
![](https://mrssmithsbiology.files.wordpress.com/2019/10/picture4.jpg)
(source: mrssmithsbiology)
</br>
#### LUCA (last Universal Common Ancestor).
- All cells orginate from the same population of ancestral cells living 3.5-3.8 billion years ago.
![](./figs/Phylogenetic_tree.svg)
(Source: wikipedia)
</br>
#### Universal fuel for life (is actually chemistry)
![](./figs/ATP-ADP.png)
(Source: adapted from wikipedia)
- ATP: Adinose-tri-phosphate
- Note that AMP is one of the building blocks of RNA.
- Link between energy and information!
</br>
#### Same building blocks for all living organisms
1. Lipids: membranes
2. Carbohydrates (sugars): energy + building blocks
3. Amino acids: building blocks of proteins, which are the workhorses of a cell
4. Nucleic Acids: building blocks of RNA and DNA
Note, that poly aromatic carbohydrates (PAHs) are omnipresent interstellar space
- Cat's paw nebula
- Green regions are regions where the radiation of hot stars induces fluorescence of PAHs.
![](./figs/orionWithPAH.jpeg){width=200%}
</br>
- They are further transformed in interstellar space
- hydrogenation: waterstoffen,
- oxidation: zuurstof,
- hydroxylation (OH)
- ...
$\rightarrow$ first step to amino acids (building blocks of proteins) and nucleotides (building blocks of RNA and DNA) are just there, abundant in outer space.
##### Lipids
![](./figs/Cell_membrane_detailed_diagram_4.svg)
(Source: Doug Hatfield, wikipedia)
</br>
##### Carbohydrates
![](https://thebiologynotes.com/wp-content/uploads/2020/11/Carbohydrates-definition-classification-with-structure-and-functions.jpeg)
(Source: thebiologynotes.com)
</br>
##### Amino Acids
![](https://thebiologynotes.com/wp-content/uploads/2020/12/Amino-acids-and-Proteins.jpeg)
(Source: thebiologynotes.com)
</br>
##### Nucleic Acids
- Building blocks for the molecules that store the information that we inherit form our parents: RNA and DNA
![](./figs/RNA-Nucleobases.svg){width=50%}
(RNA, Source: wikipedia)
</br>
![](./figs/DNA_Nucleotides.jpeg)
(DNA, Source: OpenStax, wikipedia)
</br>
![](./figs/Difference_DNA_RNA-EN.svg)
(Source: Sponk, wikipedia)
</br>
#### Same genetic code (See Life is information)
</br>
### Life is Chemistry
[![](./figs/roche_pathways.jpg)](http://biochemical-pathways.com/#/map/1)
(Source: Dr. Gerhard Michal, Roche)
<!--
Not working http in https
<div style="position:relative;padding-top:56.25%;">
<iframe src= "http://googleweblight.com/?liteurl=https://biochemical-pathways.com/#/map/1" frameborder="0" allowfullscreen
style="position:absolute;top:0;left:0;width:100%;height:100%;">
</iframe>
</div> -->
#### Energy
![](./figs/ATP-ADP.png)
(Source: wikipedia)
</br>
#### Catalysis
![](./figs/Citric_acid_cycle_with_aconitate_2.svg)
(Source: Narayanese, wikipedia)
</br>
<div style="position: relative;width: 100%;height: 0;padding-bottom: 56.25%;">
<iframe
src="https://www.youtube.com/embed/yk14dOOvwMk?start=8&end=60"
frameborder="0"
style=" position:absolute;top: 0;left: 0;width: 100%;height: 100%;"
allow="autoplay; encrypted-media" allowfullscreen data-external="1" start=8></iframe>
</div>
</br>
"Any living organism is a reflection of its enzyme arsenal"
- Catalysis: a large number of chemical reactions would never happen when we only would mix molecules without a catalist.
- Catalist: chemical substance that helps a reaction to take place without being consumed itself.
- Biological catalists are referred to as enzymes
- Enzymes are proteins that
- are fishing certain molecules from the complex mixture in a cell,
- which consists of thousands of chemical compounds
- generally at low concentrations;
- through bindingsites they can facilitate that these molecules (substrates) are getting close so that they can react and form a new compound.
</br>
#### Self-organisation
- Some proteins also give structure to a cell
- The can spontaneously form structrue
- See video where cytoplasm ("liquid in cell") was homogenised and is subsequently organising itself in cell-like structures.
</br>
<div style="position: relative;width: 100%;height: 0;padding-bottom: 56.25%;">
<iframe
src="https://www.youtube.com/embed/prq1Occu22s?start=0&end=7&loop=1"
frameborder="0"
style=" position:absolute;top: 0;left: 0;width: 100%;height: 100%;"
allow="autoplay; encrypted-media" allowfullscreen data-external="1" start=8></iframe>
</div>
(Source: Science DOI: 10.1126/science.aav7793)
</br>
Researchers found that the following was required for this:
- ATP: the energy source of a cell
- a filamentous proteines (microtubuli)
- Dynein, a kind of motor protein
![](./figs/Tubulin_Infographic.jpeg)
(Source: Pakorn Kanchanawong, wikipedia)
</br>
Proteins plays a central role in life: Catalysis + Structure
A cell thus inherits not only genetic information but also its spatial organisation from a mother cell.
</br>
### Life is information
</br>
- Gene: unit of genetic material, a DNA sequence that is encoding for the synthesis of a gene product, a protein or a functional RNA.
![](./figs/gene.svg)
(Source: Thomas Shafee, wikipedia)
</br>
- DNA: 4 letter code (4 bases: ACGT)
- RNA: 4 letter code (4 bases: ACGU)
![](./figs/Difference_DNA_RNA-EN.svg){width=50%}
(Source: Sponk, wikipedia)
</br>
![](https://aholdencirm.files.wordpress.com/2016/06/transcription_2.jpg)
(Source: [tokresources.org](http://www.tokresource.org/tok_classes/biobiobio/biomenu/transcription_translation/))
</br>
- Principle: hybridisation of complementary bases!
- Transport RNA: codon (triplet van 3 basen) for each nucleic acid that is transported
</br>
- Protein: chain of amino acids
- there exist more than 400 amino acids
- only 20 were selected by life to build proteins.
![](./figs/Aminoacids_table.svg){width=50%}
(Source: wikipedia)
</br>
- Code is evolved so that many mutations give rise to
- synonymous codons (same amino acid) or
- to incorporate amino acids that are similar
$\rightarrow$ protein function is conserved
</br>
- DNA is the carier of genetic material (cfr. hard drive)
- RNA plays a more central role:
- Messenger RNA (cfr RAM memory)
- Ribozymes: catalitic function
- transfer RNA for translation of proteins
- in ribosomes for translation of proteins
- a lot of other rybozymes (catalic RNA)
- RNA primer essential to copy DNA
- Carrier of genetic material (e.g. corona virus)
</br>
# Evolution and phylogenesis
</br>
</br>
![](./figs/Phylogenetic_tree.svg)
(Source: wikipedia)
</br>
## Evolution
### Variability and selection
Bacteria & Archae
![](./figs/Binary_Fission_2.svg){width=70%}
(Source: Ecoddington, wikipedia)
Eukaryota
![](./figs/Meiosis_Stages.svg)
(Source: Ali Zifan, wikipedia)
</br>
<center>
![](./figs/DNA_polymerase.svg){width=40%}
(Source: wikipedia)
</center>
</br>
- Error margin of DNA replication: 1 error per billion basepairs that are copied (Human genome 6.4 billion basepairs)
- Insertions/deletions: baseparen that are added or removed
- Recombination: reshuffling of genetic traits, e.g. during sexual reproduction (e.g. recombination of paternal and maternal segments during meiosis).
$\rightarrow$ Mutations
$\rightarrow$ Natural variability
- Most mutaties are neutral $\rightarrow$ Molecular/Genetic clock
- But not always
</br>
![](./figs/sickleCellWikipedia2.png)
![](./figs/Sickle_Cells_wiki.jpeg){width=45%}
![](./figs/Sickle_Cell_Anemia_wiki3.png){width=45%}
(Source: Thomas Samuel (1), OpenStax College (2), BruceBlaus(3), wikipedia)
</br>
- Many in Africa
- Why does this mutation remains?
- Selection by ecofactors: malaria resistence
</br>
### Evolution
- Natural process that forms the basis of the origin of species (plants, animals, bacteria, fungi, ... and homonids)
- Driven by two opposing forces: **variation** en **selection**
- Variation by spontaneous copy errors in genetic code: mutations
- Selection upon ecofactors, is mutation beneficial or harmfull for a particular organism in its specific environment.
- Odds on fixation of mutation depends on reproductive success
- Process of mutation and selection can eventually lead to new species upon many generations.
</br>
### Genetic drift
- Genetic drift: random fluctuations of alleles
- Particularly strong in small populations
- As opposed to selection it is not adaptive.
- New species will originate more quickly when a small fraction of the population gets isolated in a new environment.
</br>
### Horizontal gene transfer
- Non sexual transfer of genetic information
- Very common between prokaryotes (eubacteria and arachae bacteria)
- e.g. exchange of antibiotics resistance.
- between eukaryotes (mainly in protists, unicellular organisms with nucleus)
- between prokaryotes (eubacteria and arachae bacteria) and eukaryotes (protisten, fungae, plants and animals)
</br>
### Teleonomy
- There is only the primitive goal to maintain and reproduce the species.
- Evolution has no purpose or direction
- When complex organs and organisms originate its seems as if there is a direction/purpose but that is not the case.
</br>
![](./figs/evolutionEye.png){width=50%}
(Source: Matticus78, wikipedia)
</br>
- The eye is not developed by evolution with the purpose to see.
- The eye only has the function to see
- It is the result of a gradual process where each adaptation gave a reproductive advantage in a particular environment.
- In another environment it can be nolonger functional and than it might dissapear, e.g. moles eye
- The origin of a species is the result of evolution but not the purpose of evolution.
- Evolution is adaptation with as goal maintance and reproduction
</br>
#### Evolution has no direction
- Distribution of number of species and complexity
![](./figs/selectionNoDirectionDef.png)
(Source: Stephen J. Gould, 1996, Full House: the spread of Excellence from Plato To Darwin)
- Distribution of carbon mass fixated in different types of species.
![](https://www.pnas.org/cms/10.1073/pnas.1711842115/asset/44253c25-afd5-4fa1-b060-31591f934f5d/assets/graphic/pnas.1711842115fig01.jpeg)
(Massa in giga ton koolstof. Source: doi.org/10.1073/pnas.1711842115)
</br>
- Note, large error margin on bacteria (can be a factor 10 larger).
- Number of bacterial cells in our body (Source: doi.org/10.1371/journal.pbio.1002533):
- \#bacterial cells/\#human cells earlier estimated as $\pm$ 10/1
- recent estimatiom $\pm$ 1/1.
- Human of 70kg $\pm$ 38 trillion bacterial cells/30 trillion humane cells (trillion: 1000 billion: 10$^{12}$!).
</br>
## Phylogenese
Origin of all species by evolution
![](./figs/Phylogenetic_tree.svg)
(Source: wikipedia)
</br>
### Timescale
![](https://naturedocumentaries.org/wp-content/uploads/2017/12/liferockystartstrip.jpg)
| 4.5 BYA | 4.3 BYA | 3.8 BYA | 3.5 BYA | 540 MYA | 520 MYA |
|---------|---------|---------|---------|---------|---------|
| | | | | | |
(Source: naturedocumetaries.org)
- Black Earth (4.5 BYA): hot basalt rock and dust in a cold vacuum
- Grey Earth (4.3 BYA): granite
- Blue Earth (3.8 BYA): water
- Red Earth (3.5 BYA): Radical change due to life.
- Cyanobacteria $\rightarrow$ fotosynthesis $\rightarrow$ oxygen
- All iron in the ocean precipitates as iron oxide (rust, red)
- 250 $\rightarrow$ > 5000 minerals.
- Mass extinction
- White Earth (540 MYA)
- Large ice age
- Mass extinction
- Volcanic activity comes to the rescue: greenhouse gasses
- Green Earth (520 MYA)
- Explosion of life
- from unicellular to more complex life.
</br>
### Changepoint: Genesis of eukaryotic cell
Two archetypes: prokaryoten (simple cells, 0.1 to 5.0 $\mu m$) and eukaryotes (larger and more complex cells, 10-100 $\mu m$)
![](./figs/prokaryoteCell.svg){width=30%}
![](./figs/animalCell.svg){width=30%}
![](./figs/plantCell.svg){width=30%}
(Source: Ali Zifan (1) & Mariana Ruiz Villarreal (2 & 3), wikipedia)
</br>
- 3.5 BYA - 520 MYA only prokaryote cells in fossils
</br>
Genesis of Eukaryotic cell by endosymbiosis:
![](./figs/endosymbiosis.svg)
(Source: Kelvinsong, wikipedia)
</br>
- Prokaryotes: reproduction by cell devision, mutation fixed in all daugther cells.
</br>
- Eukaryotes: nearly all have a phase of sexual reproduction
- much more variation: recombination of chromosomes
- diploid organisms have two copies of each gene (father and mother) $\rightarrow$ successive mutations in 1 copy possible in the presence of another functional copy of the gene.
</br>
- Eukaryotes evolve further in
- protists (unicellular)
- fungae
- plants
- animals
</br>
Genetic information of a species can be seen as a record of the environments and development that it underwent up to this point
</br>
## Evolution of evolution
1. Chemical evolution: selection of building blocks and complex chemistry
2. Biological evolution: cell/organism $\rightarrow$ selection genetic information and function
3. Cultural evolution can bypass natural evolution:
- artificial selection: breeding of plants, pets, cattle, genetic manipulation, etc.
- Technology: fast adaptation to new environment
</br>
# Ontogenesis
- Ontogenesis: development of organism from fertilized egg cell to adult individual until death
- Each cell (except egg and sperm cell) of multi-cellular organism has same genetic material!
- Why are cells of same organism so morphologically diverse?
</br>
## Epigenetics
![](./figs/Epigenetic_mechanisms.png)
(Source: NIH, wikipedia)
</br>
![](./figs/DNA_methylation_reprogramming.png)
(Source: Mariuswalter, wikipedia)
</br>
- Differentiation $\rightarrow$ epigenetics
- Epigenetics: epigenetic markers on DNA and histones $\rightarrow$ gene can be transcribed or not.
- Epigenetics is driven by ecofactors.
- identical twins have almost the same genome (small differences have been build up in the womb), but it gets more easy to tell them apart over time: epigenetics
</br>
<center>
![](https://www.researchgate.net/profile/Tara-Hogenson/publication/320386487/figure/fig1/AS:783847914471429@1563895314907/dentical-twins-with-phenotypic-discordance-due-to-environmental-exposure-Although-MZ.png){width=50%}
</center>
(Difference due to eco-factor UV-exposure, Source: Swab & Hogenson, DOI: 10.1007/978-3-319-31143-2_65-1)
</br>
![](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/3501579/bin/nihms402187f3.jpg)
(Source doi: 10.1111/j.1526-4637.2012.01488.x)
</br>
- Epigenetics is very important in development of the brain and for learning.
</br>
# Statistical Genomics
- Genomics is the studies all genetic information of an organism
together: specific code, effects, functions and interactions
- With current high throughput methods we can measure the entire
- genome
- transcriptome: expressed transcript of all genes
- proteome: all translated proteins
- epigenome
- ...
- With statistical genomics we will try to make sense of this wealth of data so that we can find biomarkers and 'omics patterns that are reproducible.
# Closing remarks
- Life is
- one
- chemistry
- information
</br>
- Note, however that we should not reduce organisms to their genome
- Indeed, organisms and life cannot be studied without considering there intimate relation to their environment/ecosystem
- molecules do not work alone but in large networks
- environmental conditions are important for self organisation
- selective evolution: selection by eco-factors
- organisms adapt/shape in turn their environment and eco-factors: e.g. cyanobacteria
- individual evolution: eco-factors $\leftrightarrow$ epigenetics
- eco-factors are also determined by other organisms: eco-system
- Organisms interact and collaborate in their ecosystem
- Genetic information can be considered as the record of all conditions in which our ancestral cells have lived.
$\rightarrow$ "Web of Life"
# Organization of the course
## Module I: Quantitative Proteomics
1. Identification and quantification of peptides and proteins
2. Data exploration and quality control using plots
3. Preprocessing: log-transformation, Filtering, Normalization,
Summarization
4. Dealing with batch effects and other confounders
5. Statistical Concepts
1. Linear models/Linear mixed models
2. Trade-off between biological relevance/effect size vs statistical
significance
3. Empirical Bayes Methods
4. Multiple testing
## Module II: Next generation sequencing (NGS, Transcriptomics)
1. NGS Data exploration
2. Preprocessing/normalization
3. Additional Statistical Concepts
1. Generalized linear models (GLM) for binary data
2. GLM for count data
3. Overdispersion
## Details
1. Theory and Tutorials are blended
- Module I: week 1-5
- Module II: week 6-10
- Project: week 1-10 via small assignments + week 11-12
2. Communication and submission of projects via Ufora
3. All tutorials from week 2 onwards are based on
R/Bioconductor via R-studio
Scripts are made in R/markdown: a file format to combine
text, R code and R output.