forked from statOmics/HDDA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
116 lines (69 loc) · 5.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "High Dimensional Data Analysis (HDDA)"
output:
bookdown::html_document2:
code_download: false
toc: true
number_sections: false
code_folding: "none"
---
```{r setup, include=FALSE, cache=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
***
```{r}
knitr::include_graphics("./figures/wpGraph.jpeg")
```
## Course Description
Modern high throughput technologies easily generate data on thousands of variables; e.g. genomics, chemometrics, environmental monitoring, ... Conventional statistical methods are no longer suited for effectively analysing such high-dimensional data. Multivariate statistical methods may be used, but for often the dimensionality of the data set is much larger than the number of (biological) samples. Modern advances in statistical data analyses allow for the appropriate analysis of such data. Methods for the analysis of high dimensional data rely heavily on multivariate statistical methods. Therefore a large part of the course content is devoted to multivariate methods, but with a focus on high dimensional settings and issues. Multivariate statistical analysis covers many methods. In this course, only a few are discussed. A selection of techniques is made based on our experience that they are frequently used in industry and research institutes (e.g. principal component analysis, cluster analysis, classification methods). Central in the course are applications from different fields (analytical chemistry, ecology, biotechnology, genomics, …).
## Prerequisites
The prerequisites for the High Dimensional Data Analysis course are the successful completion of a basic course of statistics that covers topics on data exploration and descriptive statistics, statistical modeling, and inference: linear models, confidence intervals, t-tests, F-tests, anova, chi-squared test.
The basis concepts may be revisited in my online course <https://gtpb.github.io/PSLS20/> (English) and in <https://statomics.github.io/statistiekCursusNotas/> (Dutch).
A primer to R and Data visualisation in R can be found in:
- R Basics: <https://dodona.ugent.be/nl/courses/335/>
- R Data Exploration: <https://dodona.ugent.be/nl/courses/345/>
## Software requirements
See the [installation instructions](./installation-instructions.html).
## Organisation
This course is a 5 credit ECTS course [C003549](https://studiekiezer.ugent.be/studiefiche/en/C003549/2022) in the Master of Statistical Data Analysis at Ghent University: [Organisation of ECTS Course](./organisationC003549.html)
The course is also lectured in an intensive short course format, e.g. [UGAIN / IVPW Academy](https://ugainacademy.ugent.be/programma/postacademische-opleiding/2021-2022-data2122-data-analysis-2021-2022) at Ghent University.
If you encounter any problems related to the course material (e.g. package installation problems, bugs in the code, typos, ...), please consider [posting an issue on GitHub](https://github.com/statOmics/HDDA/issues).
Questions related to the course contents can be asked by contacting the teachers by
email or during the lectures or practical sessions, and for the C003549 ECTS course by posting on
[UFora](https://ufora.ugent.be/d2l/home/444226).
---
## Topics
### 1. Introduction
- [Introduction](./intro.html) [[PDF](./intro.pdf)]
- [Introduction to RMarkdown](./Introduction-RMarkdown.html)
- [Introduction to matrices in R](./Introduction-Matrices-R.html)
### 2. Singular value decomposition
- [Lecture 2-3: Singular value decomposition](./svd.html) [[PDF](./svd.pdf)]
- [MDS Link Gram Distance Matrix](./MDS_linkGramDistanceMatrix.html)
- [Lab 1: Introduction and SVD applications](./Lab1-Intro-SVD.html)
- [Lab 2: PCA](./Lab2-PCA.html)
### 3. Prediction with High Dimensional Predictors
- [Lecture 4-5: Prediction Theory](./prediction.html) [[PDF](./prediction.pdf)]
- [Lab 3: Penalized regression and prediction](./Lab3-Penalized-Regression.html)
- [AIC vs BIC illustrated](./AICvsBIC.html)
### 4. Sparse Singular Value Decomposition
- [Lecture 5: Sparse SVD](./sparseSvd.html) [[PDF](./sparseSvd.pdf)]
### 5. Linear discriminant analysis
- [Lecture 5: Linear Discriminant Analysis Theory](./lda.html) [[PDF](./lda.pdf)]
- [Lab 4: Sparse PCA and LDA](./Lab4-Sparse-PCA-LDA.html)
### 6. Clustering
- [Lecture 6: Introduction to Clustering](./hclust.html) [[PDF](./hclust.pdf)]
- [Lecture 6: Paper - Fraley and Raftery (1998). How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, (41)8:578-588.](https://sites.stat.washington.edu/people/raftery/Research/PDF/fraley1998.pdf)
- [Lecture 6: EM algorithm](./em.html) [[PDF](./em.pdf)]
- [Lab 5: Clustering](./Lab5-Clustering.html)
### 7. Large Scale inference
- [Lecture 7: Large Scale Inference](./lsi.html) [[PDF](./lsi.pdf)]
- [Lab 6: Large Scale Inference](./Lab6-Large-Scale-Inference.html)
---
## Homework assignments
- [Homework: Canonical Correlation Analysis](./HW-CCA.html) [[PDF](./HW-CCA.pdf)]
<!-- - [Group Project](./Project.html) [[PDF](./Project.pdf)] -->
---
## Instructors
- [Lieven Clement](./instructors.html)
- [Milan Malfait](./instructors.html)