Computer Science 499/599 at Northern Arizona University, Fall 2021
Topic: Unsupervised Learning
Dates: Aug 23 - Dec 10.
Meeting time/place:
- CS499, MWF 8-8:50AM, Engineering building, room 314.
- CS599, TuTh 8-9:15AM, Health Professions building, room 229.
Syllabus: Google Doc.
These provide background/theory about the algorithms we study in this class.
MLAPP by Murphy
- Author’s web page https://www.cs.ubc.ca/~murphyk/MLbook/
- full book online describing many machine learning algorithms from a computer science perspective.
ESL by Hastie, Tibshirani, Friedman
- Free PDF available from author’s web page https://web.stanford.edu/~hastie/ElemStatLearn/ describes many machine learning algorithms from a statistics perspective.
These provide practical advice about how to write the R code necessary for the homeworks.
Impatient R by Burns
Getting Started in R: Tinyverse Edition by Saghir Bashir and Dirk Eddelbuettel.
Tao Te Programming by Burns
- selected chapters from the book about how to become a good programmmer.
- web page with details about how to purchase the full book.
To do the homeworks you need to install the most recent version of R (4.1.1) with either the RStudio IDE (for beginners) or the ESS IDE (for students who already know/use emacs, or who want to learn, my emacs tutorials).
Folder of all class recordings and code demos (useful for homework) from last year on google drive. Yes you can copy and modify these code demos for your homework, since they are a part of the class material. But in general, copying code for your homework, from classmates or internet sources, is strictly forbidden and will be pursued as an academic integrity violation.
This General Usage Rubric will be used to grade the code quality/style/efficiency in each of your homeworks.
Folder with all code demos from this year.
Homework topics and readings for each week are listed below. The date of the Monday of each week is written. Each homework is due Friday of that week, 11:59PM.
- Aug 23, Homework 1: installing R, reading CSV, data visualization using ggplot2.
- introductory slides, intro to R video, ggplot intro, Data visualization chapter of R for Data Science, Grammar of graphics chapter of Animint2 Manual.
- Quizzes Due Tues Aug 24: R Basics 1, R ggplot 1. Due Thurs Aug 26: R ggplot 2, R ggplot 3.
- Aug 30, Homework 2: K-means.
- Slides, Introduction to clustering, MLAPP 25.1. Clustering evaluation, MLAPP-25.1.2. K-means is discussed in ESL-14.3.6, MLAPP-11.4.2.5.
- Quiz due Sun Aug 29: Clustering 1. due Tues Aug 31: Kmeans 1.
- Sept 6, Labor day 9/6. Homework 3. Gaussian mixture models
- ESL-14.3.7, MLAPP-11.4.2. mclust model names figure.
- Sept 13, Homework 4: Hierarchical Clustering
- ESL-14.3.12, MLAPP-25.5.1.
- Sept 20, Homework 5: Clustering model selection
- Estimating the number of clusters, ESL-14.3.11. Model selection for latent variable models, MLAPP-11.5.
- Sept 27, Review and exam
- Oct 4, Homework 6: Binary segmentation
- Intro to changepoint detection Truong et al. sections 1-2. Binary segmentation. Section 5.2.2. Estimating the number of changes. section 6.
- Oct 11, Homework 7: Optimal segmentation
- Optimal detection, Truong et al 5.1. Dynamic programming slides PDF. Models and cost functions, Truong et al section 4.
- Oct 18, Homework 8: Hidden Markov Models
- depmixS4 vignette section 2. Markov Models, MLAPP-17.2. Hidden Markov Models, MLAPP-17.3-5. Learning for HMMs, MLAPP-17.5.
- Oct 25, Homework 9: Segmentation model selection
- AIC/BIC, MLAPP-5.3.2.4, ESL-7.5. Changepoint ROC curve interactive data viz.
- Nov 1, Veterans day 11/11. Review and exam.
- Nov 8, Homework 10: Principal Components Analysis
- Principal Components Analysis, ESL-14.5. MLAPP-12.2.
- Nov 15, Homework 11: Auto-encoders
- Deep generative models, MLAPP-28.2 to 28.3. Deep auto-encoders, MLAPP-28.3.2. MLAPP-28.4.2 to 28.4.3.
- Nov 22, Thanksgiving 11/25-26. Homework 12. T-SNE.
- Nov 29, Reading week.
- Final exams. CS499 Mon Dec 6, 7:30-9:30. CS599 Thurs Dec 9, 7:30-9:30.
- Grad student R package coding project description.
- can I do my homework with an older version of R? Maybe, try it if you want, but homeworks will typically require using R packages, which are only tested with the most recent versions of R, so if you are getting errors with an old version of R, try upgrading to the most recent version.
Before class you should prepare by doing the suggested readings/videos. When you do that, write a summary in your own words of every section. Also write questions that you have during your reading so you can ask in class or office hours.
During class, take notes by writing what you understood in your own words. Also I would suggest to ask questions in class as soon as you need clarification.
After class, you should review your notes with one of your classmates (ask one of the students who seem to be correctly answering a lot of questions). Ask each other questions and try to teach/summarize some of the material with each other – that is one of the best ways to learn.
Finally after doing all of the above, please come to office hours (see syllabus), or email me to schedule a meeting.