Skip to content

Machine learning and Databases at CAUP/IA in 2019

License

Notifications You must be signed in to change notification settings

jorgehumberto/MLD2019

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine learning and Databases at CAUP/IA in 2019

We have started!

Course overview

This course is an advanced course at CAUP during March and April 2019. Lectures will take place on Mondays at 14:00 while practical classes will take place on Thursdays at 10:00. Both have duration 2 hours with a short break.

The aim of this course is to get a good practical grasp of machine learning. I will not spend a lot of time on algorithm details but more on how to use these in python and try to discuss what methods are useful for what type of scientific question/research goal.

March 4 - Managing data and simple regression
  • Covering git and SQL
  • Introducing machine learning through regression techniques.
March 11 - Visualisation and inference methods
  • Visualisation of data, do's and don't's
  • Classical inference
  • Bayesian inference
  • MCMC
March 18 - Density estimation and model choice
  • Estimating densities, parametric & non-parametric
  • Bias-variance trade-off
  • Cross-validation
  • Classification
March 25 - Dimensional reduction
  • Standardising data.
  • Principal Component Analysis
  • Manifold learning
April 8 - Ensemble methods, neural networks, deep learning
  • Local regression methods
  • Random forests and other boosting methods
  • Neural networks & deep learning

Literature for the course

I expect that you have read through these two documents:

  • A couple of Python & Topcat pointers. This is a very basic document and might not contain a lot of new stuff. It does have a couple of tasks to try out - the solution for these you can find in the [ProblemSets/0 - Pyton and Topcat](ProblemSets/0 - Pyton and Topcat) directory.

  • A reminder/intro to relevant math contains a summary of some basic facts from linear algebra and probability theory that are useful for this course.

Below you can find some books of use. The links from the titles get you to the Amazon page. If there are free versions of the books legally available online, I include a link as well.

-"Elements of Statistical Learning - Hastie et al, is a more advanced version of the Introduction to Statistical Learning with much the same authors. This is also freely available on the web.

Making a copy of the repository that you can edit

In this case you will want to fork the repository rather than just clone this. You can follow the instructions below (credit to Alexander Mechev for this) to create a fork of the repository:

Software you need for the course

The course will make use of python throughout, and for this you need a recent version of python installed. I use python 3 by default but will try to make all scripts compatible with python 2 and python 3. For python you will need (well, I recommend it at least) at least these libraries installed:

  • numpy - for numerical calculations
  • astropy - because we are astronomers
  • scipy - because we are scientists
  • sklearn - Machine learning libraries with full name scikit-learn.
  • matplotlib - plotting (you can use alternatives of course)
  • pandas - nice handling of data
  • seaborn - nice plots

(the last two are really "nice to have" but if you can install the others then these are easy).

You should also get astroML which has a nice web page at XX and a git repository at https://github.com/astroML/astroML

It turns out that the astroML distribution that is often picked up when you install it using a package manager (maybe also pip?) is outdated and does not work with new versions of sklearn. To check whether you have a problem, try:

from astroML.datasets import fetch_sdss_sspp

If this crashes with a complaint about a module GMM, you have the old version. To fix this the best way is probably to check out the git version of astroML linked above using e.g.:

git clone https://github.com/astroML/astroML.git

To use astroML in Anaconda you need to get it from the astropy channel. For a one-off you can do:

conda install -c astropy astroML

If you want to add the astropy channel permanently (which probably is a good idea), you can do:

conda config --add channels astropy

Lecture 1 - links and information

The slides are available in the Lectures directory. You can find some files for creating tables in the ProblemSets/MakeTables directory.

About

Machine learning and Databases at CAUP/IA in 2019

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • Python 1.1%
  • Other 0.8%