Skip to content

Qucy/UdacityMachineLearning

Repository files navigation

Udacity Machine Learning(Advanced)

This repository is regarding machine learning course i took at Udacity China.

Projects

P0 - Titanic survival exploration

  • ENV : Anaconda + Jupyter notebook + python 2.7
  • Try to write a decision tree by ourself and make predict accuracy greater than 80%

P1 - Boston housing

  • ENV : Anaconda + Jupyter notebook + python 2.7
  • Regression problem regarding supervised learning
  • Use numpy API to calculate max,min,median and std(do not use pandas API std is different with numpy's std).
  • Understanding the features
  • Split data into train and test
  • R2 score
  • Understand learning curve when will overfit
  • Understand K-fold cross-validation
  • Understand and use scikit-learning API GridSearchCV to find best parameters
  • Understand and use quartile calculator to analyze data
  • Train your first model
  • Predit test data and using R2 score to validate your prediction
  • Check whether your model is robust

P2 - Finding donors

  • ENV : Anaconda + Jupyter notebook + python 2.7
  • Binary classification problem reagrding supervised learning
  • Glance your data like how many data in total, how many categories in total ect.
  • Using pandas API to extract your features and labels from pandas dataframe
  • Explore your data and visulize some features which have very big and very small numbers
  • Using log function to handle skewed data which find by last step
  • Using sickit-learn MinMaxScaler to normalize your data
  • Using pandas get_dummies to do one-hot encode
  • Understand recall, precision, F-score and use F-score to evaluate your model
  • Understand basic classification models like, decision tree, Navie Bayes, Bagging, Boosting, SVM and NN.
  • Understand the pros and cons regarding above models.
  • Choose 3 models and train them, find best models according to their F-score and accuracy
  • Using GridSearhCV to find best parameters for your best model
  • Using model.feature_importances_ to find most 5 important features
  • Train and validate model only using most 5 important features to train your model
  • Evaluate model via test data

P3 - Creating customer segments

  • ENV : Anaconda + Jupyter notebook + python 2.7
  • Clustering problem regarding unsupervised learning
  • Using pandas API dataframe.describe to glance data
  • Understand KDE and find correlations between features
  • Understand correlation
  • Using tukey method to find outliers and remove outliers
  • Understand PCA and analyze PCA
  • Use PCA to transform data
  • Understand and explain bio-plot
  • Unserstand and compare K-means with Gaussion Mixture
  • Choose one as your model train and predict
  • Use cluster result as one feature may help improve model performance(Unspuervised learning can help supervised learning)

P4 - Qlearning robot new

  • ENV : Anaconda + Jupyter notebook + python 3.5
  • Reinforcement learning problem
  • Understand Markov decision process
  • Understand environment,state,action and reward
  • Understand Q-learning
  • Implement Q-learning in python
  • Understand parameter like epsilon, gamma, alpha and epoch
  • Train models with different parameters
  • Understand how parameter will affect each other

P5 - Dog project

  • ENV : Anaconda + Jupyter notebook + python 3.5
  • Understand Tensorflow and Keras
  • Understand converlutional layer, max pooling layer, averaging layer and dense layer
  • Understand transfer learning
  • Construct your CNN via Keras API
  • Using transfer learning train your model
  • Evaluate your model use other images

P6 - Capstone project dog_vs_cat

  • ENV : Anaconda + Jupyter notebook + python 3.5
  • Write proposal
  • Write your own code
  • Write your report

Nano Degree

About

Udacity Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published