This repository is regarding machine learning course i took at Udacity China.
- ENV : Anaconda + Jupyter notebook + python 2.7
- Try to write a decision tree by ourself and make predict accuracy greater than 80%
- ENV : Anaconda + Jupyter notebook + python 2.7
- Regression problem regarding supervised learning
- Use numpy API to calculate max,min,median and std(do not use pandas API std is different with numpy's std).
- Understanding the features
- Split data into train and test
- R2 score
- Understand learning curve when will overfit
- Understand K-fold cross-validation
- Understand and use scikit-learning API GridSearchCV to find best parameters
- Understand and use quartile calculator to analyze data
- Train your first model
- Predit test data and using R2 score to validate your prediction
- Check whether your model is robust
- ENV : Anaconda + Jupyter notebook + python 2.7
- Binary classification problem reagrding supervised learning
- Glance your data like how many data in total, how many categories in total ect.
- Using pandas API to extract your features and labels from pandas dataframe
- Explore your data and visulize some features which have very big and very small numbers
- Using log function to handle skewed data which find by last step
- Using sickit-learn MinMaxScaler to normalize your data
- Using pandas get_dummies to do one-hot encode
- Understand recall, precision, F-score and use F-score to evaluate your model
- Understand basic classification models like, decision tree, Navie Bayes, Bagging, Boosting, SVM and NN.
- Understand the pros and cons regarding above models.
- Choose 3 models and train them, find best models according to their F-score and accuracy
- Using GridSearhCV to find best parameters for your best model
- Using model.feature_importances_ to find most 5 important features
- Train and validate model only using most 5 important features to train your model
- Evaluate model via test data
- ENV : Anaconda + Jupyter notebook + python 2.7
- Clustering problem regarding unsupervised learning
- Using pandas API dataframe.describe to glance data
- Understand KDE and find correlations between features
- Understand correlation
- Using tukey method to find outliers and remove outliers
- Understand PCA and analyze PCA
- Use PCA to transform data
- Understand and explain bio-plot
- Unserstand and compare K-means with Gaussion Mixture
- Choose one as your model train and predict
- Use cluster result as one feature may help improve model performance(Unspuervised learning can help supervised learning)
- ENV : Anaconda + Jupyter notebook + python 3.5
- Reinforcement learning problem
- Understand Markov decision process
- Understand environment,state,action and reward
- Understand Q-learning
- Implement Q-learning in python
- Understand parameter like epsilon, gamma, alpha and epoch
- Train models with different parameters
- Understand how parameter will affect each other
- ENV : Anaconda + Jupyter notebook + python 3.5
- Understand Tensorflow and Keras
- Understand converlutional layer, max pooling layer, averaging layer and dense layer
- Understand transfer learning
- Construct your CNN via Keras API
- Using transfer learning train your model
- Evaluate your model use other images
- ENV : Anaconda + Jupyter notebook + python 3.5
- Write proposal
- Write your own code
- Write your report