From 677a6f9fef8eea4bf04490900ebe057a44befba2 Mon Sep 17 00:00:00 2001 From: rickiepark Date: Thu, 18 Oct 2018 11:29:30 +0900 Subject: [PATCH 01/12] Add Korean --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 0c6dbadd2..601ea172d 100644 --- a/README.md +++ b/README.md @@ -22,14 +22,14 @@ This repository aims at collaboratively translating our [Machine Learning cheats |Linear algebra|not started|not started|not started|done|not started| -|Cheatsheet topic|Polski|Suomi|Català|Українська| -|:---|:---:|:---:|:---:|:---:| -|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started| -|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started| -|Unsupervised learning|not started|not started|not started|not started| -|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started| -|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/64)| -|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started| +|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어| +|:---|:---:|:---:|:---:|:---:|:---:| +|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started| +|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|not started| +|Unsupervised learning|not started|not started|not started|not started|not started| +|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|not started| +|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/64)|not started| +|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|not started| If your favorite language is missing, please feel free to add it! From 68d957bbcfc421d0e688a17abe3ae64d676c9142 Mon Sep 17 00:00:00 2001 From: rickiepark Date: Fri, 19 Oct 2018 16:15:12 +0900 Subject: [PATCH 02/12] add ko folder --- ko/cheatsheet-deep-learning.md | 321 ++++++++++ ...tsheet-machine-learning-tips-and-tricks.md | 285 +++++++++ ko/cheatsheet-supervised-learning.md | 567 ++++++++++++++++++ ko/cheatsheet-unsupervised-learning.md | 340 +++++++++++ ko/refresher-linear-algebra.md | 339 +++++++++++ ko/refresher-probability.md | 381 ++++++++++++ 6 files changed, 2233 insertions(+) create mode 100644 ko/cheatsheet-deep-learning.md create mode 100644 ko/cheatsheet-machine-learning-tips-and-tricks.md create mode 100644 ko/cheatsheet-supervised-learning.md create mode 100644 ko/cheatsheet-unsupervised-learning.md create mode 100644 ko/refresher-linear-algebra.md create mode 100644 ko/refresher-probability.md diff --git a/ko/cheatsheet-deep-learning.md b/ko/cheatsheet-deep-learning.md new file mode 100644 index 000000000..a5aa3756c --- /dev/null +++ b/ko/cheatsheet-deep-learning.md @@ -0,0 +1,321 @@ +**1. Deep Learning cheatsheet** + +⟶ + +
+ +**2. Neural Networks** + +⟶ + +
+ +**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.** + +⟶ + +
+ +**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:** + +⟶ + +
+ +**5. [Input layer, hidden layer, output layer]** + +⟶ + +
+ +**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:** + +⟶ + +
+ +**7. where we note w, b, z the weight, bias and output respectively.** + +⟶ + +
+ +**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:** + +⟶ + +
+ +**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]** + +⟶ + +
+ +**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:** + +⟶ + +
+ +**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.** + +⟶ + +
+ +**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:** + +⟶ + +
+ +**13. As a result, the weight is updated as follows:** + +⟶ + +
+ +**14. Updating weights ― In a neural network, weights are updated as follows:** + +⟶ + +
+ +**15. Step 1: Take a batch of training data.** + +⟶ + +
+ +**16. Step 2: Perform forward propagation to obtain the corresponding loss.** + +⟶ + +
+ +**17. Step 3: Backpropagate the loss to get the gradients.** + +⟶ + +
+ +**18. Step 4: Use the gradients to update the weights of the network.** + +⟶ + +
+ +**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p** + +⟶ + +
+ +**20. Convolutional Neural Networks** + +⟶ + +
+ +**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:** + +⟶ + +
+ +**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:** + +⟶ + +
+ +**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.** + +⟶ + +
+ +**24. Recurrent Neural Networks** + +⟶ + +
+ +**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:** + +⟶ + +
+ +**26. [Input gate, forget gate, gate, output gate]** + +⟶ + +
+ +**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]** + +⟶ + +
+ +**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.** + +⟶ + +
+ +**29. Reinforcement Learning and Control** + +⟶ + +
+ +**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.** + +⟶ + +
+ +**31. Definitions** + +⟶ + +
+ +**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:** + +⟶ + +
+ +**33. S is the set of states** + +⟶ + +
+ +**34. A is the set of actions** + +⟶ + +
+ +**35. {Psa} are the state transition probabilities for s∈S and a∈A** + +⟶ + +
+ +**36. γ∈[0,1[ is the discount factor** + +⟶ + +
+ +**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize** + +⟶ + +
+ +**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.** + +⟶ + +
+ +**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).** + +⟶ + +
+ +**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:** + +⟶ + +
+ +**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:** + +⟶ + +
+ +**42. Remark: we note that the optimal policy π∗ for a given state s is such that:** + +⟶ + +
+ +**43. Value iteration algorithm ― The value iteration algorithm is in two steps:** + +⟶ + +
+ +**44. 1) We initialize the value:** + +⟶ + +
+ +**45. 2) We iterate the value based on the values before:** + +⟶ + +
+ +**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:** + +⟶ + +
+ +**47. times took action a in state s and got to s′** + +⟶ + +
+ +**48. times took action a in state s** + +⟶ + +
+ +**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:** + +⟶ + +
+ +**50. View PDF version on GitHub** + +⟶ + +
+ +**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]** + +⟶ + +
+ +**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]** + +⟶ + +
+ +**53. [Recurrent Neural Networks, Gates, LSTM]** + +⟶ + +
+ +**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]** + +⟶ diff --git a/ko/cheatsheet-machine-learning-tips-and-tricks.md b/ko/cheatsheet-machine-learning-tips-and-tricks.md new file mode 100644 index 000000000..9712297b8 --- /dev/null +++ b/ko/cheatsheet-machine-learning-tips-and-tricks.md @@ -0,0 +1,285 @@ +**1. Machine Learning tips and tricks cheatsheet** + +⟶ + +
+ +**2. Classification metrics** + +⟶ + +
+ +**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.** + +⟶ + +
+ +**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:** + +⟶ + +
+ +**5. [Predicted class, Actual class]** + +⟶ + +
+ +**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:** + +⟶ + +
+ +**7. [Metric, Formula, Interpretation]** + +⟶ + +
+ +**8. Overall performance of model** + +⟶ + +
+ +**9. How accurate the positive predictions are** + +⟶ + +
+ +**10. Coverage of actual positive sample** + +⟶ + +
+ +**11. Coverage of actual negative sample** + +⟶ + +
+ +**12. Hybrid metric useful for unbalanced classes** + +⟶ + +
+ +**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:** + +⟶ + +
+ +**14. [Metric, Formula, Equivalent]** + +⟶ + +
+ +**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:** + +⟶ + +
+ +**16. [Actual, Predicted]** + +⟶ + +
+ +**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:** + +⟶ + +
+ +**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]** + +⟶ + +
+ +**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:** + +⟶ + +
+ +**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:** + +⟶ + +
+ +**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.** + +⟶ + +
+ +**22. Model selection** + +⟶ + +
+ +**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:** + +⟶ + +
+ +**24. [Training set, Validation set, Testing set]** + +⟶ + +
+ +**25. [Model is trained, Model is assessed, Model gives predictions]** + +⟶ + +
+ +**26. [Usually 80% of the dataset, Usually 20% of the dataset]** + +⟶ + +
+ +**27. [Also called hold-out or development set, Unseen data]** + +⟶ + +
+ +**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:** + +⟶ + +
+ +**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:** + +⟶ + +
+ +**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]** + +⟶ + +
+ +**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]** + +⟶ + +
+ +**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.** + +⟶ + +
+ +**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:** + +⟶ + +
+ +**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]** + +⟶ + +
+ +**35. Diagnostics** + +⟶ + +
+ +**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.** + +⟶ + +
+ +**37. Variance ― The variance of a model is the variability of the model prediction for given data points.** + +⟶ + +
+ +**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.** + +⟶ + +
+ +**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]** + +⟶ + +
+ +**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]** + +⟶ + +
+ +**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]** + +⟶ + +
+ +**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.** + +⟶ + +
+ +**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.** + +⟶ + +
+ +**44. Regression metrics** + +⟶ + +
+ +**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]** + +⟶ + +
+ +**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]** + +⟶ + +
+ +**47. [Model selection, cross-validation, regularization]** + +⟶ + +
+ +**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]** + +⟶ diff --git a/ko/cheatsheet-supervised-learning.md b/ko/cheatsheet-supervised-learning.md new file mode 100644 index 000000000..a6b19ea1c --- /dev/null +++ b/ko/cheatsheet-supervised-learning.md @@ -0,0 +1,567 @@ +**1. Supervised Learning cheatsheet** + +⟶ + +
+ +**2. Introduction to Supervised Learning** + +⟶ + +
+ +**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.** + +⟶ + +
+ +**4. Type of prediction ― The different types of predictive models are summed up in the table below:** + +⟶ + +
+ +**5. [Regression, Classifier, Outcome, Examples]** + +⟶ + +
+ +**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]** + +⟶ + +
+ +**7. Type of model ― The different models are summed up in the table below:** + +⟶ + +
+ +**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]** + +⟶ + +
+ +**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary, Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]** + +⟶ + +
+ +**10. Notations and general concepts** + +⟶ + +
+ +**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).** + +⟶ + +
+ +**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:** + +⟶ + +
+ +**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]** + +⟶ + +
+ +**14. [Linear regression, Logistic regression, SVM, Neural Network]** + +⟶ + +
+ +**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:** + +⟶ + +
+ +**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:** + +⟶ + +
+ +**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.** + +⟶ + +
+ +**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:** + +⟶ + +
+ +**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:** + +⟶ + +
+ +**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:** + +⟶ + +
+ +**21. Linear models** + +⟶ + +
+ +**22. Linear regression** + +⟶ + +
+ +**23. We assume here that y|x;θ∼N(μ,σ2)** + +⟶ + +
+ +**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:** + +⟶ + +
+ +**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:** + +⟶ + +
+ +**26. Remark: the update rule is a particular case of the gradient ascent.** + +⟶ + +
+ +**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:** + +⟶ + +
+ +**28. Classification and logistic regression** + +⟶ + +
+ +**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:** + +⟶ + +
+ +**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:** + +⟶ + +
+ +**31. Remark: there is no closed form solution for the case of logistic regressions.** + +⟶ + +
+ +**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:** + +⟶ + +
+ +**33. Generalized Linear Models** + +⟶ + +
+ +**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:** + +⟶ + +
+ +**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.** + +⟶ + +
+ +**36. Here are the most common exponential distributions summed up in the following table:** + +⟶ + +
+ +**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]** + +⟶ + +
+ +**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:** + +⟶ + +
+ +**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.** + +⟶ + +
+ +**40. Support Vector Machines** + +⟶ + +
+ +**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.** + +⟶ + +
+ +**42: Optimal margin classifier ― The optimal margin classifier h is such that:** + +⟶ + +
+ +**43: where (w,b)∈Rn×R is the solution of the following optimization problem:** + +⟶ + +
+ +**44. such that** + +⟶ + +
+ +**45. support vectors** + +⟶ + +
+ +**46. Remark: the line is defined as wTx−b=0.** + +⟶ + +
+ +**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:** + +⟶ + +
+ +**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:** + +⟶ + +
+ +**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.** + +⟶ + +
+ +**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]** + +⟶ + +
+ +**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.** + +⟶ + +
+ +**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:** + +⟶ + +
+ +**53. Remark: the coefficients βi are called the Lagrange multipliers.** + +⟶ + +
+ +**54. Generative Learning** + +⟶ + +
+ +**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.** + +⟶ + +
+ +**56. Gaussian Discriminant Analysis** + +⟶ + +
+ +**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:** + +⟶ + +
+ +**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:** + +⟶ + +
+ +**59. Naive Bayes** + +⟶ + +
+ +**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:** + +⟶ + +
+ +**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]** + +⟶ + +
+ +**62. Remark: Naive Bayes is widely used for text classification and spam detection.** + +⟶ + +
+ +**63. Tree-based and ensemble methods** + +⟶ + +
+ +**64. These methods can be used for both regression and classification problems.** + +⟶ + +
+ +**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.** + +⟶ + +
+ +**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.** + +⟶ + +
+ +**67. Remark: random forests are a type of ensemble methods.** + +⟶ + +
+ +**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:** + +⟶ + +
+ +**69. [Adaptive boosting, Gradient boosting]** + +⟶ + +
+ +**70. High weights are put on errors to improve at the next boosting step** + +⟶ + +
+ +**71. Weak learners trained on remaining errors** + +⟶ + +
+ +**72. Other non-parametric approaches** + +⟶ + +
+ +**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.** + +⟶ + +
+ +**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.** + +⟶ + +
+ +**75. Learning Theory** + +⟶ + +
+ +**76. Union bound ― Let A1,...,Ak be k events. We have:** + +⟶ + +
+ +**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:** + +⟶ + +
+ +**78. Remark: this inequality is also known as the Chernoff bound.** + +⟶ + +
+ +**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:** + +⟶ + +
+ +**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: ** + +⟶ + +
+ +**81: the training and testing sets follow the same distribution ** + +⟶ + +
+ +**82. the training examples are drawn independently** + +⟶ + +
+ +**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:** + +⟶ + +
+ +**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:** + +⟶ + +
+ +**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.** + +⟶ + +
+ +**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.** + +⟶ + +
+ +**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:** + +⟶ + +
+ +**88. [Introduction, Type of prediction, Type of model]** + +⟶ + +
+ +**89. [Notations and general concepts, loss function, gradient descent, likelihood]** + +⟶ + +
+ +**90. [Linear models, linear regression, logistic regression, generalized linear models]** + +⟶ + +
+ +**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]** + +⟶ + +
+ +**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]** + +⟶ + +
+ +**93. [Trees and ensemble methods, CART, Random forest, Boosting]** + +⟶ + +
+ +**94. [Other methods, k-NN]** + +⟶ + +
+ +**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]** + +⟶ diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md new file mode 100644 index 000000000..827d815a3 --- /dev/null +++ b/ko/cheatsheet-unsupervised-learning.md @@ -0,0 +1,340 @@ +**1. Unsupervised Learning cheatsheet** + +⟶ + +
+ +**2. Introduction to Unsupervised Learning** + +⟶ + +
+ +**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.** + +⟶ + +
+ +**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:** + +⟶ + +
+ +**5. Clustering** + +⟶ + +
+ +**6. Expectation-Maximization** + +⟶ + +
+ +**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:** + +⟶ + +
+ +**8. [Setting, Latent variable z, Comments]** + +⟶ + +
+ +**9. [Mixture of k Gaussians, Factor analysis]** + +⟶ + +
+ +**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:** + +⟶ + +
+ +**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:** + +⟶ + +
+ +**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:** + +⟶ + +
+ +**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]** + +⟶ + +
+ +**14. k-means clustering** + +⟶ + +
+ +**15. We note c(i) the cluster of data point i and μj the center of cluster j.** + +⟶ + +
+ +**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:** + +⟶ + +
+ +**17. [Means initialization, Cluster assignment, Means update, Convergence]** + +⟶ + +
+ +**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:** + +⟶ + +
+ +**19. Hierarchical clustering** + +⟶ + +
+ +**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.** + +⟶ + +
+ +**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:** + +⟶ + +
+ +**22. [Ward linkage, Average linkage, Complete linkage]** + +⟶ + +
+ +**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]** + +⟶ + +
+ +**24. Clustering assessment metrics** + +⟶ + +
+ +**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.** + +⟶ + +
+ +**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:** + +⟶ + +
+ +**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as** + +⟶ + +
+ +**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:** + +⟶ + +
+ +**29. Dimension reduction** + +⟶ + +
+ +**30. Principal component analysis** + +⟶ + +
+ +**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.** + +⟶ + +
+ +**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** + +⟶ + +
+ +**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** + +⟶ + +
+ +**34. diagonal** + +⟶ + +
+ +**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.** + +⟶ + +
+ +**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k +dimensions by maximizing the variance of the data as follows:** + +⟶ + +
+ +**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.** + +⟶ + +
+ +**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.** + +⟶ + +
+ +**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.** + +⟶ + +
+ +**40. Step 4: Project the data on spanR(u1,...,uk).** + +⟶ + +
+ +**41. This procedure maximizes the variance among all k-dimensional spaces.** + +⟶ + +
+ +**42. [Data in feature space, Find principal components, Data in principal components space]** + +⟶ + +
+ +**43. Independent component analysis** + +⟶ + +
+ +**44. It is a technique meant to find the underlying generating sources.** + +⟶ + +
+ +**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:** + +⟶ + +
+ +**46. The goal is to find the unmixing matrix W=A−1.** + +⟶ + +
+ +**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:** + +⟶ + +
+ +**48. Write the probability of x=As=W−1s as:** + +⟶ + +
+ +**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:** + +⟶ + +
+ +**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:** + +⟶ + +
+ +**51. The Machine Learning cheatsheets are now available in Japanese.** + +⟶ + +
+ +**52. Original authors** + +⟶ + +
+ +**53. Translated by X, Y and Z** + +⟶ + +
+ +**54. Reviewed by X, Y and Z** + +⟶ + +
+ +**55. [Introduction, Motivation, Jensen's inequality]** + +⟶ + +
+ +**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** + +⟶ + +
+ +**57. [Dimension reduction, PCA, ICA]** + +⟶ diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md new file mode 100644 index 000000000..a6b440d1e --- /dev/null +++ b/ko/refresher-linear-algebra.md @@ -0,0 +1,339 @@ +**1. Linear Algebra and Calculus refresher** + +⟶ + +
+ +**2. General notations** + +⟶ + +
+ +**3. Definitions** + +⟶ + +
+ +**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:** + +⟶ + +
+ +**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:** + +⟶ + +
+ +**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.** + +⟶ + +
+ +**7. Main matrices** + +⟶ + +
+ +**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:** + +⟶ + +
+ +**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.** + +⟶ + +
+ +**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:** + +⟶ + +
+ +**11. Remark: we also note D as diag(d1,...,dn).** + +⟶ + +
+ +**12. Matrix operations** + +⟶ + +
+ +**13. Multiplication** + +⟶ + +
+ +**14. Vector-vector ― There are two types of vector-vector products:** + +⟶ + +
+ +**15. inner product: for x,y∈Rn, we have:** + +⟶ + +
+ +**16. outer product: for x∈Rm,y∈Rn, we have:** + +⟶ + +
+ +**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:** + +⟶ + +
+ +**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.** + +⟶ + +
+ +**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:** + +⟶ + +
+ +**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively** + +⟶ + +
+ +**21. Other operations** + +⟶ + +
+ +**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:** + +⟶ + +
+ +**23. Remark: for matrices A,B, we have (AB)T=BTAT** + +⟶ + +
+ +**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:** + +⟶ + +
+ +**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1** + +⟶ + +
+ +**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:** + +⟶ + +
+ +**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)** + +⟶ + +
+ +**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:** + +⟶ + +
+ +**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.** + +⟶ + +
+ +**30. Matrix properties** + +⟶ + +
+ +**31. Definitions** + +⟶ + +
+ +**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:** + +⟶ + +
+ +**33. [Symmetric, Antisymmetric]** + +⟶ + +
+ +**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:** + +⟶ + +
+ +**35. N(ax)=|a|N(x) for a scalar** + +⟶ + +
+ +**36. if N(x)=0, then x=0** + +⟶ + +
+ +**37. For x∈V, the most commonly used norms are summed up in the table below:** + +⟶ + +
+ +**38. [Norm, Notation, Definition, Use case]** + +⟶ + +
+ +**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.** + +⟶ + +
+ +**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent** + +⟶ + +
+ +**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.** + +⟶ + +
+ +**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:** + +⟶ + +
+ +**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.** + +⟶ + +
+ +**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** + +⟶ + +
+ +**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** + +⟶ + +
+ +**46. diagonal** + +⟶ + +
+ +**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:** + +⟶ + +
+ +**48. Matrix calculus** + +⟶ + +
+ +**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:** + +⟶ + +
+ +**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.** + +⟶ + +
+ +**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:** + +⟶ + +
+ +**52. Remark: the hessian of f is only defined when f is a function that returns a scalar** + +⟶ + +
+ +**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:** + +⟶ + +
+ +**54. [General notations, Definitions, Main matrices]** + +⟶ + +
+ +**55. [Matrix operations, Multiplication, Other operations]** + +⟶ + +
+ +**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]** + +⟶ + +
+ +**57. [Matrix calculus, Gradient, Hessian, Operations]** + +⟶ diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md new file mode 100644 index 000000000..5c9b34656 --- /dev/null +++ b/ko/refresher-probability.md @@ -0,0 +1,381 @@ +**1. Probabilities and Statistics refresher** + +⟶ + +
+ +**2. Introduction to Probability and Combinatorics** + +⟶ + +
+ +**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.** + +⟶ + +
+ +**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.** + +⟶ + +
+ +**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.** + +⟶ + +
+ +**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:** + +⟶ + +
+ +**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:** + +⟶ + +
+ +**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:** + +⟶ + +
+ +**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:** + +⟶ + +
+ +**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:** + +⟶ + +
+ +**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)** + +⟶ + +
+ +**12. Conditional Probability** + +⟶ + +
+ +**13. Bayes' rule ― For events A and B such that P(B)>0, we have:** + +⟶ + +
+ +**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)** + +⟶ + +
+ +**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:** + +⟶ + +
+ +**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).** + +⟶ + +
+ +**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:** + +⟶ + +
+ +**18. Independence ― Two events A and B are independent if and only if we have:** + +⟶ + +
+ +**19. Random Variables** + +⟶ + +
+ +**20. Definitions** + +⟶ + +
+ +**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.** + +⟶ + +
+ +**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:** + +⟶ + +
+ +**23. Remark: we have P(a + +**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.** + +⟶ + +
+ +**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.** + +⟶ + +
+ +**26. [Case, CDF F, PDF f, Properties of PDF]** + +⟶ + +
+ +**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:** + +⟶ + +
+ +**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:** + +⟶ + +
+ +**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:** + +⟶ + +
+ +**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:** + +⟶ + +
+ +**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:** + +⟶ + +
+ +**32. Probability Distributions** + +⟶ + +
+ +**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:** + +⟶ + +
+ +**34. Main distributions ― Here are the main distributions to have in mind:** + +⟶ + +
+ +**35. [Type, Distribution]** + +⟶ + +
+ +**36. Jointly Distributed Random Variables** + +⟶ + +
+ +**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have** + +⟶ + +
+ +**38. [Case, Marginal density, Cumulative function]** + +⟶ + +
+ +**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:** + +⟶ + +
+ +**40. Independence ― Two random variables X and Y are said to be independent if we have:** + +⟶ + +
+ +**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:** + +⟶ + +
+ +**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:** + +⟶ + +
+ +**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].** + +⟶ + +
+ +**44. Remark 2: If X and Y are independent, then ρXY=0.** + +⟶ + +
+ +**45. Parameter estimation** + +⟶ + +
+ +**46. Definitions** + +⟶ + +
+ +**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.** + +⟶ + +
+ +**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.** + +⟶ + +
+ +**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:** + +⟶ + +
+ +**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.** + +⟶ + +
+ +**51. Estimating the mean** + +⟶ + +
+ +**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:** + +⟶ + +
+ +**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.** + +⟶ + +
+ +**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:** + +⟶ + +
+ +**55. Estimating the variance** + +⟶ + +
+ +**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:** + +⟶ + +
+ +**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.** + +⟶ + +
+ +**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:** + +⟶ + +
+ +**59. [Introduction, Sample space, Event, Permutation]** + +⟶ + +
+ +**60. [Conditional probability, Bayes' rule, Independence]** + +⟶ + +
+ +**61. [Random variables, Definitions, Expectation, Variance]** + +⟶ + +
+ +**62. [Probability distributions, Chebyshev's inequality, Main distributions]** + +⟶ + +
+ +**63. [Jointly distributed random variables, Density, Covariance, Correlation]** + +⟶ + +
+ +**64. [Parameter estimation, Mean, Variance]** + +⟶ From 7e024b338069a608e373bfe92f975cf049289518 Mon Sep 17 00:00:00 2001 From: rickiepark Date: Fri, 19 Oct 2018 16:18:23 +0900 Subject: [PATCH 03/12] add ko contributor --- CONTRIBUTORS | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/CONTRIBUTORS b/CONTRIBUTORS index afd1d1f12..daf51b02a 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -65,6 +65,10 @@ --hi +--ko + + Haesun Park (translation of deep learning) + --ja --pt From 388ecfcc04df5af3c5f86c7406b77aa29447a537 Mon Sep 17 00:00:00 2001 From: rickiepark Date: Sat, 20 Oct 2018 12:33:40 +0900 Subject: [PATCH 04/12] draft of deep-learning --- ko/cheatsheet-deep-learning.md | 108 ++++++++++++++++----------------- 1 file changed, 54 insertions(+), 54 deletions(-) diff --git a/ko/cheatsheet-deep-learning.md b/ko/cheatsheet-deep-learning.md index a5aa3756c..f75d17c54 100644 --- a/ko/cheatsheet-deep-learning.md +++ b/ko/cheatsheet-deep-learning.md @@ -1,300 +1,300 @@ **1. Deep Learning cheatsheet** -⟶ +⟶ 딥러닝 치트시트
**2. Neural Networks** -⟶ +⟶ 신경망
**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.** -⟶ +⟶ 신경망(neural network)은 층(layer)으로 구성된 모델입니다. 널리 사용되는 신경망에는 합성곱 신경망(convolutional neural network)과 순환 신경망(recurrent neural network)이 있습니다.
**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:** -⟶ +⟶ 구조 - 신경망 구조에 관한 용어를 다음 그림에 표현했습니다:
**5. [Input layer, hidden layer, output layer]** -⟶ +⟶ [입력층, 은닉층, 출력층]
**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:** -⟶ +⟶ i 는 네트워크의 i 번째 층을 나타내고 j 는 각 층의 j 번째 은닉 유닛을 지칭합니다:
**7. where we note w, b, z the weight, bias and output respectively.** -⟶ +⟶ 여기에서 w, b, z 는 각각 가중치(weight), 절편(bias), 출력입니다.
**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:** -⟶ +⟶ 활성화 함수 - 활성화 함수는 은닉 유닛 다음에 추가하여 모델에 비선형성을 추가합니다. 자주 사용하는 함수들은 다음과 같습니다:
**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]** -⟶ +⟶ [시그모이드(Sigmoid), 하이퍼볼릭탄젠트(Tanh), 렐루(ReLU), Leaky 렐루(Leaky ReLU)]
**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:** -⟶ +⟶ 크로스 엔트로피(cross-entropy) 손실 - 신경망에서 널리 사용되는 크로스 엔트로피 손실 함수 L(z,y)는 다음과 같이 정의합니다:
**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.** -⟶ +⟶ 학습률 - 학습률은 종종 α 또는 η로 표시합니다. 이는 가중치 업데이트 양을 조절합니다. 학습률을 고정하거나 적응적으로 바꿀 수도 있습니다. 적응적 학습률 방법인 Adam이 현재 가장 인기가 많습니다.
**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:** -⟶ +⟶ 역전파(backpropagation) - 역전파는 실제 출력과 기대 출력을 비교하여 신경망의 가중치를 업데이트하는 방법입니다. 가중치 w에 대한 도함수는 연쇄 법칙(chain rule)을 사용해 구할 수 있으며 다음과 같습니다:
**13. As a result, the weight is updated as follows:** -⟶ +⟶ 결국 가중치 업데이트 식은 다음과 같습니다:
**14. Updating weights ― In a neural network, weights are updated as follows:** -⟶ +⟶ 가중치 업데이트 - 신경망에서 가중치는 다음 단계를 따라 업데이트됩니다:
**15. Step 1: Take a batch of training data.** -⟶ +⟶ 1 단계: 훈련 데이터의 배치를 만듭니다.
**16. Step 2: Perform forward propagation to obtain the corresponding loss.** -⟶ +⟶ 2 단계: 정방향 계산을 수행하여 배치에 해당하는 손실을 얻습니다.
**17. Step 3: Backpropagate the loss to get the gradients.** -⟶ +⟶ 3 단계: 손실을 역전파하여 그래디언트(gradient)를 구합니다.
**18. Step 4: Use the gradients to update the weights of the network.** -⟶ +⟶ 4 단계: 그래디언트를 사용해 네트워크의 가중치를 업데이트합니다.
**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p** -⟶ +⟶ 드롭아웃(dropout) - 드롭아웃은 신경망의 유닛을 꺼서 훈련 데이터에 과대적합(overfitting)되는 것을 막는 기법입니다. 실전에서는 확률 p로 유닛을 끄거나 확률 1-p로 유닛을 작동시킵니다.
**20. Convolutional Neural Networks** -⟶ +⟶ 합성곱 신경망
**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:** -⟶ +⟶ 합성곱 층의 조건 - 입력 크기를 W, 합성곱 층의 커널(kernel) 크기를 F, 제로 패딩(padding)을 P, 스트라이드(stride)를 S라 했을 때 필요한 뉴런의 수 N은 다음과 같습니다:
**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:** -⟶ +⟶ 배치 정규화(batch normalization) - 하이퍼파라미터 γ,β로 배치 {xi}를 정규화하는 단계입니다. 조정하려는 배치의 평균과 분산을 각각 μB,σ2B라고 했을 때 배치 정규화는 다음과 같습니다:
**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.** -⟶ +⟶ 보통 완전 연결(fully connected) 층이나 합성곱 층과 층의 활성화 함수 사이에 위치합니다. 배치 정규화를 적용하면 학습률을 높일 수 있고 초기화에 따른 의존성을 줄일 수 있습니다.
**24. Recurrent Neural Networks** -⟶ +⟶ 순환 신경망
**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:** -⟶ +⟶ 게이트(gate) 종류 - 전형적인 순환 신경망에서 볼 수 있는 게이트 종류는 다음과 같습니다:
**26. [Input gate, forget gate, gate, output gate]** -⟶ +⟶ [입력 게이트, 삭제 게이트, 게이트, 출력 게이트]
**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]** -⟶ +⟶ [셀(cell)에 기록할지 여부, 셀을 삭제할지 여부, 셀의 입력 조절, 셀의 출력 조절]
**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.** -⟶ +⟶ LSTM - 장 단기 메모리(long short-term memory, LSTM) 네트워크는 삭제 게이트를 추가하여 그래디언트 소실 문제를 완화한 RNN 모델입니다.
**29. Reinforcement Learning and Control** -⟶ +⟶ 강화 학습
**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.** -⟶ +⟶ 강화 학습의 목표는 주어진 환경에서 진화할 수 있는 에이전트를 학습시키는 것입니다.
**31. Definitions** -⟶ +⟶ 정의
**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:** -⟶ +⟶ 마르코프 결정 과정(Markov decision process) - 마르코프 결정 과정(MDP)는 다섯 개의 요소 (S,A,{Psa},γ,R)로 구성됩니다:
**33. S is the set of states** -⟶ +⟶ S는 상태의 집합입니다.
**34. A is the set of actions** -⟶ +⟶ A는 행동의 집합입니다.
**35. {Psa} are the state transition probabilities for s∈S and a∈A** -⟶ +⟶ {Psa}는 상태 전이 확률입니다. s∈S, a∈A 입니다.
**36. γ∈[0,1[ is the discount factor** -⟶ +⟶ γ∈[0,1]는 할인 계수(discount factor)입니다.
**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize** -⟶ +⟶ R:S×A⟶R or R:S⟶R 는 알고리즘이 최대화하려는 보상 함수(reward function)입니다.
**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.** -⟶ +⟶ 정책(policy) - 정책 π는 상태와 행동을 매핑한 함수 π:S⟶A 입니다.
**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).** -⟶ +⟶ 참고: 주어진 상태 s에서 행동 a=π(s)를 얻었을 때 정책 π를 실행한다고 말합니다.
**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:** -⟶ +⟶ 가치 함수(value function) - 정책 π와 상태 s가 주어졌을 때 가치 함수 Vπ를 다음과 같이 정의합니다:
**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:** -⟶ +⟶ 벨만(Bellman) 방정식 - 벨만 최적 방정식은 가치 함수 Vπ∗와 최적의 정책 π∗로 표현됩니다:
**42. Remark: we note that the optimal policy π∗ for a given state s is such that:** -⟶ +⟶ 참고: 주어진 상태 s에서 최적 정책 π∗는 다음과 같이 나타냅니다:
**43. Value iteration algorithm ― The value iteration algorithm is in two steps:** -⟶ +⟶ 가치 반복 알고리즘 - 가치 반복 알고리즘은 두 단계를 가집니다:
**44. 1) We initialize the value:** -⟶ +⟶ 1) 가치를 초기화합니다:
**45. 2) We iterate the value based on the values before:** -⟶ +⟶ 2) 이전 가치를 기반으로 가치를 반복합니다:
**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:** -⟶ +⟶ 최대 가능도 추정 - 상태 전이 함수를 위한 최대 가능도 추정은 다음과 같습니다:
**47. times took action a in state s and got to s′** -⟶ +⟶ 상태 s에 있는 행동 a를 선택하여 s′를 얻을 횟수
**48. times took action a in state s** -⟶ +⟶ 상태 s에 있는 행동 a를 선택한 횟수
**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:** -⟶ +⟶ Q-러닝(learning) - Q-러닝은 Q의 모델-프리(model-free) 추정으로 다음과 같습니다:
**50. View PDF version on GitHub** -⟶ +⟶ 깃허브(GitHub)에서 PDF 버전으로 보기
@@ -302,20 +302,20 @@ ⟶ -
+
[신경망, 구조, 활성화 함수, 역전파, 드롭아웃] **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]** -⟶ +⟶ [합성곱 신경망, 합성곱 층, 배치 정규화]
**53. [Recurrent Neural Networks, Gates, LSTM]** -⟶ +⟶ [순환 신경망, 게이트, LSTM]
**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]** -⟶ +⟶ [강화 학습, 마르코프 결정 과정, 가치/정책 반복, 근사 동적 계획법, 정책 탐색] From 5bd4e5226850954f1114d40b759e797c46a7f11f Mon Sep 17 00:00:00 2001 From: rickiepark Date: Sat, 20 Oct 2018 23:51:42 +0900 Subject: [PATCH 05/12] proofreading --- ko/cheatsheet-deep-learning.md | 56 +++++++++++++++++----------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/ko/cheatsheet-deep-learning.md b/ko/cheatsheet-deep-learning.md index f75d17c54..07b072566 100644 --- a/ko/cheatsheet-deep-learning.md +++ b/ko/cheatsheet-deep-learning.md @@ -12,37 +12,37 @@ **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.** -⟶ 신경망(neural network)은 층(layer)으로 구성된 모델입니다. 널리 사용되는 신경망에는 합성곱 신경망(convolutional neural network)과 순환 신경망(recurrent neural network)이 있습니다. +⟶ 신경망(neural network)은 층(layer)으로 구성되는 모델입니다. 합성곱 신경망(convolutional neural network)과 순환 신경망(recurrent neural network)이 널리 사용되는 신경망입니다.
**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:** -⟶ 구조 - 신경망 구조에 관한 용어를 다음 그림에 표현했습니다: +⟶ 구조 - 다음 그림에 신경망 구조에 관한 용어가 표현되어 있습니다:
**5. [Input layer, hidden layer, output layer]** -⟶ [입력층, 은닉층, 출력층] +⟶ [입력층(input layer), 은닉층(hidden layer), 출력층(output layer)]
**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:** -⟶ i 는 네트워크의 i 번째 층을 나타내고 j 는 각 층의 j 번째 은닉 유닛을 지칭합니다: +⟶ i는 네트워크의 i 번째 층을 나타내고 j는 각 층의 j 번째 은닉 유닛(hidden unit)을 지칭합니다:
**7. where we note w, b, z the weight, bias and output respectively.** -⟶ 여기에서 w, b, z 는 각각 가중치(weight), 절편(bias), 출력입니다. +⟶ 여기에서 w, b, z는 각각 가중치(weight), 절편(bias), 출력입니다.
**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:** -⟶ 활성화 함수 - 활성화 함수는 은닉 유닛 다음에 추가하여 모델에 비선형성을 추가합니다. 자주 사용하는 함수들은 다음과 같습니다: +⟶ 활성화 함수(activation function) - 활성화 함수는 은닉 유닛 다음에 추가하여 모델에 비선형성을 추가합니다. 다음과 같은 함수들을 자주 사용합니다:
@@ -60,19 +60,19 @@ **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.** -⟶ 학습률 - 학습률은 종종 α 또는 η로 표시합니다. 이는 가중치 업데이트 양을 조절합니다. 학습률을 고정하거나 적응적으로 바꿀 수도 있습니다. 적응적 학습률 방법인 Adam이 현재 가장 인기가 많습니다. +⟶ 학습률 - 학습률은 종종 α 또는 η로 표시하며 가중치 업데이트 양을 조절합니다. 학습률을 일정하게 고정하거나 적응적으로 바꿀 수도 있습니다. 적응적 학습률 방법인 Adam이 현재 가장 인기가 많습니다.
**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:** -⟶ 역전파(backpropagation) - 역전파는 실제 출력과 기대 출력을 비교하여 신경망의 가중치를 업데이트하는 방법입니다. 가중치 w에 대한 도함수는 연쇄 법칙(chain rule)을 사용해 구할 수 있으며 다음과 같습니다: +⟶ 역전파(backpropagation) - 역전파는 실제 출력과 기대 출력을 비교하여 신경망의 가중치를 업데이트하는 방법입니다. 연쇄 법칙(chain rule)으로 표현된 가중치 w에 대한 도함수는 다음과 같이 쓸 수 있습니다:
**13. As a result, the weight is updated as follows:** -⟶ 결국 가중치 업데이트 식은 다음과 같습니다: +⟶ 결국 가중치는 다음과 같이 업데이트됩니다:
@@ -84,13 +84,13 @@ **15. Step 1: Take a batch of training data.** -⟶ 1 단계: 훈련 데이터의 배치를 만듭니다. +⟶ 1 단계: 훈련 데이터의 배치(batch)를 만듭니다.
**16. Step 2: Perform forward propagation to obtain the corresponding loss.** -⟶ 2 단계: 정방향 계산을 수행하여 배치에 해당하는 손실을 얻습니다. +⟶ 2 단계: 정방향 계산을 수행하여 배치에 해당하는 손실(loss)을 얻습니다.
@@ -126,13 +126,13 @@ **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:** -⟶ 배치 정규화(batch normalization) - 하이퍼파라미터 γ,β로 배치 {xi}를 정규화하는 단계입니다. 조정하려는 배치의 평균과 분산을 각각 μB,σ2B라고 했을 때 배치 정규화는 다음과 같습니다: +⟶ 배치 정규화(batch normalization) - 하이퍼파라미터 γ,β로 배치 {xi}를 정규화하는 단계입니다. 조정하려는 배치의 평균과 분산을 각각 μB,σ2B라고 했을 때 배치 정규화는 다음과 같이 계산됩니다:
**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.** -⟶ 보통 완전 연결(fully connected) 층이나 합성곱 층과 층의 활성화 함수 사이에 위치합니다. 배치 정규화를 적용하면 학습률을 높일 수 있고 초기화에 따른 의존성을 줄일 수 있습니다. +⟶ 보통 완전 연결(fully connected)/합성곱 층과 활성화 함수 사이에 위치합니다. 배치 정규화를 적용하면 학습률을 높일 수 있고 초기화에 대한 의존도를 줄일 수 있습니다.
@@ -156,7 +156,7 @@ **27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]** -⟶ [셀(cell)에 기록할지 여부, 셀을 삭제할지 여부, 셀의 입력 조절, 셀의 출력 조절] +⟶ [셀(cell) 정보의 기록 여부, 셀 정보의 삭제 여부, 셀의 입력 조절, 셀의 출력 조절]
@@ -168,7 +168,7 @@ **29. Reinforcement Learning and Control** -⟶ 강화 학습 +⟶ 강화 학습(reinforcement learning)
@@ -186,25 +186,25 @@ **32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:** -⟶ 마르코프 결정 과정(Markov decision process) - 마르코프 결정 과정(MDP)는 다섯 개의 요소 (S,A,{Psa},γ,R)로 구성됩니다: +⟶ 마르코프 결정 과정(Markov decision process) - 마르코프 결정 과정(MDP)은 다섯 개의 요소 (S,A,{Psa},γ,R)로 구성됩니다:
**33. S is the set of states** -⟶ S는 상태의 집합입니다. +⟶ S는 상태(state)의 집합입니다.
**34. A is the set of actions** -⟶ A는 행동의 집합입니다. +⟶ A는 행동(action)의 집합입니다.
**35. {Psa} are the state transition probabilities for s∈S and a∈A** -⟶ {Psa}는 상태 전이 확률입니다. s∈S, a∈A 입니다. +⟶ {Psa}는 상태 전이 확률(state transition probability)입니다. s∈S, a∈A 입니다.
@@ -216,19 +216,19 @@ **37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize** -⟶ R:S×A⟶R or R:S⟶R 는 알고리즘이 최대화하려는 보상 함수(reward function)입니다. +⟶ R:S×A⟶R 또는 R:S⟶R 는 알고리즘이 최대화하려는 보상 함수(reward function)입니다.
**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.** -⟶ 정책(policy) - 정책 π는 상태와 행동을 매핑한 함수 π:S⟶A 입니다. +⟶ 정책(policy) - 정책 π는 상태와 행동을 매핑하는 함수 π:S⟶A 입니다.
**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).** -⟶ 참고: 주어진 상태 s에서 행동 a=π(s)를 얻었을 때 정책 π를 실행한다고 말합니다. +⟶ 참고: 상태 s가 주어졌을 때 정책 π를 실행하여 행동 a=π(s)를 선택한다고 말합니다.
@@ -246,7 +246,7 @@ **42. Remark: we note that the optimal policy π∗ for a given state s is such that:** -⟶ 참고: 주어진 상태 s에서 최적 정책 π∗는 다음과 같이 나타냅니다: +⟶ 참고: 주어진 상태 s에 대한 최적 정책 π∗는 다음과 같이 나타냅니다:
@@ -264,31 +264,31 @@ **45. 2) We iterate the value based on the values before:** -⟶ 2) 이전 가치를 기반으로 가치를 반복합니다: +⟶ 2) 이전 가치를 기반으로 다음 가치를 반복합니다:
**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:** -⟶ 최대 가능도 추정 - 상태 전이 함수를 위한 최대 가능도 추정은 다음과 같습니다: +⟶ 최대 가능도 추정 - 상태 전이 함수를 위한 최대 가능도(maximum likelihood) 추정은 다음과 같습니다:
**47. times took action a in state s and got to s′** -⟶ 상태 s에 있는 행동 a를 선택하여 s′를 얻을 횟수 +⟶ 상태 s에서 행동 a를 선택하여 s′를 얻을 횟수
**48. times took action a in state s** -⟶ 상태 s에 있는 행동 a를 선택한 횟수 +⟶ 상태 s에서 행동 a를 선택한 횟수
**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:** -⟶ Q-러닝(learning) - Q-러닝은 Q의 모델-프리(model-free) 추정으로 다음과 같습니다: +⟶ Q-러닝(learning) - Q-러닝은 다음과 같은 Q의 모델-프리(model-free) 추정입니다:
From da32daf9e288545711431fbda147218b6b5b88c8 Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 14:39:56 -0800 Subject: [PATCH 06/12] Delete refresher-probability.md --- ko/refresher-probability.md | 381 ------------------------------------ 1 file changed, 381 deletions(-) delete mode 100644 ko/refresher-probability.md diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md deleted file mode 100644 index 5c9b34656..000000000 --- a/ko/refresher-probability.md +++ /dev/null @@ -1,381 +0,0 @@ -**1. Probabilities and Statistics refresher** - -⟶ - -
- -**2. Introduction to Probability and Combinatorics** - -⟶ - -
- -**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.** - -⟶ - -
- -**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.** - -⟶ - -
- -**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.** - -⟶ - -
- -**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:** - -⟶ - -
- -**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:** - -⟶ - -
- -**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:** - -⟶ - -
- -**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:** - -⟶ - -
- -**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:** - -⟶ - -
- -**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)** - -⟶ - -
- -**12. Conditional Probability** - -⟶ - -
- -**13. Bayes' rule ― For events A and B such that P(B)>0, we have:** - -⟶ - -
- -**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)** - -⟶ - -
- -**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:** - -⟶ - -
- -**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).** - -⟶ - -
- -**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:** - -⟶ - -
- -**18. Independence ― Two events A and B are independent if and only if we have:** - -⟶ - -
- -**19. Random Variables** - -⟶ - -
- -**20. Definitions** - -⟶ - -
- -**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.** - -⟶ - -
- -**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:** - -⟶ - -
- -**23. Remark: we have P(a - -**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.** - -⟶ - -
- -**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.** - -⟶ - -
- -**26. [Case, CDF F, PDF f, Properties of PDF]** - -⟶ - -
- -**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:** - -⟶ - -
- -**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:** - -⟶ - -
- -**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:** - -⟶ - -
- -**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:** - -⟶ - -
- -**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:** - -⟶ - -
- -**32. Probability Distributions** - -⟶ - -
- -**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:** - -⟶ - -
- -**34. Main distributions ― Here are the main distributions to have in mind:** - -⟶ - -
- -**35. [Type, Distribution]** - -⟶ - -
- -**36. Jointly Distributed Random Variables** - -⟶ - -
- -**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have** - -⟶ - -
- -**38. [Case, Marginal density, Cumulative function]** - -⟶ - -
- -**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:** - -⟶ - -
- -**40. Independence ― Two random variables X and Y are said to be independent if we have:** - -⟶ - -
- -**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:** - -⟶ - -
- -**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:** - -⟶ - -
- -**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].** - -⟶ - -
- -**44. Remark 2: If X and Y are independent, then ρXY=0.** - -⟶ - -
- -**45. Parameter estimation** - -⟶ - -
- -**46. Definitions** - -⟶ - -
- -**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.** - -⟶ - -
- -**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.** - -⟶ - -
- -**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:** - -⟶ - -
- -**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.** - -⟶ - -
- -**51. Estimating the mean** - -⟶ - -
- -**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:** - -⟶ - -
- -**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.** - -⟶ - -
- -**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:** - -⟶ - -
- -**55. Estimating the variance** - -⟶ - -
- -**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:** - -⟶ - -
- -**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.** - -⟶ - -
- -**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:** - -⟶ - -
- -**59. [Introduction, Sample space, Event, Permutation]** - -⟶ - -
- -**60. [Conditional probability, Bayes' rule, Independence]** - -⟶ - -
- -**61. [Random variables, Definitions, Expectation, Variance]** - -⟶ - -
- -**62. [Probability distributions, Chebyshev's inequality, Main distributions]** - -⟶ - -
- -**63. [Jointly distributed random variables, Density, Covariance, Correlation]** - -⟶ - -
- -**64. [Parameter estimation, Mean, Variance]** - -⟶ From 3ccb2bc0c514465bc98058fb4ec2e1d169ed36cd Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 14:40:12 -0800 Subject: [PATCH 07/12] Delete refresher-linear-algebra.md --- ko/refresher-linear-algebra.md | 339 --------------------------------- 1 file changed, 339 deletions(-) delete mode 100644 ko/refresher-linear-algebra.md diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md deleted file mode 100644 index a6b440d1e..000000000 --- a/ko/refresher-linear-algebra.md +++ /dev/null @@ -1,339 +0,0 @@ -**1. Linear Algebra and Calculus refresher** - -⟶ - -
- -**2. General notations** - -⟶ - -
- -**3. Definitions** - -⟶ - -
- -**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:** - -⟶ - -
- -**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:** - -⟶ - -
- -**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.** - -⟶ - -
- -**7. Main matrices** - -⟶ - -
- -**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:** - -⟶ - -
- -**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.** - -⟶ - -
- -**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:** - -⟶ - -
- -**11. Remark: we also note D as diag(d1,...,dn).** - -⟶ - -
- -**12. Matrix operations** - -⟶ - -
- -**13. Multiplication** - -⟶ - -
- -**14. Vector-vector ― There are two types of vector-vector products:** - -⟶ - -
- -**15. inner product: for x,y∈Rn, we have:** - -⟶ - -
- -**16. outer product: for x∈Rm,y∈Rn, we have:** - -⟶ - -
- -**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:** - -⟶ - -
- -**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.** - -⟶ - -
- -**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:** - -⟶ - -
- -**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively** - -⟶ - -
- -**21. Other operations** - -⟶ - -
- -**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:** - -⟶ - -
- -**23. Remark: for matrices A,B, we have (AB)T=BTAT** - -⟶ - -
- -**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:** - -⟶ - -
- -**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1** - -⟶ - -
- -**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:** - -⟶ - -
- -**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)** - -⟶ - -
- -**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:** - -⟶ - -
- -**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.** - -⟶ - -
- -**30. Matrix properties** - -⟶ - -
- -**31. Definitions** - -⟶ - -
- -**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:** - -⟶ - -
- -**33. [Symmetric, Antisymmetric]** - -⟶ - -
- -**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:** - -⟶ - -
- -**35. N(ax)=|a|N(x) for a scalar** - -⟶ - -
- -**36. if N(x)=0, then x=0** - -⟶ - -
- -**37. For x∈V, the most commonly used norms are summed up in the table below:** - -⟶ - -
- -**38. [Norm, Notation, Definition, Use case]** - -⟶ - -
- -**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.** - -⟶ - -
- -**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent** - -⟶ - -
- -**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.** - -⟶ - -
- -**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:** - -⟶ - -
- -**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.** - -⟶ - -
- -**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** - -⟶ - -
- -**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** - -⟶ - -
- -**46. diagonal** - -⟶ - -
- -**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:** - -⟶ - -
- -**48. Matrix calculus** - -⟶ - -
- -**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:** - -⟶ - -
- -**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.** - -⟶ - -
- -**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:** - -⟶ - -
- -**52. Remark: the hessian of f is only defined when f is a function that returns a scalar** - -⟶ - -
- -**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:** - -⟶ - -
- -**54. [General notations, Definitions, Main matrices]** - -⟶ - -
- -**55. [Matrix operations, Multiplication, Other operations]** - -⟶ - -
- -**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]** - -⟶ - -
- -**57. [Matrix calculus, Gradient, Hessian, Operations]** - -⟶ From 55b9a719b822f8b8100381703a8192879e04d1ca Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 14:40:34 -0800 Subject: [PATCH 08/12] Delete cheatsheet-unsupervised-learning.md --- ko/cheatsheet-unsupervised-learning.md | 340 ------------------------- 1 file changed, 340 deletions(-) delete mode 100644 ko/cheatsheet-unsupervised-learning.md diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md deleted file mode 100644 index 827d815a3..000000000 --- a/ko/cheatsheet-unsupervised-learning.md +++ /dev/null @@ -1,340 +0,0 @@ -**1. Unsupervised Learning cheatsheet** - -⟶ - -
- -**2. Introduction to Unsupervised Learning** - -⟶ - -
- -**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.** - -⟶ - -
- -**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:** - -⟶ - -
- -**5. Clustering** - -⟶ - -
- -**6. Expectation-Maximization** - -⟶ - -
- -**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:** - -⟶ - -
- -**8. [Setting, Latent variable z, Comments]** - -⟶ - -
- -**9. [Mixture of k Gaussians, Factor analysis]** - -⟶ - -
- -**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:** - -⟶ - -
- -**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:** - -⟶ - -
- -**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:** - -⟶ - -
- -**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]** - -⟶ - -
- -**14. k-means clustering** - -⟶ - -
- -**15. We note c(i) the cluster of data point i and μj the center of cluster j.** - -⟶ - -
- -**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:** - -⟶ - -
- -**17. [Means initialization, Cluster assignment, Means update, Convergence]** - -⟶ - -
- -**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:** - -⟶ - -
- -**19. Hierarchical clustering** - -⟶ - -
- -**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.** - -⟶ - -
- -**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:** - -⟶ - -
- -**22. [Ward linkage, Average linkage, Complete linkage]** - -⟶ - -
- -**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]** - -⟶ - -
- -**24. Clustering assessment metrics** - -⟶ - -
- -**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.** - -⟶ - -
- -**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:** - -⟶ - -
- -**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as** - -⟶ - -
- -**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:** - -⟶ - -
- -**29. Dimension reduction** - -⟶ - -
- -**30. Principal component analysis** - -⟶ - -
- -**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.** - -⟶ - -
- -**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:** - -⟶ - -
- -**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:** - -⟶ - -
- -**34. diagonal** - -⟶ - -
- -**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.** - -⟶ - -
- -**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k -dimensions by maximizing the variance of the data as follows:** - -⟶ - -
- -**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.** - -⟶ - -
- -**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.** - -⟶ - -
- -**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.** - -⟶ - -
- -**40. Step 4: Project the data on spanR(u1,...,uk).** - -⟶ - -
- -**41. This procedure maximizes the variance among all k-dimensional spaces.** - -⟶ - -
- -**42. [Data in feature space, Find principal components, Data in principal components space]** - -⟶ - -
- -**43. Independent component analysis** - -⟶ - -
- -**44. It is a technique meant to find the underlying generating sources.** - -⟶ - -
- -**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:** - -⟶ - -
- -**46. The goal is to find the unmixing matrix W=A−1.** - -⟶ - -
- -**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:** - -⟶ - -
- -**48. Write the probability of x=As=W−1s as:** - -⟶ - -
- -**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:** - -⟶ - -
- -**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:** - -⟶ - -
- -**51. The Machine Learning cheatsheets are now available in Japanese.** - -⟶ - -
- -**52. Original authors** - -⟶ - -
- -**53. Translated by X, Y and Z** - -⟶ - -
- -**54. Reviewed by X, Y and Z** - -⟶ - -
- -**55. [Introduction, Motivation, Jensen's inequality]** - -⟶ - -
- -**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]** - -⟶ - -
- -**57. [Dimension reduction, PCA, ICA]** - -⟶ From cbc95dfc86c4d0b3874a354b53f1d0323fd1b174 Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 14:40:54 -0800 Subject: [PATCH 09/12] Delete cheatsheet-supervised-learning.md --- ko/cheatsheet-supervised-learning.md | 567 --------------------------- 1 file changed, 567 deletions(-) delete mode 100644 ko/cheatsheet-supervised-learning.md diff --git a/ko/cheatsheet-supervised-learning.md b/ko/cheatsheet-supervised-learning.md deleted file mode 100644 index a6b19ea1c..000000000 --- a/ko/cheatsheet-supervised-learning.md +++ /dev/null @@ -1,567 +0,0 @@ -**1. Supervised Learning cheatsheet** - -⟶ - -
- -**2. Introduction to Supervised Learning** - -⟶ - -
- -**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.** - -⟶ - -
- -**4. Type of prediction ― The different types of predictive models are summed up in the table below:** - -⟶ - -
- -**5. [Regression, Classifier, Outcome, Examples]** - -⟶ - -
- -**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]** - -⟶ - -
- -**7. Type of model ― The different models are summed up in the table below:** - -⟶ - -
- -**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]** - -⟶ - -
- -**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary, Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]** - -⟶ - -
- -**10. Notations and general concepts** - -⟶ - -
- -**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).** - -⟶ - -
- -**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:** - -⟶ - -
- -**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]** - -⟶ - -
- -**14. [Linear regression, Logistic regression, SVM, Neural Network]** - -⟶ - -
- -**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:** - -⟶ - -
- -**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:** - -⟶ - -
- -**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.** - -⟶ - -
- -**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:** - -⟶ - -
- -**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:** - -⟶ - -
- -**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:** - -⟶ - -
- -**21. Linear models** - -⟶ - -
- -**22. Linear regression** - -⟶ - -
- -**23. We assume here that y|x;θ∼N(μ,σ2)** - -⟶ - -
- -**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:** - -⟶ - -
- -**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:** - -⟶ - -
- -**26. Remark: the update rule is a particular case of the gradient ascent.** - -⟶ - -
- -**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:** - -⟶ - -
- -**28. Classification and logistic regression** - -⟶ - -
- -**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:** - -⟶ - -
- -**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:** - -⟶ - -
- -**31. Remark: there is no closed form solution for the case of logistic regressions.** - -⟶ - -
- -**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:** - -⟶ - -
- -**33. Generalized Linear Models** - -⟶ - -
- -**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:** - -⟶ - -
- -**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.** - -⟶ - -
- -**36. Here are the most common exponential distributions summed up in the following table:** - -⟶ - -
- -**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]** - -⟶ - -
- -**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:** - -⟶ - -
- -**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.** - -⟶ - -
- -**40. Support Vector Machines** - -⟶ - -
- -**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.** - -⟶ - -
- -**42: Optimal margin classifier ― The optimal margin classifier h is such that:** - -⟶ - -
- -**43: where (w,b)∈Rn×R is the solution of the following optimization problem:** - -⟶ - -
- -**44. such that** - -⟶ - -
- -**45. support vectors** - -⟶ - -
- -**46. Remark: the line is defined as wTx−b=0.** - -⟶ - -
- -**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:** - -⟶ - -
- -**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:** - -⟶ - -
- -**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.** - -⟶ - -
- -**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]** - -⟶ - -
- -**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.** - -⟶ - -
- -**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:** - -⟶ - -
- -**53. Remark: the coefficients βi are called the Lagrange multipliers.** - -⟶ - -
- -**54. Generative Learning** - -⟶ - -
- -**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.** - -⟶ - -
- -**56. Gaussian Discriminant Analysis** - -⟶ - -
- -**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:** - -⟶ - -
- -**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:** - -⟶ - -
- -**59. Naive Bayes** - -⟶ - -
- -**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:** - -⟶ - -
- -**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]** - -⟶ - -
- -**62. Remark: Naive Bayes is widely used for text classification and spam detection.** - -⟶ - -
- -**63. Tree-based and ensemble methods** - -⟶ - -
- -**64. These methods can be used for both regression and classification problems.** - -⟶ - -
- -**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.** - -⟶ - -
- -**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.** - -⟶ - -
- -**67. Remark: random forests are a type of ensemble methods.** - -⟶ - -
- -**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:** - -⟶ - -
- -**69. [Adaptive boosting, Gradient boosting]** - -⟶ - -
- -**70. High weights are put on errors to improve at the next boosting step** - -⟶ - -
- -**71. Weak learners trained on remaining errors** - -⟶ - -
- -**72. Other non-parametric approaches** - -⟶ - -
- -**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.** - -⟶ - -
- -**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.** - -⟶ - -
- -**75. Learning Theory** - -⟶ - -
- -**76. Union bound ― Let A1,...,Ak be k events. We have:** - -⟶ - -
- -**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:** - -⟶ - -
- -**78. Remark: this inequality is also known as the Chernoff bound.** - -⟶ - -
- -**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:** - -⟶ - -
- -**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: ** - -⟶ - -
- -**81: the training and testing sets follow the same distribution ** - -⟶ - -
- -**82. the training examples are drawn independently** - -⟶ - -
- -**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:** - -⟶ - -
- -**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:** - -⟶ - -
- -**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.** - -⟶ - -
- -**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.** - -⟶ - -
- -**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:** - -⟶ - -
- -**88. [Introduction, Type of prediction, Type of model]** - -⟶ - -
- -**89. [Notations and general concepts, loss function, gradient descent, likelihood]** - -⟶ - -
- -**90. [Linear models, linear regression, logistic regression, generalized linear models]** - -⟶ - -
- -**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]** - -⟶ - -
- -**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]** - -⟶ - -
- -**93. [Trees and ensemble methods, CART, Random forest, Boosting]** - -⟶ - -
- -**94. [Other methods, k-NN]** - -⟶ - -
- -**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]** - -⟶ From 8eb764f634fcccd1caac950445b939c317b3eac4 Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 14:41:09 -0800 Subject: [PATCH 10/12] Delete cheatsheet-machine-learning-tips-and-tricks.md --- ...tsheet-machine-learning-tips-and-tricks.md | 285 ------------------ 1 file changed, 285 deletions(-) delete mode 100644 ko/cheatsheet-machine-learning-tips-and-tricks.md diff --git a/ko/cheatsheet-machine-learning-tips-and-tricks.md b/ko/cheatsheet-machine-learning-tips-and-tricks.md deleted file mode 100644 index 9712297b8..000000000 --- a/ko/cheatsheet-machine-learning-tips-and-tricks.md +++ /dev/null @@ -1,285 +0,0 @@ -**1. Machine Learning tips and tricks cheatsheet** - -⟶ - -
- -**2. Classification metrics** - -⟶ - -
- -**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.** - -⟶ - -
- -**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:** - -⟶ - -
- -**5. [Predicted class, Actual class]** - -⟶ - -
- -**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:** - -⟶ - -
- -**7. [Metric, Formula, Interpretation]** - -⟶ - -
- -**8. Overall performance of model** - -⟶ - -
- -**9. How accurate the positive predictions are** - -⟶ - -
- -**10. Coverage of actual positive sample** - -⟶ - -
- -**11. Coverage of actual negative sample** - -⟶ - -
- -**12. Hybrid metric useful for unbalanced classes** - -⟶ - -
- -**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:** - -⟶ - -
- -**14. [Metric, Formula, Equivalent]** - -⟶ - -
- -**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:** - -⟶ - -
- -**16. [Actual, Predicted]** - -⟶ - -
- -**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:** - -⟶ - -
- -**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]** - -⟶ - -
- -**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:** - -⟶ - -
- -**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:** - -⟶ - -
- -**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.** - -⟶ - -
- -**22. Model selection** - -⟶ - -
- -**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:** - -⟶ - -
- -**24. [Training set, Validation set, Testing set]** - -⟶ - -
- -**25. [Model is trained, Model is assessed, Model gives predictions]** - -⟶ - -
- -**26. [Usually 80% of the dataset, Usually 20% of the dataset]** - -⟶ - -
- -**27. [Also called hold-out or development set, Unseen data]** - -⟶ - -
- -**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:** - -⟶ - -
- -**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:** - -⟶ - -
- -**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]** - -⟶ - -
- -**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]** - -⟶ - -
- -**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.** - -⟶ - -
- -**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:** - -⟶ - -
- -**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]** - -⟶ - -
- -**35. Diagnostics** - -⟶ - -
- -**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.** - -⟶ - -
- -**37. Variance ― The variance of a model is the variability of the model prediction for given data points.** - -⟶ - -
- -**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.** - -⟶ - -
- -**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]** - -⟶ - -
- -**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]** - -⟶ - -
- -**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]** - -⟶ - -
- -**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.** - -⟶ - -
- -**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.** - -⟶ - -
- -**44. Regression metrics** - -⟶ - -
- -**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]** - -⟶ - -
- -**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]** - -⟶ - -
- -**47. [Model selection, cross-validation, regularization]** - -⟶ - -
- -**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]** - -⟶ From e2cd78004b09dacc8a97d41e802b06c0bd3957b9 Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 17:03:05 -0800 Subject: [PATCH 11/12] Add contributors --- CONTRIBUTORS | 447 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 367 insertions(+), 80 deletions(-) diff --git a/CONTRIBUTORS b/CONTRIBUTORS index daf51b02a..ea1bc89f8 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -1,106 +1,393 @@ --ar +cs-229-deep-learning + Amjad Khatabi (translation) + Zaid Alyafeai (review) + +cs-229-linear-algebra + Zaid Alyafeai (translation) + Amjad Khatabi (review) + Mazen Melibari (review) + +cs-229-machine-learning-tips-and-tricks + Fares Al-Qunaieer (translation) + Zaid Alyafeai (review) + +cs-229-probability + Mahmoud Aslan (translation) + Fares Al-Qunaieer (review) + +cs-229-supervised-learning + Fares Al-Qunaieer (translation) + Zaid Alyafeai (review) + +cs-229-unsupervised-learning + Redouane Lguensat (translation) + Fares Al-Qunaieer (review) --de +cs-229-deep-learning + Philip Düe (translation) + Bettina Schlager (review) + +--es +cs-229-deep-learning + Erick Gabriel Mendoza Flores (translation) + Fernando Diaz (review) + Fernando González-Herrera (review) + Mariano Ramirez (review) + Juan P. Chavat (review) + Alonso Melgar López (review) + Gustavo Velasco-Hernández (review) + Juan Manuel Nava Zamudio (review) + +cs-229-linear-algebra + Fernando González-Herrera (translation) + Fernando Diaz (review) + Gustavo Velasco-Hernández (review) + Juan P. Chavat (review) + +cs-229-machine-learning-tips-and-tricks + David Jiménez Paredes (translation) + Fernando Diaz (translation) + Gustavo Velasco-Hernández (review) + Alonso Melgar-Lopez (review) + +cs-229-probability + Fermin Ordaz (translation) + Fernando González-Herrera (review) + Alonso Melgar López (review) + +cs-229-supervised-learning + Juan P. Chavat (translation) + Fernando Gonzalez-Herrera (review) + Fernando Diaz (review) + Alonso Melgar-Lopez (review) + +cs-229-unsupervised-learning + Jaime Noel Alvarez Luna (translation) + Alonso Melgar López (review) + Fernando Diaz (review) + +--et +cs-229-machine-learning-tips-and-tricks + kenkyusha (translation) + stemajo (review) ---es - Erick Gabriel Mendoza Flores (translation of deep learning) - Fernando Diaz (review of deep learning) - Fernando González-Herrera (review of deep learning) - Mariano Ramirez (review of deep learning) - Juan P. Chavat (review of deep learning) - Alonso Melgar López (review of deep learning) - Gustavo Velasco-Hernández (review of deep learning) - Juan Manuel Nava Zamudio (review of deep learning) - - Fernando González-Herrera (translation of linear algebra) - Fernando Diaz (review of linear algebra) - Gustavo Velasco-Hernández (review of linear algebra) - Juan P. Chavat (review of linear algebra) - - David Jiménez Paredes (translation of machine learning tips and tricks) - Fernando Diaz (translation of machine learning tips and tricks) - Gustavo Velasco-Hernández (review of machine learning tips and tricks) - Alonso Melgar-Lopez (review of machine learning tips and tricks) - - Fermin Ordaz (translation of probabilities and statistics) - Fernando González-Herrera (review of probabilities and statistics) - Alonso Melgar López (review of probabilities and statistics) - - Juan P. Chavat (translation of supervised learning) - Fernando Gonzalez-Herrera (review of supervised learning) - Fernando Diaz (review of supervised learning) - Alonso Melgar-Lopez (review of supervised learning) - - Jaime Noel Alvarez Luna (translation of unsupervised learning) - Alonso Melgar López (review of unsupervised learning) - Fernando Diaz (review of unsupervised learning) - --fa - AlisterTA (translation of deep learning) - Mohammad Karimi (review of deep learning) - Erfan Noury (review of deep learning) - - Erfan Noury (translation of linear algebra) - Mohammad Karimi (review of linear algebra) - - AlisterTA (translation of machine learning tips and tricks) - Mohammad Reza (translation of machine learning tips and tricks) - Erfan Noury (review of machine learning tips and tricks) - Mohammad Karimi (review of machine learning tips and tricks) - - Erfan Noury (translation of probabilities and statistics) - Mohammad Karimi (review of probabilities and statistics) - - Amirhosein Kazemnejad (translation of supervised learning) - Erfan Noury (review of supervised learning) - Mohammad Karimi (review of supervised learning) - - Erfan Noury (translation of unsupervised learning) - Mohammad Karimi (review of unsupervised learning) - +cs-229-deep-learning + AlisterTA (translation) + Mohammad Karimi (review) + Erfan Noury (review) + +cs-229-linear-algebra + Erfan Noury (translation) + Mohammad Karimi (review) + +cs-229-machine-learning-tips-and-tricks + AlisterTA (translation) + Mohammad Reza (translation) + Erfan Noury (review) + Mohammad Karimi (review) + +cs-229-probability + Erfan Noury (translation) + Mohammad Karimi (review) + +cs-229-supervised-learning + Amirhosein Kazemnejad (translation) + Erfan Noury (review) + Mohammad Karimi (review) + +cs-229-unsupervised-learning + Erfan Noury (translation) + Mohammad Karimi (review) + +cs-230-convolutional-neural-networks + AlisterTA (translation) + Ehsan Kermani (translation) + Erfan Noury (review) + +cs-230-deep-learning-tips-and-tricks + AlisterTA (translation) + Erfan Noury (review) + +cs-230-recurrent-neural-networks + AlisterTA (translation) + Erfan Noury (review) + --fr - Original authors +Original authors --he --hi +--id +cs-229-linear-algebra + Prasetia Utama Putra (translation) + Gunawan Tri (review) + Jimmy The Lecturer (review) + +cs-229-probability + Prasetia Utama Putra (translation) + Jimmy The Lecturer (review) + +cs-230-convolutional-neural-networks + Prasetia Utama Putra (translation) + Gunawan Tri (review) + Jimmy The Lecturer (review) + +--it +cs-229-linear-algebra + Alessandro Piotti (translation) + Nicola Dall'Asen (review) + +cs-229-probability + Nicola Dall'Asen (translation) + Alessandro Piotti (review) + --ko +cs-229-deep-learning + Haesun Park (translation) + Danny Toeun Kim (review) + +cs-229-linear-algebra + Soyoung Lee (translation) - Haesun Park (translation of deep learning) +cs-229-machine-learning-tips-and-tricks + Wooil Jeong (translation) + +cs-229-probability + Wooil Jeong (translation) + +cs-229-unsupervised-learning + Kwang Hyeok Ahn (translation) + +cs-230-convolutional-neural-networks + Soyoung Lee (translation) + Jack Kang (review) --ja +cs-229-deep-learning + Taichi Kato (translation) + Dan Lillrank (review) + Yoshiyuki Nakai (review) + Yuki Tokyo (review) + +cs-229-linear-algebra + Robert Altena (translation) + Kamuela Lau (review) + +cs-229-machine-learning-tips-and-tricks + UMU (translation) + Hiroki Mori (review) + H. Hamano (review) + Tian-Jian Jiang (review) + Yuta Kanzawa (review) + +cs-229-probability + Takatoshi Nao (translation) + Yuta Kanzawa (review) + +cs-229-supervised-learning + Yuta Kanzawa (translation) + Tran Tuan Anh (review) + +cs-229-unsupervised-learning + Tran Tuan Anh (translation) + Yoshiyuki Nakai (review) + Yuta Kanzawa (review) + Dan Lillrank (review) + +cs-230-convolutional-neural-networks + Tran Tuan Anh (translation) + Yoshiyuki Nakai (review) + Linh Dang (review) + +cs-230-deep-learning-tips-and-tricks + Kamuela Lau (translation) + Yoshiyuki Nakai (review) + Hiroki Mori (review) + +cs-230-recurrent-neural-networks + H. Hamano (translation) + Yoshiyuki Nakai (review) --pt - Gabriel Fonseca (translation of deep learning) - Leticia Portella (review of deep learning) +cs-229-deep-learning + Gabriel Fonseca (translation) + Leticia Portella (review) + Renato Kano (review) + +cs-229-linear-algebra + Gabriel Fonseca (translation) + Leticia Portella (review) + +cs-229-machine-learning-tips-and-tricks + Fernando Santos (translation) + Leticia Portella (review) + Gabriel Fonseca (review) - Gabriel Fonseca (translation of linear algebra) - Leticia Portella (review of linear algebra) +cs-229-probability + Leticia Portella (translation) + Flavio Clesio (review) - Leticia Portella (translation of probability) - Flavio Clesio (review of probability) +cs-229-supervised-learning + Leticia Portella (translation) + Gabriel Fonseca (review) + Flavio Clesio (review) - Leticia Portella (translation of supervised learning) - Gabriel Fonseca (review of supervised learning) - Flavio Clesio (review of supervised learning) - - Gabriel Fonseca (translation of unsupervised learning) - Tiago Danin (review of unsupervised learning) +cs-229-unsupervised-learning + Gabriel Fonseca (translation) + Tiago Danin (review) + +cs-230-convolutional-neural-networks + Leticia Portella (translation) + Gabriel Aparecido Fonseca (review) --tr - Ekrem Çetinkaya (translation of deep learning) - Omer Bukte (review of deep learning) - - Kadir Tekeli (translation of linear algebra) - Ekrem Çetinkaya (review of linear algebra) - +cs-221-logic-models + Ayyüce Kızrak (translation) + Başak Buluz (review) + +cs-221-reflex-models + Yavuz Kömeçoğlu (translation) + Ayyüce Kızrak (review) + +cs-221-states-models + Cemal Gurpinar (translation) + Başak Buluz (review) + +cs-221-variables-models + Başak Buluz (translation) + Ayyüce Kızrak (review) + +cs-229-deep-learning + Ekrem Çetinkaya (translation) + Omer Bukte (review) + +cs-229-linear-algebra + Kadir Tekeli (translation) + Ekrem Çetinkaya (review) + +cs-229-machine-learning-tips-and-tricks + Seray Beşer (translation) + Ayyüce Kızrak (review) + Yavuz Kömeçoğlu (review) + +cs-229-probability + Ayyüce Kızrak (translation) + Başak Buluz (review) + +cs-229-supervised-learning + Başak Buluz (translation) + Ayyüce Kızrak (review) + +cs-229-unsupervised-learning + Yavuz Kömeçoğlu (translation) + Başak Buluz (review) + +cs-230-convolutional-neural-networks + Ayyüce Kızrak (translation) + Yavuz Kömeçoğlu (review) + +cs-230-deep-learning-tips-and-tricks + Ayyüce Kızrak (translation) + Yavuz Kömeçoğlu (review) + +cs-230-recurrent-neural-networks + Başak Buluz (translation) + Yavuz Kömeçoğlu (review) + +--uk +cs-229-probability + Gregory Reshetniak (translation) + Denys (review) + +--vi +cs-221-logic-models + Hoàng Minh Tuấn (translation) + Đàm Minh Tiến (review) + +cs-229-deep-learning + Trần Tuấn Anh (translation) + Phạm Hồng Vinh (review) + Đàm Minh Tiến (review) + Nguyễn Khánh Hưng (review) + Hoàng Vũ Đạt (review) + Nguyễn Trí Minh (review) + +cs-229-linear-algebra + Hoàng Minh Tuấn (translation) + Phạm Hồng Vinh (review) + +cs-229-machine-learning-tips-and-tricks + Trần Tuấn Anh (translation) + Nguyễn Trí Minh (review) + Vinh Pham (review) + Đàm Minh Tiến (review) + +cs-229-probability + Hoàng Minh Tuấn (translation) + Hung Nguyễn (review) + +cs-229-supervised-learning + Trần Tuấn Anh (translation) + Đàm Minh Tiến (review) + Hung Nguyễn (review) + Nguyễn Trí Minh (review) + +cs-229-unsupervised-learning + Trần Tuấn Anh (translation) + Đàm Minh Tiến (review) + +cs-230-convolutional-neural-networks + Phạm Hồng Vinh (translation) + Đàm Minh Tiến (review) + +cs-230-deep-learning-tips-and-tricks + Hoàng Minh Tuấn (translation) + Trần Tuấn Anh (review) + Đàm Minh Tiến (review) + +cs-230-recurrent-neural-networks + Trần Tuấn Anh (translation) + Đàm Minh Tiến (review) + Hung Nguyễn (review) + Nguyễn Trí Minh (review) + --zh - Wang Hongnian (translation of supervised learning) - Xiaohu Zhu (朱小虎) (review of supervised learning) - Chaoying Xue (review of supervised learning) +cs-229-supervised-learning + Wang Hongnian (translation) + Xiaohu Zhu (朱小虎) (review) + Chaoying Xue (review) --zh-tw - kevingo (translation of deep learning) - TobyOoO (review of deep learning) +cs-229-deep-learning + kevingo (translation) + TobyOoO (review) + +cs-229-linear-algebra + kevingo (translation) + Miyaya (review) + +cs-229-probability + kevingo (translation) + johnnychhsu (review) + +cs-229-supervised-learning + kevingo (translation) + accelsao (review) + +cs-229-unsupervised-learning + kevingo (translation) + imironhead (review) + johnnychhsu (review) + +cs-229-machine-learning-tips-and-tricks + kevingo (translation) + kentropy (review) +cs-230-convolutional-neural-networks + kentropy (translation) + kevingo (review) From 040304be3baca4162a04dcc9bc3b4d884623333b Mon Sep 17 00:00:00 2001 From: Shervine Amidi Date: Sat, 14 Nov 2020 17:04:37 -0800 Subject: [PATCH 12/12] Mark ko/cs-229-deep-learning translation as done --- README.md | 130 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 92 insertions(+), 38 deletions(-) diff --git a/README.md b/README.md index 8975eae86..affbe8550 100644 --- a/README.md +++ b/README.md @@ -1,55 +1,109 @@ # Translation of VIP Cheatsheets ## Goal -This repository aims at collaboratively translating our [Machine Learning cheatsheets](https://github.com/afshinea/stanford-cs-229-machine-learning) into a ton of languages, so that this content can be enjoyed by anyone from any part of the world! - -## Progression -|Cheatsheet topic|Español|فارسی|Français|日本語|Português|简体中文| -|:---|:---:|:---:|:---:|:---:|:---:|:---:| -|Deep learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)| -|Supervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/52)| -|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)| -|ML tips and tricks|done|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/57)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)| -|Probabilities and Statistics|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)| -|Linear algebra|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)| - -|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский| -|:---|:---:|:---:|:---:|:---:|:---:| -|Deep learning|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)| -|Supervised learning|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)| -|Unsupervised learning|not started|not started|not started|not started|not started| -|ML tips and tricks|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/39)|not started| -|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/26)|not started|not started|not started|not started| -|Linear algebra|not started|not started|not started|done|not started| - - -|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어| -|:---|:---:|:---:|:---:|:---:|:---:| -|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started| -|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|not started| -|Unsupervised learning|not started|not started|not started|not started|not started| -|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|not started| -|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/64)|not started| -|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|not started| - -If your favorite language is missing, please feel free to add it! +This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning), [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) and [Artificial Intelligence](https://github.com/afshinea/stanford-cs-221-artificial-intelligence) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world! ## Contribution guidelines -Please first check for [existing pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) before submitting yours. Also, please propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process. +The translation process of each cheatsheet contains two steps: +- the **translation** step, where contributors follow a template of items to translate, +- the **review** step, where contributors go through each expression translated by their peers, on top of which they add their suggestions and remarks. + +### Translators +0. Check for [existing pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) to see which cheatsheet is yet to be translated. 1. Fork the repository. -2. Go to the folder associated to the language of your choice (e.g. `es/` for Spanish, `zh/` for Mandarin Chinese). If it is not created yet, copy `template/` into a language folder with a naming that follows the [ISO 639-1 notation](https://www.loc.gov/standards/iso639-2/php/code_list.php). +2. Copy [the template](https://github.com/shervinea/cheatsheet-translation/tree/master/template) of the cheatsheet you wish to translate into the language folder with a naming that follows the [ISO 639-1 notation](https://www.loc.gov/standards/iso639-2/php/code_list.php) (e.g. `[es]` for Spanish, `[zh]` for Mandarin Chinese). -3. Translate anything you want by keeping the following template: +3. Translate sentences by keeping the following structure: > 34. **English blabla** > > ⟶ Translated blabla 4. Commit the changes to your forked repository. -5. Submit a [pull request](https://help.github.com/articles/creating-a-pull-request/) and call it `[code of language name] Topic name`. For example, a translation in Spanish of the deep learning cheatsheet will be called `[es] Deep learning`. +5. Submit a [pull request](https://help.github.com/articles/creating-a-pull-request/) and call it `[language code] file-name`. For example, the PR related to the translation in Spanish of the `template/cs-229-deep-learning.md` cheatsheet will be entitled `[es] cs-229-deep-learning`. + +### Reviewers +1. Go to the [list of pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) and filter them by your native language. + +2. Locate pull requests where help is needed. Those contain the tag `reviewer wanted`. + +3. Review the content line per line and add comments and suggestions when necessary. + +### Important note +Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process. + +## Progression +### CS 221 (Artificial Intelligence) +| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|[Logic models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-logic-models.md)| +|:---|:---:|:---:|:---:|:---:| +|**Deutsch**|not started|not started|not started|not started| +|**Español**|not started|not started|not started|not started| +|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started| +|**Français**|done|done|done|done| +|**עִבְרִית**|not started|not started|not started|not started| +|**Italiano**|not started|not started|not started|not started| +|**日本語**|not started|not started|not started|not started| +|**한국어**|not started|not started|not started|not started| +|**Português**|not started|not started|not started|not started| +|**Türkçe**|done|done|done|done| +|**Tiếng Việt**|not started|not started|not started|done| +|**简体中文**|not started|not started|not started|not started| +|**繁體中文**|not started|not started|not started|not started| + +### CS 229 (Machine Learning) +| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)| +|:---|:---:|:---:|:---:|:---:|:---:|:---:| +|**العَرَبِيَّة**|done|done|done|done|done|done| +|**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)| +|**Deutsch**|done|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)| +|**Ελληνικά**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/209)|not started|not started| +|**Español**|done|done|done|done|done|done| +|**Eesti**|not started|not started|not started|done|not started|not started| +|**فارسی**|done|done|done|done|done|done| +|**Suomi**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started|not started|not started| +|**Français**|done|done|done|done|done|done| +|**עִבְרִית**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|not started|not started|not started|not started|not started| +|**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started| +|**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)| +|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|done|done| +|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done| +|**日本語**|done|done|done|done|done|done| +|**한국어**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done| +|**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/208)|not started| +|**Português**|done|done|done|done|done|done| +|**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started| +|**Türkçe**|done|done|done|done|done|done| +|**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)| +|**Tiếng Việt**|done|done|done|done|done|done| +|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)| +|**繁體中文**|done|done|done|done|done|done| -Submissions will have to be reviewed by a fellow native speaker before being accepted. +### CS 230 (Deep Learning) +| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)| +|:---|:---:|:---:|:---:| +|**العَرَبِيَّة**|not started|not started|not started| +|**Català**|not started|not started|not started| +|**Deutsch**|not started|not started|not started| +|**Español**|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/210)| +|**فارسی**|done|done|done| +|**Suomi**|not started|not started|not started| +|**Français**|done|done|done| +|**עִבְרִית**|not started|not started|not started| +|**हिन्दी**|not started|not started|not started| +|**Magyar**|not started|not started|not started| +|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)| +|**Italiano**|not started|not started|not started| +|**日本語**|done|done|done| +|**한국어**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)| +|**Polski**|not started|not started|not started| +|**Português**|done|not started|not started| +|**Русский**|not started|not started|not started| +|**Türkçe**|done|done|done| +|**Українська**|not started|not started|not started| +|**Tiếng Việt**|done|done|done| +|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/212)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started| +|**繁體中文**|done|not started|not started| ## Acknowledgements -Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching/cs-229.html). +Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).