The least squares method is a procedure to find the best function match of the data by minimizing the square sum of the error. The least square method can be used to easily obtain unknown data, and minimize the sum of squares of errors between the obtained data and the actual data. The least squares method can also be used for curve fitting. Some other optimization problems can also be expressed by the least square method.
In 1801, Italian astronomer Giuseppe Piazzi discovered the first asteroid, Ceres. After 40 days of tracking, Piazzi lost the position of Ceres as Ceres moved behind the sun. Then scientists all over the world began to search for Ceres using Piazzi's observational data, but most of them had no results. At the age of 24, Gauss also calculated the orbit of Ceres. The Austrian astronomer Heinrich Olbers rediscovered Ceres based on the orbit calculated by Gauss.
The method of least squares used by Gauss was published in his book "Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium meaning" in 1809. The French scientist Legendre independently invented the least squares method in 1806, but it was unknown to the world. Legendre had a dispute with Gauss who first established the least squares method.
In 1829, Gauss provided proof that the optimization effect of the least squares method was stronger than other methods. This is called Gauss-Markov theorem.
Linear regression tries to learn:
To make:
Among them,
How to learn
图4-3 Figure 4-3 Evaluation principle of the mean square error function
In Figure 4-3, the circular points are the samples, and the straight line is the current fitting result. As shown in the figure on the left, we need to calculate the vertical distance from the sample points to the straight line. We need to calculate the (垂足) based on the slope of the straight line and then calculate the distance. This calculation is very slow; In engineering, we usually use the method on the right, that is, the vertical distance between the sample point and the straight line, becuase this calculation is very convenient, just a subtraction.
Suppose we calculate that the preliminary result is shown by the dashed line, is this straight line appropriate? We can calculate the distance from each point in the figure to this straight line, and add up the values of these distances (all positive numbers, there is no problem of cancelling each other) to become an error.
Because the points in the above figure are not on a straight line, there cannot be a straight line that can pass through them at the same time. Therefore, we can only think of ways to continuously change the angle and position of the red straight line to minimize the overall error (it can never be
If we want to minimize the value of the error, we can take the derivative of
The derivation process is as follows:
Let formula 4 be
Let formula 6 be
From formula 7, we can get (Suppose there are
Divide both sides by
Among them:
Substitute formula 10 into formula 5:
Substitute formula 10 into formula 11:
Multiply both the numerator and denominator by
In fact, there are many variants of formula 13, and you will see different versions in different articles, and you are often confused. For example, the following two formulas are also correct solutions:
For the above two formulas, if you substitute formula 10 into it, you should get the same answer as formula 13, but you need some calculation skills. For example, many people don’t know this magic formula:
Let's use Python
to implement the above calculation process:
# formula 15
def method1(X,Y,m):
x_mean = X.mean()
p = sum(Y*(X-x_mean))
q = sum(X*X) - sum(X)*sum(X)/m
w = p/q
return w
# formula 16
def method2(X,Y,m):
x_mean = X.mean()
y_mean = Y.mean()
p = sum(X*(Y-y_mean))
q = sum(X*X) - x_mean*sum(X)
w = p/q
return w
# formula 13
def method3(X,Y,m):
p = m*sum(X*Y) - sum(X)*sum(Y)
q = m*sum(X*X) - sum(X)*sum(X)
w = p/q
return w
With the help of the function library, we don't need to manually calculate the basic functions like sum()
and mean()
.
# formula 14
def calculate_b_1(X,Y,w,m):
b = sum(Y-w*X)/m
return b
# formula 9
def calculate_b_2(X,Y,w):
b = Y.mean() - w * X.mean()
return b
Using the above methods, the final results are consistent, which can play a role of cross-validation:
w1=2.056827, b1=2.965434
w2=2.056827, b2=2.965434
w3=2.056827, b3=2.965434
ch04, Level1