Linear Regression Scenarios
- House price prediction
- Sales forecasting
- Loan amount prediction
Linear Regression Definition
Linear Regression is an analytical method that uses regression equations (functions) to model the relationship between one or more independent variables and a dependent variable.
Characteristics: When there is only one independent variable, it’s called simple regression; when there are multiple independent variables, it’s called multiple regression.
Linear regression includes two main models: linear relationships and non-linear relationships.
Simple Linear Relationship (One Variable)
Example of simple linear relationship.
Multiple Linear Relationship (Multiple Variables)
Example of multiple linear relationship.
Non-linear Relationships
Example of non-linear relationships.
Loss Function and Optimization
Suppose there’s a study score example where the real data follows this relationship:
Real relationship: Final Score = 0.5 × Regular Score + 0.3 × Final Exam Score
Now we hypothesize a relationship:
Predicted Score = 0.45 × Regular Score + 0.2 × Final Exam Score
As you can see, there’s error between the actual result and our prediction.
Since this error exists, how do we measure it?
Loss Function
Total loss function:
∑(yi - h(xi))²
- yi is the true value of the i-th training sample
- h(xi) is the predicted value from the i-th sample’s feature combination
- Also known as Least Squares method
Optimization Algorithms
How to find W in the model to minimize the loss (goal is to find the W value corresponding to minimum loss).
Two commonly used optimization algorithms for linear regression:
- Analytical Solution (Normal Equation)
- Gradient Descent
Normal Equation:
Understanding: X is the feature matrix, Y is the target matrix - solve directly for the best result.
Disadvantage: When features are too complex, solving becomes too slow or impossible.
Gradient Descent (GD):
The basic idea of gradient descent can be compared to the process of going down a mountain. A person is trapped on a mountain and needs to get down to the lowest point (the valley), but there’s thick fog making visibility very low. Therefore, the downhill path cannot be determined - they must use surrounding information to find the path.
Gradient is an important concept in calculus:
- In single-variable functions, the gradient is actually the derivative, representing the slope of the tangent line at a specific point
- In multi-variable functions, the gradient is a vector with direction - the gradient’s direction specifies the fastest upward direction at a given point
- In calculus, taking partial derivatives of parameters in a multi-variable function and writing them as a vector gives the gradient
The meaning of α: In gradient descent, α is called the learning rate or step size, meaning we can control how far each step goes through α.
Why multiply gradient by a negative sign:
Adding a negative sign before the gradient means moving in the opposite direction of the gradient! As mentioned earlier, the gradient’s direction is actually the fastest upward direction at that point, so naturally it’s the negative gradient direction - hence we need to add the negative sign.
Gradient Descent for Single Variable Functions
- Suppose there’s a single-variable function J(θ) = θ²
- Initialize at θ0 = 1
- Learning rate α = 0.4
After four iterations, we essentially reach the function’s minimum.
Gradient Descent for Multi-Variable Functions
- Objective function: J(θ) = θ1² + θ2²
- Starting point: θ0 = (1, 3)
- Learning rate: α = 0.1
- Gradient of the function: J(θ) = <2θ1, 2θ2>
After multiple iterations, gradient descent will approach the function’s minimum point (0, 0).
Comparison: Gradient Descent vs Normal Equation
| Feature | Gradient Descent | Normal Equation |
|---|---|---|
| Advantages | Works for various scenarios | Direct solution |
| Disadvantages | Requires iteration | Slow with many features |