Spark MLlib Linear Regression: Scenarios, Loss Function a...

Linear Regression Scenarios

House price prediction
Sales volume prediction
Loan amount prediction

Linear Regression Definition

Linear Regression is an analytical method that uses regression equations (functions) to model the relationship between one or more independent variables and a dependent variable.

Characteristics: when there is only one independent variable it is called univariate regression; when there is more than one independent variable it is called multiple regression.

Linear regression has two main model types: linear relationships and non-linear relationships.

Univariate Linear Relationship

A single independent variable with a linear relationship to the dependent variable.

Multivariate Linear Relationship

Multiple independent variables with linear relationships to the dependent variable.

Non-linear Relationship

When the relationship between variables cannot be expressed as a straight line.

Loss and Optimization in Linear Regression

Suppose we have a study score example where the true relationship between variables is:

True relationship: Final score = 0.5 × Regular score + 0.3 × Final exam score

Now suppose we guess a relationship:

Guessed relationship: Predicted final score = 0.45 × Regular score + 0.2 × Final exam score

There will be some error between the true result and our predicted result.

Since this error exists, how do we measure it?

Loss Function

The total loss function is:

∑(yi - h(xi))²

yi is the true value of the i-th training sample
h(xi) is the predicted value from the feature combination of the i-th training sample
Also known as the least squares method

Optimization Algorithms

How to find the W in the model that minimizes the loss (the goal is to find the W value corresponding to minimum loss).

Two optimization algorithms commonly used in linear regression:

Analytical solution (normal equation)
Gradient descent

Analytical Solution (Normal Equation):

Understanding: X is the feature matrix, Y is the target matrix, directly solving for the best result.

Disadvantage: when there are too many complex features, the solution is too slow and may not converge.

Gradient Descent:

The basic idea of gradient descent can be compared to descending a mountain. A person is trapped on a mountain and needs to get down (e.g., find the lowest point, the valley), but there is heavy fog making visibility very low. Therefore, the path down cannot be determined, and they must use surrounding information to find the way down.

Gradient is an important concept in calculus:

In a single-variable function, the gradient is the derivative of the function, representing the slope of the tangent at a given point
In a multi-variable function, the gradient is a vector with direction; the gradient direction points in the direction of steepest ascent at a given point
In calculus, taking partial derivatives ∂ of a multivariate function with respect to each parameter and writing them as a vector gives the gradient

Meaning of α: α in gradient descent is called the learning rate or step size, meaning we can control how far each step goes through α.

Why multiply the gradient by a negative sign:

Adding a negative sign before the gradient means moving in the opposite direction of the gradient. As mentioned, the gradient direction is the direction of steepest ascent, so the negative gradient direction is the direction of steepest descent — hence the negative sign.

Gradient Descent for Single-Variable Functions

Suppose a single-variable function J(θ) = θ²
Initialize starting point θ0 = 1
Learning rate α = 0.4

After four iterations, we basically reach the minimum of the function.

Gradient Descent for Multi-Variable Functions

Objective function: J(θ) = θ1² + θ2²
Starting point: θ0 = (1, 3)
Learning rate: α = 0.1
Function gradient: J(θ) = < 2θ1, 2θ2 >

After multiple iterations, gradient descent approaches the minimum point (0, 0) of the function.

Comparison: Gradient Descent vs Normal Equation

Feature	Gradient Descent	Normal Equation
Advantage	Applicable to various scenarios	Direct solution
Disadvantage	Requires iteration	Slow when many features