Big Data 211 - Scikit-Learn Logistic Regression Implementation

Continuing from remaining content in previous section.

max_iter

Mathematical purpose of Logistic Regression is to solve parameter w that optimizes and best fits model: solving w that minimizes loss function J(w). For binary Logistic Regression, multiple methods can solve parameter w, most common are Gradient Descent, Coordinate Descent, Newton Method (Newton-Raphson method), etc. Gradient Descent is most famous. Each method involves complex mathematical principles, but calculation tasks actually similar.

Look at max_iter learning curve for dataset:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression as LR
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Define lists to store results
l2 = []
l2test = []

# Split training and test set
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.3, random_state=420)

# Train with different max iterations
for i in np.arange(1, 201, 10):
    # Use L2 regularization, change max iterations
    lrl2 = LR(penalty="l2", solver="liblinear", C=0.9, max_iter=i)
    lrl2 = lrl2.fit(Xtrain, Ytrain)
    l2.append(accuracy_score(lrl2.predict(Xtrain), Ytrain))
    l2test.append(accuracy_score(lrl2.predict(Xtest), Ytest))

# Draw training and test set results as chart
graph = [l2, l2test]
color = ["black", "gray"]
label = ["L2", "L2test"]

plt.figure(figsize=(20, 5))

# Draw chart
for i in range(len(graph)):
    plt.plot(np.arange(1, 201, 10), graph[i], color[i], label=label[i])

plt.legend(loc=4)  # Legend position bottom right
plt.xticks(np.arange(1, 201, 10))  # Set x-axis ticks
plt.xlabel('Max Iterations')
plt.ylabel('Accuracy')
plt.title('Accuracy vs Max Iterations')
plt.show()

# Print actual iterations in this solution
lr = LR(penalty="l2", solver="liblinear", C=0.9, max_iter=300).fit(Xtrain, Ytrain)
print("Number of iterations:", lr.n_iter_)

When steps limited by max_iter are finished but Logistic Regression still hasn’t found minimum of loss function, parameters w not converged, sklearn will pop warning. Although writing looks different, actually same meaning, reminding: parameters not converged, please increase number in max_iter.

But we don’t have to listen to sklearn. Large max_iter means small steps, model runs slower. In gradient descent pursue minimum of loss function, but may also mean model overfits (performs too well on training, not on test). So if under max_iter red situation, model training and prediction already good, don’t need increase max_iter. After all everything based on model prediction effect, as long as model prediction effect good, runs fast, everything good.

Classification Method Selection Parameter

multi_class parameter determines our classification method choice, has ovr and multinomial two values, default is ovr.

ovr is one-vs-rest (OvR) mentioned earlier, multinomial is many-vs-many (MvM). For binary Logistic Regression, ovr and multinomial have no difference, difference mainly on multi-class Logistic Regression.

OvR thought is simple: no matter how many class Logistic Regression, can treat as binary Logistic Regression. Specific approach: for K class classification decision, treat all K class samples as positive, except K class samples all as negative, then do binary Logistic Regression, get K class classification model. Get other class classification models similarly.

MvM relatively complex, here take MvM special case one-vs-one (OvO) for explanation. If model has T classes, select two classes each time from all T class samples, record as T1 and T2, put all samples with output T1 and T2 together, treat T1 as positive, T2 as negative, do binary Logistic Regression, get model parameters, need total T(T-1)/2 classifications.

From above description OvR relatively simple, but classification effect relatively poor (this refers to most sample distribution, some sample distributions OvR may be better). MvM relatively accurate classification, but classification speed not as fast as OvR.

If select OvR, can choose all 4 loss function optimization methods liblinear, newton-cg, bfgs, sag. But if select multinomial:

from sklearn.linear_model import LogisticRegression as LR
from sklearn.datasets import load_iris
iris = load_iris()
for multi_class in ('multinomial', 'ovr'):
    clf = LR(solver='sag', max_iter=100, random_state=42,
    multi_class=multi_class).fit(iris.data,iris.target)
    # Print training score under two multi_class modes
    print("training score : %.3f (%s)" % (clf.score(iris.data,
    iris.target),multi_class))

Execution result: multinomial training score 0.980, ovr training score 0.960.

Error Quick Reference

SymptomRoot CauseFix
Warning: max_iter not convergedIterations insufficient, model not convergedIncrease max_iter, balance training speed and overfitting
Model overfitting (high training accuracy, low test)Too many iterations, model learns too many detailsAdjust max_iter to reasonable range, avoid excessive iteration
Classification effect not idealmulti_class parameter selection inappropriateFor multi-class, use multinomial or adjust OvR settings