Big Data 210 - How to Implement Logistic Regression in Scikit-Learn and Regularization Detailed (L1 and L2)

TL;DR

Scenario: Implement Logistic Regression using Scikit-Learn and adjust regularization (L1 and L2) to optimize model.
Conclusion: L1 regularization makes features sparse, L2 doesn’t; both perform similarly on different datasets and C values.
Output: Master how to choose regularization method and how to adjust model performance based on C value.

Version Matrix

Version	Verified	Description
0.24.x	✅	Scikit-Learn supports L1 and L2 regularization
0.22.x and earlier	✅	Suitable for most old version Logistic Regression functions
Latest version (2026)	✅	Includes new optimizers and improved regularization handling

Logistic Regression Scikit-Learn Implementation

Parameter Detailed

class sklearn.linear_model.LogisticRegression(
    penalty='l2',
    dual=False,
    tol=0.0001,
    C=1.0,
    fit_intercept=True,
    intercept_scaling=1,
    class_weight=None,
    random_state=None,
    solver='warn',
    max_iter=100,
    multi_class='warn',
    verbose=0,
    warm_start=False,
    n_jobs=None
)

penalty

Regularization parameter, LogisticRegression has regularization by default. penalty parameter can choose l1 and l2, corresponding to L1 regularization and L2 regularization, default is L2.

During parameter tuning if main purpose is just solving overfitting, penalty L2 regularization enough. But if still overfitting after choosing L2 regularization (prediction effect poor), can consider L1 regularization. Additionally, if model has very many features, want some unimportant feature coefficients become zero to make model sparse, can also use L1 regularization.

penalty parameter choice affects solver selection for loss function optimization algorithm. If L2 regularization, can choose all 4 available algorithms (newton-cg, lbfgs, libnear, sag). But if penalty is L1 regularization, can only use liblinear.

This is because L1 regularization loss function is not continuously differentiable, while (newton-cg, lbfgs, sag) all require first or second order continuous derivatives. liblinear doesn’t have this dependency.

Both regularization under C values can be adjusted through learning curves.

Build two Logistic Regressions, difference between L1 and L2 regularization clear at a glance:

from sklearn.linear_model import LogisticRegression as LR
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Standardize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Logistic Regression with L1 regularization
lrl1 = LR(penalty="l1", solver="liblinear", C=0.5, max_iter=1000)

# Logistic Regression with L2 regularization
lrl2 = LR(penalty="l2", solver="liblinear", C=0.5, max_iter=1000)

# Train L1 model
lrl1 = lrl1.fit(X_scaled, y)

# Print L1 coefficients
print(lrl1.coef_)

# Count non-zero coefficients
print((lrl1.coef_ != 0).sum(axis=1))

# Train L2 model
lrl2 = lrl2.fit(X_scaled, y)

# Print L2 coefficients
print(lrl2.coef_)

Can see when choosing L1 regularization, many feature parameters set to 0, these features won’t appear in model when actually building. L2 regularization gives parameters to all features.

Which regularization effect better? Or similar?

l1 = []
l2 = []
l1test = []
l2test = []
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,y,test_size=0.3,random_state=420)
for i in np.linspace(0.05,1,19):
    lrl1 = LR(penalty="l1",solver="liblinear",C=i,max_iter=1000)
    lrl2 = LR(penalty="l2",solver="liblinear",C=i,max_iter=1000)
    lrl1 = lrl1.fit(Xtrain,Ytrain)
    l1.append(accuracy_score(lrl1.predict(Xtrain),Ytrain))
    l1test.append(accuracy_score(lrl1.predict(Xtest),Ytest))
    lrl2 = lrl2.fit(Xtrain,Ytrain)
    l2.append(accuracy_score(lrl2.predict(Xtrain),Ytrain))
    l2test.append(accuracy_score(lrl2.predict(Xtest),Ytest))
graph = [l1,l2,l1test,l2test]
color = ["green","black","lightgreen","gray"]
label = ["L1","L2","L1test","L2test"]
plt.figure(figsize=(6,6))
for i in range(len(graph)):
    plt.plot(np.linspace(0.05,1,19),graph[i],color[i],label=label[i])
plt.legend(loc=4)
plt.show()

Visible, on our breast cancer dataset, difference between two regularizations not large. But as C gradually increases, regularization strength gets smaller, model performance on training and test shows upward trend, until around C=0.8, training performance still rising but performance on unknown data starts declining, this is overfitting. Can think C=0.8 is better.

In actual use, basically default to L2 regularization, if feel not good then try L1.

solver

solver parameter determines optimization method for Logistic Regression loss function, 4 algorithms available:

liblinear: Uses open source liblinear library implementation, internally uses coordinate descent to iteratively optimize loss function
lbfgs: Quasi-Newton method, uses second derivative matrix of loss function (Hessian matrix) to iteratively optimize loss function
newton-cg: Also Newton method family, uses second derivative matrix of loss function (Hessian matrix) to iteratively optimize loss function
sag: Stochastic Average Gradient Descent, variant of gradient descent method, difference from ordinary gradient descent is each iteration only uses subset of samples to calculate gradient, suitable when sample data is large

From above, newton-cg, lbfgs, sag require first or second order continuous derivatives of loss function, therefore cannot be used for L1 regularization without continuous derivatives, can only use L2 regularization. liblinear supports both L1 and L2. Meanwhile sag only uses subset samples for gradient iteration, so don’t choose when sample size small. If sample very large, like >100k, sag is first choice.

But sag cannot use L1 regularization, so when have large samples and need L1 need make choice: either sample to reduce sample size, or return to L2 regularization.

At this point everyone may think, since newton-cg, lbfgs, sag have so many restrictions, if not large sample, choose liblinear not okay? Because liblinear also has weakness. We know Logistic Regression binary and multi-class. For multi-class Logistic Regression common one-vs-rest (OvR) and many-vs-many (MvM), MvM usually more accurate than OvR. liblinear only supports OvR, doesn’t support MvM. This means when need relatively accurate multi-class Logistic Regression cannot choose liblinear. Also means need relatively accurate multi-class regression cannot use L1 regularization.

Error Quick Reference

Symptom	Root Cause	Fix
Cannot choose L1 regularization method	Solver used doesn’t support L1	Check if solver set to liblinear, only this supports L1
Model overfitting	C value too large, insufficient regularization	Adjust C value, gradually decrease C value, use learning curve to check
SAG training slow and doesn’t support L1	SAG doesn’t support L1	Check solver settings, change to liblinear or choose L2