Gleam Lab · Tag Archive

Tag: sklearn

13 articles collected by topic for tutorials, cases, engineering practice, and research notes.

sklearn KMeans Key Attributes & Evaluation: cluster_cluster_centers_、inertia_、metrics

Scenario: Using sklearn for KMeans clustering, want to explain centroids/loss and use metrics for K selection.

11/9/2024

Big Data 216 - KMeans n_clusters Selection

KMeans nclusters selection method: calculate silhouettescore and silhouette_samples on candidate cluster numbers (e.g.

11/9/2024

Big Data 213 - Python Hand-Written K-Means Clustering

Scenario: Hand-write K-Means using NumPy/Pandas, perform 3-class clustering on Iris.txt and output centroids with clustering results.

11/8/2024

Big Data 214 - K-Means Clustering Practice: Self-Implemented Algorithm vs sklearn

K-Means clustering provides an engineering workflow that is 'verifiable, reproducible, and debuggable': first use 2D testSet dataset for algorithm verification.

11/8/2024

Big Data 211 - Scikit-Learn Logistic Regression Implementation

When using Logistic Regression in Scikit-Learn, max_iter controls maximum iterations affecting model convergence speed and accuracy.

11/7/2024

Big Data 212 - K-Means Clustering Guide

K-Means clustering algorithm, comparing supervised vs unsupervised learning (whether labels Y are needed).

11/7/2024

Big Data 210 - How to Implement Logistic Regression in Scikit-Learn and Regularization Detailed (L1 and L2)

As C gradually increases, regularization strength gets smaller, model performance on training and test shows upward trend, until around C=0.8.

11/6/2024

Big Data 207 - How to Handle Multicollinearity

When using scikit-learn for linear regression, how to handle multicollinearity in least squares method.

11/5/2024

Big Data 203 - sklearn Decision Tree Pruning Parameters

Common parameters for decision tree pruning (pre-pruning) in engineering: maxdepth, minsamplesleaf, minsamplessplit, maxfeatures, minimpuritydecrease.

11/2/2024

Big Data 204 - Confusion Matrix to ROC: Imbalanced Binary Classification Metrics in sklearn

Confusion matrix (TP, FP, FN, TN) with unified metrics: Accuracy, Precision, Recall (Sensitivity), F1 Measure, ROC curve, AUC value, and practical business interpretation...

11/2/2024

Big Data 201 - Decision Tree from Split to Pruning

Decision tree is a tree-structured supervised learning model, commonly used for classification and regression tasks.

11/1/2024

Big Data 202 - sklearn Decision Tree Practice: criterion, Graphviz Visualization & Pruning

Complete flow of DecisionTreeClassifier on load_wine dataset from data splitting, model evaluation to decision tree visualization (2026 version).

11/1/2024

Big Data 196 - scikit-learn KNN Practice: KNeighborsClassifier, kneighbors & Learning Curves

Since being initiated in 2007 by David Cournapeau, scikit-learn (sklearn) has become one of the most important machine learning libraries in the Python ecosystem.

10/29/2024