Tag: sklearn

13 articles

sklearn KMeans Key Attributes & Evaluation: cluster_cente...

scikit-learn (sklearn) KMeans (2026) explains three most commonly used objects: cluster_centers_ (cluster centers), inertia_ (Within-Cluster Sum of Squares),...

11/9/2024

KMeans n_clusters Selection: Silhouette Score Practice + ...

KMeans n_clusters selection method: calculate silhouette_score and silhouette_samples on candidate cluster numbers (e.g., 2/4/6/8), determine optimal k by...

11/9/2024

Python Hand-Written K-Means Clustering on Iris Dataset: F...

Python K-Means clustering implementation: using NumPy broadcasting to compute squared Euclidean distance (distEclud), initializing centroids via uniform...

11/8/2024

K-Means Clustering Practice: Self-Implemented Algorithm V...

K-Means clustering provides an engineering workflow that is 'verifiable, reproducible, and debuggable': first use 2D testSet dataset for algorithm verification...

11/8/2024

Scikit-Learn Logistic Regression Implementation: max_iter...

When using Logistic Regression in Scikit-Learn, max_iter controls maximum iterations affecting model convergence speed and accuracy. If training doesn't...

11/7/2024

K-Means Clustering Guide: From Unsupervised Concepts to I...

K-Means clustering algorithm, comparing supervised vs unsupervised learning (whether labels Y are needed), with engineering applications in customer...

11/7/2024

How to Implement Logistic Regression in Scikit-Learn and ...

As C gradually increases, regularization strength gets smaller, model performance on training and test shows upward trend, until around C=0.8, training...

11/6/2024

How to Handle Multicollinearity: Common Problems & Soluti...

When using scikit-learn for linear regression, how to handle multicollinearity in least squares method. Multicollinearity may cause instability in regression...

11/5/2024

sklearn Decision Tree Pruning Parameters: max_depth/min_s...

Common parameters for decision tree pruning (pre-pruning) in engineering: max_depth, min_samples_leaf, min_samples_split, max_features, min_impurity_decrease...

11/2/2024

Confusion Matrix to ROC: Complete Review of Imbalanced Bi...

Confusion matrix (TP, FP, FN, TN) with unified metrics: Accuracy, Precision, Recall (Sensitivity), F1 Measure, ROC curve, AUC value, and practical business interpretation for classification models.

11/2/2024

Decision Tree from Split to Pruning: Information Gain/Gai...

Complete chain from 'split' to 'pruning', explain why usually uses greedy algorithm to form 'local optimum', and differences in splitting criteria between...

11/1/2024

sklearn Decision Tree Practice: criterion, Graphviz Visua...

Complete flow of DecisionTreeClassifier on load_wine dataset from data splitting, model evaluation to decision tree visualization (2026 version). Focus on...

11/1/2024

scikit-learn KNN Practice: KNeighborsClassifier, kneighbo...

From unified API (fit/predict/transform/score) to kneighbors to find K nearest neighbors of test samples, then using learning curve/parameter curve to select...

10/29/2024