Tag: Sklearn
13 articles
sklearn KMeans Key Attributes & Evaluation: cluster_cluster_centers_、inertia_、metrics
Scenario: Using sklearn for KMeans clustering, want to explain centroids/loss and use metrics for K selection.
Big Data 216 - KMeans n_clusters Selection
KMeans nclusters selection method: calculate silhouettescore and silhouette_samples on candidate cluster numbers (e.g.
Big Data 213 - Python Hand-Written K-Means Clustering
Scenario: Hand-write K-Means using NumPy/Pandas, perform 3-class clustering on Iris.txt and output centroids with clustering results.
Big Data 214 - K-Means Clustering Practice: Self-Implemented Algorithm vs sklearn
K-Means clustering provides an engineering workflow that is 'verifiable, reproducible, and debuggable': first use 2D testSet dataset for algorithm verification.
Big Data 211 - Scikit-Learn Logistic Regression Implementation
When using Logistic Regression in Scikit-Learn, max_iter controls maximum iterations affecting model convergence speed and accuracy.
Big Data 212 - K-Means Clustering Guide
K-Means clustering algorithm, comparing supervised vs unsupervised learning (whether labels Y are needed).
Big Data 210 - How to Implement Logistic Regression in Scikit-Learn and Regularization Detailed (L1 and L2)
As C gradually increases, regularization strength gets smaller, model performance on training and test shows upward trend, until around C=0.8.
Big Data 207 - How to Handle Multicollinearity
When using scikit-learn for linear regression, how to handle multicollinearity in least squares method.
Big Data 203 - sklearn Decision Tree Pruning Parameters
Common parameters for decision tree pruning (pre-pruning) in engineering: maxdepth, minsamplesleaf, minsamplessplit, maxfeatures, minimpuritydecrease.
Big Data 204 - Confusion Matrix to ROC: Imbalanced Binary Classification Metrics in sklearn
Confusion matrix (TP, FP, FN, TN) with unified metrics: Accuracy, Precision, Recall (Sensitivity), F1 Measure, ROC curve, AUC value, and practical business interpretation...
Big Data 201 - Decision Tree from Split to Pruning
Decision tree is a tree-structured supervised learning model, commonly used for classification and regression tasks.
Big Data 202 - sklearn Decision Tree Practice: criterion, Graphviz Visualization & Pruning
Complete flow of DecisionTreeClassifier on load_wine dataset from data splitting, model evaluation to decision tree visualization (2026 version).
Big Data 196 - scikit-learn KNN Practice: KNeighborsClassifier, kneighbors & Learning Curves
Since being initiated in 2007 by David Cournapeau, scikit-learn (sklearn) has become one of the most important machine learning libraries in the Python ecosystem.