Blog

Technical exploration and thoughts · 655 articles

All big-datajavaaiartificial-intelligenceprogrammer-lifemachine-learningmysqldata-engineeringbackenddistributeddata-warehouseflinkarchitecturepythonroboticssparkhivellmdistributed-systemkafkadatabasescalaembodied-aihdfsdeep-learningspringmessage-queuelangchainsystem-architecturemybatisperformance-optimizationelasticsearchmongodbhealthredisspring-bootrabbitmqmqhadoopelkflumestream-processingtransactionmessagingrpctutorialsklearncachingcachedubbojava-rabbitmqclickhousehbasekylinneo4jmicroservicessqlindextomcatprogrammermultimodalzookeeperdruidcanalmllibormiocnutritionlarge-modelrobot-armteslaindie-devnginxdataxshardingshardingspherefastdfsrocketmqtime-managementapplicationscareer-growthdockeretlguavajava-rocketmqoptimizationlearningquantizationdeploymentkudulogstashdecision-treesqoopairflowrealtime-warehousemycatstorage-engineconsistencyfat-lossgptproduct-managercoffeebusiness-analysisautomationalgorithmcareer-and-growthmiddlewarecomputer-visionautonomous-drivingfsdqwenmapreducecrudmonitoringdatabase-shardingdistributed-transactionconcurrencytransaction-pitfallsgraph-databasememcachednettyinnodbsalarycareer-developmentcold-showerrunningefficiencyluckinindustrymedicalindustriallfplfp-batterybatteryevfitnesscareer-personal-growthocrdeepseekdeepseek-ocromnicloud-nativeyarndatastreamjdbcolapknnlinear-regressionnumpyzipper-tablegriffindevopskubernetesdata-mappingdesign-patternshigh-availabilityread-write-separationsharding-jdbcsagasecurityreplica-setcqlsource-code-analysisevcacheservletaopload-balancinghandwrittenniomindfulnessmeditationexercisereinforcement-learningagentconflictevaluationmoney-managementconsumptionsavingssocial-mediadatingmemoryprice-warcottiptqqatqloraqwen2.5-vlmultivitamincalciumevolutiontechnologyindustrial-robotagriculturehardwaresimulationroslarge-language-modeldegradationslamvisual-inspectionprogramming-languagelinuxwindowraftkibanaaggregationregularizationlogistic-regressionprometheusexporteratlasstate-managementmavenacidannotation-developmentmaster-slave-replicationflexible-transactionxacap2pc3pcbsonexplainb+treeslow-queryauthenticationclusterossaliyunsource-codeasyncnetflixjmspaxosrmiengineeringphysiologyhot-showerpractical-guidemuscle-buildingtransformertensorflowreportstechnical-sharingproductentrepreneurshipmethodologyteam-collaborationconflict-resolutioncollaborationgtdtoolsusage-timehealth-managementchina-usculturemarriagepartnercoffee-beverage-trendhomemade-coffeetasteperformancefine-tuningblip-2minigpt-4llavaalibabavitaminsfish-oilvitamin-cironfolatechronic-diseasesupplementstraditional-chinese-medicinewestern-medicineintegrated-medicinedevelopment-historytech-evolutionlakehousedata-meshserverlesstalenttech-selectionhistoryunimatehydraulic-driveai-collaborationcategoriesservice-robothumanoid-robotlogisticscareerskillstrendsservicescaracobotmotorreducersensorplcmpccontroltrajectory-planningvisioncore-technologyperceptiondecision-makinghomedatamarketchallengescommercializationfuture-trendsmeta-learninglifestylenmc-batterybody-fat-percentagebody-shapingmuscle-gainstrength-trainingbody-fatmetabolismsympathetic-nerveparasympathetic-nerveautonomic-nervous-systemhrvtesting-platformapi-integrationautomotive3dmodel-yopen-sourceimitation-learningvisual-algorithmsresearchjava-21kotlingolangrustjavascriptnodek8sgeminicepsourcesinkdatasetmergetreeik-analyzerdslterm-queryfilterinverted-indexnrtgrokfilebeattezdata-miningcross-validationnormalizationevaluation-metricsridge-regressionlassogradient-descentgrafanavisualizationodsscddimension-tabledwddwsadsrealtimememory-managementparallelismharborcontaineresp32home-assistantjenkinsgitlabcicdessaywebsiteastrofrontendxml-mappingdynamic-sqlsqlsessionhigh-concurrencymhafailoverdistributed-primary-keyscalingbinding-tablessql-optimizationbinding-tabletccseatadata-maskingdistributed-databasesharding-proxysharding-strategye-r-shardingconfiguration-filetransaction-isolation-levelschema.xmlpropagationdeclarative-transactionprogrammatic-transactiontransactionalplugindatabase-operationsnosqljsonpipelinepaginationwriteconcernpagehelpergeneric-mapperb-treeuse-casesselection-guidetemplaterepositorywiredtigerinmemorycontainerizationdata-modelingembeddedreferenceoplogelectionpermissionssharded-clustergraph-theoryeuler-pathproxy-patternembedded-databasebackupaccess-controldynamic-proxycloud-storagelruconcurrenthashmapoomdistributed-cachespymemcachedactivemqblockingqueuemessage-storagequeue-indexerlanghandwritten-frameworkjdkreverse-proxyprocessconfigurationclass-loadingssljvmioheartbeat-detectionspiroutingstorage-structureundoredothread-modeltablespacebinlogreplicationclustered-indexlockmvccsortingpipofflinepandasvoice

How to Handle Multicollinearity: Common Problems & Soluti...

When using scikit-learn for linear regression, how to handle multicollinearity in least squares method. Multicollinearity may cause instability in regression...

Ridge Regression and Lasso Regression: Differences, Appli...

Ridge Regression and Lasso Regression are two commonly used linear regression regularization methods for solving overfitting and multicollinearity in machine...

Linear Regression Machine Learning Perspective: Matrix Re...

Linear Regression core chain: unify prediction function y=Xw in matrix form, treat parameter vector w as only unknown; use loss function to characterize...

NumPy Matrix Multiplication Hand-written Multivariate Lin...

pandas DataFrame and NumPy matrix multiplication hand-written multivariate linear regression (linear regression implementation). Core idea is to form normal...

sklearn Decision Tree Pruning Parameters: max_depth/min_s...

Common parameters for decision tree pruning (pre-pruning) in engineering: max_depth, min_samples_leaf, min_samples_split, max_features, min_impurity_decrease...

Confusion Matrix to ROC: Complete Review of Imbalanced Bi...

Confusion matrix (TP, FP, FN, TN) with unified metrics: Accuracy, Precision, Recall (Sensitivity), F1 Measure, ROC curve, AUC value, and practical business interpretation for classification models.

Spark Standalone Mode: Architecture & Performance Tuning

Comprehensive explanation of Spark Standalone cluster four core components, application submission flow, SparkContext internal architecture, Shuffle evolution history and RDD optimization strategies.

SparkSQL Introduction: SQL & Distributed Computing Fusion

Systematic introduction to SparkSQL evolution history, core abstractions DataFrame/Dataset, Catalyst optimizer principle, and practical usage of multi-data source integration with Hive/HDFS.

Decision Tree from Split to Pruning: Information Gain/Gai...

Complete chain from 'split' to 'pruning', explain why usually uses greedy algorithm to form 'local optimum', and differences in splitting criteria between...

sklearn Decision Tree Practice: criterion, Graphviz Visua...

Complete flow of DecisionTreeClassifier on load_wine dataset from data splitting, model evaluation to decision tree visualization (2026 version). Focus on...

Decision Tree Model Detailed: Node Structure, Conditional...

Decision Tree model systematic overview for classification tasks: three types of nodes (root/internal/leaf), recursive split flow from root to leaf, and...

Decision Tree Information Gain Detailed: Information Entr...

Decision tree information gain (Information Gain) explained, first using information entropy (Entropy) to explain impurity, then explaining why when splitting...

K-Fold Cross-Validation Practice: sklearn Look at Mean/Va...

Random train/test split causes evaluation metrics to be unstable, and gives engineering solution: K-Fold Cross Validation. Through sklearn's cross_val_score to...

KNN Must Normalize First: Min-Max Correct Method, Data Le...

In scikit-learn machine learning training pipeline, distance-based models like KNN are extremely sensitive to inconsistent feature scales: Euclidean distance...

Spark RDD Fault Tolerance: Checkpoint Principle & Best Pr...

Detailed explanation of Spark Checkpoint execution flow, core differences with persist/cache, partitioner strategies, and best practices for iterative algorithms and long dependency chain scenarios.

Spark Broadcast Variables: Efficient Shared Read-Only Data

Detailed explanation of Spark broadcast variable working principle, configuration parameters and best practices, and performance optimization solution using broadcast to implement MapSideJoin inste...

KNN/K-Nearest Neighbors Algorithm Practice: Euclidean Dis...

KNN/K-Nearest Neighbors Algorithm: From Euclidean distance calculation, distance sorting, TopK voting to function encapsulation, giving reproducible Python...

scikit-learn KNN Practice: KNeighborsClassifier, kneighbo...

From unified API (fit/predict/transform/score) to kneighbors to find K nearest neighbors of test samples, then using learning curve/parameter curve to select...

Apache Tez Practice: Hive on Tez Installation & Configura...

Apache Tez (example version Tez 0.9.x) as execution engine alternative to MapReduce on Hadoop2/YARN, providing DAG (Directed Acyclic Graph) execution model for...

Data Mining: From Wine Classification to Machine Learning...

2025's most commonly used machine learning concept framework: supervised learning (classification/regression), unsupervised learning (clustering/dimensionality...