Blog

Technical exploration and thoughts · 655 articles

All big-data java ai artificial-intelligence programmer-life machine-learning mysql data-engineering backend distributed data-warehouse flink architecture python robotics spark hive llm distributed-system kafka database scala embodied-ai hdfs deep-learning spring message-queue langchain system-architecture mybatis performance-optimization elasticsearch mongodb health redis spring-boot rabbitmq mq hadoop elk flume stream-processing transaction messaging rpc tutorial sklearn caching cache dubbo java-rabbitmq clickhouse hbase kylin neo4j microservices sql index tomcat programmer multimodal zookeeper druid canal mllib orm ioc nutrition large-model robot-arm tesla indie-dev nginx datax sharding shardingsphere fastdfs rocketmq time-management applications career-growth docker etl guava java-rocketmq optimization learning quantization deployment kudu logstash decision-tree sqoop airflow realtime-warehouse mycat storage-engine consistency fat-loss gpt product-manager coffee business-analysis automation algorithm career-and-growth middleware computer-vision autonomous-driving fsd qwen mapreduce crud monitoring database-sharding distributed-transaction concurrency transaction-pitfalls graph-database memcached netty innodb salary career-development cold-shower running efficiency luckin industry medical industrial lfp lfp-battery battery ev fitness career-personal-growth ocr deepseek deepseek-ocr omni cloud-native yarn datastream jdbc olap knn linear-regression numpy zipper-table griffin devops kubernetes data-mapping design-patterns high-availability read-write-separation sharding-jdbc saga security replica-set cql source-code-analysis evcache servlet aop load-balancing handwritten nio mindfulness meditation exercise reinforcement-learning agent conflict evaluation money-management consumption savings social-media dating memory price-war cotti ptq qat qlora qwen2.5-vl multivitamin calcium evolution technology industrial-robot agriculture hardware simulation ros large-language-model degradation slam visual-inspection programming-language linux window raft kibana aggregation regularization logistic-regression prometheus exporter atlas state-management maven acid annotation-development master-slave-replication flexible-transaction xa cap 2pc 3pc bson explain b+tree slow-query authentication cluster oss aliyun source-code async netflix jms paxos rmi engineering physiology hot-shower practical-guide muscle-building transformer tensorflow reports technical-sharing product entrepreneurship methodology team-collaboration conflict-resolution collaboration gtd tools usage-time health-management china-us culture marriage partner coffee-beverage-trend homemade-coffee taste performance fine-tuning blip-2 minigpt-4 llava alibaba vitamins fish-oil vitamin-c iron folate chronic-disease supplements traditional-chinese-medicine western-medicine integrated-medicine development-history tech-evolution lakehouse data-mesh serverless talent tech-selection history unimate hydraulic-drive ai-collaboration categories service-robot humanoid-robot logistics career skills trends service scara cobot motor reducer sensor plc mpc control trajectory-planning vision core-technology perception decision-making home data market challenges commercialization future-trends meta-learning lifestyle nmc-battery body-fat-percentage body-shaping muscle-gain strength-training body-fat metabolism sympathetic-nerve parasympathetic-nerve autonomic-nervous-system hrv testing-platform api-integration automotive 3d model-y open-source imitation-learning visual-algorithms research java-21 kotlin golang rust javascript node k8s gemini cep source sink dataset mergetree ik-analyzer dsl term-query filter inverted-index nrt grok filebeat tez data-mining cross-validation normalization evaluation-metrics ridge-regression lasso gradient-descent grafana visualization ods scd dimension-table dwd dws ads realtime memory-management parallelism harbor container esp32 home-assistant jenkins gitlab cicd essay website astro frontend xml-mapping dynamic-sql sqlsession high-concurrency mha failover distributed-primary-key scaling binding-tables sql-optimization binding-table tcc seata data-masking distributed-database sharding-proxy sharding-strategy e-r-sharding configuration-file transaction-isolation-level schema.xml propagation declarative-transaction programmatic-transaction transactional plugin database-operations nosql json pipeline pagination writeconcern pagehelper generic-mapper b-tree use-cases selection-guide template repository wiredtiger inmemory containerization data-modeling embedded reference oplog election permissions sharded-cluster graph-theory euler-path proxy-pattern embedded-database backup access-control dynamic-proxy cloud-storage lru concurrenthashmap oom distributed-cache spymemcached activemq blockingqueue message-storage queue-index erlang handwritten-framework jdk reverse-proxy process configuration class-loading ssl jvm io heartbeat-detection spi routing storage-structure undo redo thread-model tablespace binlog replication clustered-index lock mvcc sorting pip offline pandas voice

Spark Streaming Introduction: From DStream to Structured ...

Introduction to Spark's two generations of real-time computing frameworks: DStream micro-batch processing model's architecture and limitations, and how Structured Streaming solves EventTime process...

11/13/2024

big-datasparkscalastream-processingdata-engineering

Spark Streaming Data Sources: File Stream, Socket, RDD Qu...

Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation, with complete Scala code exam...

11/13/2024

big-datasparkscalastream-processingkafkadata-engineering

MyBatis Deep Dive - Level 1 Cache, Code Testing, and Sour...

Detailed introduction to MyBatis level 1 cache working principles, code testing, invalidation scenarios, and source code analysis. Level 1 cache is enabled by default in MyBatis with SqlSession-lev...

11/13/2024

JavaMyBatisCache

MyBatis Level 2 Cache - Testing and Source Code Analysis

Detailed introduction to MyBatis level 2 cache working principles, enable configuration, code testing, and source code analysis. Level 2 cache is based on Mapper namespace, and multiple SqlSessions...

11/13/2024

JavaMyBatisCache

Grafana 11.3.0 Installation & Startup: YUM Install RPM, s...

For OPs/devs still using CentOS/RHEL (including compatible distributions) in 2026, provides Grafana 11.3.0 (grafana-enterprise-11.3.0-1.x86_64.rpm) direct YUM...

11/12/2024

big-datagrafanamonitoringvisualization

Data Warehouse Introduction: Four Characteristics, OLTP v...

2026 engineering practice, covering core concepts and implementation concerns for data warehouses: starting from enterprise data silos, explaining four...

11/12/2024

big-datadata-warehouseolapetl

Prometheus 2.53.2 Installation & Configuration Practice: ...

Prometheus 2.53.2 (still common in existing environments in 2025/2026) provides a reusable deployment process: download and extract binary on monitoring...

11/11/2024

big-dataprometheusmonitoringexporter

Prometheus Node Exporter 1.8.2 + Pushgateway 1.10.0: Down...

Common Prometheus monitoring deployment: Install node_exporter-1.8.2 on Rocky Linux to expose host metrics, integrate with Prometheus scrape config, and visualize in Grafana dashboards.

11/11/2024

big-dataprometheusmonitoringexporter

sklearn KMeans Key Attributes & Evaluation: cluster_cente...

scikit-learn (sklearn) KMeans (2026) explains three most commonly used objects: cluster_centers_ (cluster centers), inertia_ (Within-Cluster Sum of Squares),...

11/9/2024

big-datamachine-learningsklearnpython

KMeans n_clusters Selection: Silhouette Score Practice + ...

KMeans n_clusters selection method: calculate silhouette_score and silhouette_samples on candidate cluster numbers (e.g., 2/4/6/8), determine optimal k by...

11/9/2024

big-datamachine-learningsklearnpython

SparkSQL Statements: DataFrame Operations, SQL Queries & ...

Comprehensive guide to SparkSQL core usage including DataFrame API operations, SQL query syntax, lateral view explode, and Hive integration via enableHiveSupport for metadata and table operations.

11/9/2024

big-datasparkscalasqlhivedata-engineering

SparkSQL Kernel: Five Join Strategies & Catalyst Optimize...

Deep dive into SparkSQL's five Join execution strategies (BHJ, SHJ, SMJ, Cartesian, BNLJ) selection conditions and use cases, along with the complete processing flow of Catalyst optimizer from SQL ...

11/9/2024

big-datasparkscalasqldistributed-systemdata-engineering

Python Hand-Written K-Means Clustering on Iris Dataset: F...

Python K-Means clustering implementation: using NumPy broadcasting to compute squared Euclidean distance (distEclud), initializing centroids via uniform...

11/8/2024

big-datamachine-learningsklearnpython

K-Means Clustering Practice: Self-Implemented Algorithm V...

K-Means clustering provides an engineering workflow that is 'verifiable, reproducible, and debuggable': first use 2D testSet dataset for algorithm verification...

11/8/2024

big-datamachine-learningsklearnpython

Scikit-Learn Logistic Regression Implementation: max_iter...

When using Logistic Regression in Scikit-Learn, max_iter controls maximum iterations affecting model convergence speed and accuracy. If training doesn't...

11/7/2024

big-datamachine-learningsklearnpython

K-Means Clustering Guide: From Unsupervised Concepts to I...

K-Means clustering algorithm, comparing supervised vs unsupervised learning (whether labels Y are needed), with engineering applications in customer...

11/7/2024

big-datamachine-learningsklearnpython

Deep Understanding of Logistic Regression & Gradient Desc...

Logistic Regression (LR) is an important classification algorithm in machine learning, widely used in binary classification tasks like sentiment analysis,...

11/6/2024

big-datamachine-learninglogistic-regressiongradient-descentpython

How to Implement Logistic Regression in Scikit-Learn and ...

As C gradually increases, regularization strength gets smaller, model performance on training and test shows upward trend, until around C=0.8, training...

11/6/2024

big-datamachine-learninglogistic-regressionsklearnregularization

SparkSQL Core Abstractions: RDD, DataFrame, Dataset & Spa...

Deep comparison of Spark's three data abstractions RDD, DataFrame, Dataset features and use cases, introduction to SparkSession unified entry, and demonstration of mutual conversion methods between...

11/6/2024

big-datasparkscalasqldata-engineering

SparkSQL Operators: Transformation & Action Operations

Systematically review SparkSQL Transformation and Action operators, covering select, filter, join, groupBy, union operations, with practical test cases demonstrating usage and performance optimizat...

11/6/2024

big-datasparkscalasqldata-engineering