Blog

Technical exploration and thoughts · 655 articles

All big-data java ai artificial-intelligence programmer-life machine-learning mysql data-engineering backend distributed data-warehouse flink architecture python robotics spark hive llm distributed-system kafka database scala embodied-ai hdfs deep-learning spring message-queue langchain system-architecture mybatis performance-optimization elasticsearch mongodb health redis spring-boot rabbitmq mq hadoop elk flume stream-processing transaction messaging rpc tutorial sklearn caching cache dubbo java-rabbitmq clickhouse hbase kylin neo4j microservices sql index tomcat programmer multimodal zookeeper druid canal mllib orm ioc nutrition large-model robot-arm tesla indie-dev nginx datax sharding shardingsphere fastdfs rocketmq time-management applications career-growth docker etl guava java-rocketmq optimization learning quantization deployment kudu logstash decision-tree sqoop airflow realtime-warehouse mycat storage-engine consistency fat-loss gpt product-manager coffee business-analysis automation algorithm career-and-growth middleware computer-vision autonomous-driving fsd qwen mapreduce crud monitoring database-sharding distributed-transaction concurrency transaction-pitfalls graph-database memcached netty innodb salary career-development cold-shower running efficiency luckin industry medical industrial lfp lfp-battery battery ev fitness career-personal-growth ocr deepseek deepseek-ocr omni cloud-native yarn datastream jdbc olap knn linear-regression numpy zipper-table griffin devops kubernetes data-mapping design-patterns high-availability read-write-separation sharding-jdbc saga security replica-set cql source-code-analysis evcache servlet aop load-balancing handwritten nio mindfulness meditation exercise reinforcement-learning agent conflict evaluation money-management consumption savings social-media dating memory price-war cotti ptq qat qlora qwen2.5-vl multivitamin calcium evolution technology industrial-robot agriculture hardware simulation ros large-language-model degradation slam visual-inspection programming-language linux window raft kibana aggregation regularization logistic-regression prometheus exporter atlas state-management maven acid annotation-development master-slave-replication flexible-transaction xa cap 2pc 3pc bson explain b+tree slow-query authentication cluster oss aliyun source-code async netflix jms paxos rmi engineering physiology hot-shower practical-guide muscle-building transformer tensorflow reports technical-sharing product entrepreneurship methodology team-collaboration conflict-resolution collaboration gtd tools usage-time health-management china-us culture marriage partner coffee-beverage-trend homemade-coffee taste performance fine-tuning blip-2 minigpt-4 llava alibaba vitamins fish-oil vitamin-c iron folate chronic-disease supplements traditional-chinese-medicine western-medicine integrated-medicine development-history tech-evolution lakehouse data-mesh serverless talent tech-selection history unimate hydraulic-drive ai-collaboration categories service-robot humanoid-robot logistics career skills trends service scara cobot motor reducer sensor plc mpc control trajectory-planning vision core-technology perception decision-making home data market challenges commercialization future-trends meta-learning lifestyle nmc-battery body-fat-percentage body-shaping muscle-gain strength-training body-fat metabolism sympathetic-nerve parasympathetic-nerve autonomic-nervous-system hrv testing-platform api-integration automotive 3d model-y open-source imitation-learning visual-algorithms research java-21 kotlin golang rust javascript node k8s gemini cep source sink dataset mergetree ik-analyzer dsl term-query filter inverted-index nrt grok filebeat tez data-mining cross-validation normalization evaluation-metrics ridge-regression lasso gradient-descent grafana visualization ods scd dimension-table dwd dws ads realtime memory-management parallelism harbor container esp32 home-assistant jenkins gitlab cicd essay website astro frontend xml-mapping dynamic-sql sqlsession high-concurrency mha failover distributed-primary-key scaling binding-tables sql-optimization binding-table tcc seata data-masking distributed-database sharding-proxy sharding-strategy e-r-sharding configuration-file transaction-isolation-level schema.xml propagation declarative-transaction programmatic-transaction transactional plugin database-operations nosql json pipeline pagination writeconcern pagehelper generic-mapper b-tree use-cases selection-guide template repository wiredtiger inmemory containerization data-modeling embedded reference oplog election permissions sharded-cluster graph-theory euler-path proxy-pattern embedded-database backup access-control dynamic-proxy cloud-storage lru concurrenthashmap oom distributed-cache spymemcached activemq blockingqueue message-storage queue-index erlang handwritten-framework jdk reverse-proxy process configuration class-loading ssl jvm io heartbeat-detection spi routing storage-structure undo redo thread-model tablespace binlog replication clustered-index lock mvcc sorting pip offline pandas voice

Spark MLlib Decision Tree Pruning: Pre-pruning, Post-prun...

This article systematically introduces decision tree pre-pruning and post-pruning principles, compares core differences between three mainstream algorithms...

5/29/2025

big-datasparkmachine-learningmllibscala

Spark MLlib Decision Tree: Classification Principles, Gin...

This article introduces the basic concepts, classification principles, and classification principles of decision trees. Decision tree is a non-linear...

5/28/2025

big-datasparkmachine-learningmllibscala

Spark MLlib Logistic Regression: Input Function, Sigmoid,...

This article introduces the basic principles, application scenarios, and implementation in Spark MLlib of logistic regression. Logistic regression is an efficient binary classification algorithm wi...

5/27/2025

big-datasparkmachine-learningmllibscala

Spark MLlib Linear Regression: Scenarios, Loss Function a...

Linear regression uses regression equations to model relationships between independent and dependent variables. This article covers regression scenarios (house...

4/11/2025

big-datasparkmachine-learningmllibscala

Big Data #268: Real-time Warehouse ODS Layer - Writing Ka...

Writing dimension tables (DIM) from Kafka typically involves reading real-time or batch data from Kafka topics and updating dimension tables based on the data...

1/3/2025

big-datarealtime-warehouseflinkkafkacanalhbasescala

Big Data #269: Real-time Warehouse DIM, DW and ADS Layer ...

DW (Data Warehouse layer) is built from DWD, DWS, and DIM layer data, completing data architecture and integration, establishing consistent dimensions, and...

1/3/2025

big-datarealtime-warehouseflinkkafkacanalhbasescala

Spark MLlib Logistic Regression: Sigmoid, Loss Function a...

Logistic regression is a classification model in machine learning — an efficient binary classification algorithm widely used in ad click-through rate...

1/3/2025

big-datasparkmachine-learningmllibscala

Big Data #266: Canal Integration with Kafka - Real-time D...

This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog. Demonstrates how to integrate...

1/2/2025

big-datarealtime-warehouseflinkkafkacanalhbasescala

Realtime Warehouse - ODS Lambda Architecture Kappa Archit...

In internet companies, common ODS data includes business log data (Log) and business DB data. For business DB data, collecting data from relational databases...

1/2/2025

big-datarealtime-warehouseflinkkafkacanalhbasescala

Spark MLlib Linear Regression: Scenarios, Loss Function a...

Linear Regression is an analytical method that uses regression equations to model the relationship between one or more independent variables and a dependent...

1/2/2025

big-datamachine-learningsparkmllibscala

Canal Deployment: Installation, Service Startup and Commo...

Canal is an open-source data synchronization tool from Alibaba for MySQL database incremental log parsing and synchronization. It simulates the MySQL slave...

12/31/2024

big-datadata-warehousecanal

Canal Working Principle: Workflow and MySQL Binlog Introd...

Canal is an open-source tool for MySQL database binlog incremental subscription and consumption, primarily used for data synchronization and distributed...

12/30/2024

big-datadata-warehousecanal

MySQL Binlog Deep Dive: Storage Directory, Change Records...

MySQL's Binary Log (binlog) is a log file type in MySQL that records all change operations performed on the database (excluding SELECT and SHOW queries). It is...

12/30/2024

big-datadata-warehousemysql

Canal Data Sync: Introduction, Background, Principles and...

Alibaba B2B's cross-region business between domestic sellers and overseas buyers drove the need for data synchronization between Hangzhou and US data centers.

12/29/2024

big-datadata-warehousecanal

Spring In-Depth: AOP Aspect Enhancement Core Concepts Pro...

In-depth introduction to Spring AOP aspect enhancement covering core concepts, related terminology, and proxy configuration with practical examples.

12/29/2024

JavaSpringBackendSpring BootAOP

Realtime Warehouse - Business Database Table Structure: T...

Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput,...

12/28/2024

big-datarealtime-warehouseflinkkafkacanalhbasescala

Spring In-Depth: IoC Container Circular Dependency Protot...

In-depth analysis of Spring IoC container system covering circular dependency, prototype beans, prototype scope, and lazy ObjectFactory with solutions.

12/28/2024

JavaSpringBackendSpring BootIoC

Real-time Data Warehouse: Background, Architecture, Requi...

Real-time data processing capability has become a key competitive factor for enterprises. Initially, each new requirement spawned a separate real-time task,...

12/27/2024

big-datadata-warehouserealtime

Spring In-Depth: IoC Container System BeanFactory Analysi...

In-depth analysis of Spring IoC container system covering BeanFactory process analysis and Bean lazy loading mechanism with practical examples.

12/27/2024

JavaSpringBackendSpring BootIoC

Apache Griffin Configuration: pom.xml, sparkProperties an...

Apache Griffin is an open-source data quality management framework designed to help organizations monitor and improve data quality in big data environments.

12/25/2024

big-datadata-warehousegriffin