Blog

Technical exploration and thoughts · 655 articles

All big-data java ai artificial-intelligence programmer-life machine-learning mysql data-engineering backend distributed data-warehouse flink architecture python robotics spark hive llm distributed-system kafka database scala embodied-ai hdfs deep-learning spring message-queue langchain system-architecture mybatis performance-optimization elasticsearch mongodb health redis spring-boot rabbitmq mq hadoop elk flume stream-processing transaction messaging rpc tutorial sklearn caching cache dubbo java-rabbitmq clickhouse hbase kylin neo4j microservices sql index tomcat programmer multimodal zookeeper druid canal mllib orm ioc nutrition large-model robot-arm tesla indie-dev nginx datax sharding shardingsphere fastdfs rocketmq time-management applications career-growth docker etl guava java-rocketmq optimization learning quantization deployment kudu logstash decision-tree sqoop airflow realtime-warehouse mycat storage-engine consistency fat-loss gpt product-manager coffee business-analysis automation algorithm career-and-growth middleware computer-vision autonomous-driving fsd qwen mapreduce crud monitoring database-sharding distributed-transaction concurrency transaction-pitfalls graph-database memcached netty innodb salary career-development cold-shower running efficiency luckin industry medical industrial lfp lfp-battery battery ev fitness career-personal-growth ocr deepseek deepseek-ocr omni cloud-native yarn datastream jdbc olap knn linear-regression numpy zipper-table griffin devops kubernetes data-mapping design-patterns high-availability read-write-separation sharding-jdbc saga security replica-set cql source-code-analysis evcache servlet aop load-balancing handwritten nio mindfulness meditation exercise reinforcement-learning agent conflict evaluation money-management consumption savings social-media dating memory price-war cotti ptq qat qlora qwen2.5-vl multivitamin calcium evolution technology industrial-robot agriculture hardware simulation ros large-language-model degradation slam visual-inspection programming-language linux window raft kibana aggregation regularization logistic-regression prometheus exporter atlas state-management maven acid annotation-development master-slave-replication flexible-transaction xa cap 2pc 3pc bson explain b+tree slow-query authentication cluster oss aliyun source-code async netflix jms paxos rmi engineering physiology hot-shower practical-guide muscle-building transformer tensorflow reports technical-sharing product entrepreneurship methodology team-collaboration conflict-resolution collaboration gtd tools usage-time health-management china-us culture marriage partner coffee-beverage-trend homemade-coffee taste performance fine-tuning blip-2 minigpt-4 llava alibaba vitamins fish-oil vitamin-c iron folate chronic-disease supplements traditional-chinese-medicine western-medicine integrated-medicine development-history tech-evolution lakehouse data-mesh serverless talent tech-selection history unimate hydraulic-drive ai-collaboration categories service-robot humanoid-robot logistics career skills trends service scara cobot motor reducer sensor plc mpc control trajectory-planning vision core-technology perception decision-making home data market challenges commercialization future-trends meta-learning lifestyle nmc-battery body-fat-percentage body-shaping muscle-gain strength-training body-fat metabolism sympathetic-nerve parasympathetic-nerve autonomic-nervous-system hrv testing-platform api-integration automotive 3d model-y open-source imitation-learning visual-algorithms research java-21 kotlin golang rust javascript node k8s gemini cep source sink dataset mergetree ik-analyzer dsl term-query filter inverted-index nrt grok filebeat tez data-mining cross-validation normalization evaluation-metrics ridge-regression lasso gradient-descent grafana visualization ods scd dimension-table dwd dws ads realtime memory-management parallelism harbor container esp32 home-assistant jenkins gitlab cicd essay website astro frontend xml-mapping dynamic-sql sqlsession high-concurrency mha failover distributed-primary-key scaling binding-tables sql-optimization binding-table tcc seata data-masking distributed-database sharding-proxy sharding-strategy e-r-sharding configuration-file transaction-isolation-level schema.xml propagation declarative-transaction programmatic-transaction transactional plugin database-operations nosql json pipeline pagination writeconcern pagehelper generic-mapper b-tree use-cases selection-guide template repository wiredtiger inmemory containerization data-modeling embedded reference oplog election permissions sharded-cluster graph-theory euler-path proxy-pattern embedded-database backup access-control dynamic-proxy cloud-storage lru concurrenthashmap oom distributed-cache spymemcached activemq blockingqueue message-storage queue-index erlang handwritten-framework jdk reverse-proxy process configuration class-loading ssl jvm io heartbeat-detection spi routing storage-structure undo redo thread-model tablespace binlog replication clustered-index lock mvcc sorting pip offline pandas voice

Sqoop Incremental Import and CDC Change Data Capture Prin...

Introduce Sqoop's --incremental append incremental import mechanism, and deeply explain CDC (Change Data Capture) core concepts, capture method comparisons, and modern solutions like Flink CDC, Deb...

7/27/2024

big-datasqoopetldata-engineering

ZooKeeper Distributed Coordination Framework Introduction...

Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configuration process.

7/27/2024

big-datazookeeperdistributed-system

Sqoop Partial Import: --query, --columns, --where Three F...

Detailed explanation of three ways Sqoop imports partial data from MySQL to HDFS by condition: custom query, specify columns, WHERE condition filtering, with applicable scenarios and precautions.

7/24/2024

big-datasqoopetldata-engineering

Sqoop and Hive Integration: MySQL ↔ Hive Bidirectional Da...

Demonstrates Sqoop importing MySQL data directly to Hive table, and exporting Hive data back to MySQL, covering key parameters like --hive-import, --create-hive-table usage.

7/24/2024

big-datasqoophiveetldata-engineering

Sqoop Data Migration ETL Tool Introduction and Installation

Introduction to Apache Sqoop core principles, use cases, and installation configuration steps on Hadoop cluster, helping quickly get started with batch data migration between MySQL and HDFS/Hive.

7/20/2024

big-datasqoopetldata-engineering

Sqoop Practice: MySQL Full Data Import to HDFS

Complete example demonstrating Sqoop importing MySQL table data to HDFS, covering core parameter explanations, MapReduce parallel mechanism, and execution result verification.

7/20/2024

big-datasqoophadoopetldata-engineering

Flume Collect Hive Logs to HDFS

Use Flume exec source to real-time track Hive log files, buffer via memory channel, configure HDFS sink to write with time-based partitioning, implement automatic log data landing to HDFS.

7/17/2024

big-dataflumehdfsdata-engineering

Flume Dual Sink: Write Logs to Both HDFS and Local File

Through Flume replication mode (Replicating Channel Selector) and three-Agent cascade architecture, implement same log data written to both HDFS and local file, meeting both offline analysis and re...

7/17/2024

big-dataflumehdfsdata-engineering

Apache Flume Architecture and Core Concepts

Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.

7/13/2024

big-dataflumedata-engineeringdistributed-system

Flume Hello World: NetCat Source + Memory Channel + Logge...

Through Flume's simplest Hello World case, use netcat source to monitor port, memory channel for buffering, logger sink for console output, demonstrating complete Source→Channel→Sink data flow.

7/13/2024

big-dataflumedata-engineering

Hive Metastore Three Modes and Remote Deployment

Detailed explanation of Hive Metastore's embedded, local, and remote deployment modes, and complete steps to configure high-availability remote Metastore on three-node cluster.

7/10/2024

big-datahivedata-engineering

HiveServer2 Configuration and Beeline Remote Connection

Introduction to HiveServer2 architecture and role, configure Hadoop proxy user and WebHDFS, implement cross-node JDBC remote access to Hive via Beeline client.

7/10/2024

big-datahivedata-engineering

Hive DDL and DML Operations

Systematic explanation of Hive DDL (database/table creation, internal and external tables) and DML (data loading, insertion, query) operations, with complete HiveQL examples and configuration optim...

7/8/2024

big-datahivesqldata-engineering

Hive HQL Advanced: Data Import/Export and Query Practice

Deep dive into Hive's multiple data import methods (LOAD/INSERT/External Table/Sqoop), data export methods, and practical usage of HQL query operations like aggregation, filtering, and sorting.

7/8/2024

big-datahivesqldata-engineering

MapReduce JOIN Four Implementation Strategies

Deep dive into four JOIN strategies in MapReduce: Reduce-Side Join, Map-Side Join, Semi-Join, and Bloom Join principles and Java implementations, with analysis of applicable scenarios and performan...

7/4/2024

big-datahadoopmapreducejava

Hive Introduction: Architecture and Cluster Installation

Introduction to Hive data warehouse core concepts, architecture components and pros/cons, with detailed steps to install and configure Hive 2.3.9 on three-node Hadoop cluster.

7/4/2024

big-datahivehadoopdata-engineering

HDFS Java Client Practice: Upload/Download Files, Directo...

Using Hadoop HDFS Java Client API for file operations: Maven dependency configuration, FileSystem/Path/Configuration core classes, implement file upload, download, delete, list scan and progress ba...

7/3/2024

big-datahdfshadoopjava

Java Implementation MapReduce WordCount Complete Code

Implement Hadoop MapReduce WordCount from scratch: Hadoop serialization mechanism detailed explanation, writing Mapper, Reducer, Driver three components, Maven project configuration, local and clus...

7/3/2024

big-datahadoopmapreducejava

HDFS Distributed File System Read/Write Principle

Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic commands.

7/2/2024

big-datahdfshadoopdistributed-system

HDFS CLI Practice Complete Command Guide

Complete HDFS CLI practice: hadoop fs common commands including directory operations, file upload/download, permission management, with three-node cluster live demo.

7/2/2024

big-datahdfshadooplinux