Blog
Technical exploration and thoughts · 655 articles
Offline Data Warehouse: Hive ODS Layer Table Creation and...
Sync MySQL data to specified HDFS directory via DataX, then create ODS external tables in Hive with unified dt string partitioning. Enables fast queries of raw transaction records within 7 days, de...
Offline Data Warehouse: E-commerce Core Transaction Incre...
Using DataX (MySQLReader + HDFSWriter) to extract daily incremental data from MySQL order tables, order detail tables, and product information tables into...
MyBatis Design Patterns - Proxy Pattern and Source Code A...
Detailed introduction to proxy design pattern concepts, classifications, and their manifestation in MyBatis, including static and dynamic proxy code implementation and MapperProxy source code analy...
Neo4j + Spring Boot Practice: Integration from Driver to ...
Complete guide to integrating Spring Boot with Neo4j: Java Driver, Neo4jTemplate, Repository pattern with practical examples for graph database CRUD operations and relationship queries.
MyBatis Design Patterns - Builder Pattern, Factory Patter...
Detailed introduction to design patterns used in MyBatis source code including builder pattern, factory method pattern, singleton pattern, proxy pattern, composite pattern, and their manifestation ...
Offline Data Warehouse Practice: E-commerce Core Transact...
Focusing on three main metrics: order count, product count, payment amount, breakdown analysis dimensions by sales region and product type (3-level category).
Flink State and Checkpoint: State Management, Fault Toler...
Flink stateful computation explanation: Keyed State, Operator State, Checkpoint configuration, Savepoint backup and recovery, production environment practices.
Offline Data Warehouse Advertising Business Hive ADS Prac...
Complete solution for exporting Hive ADS layer data to MySQL using DataX. Covers ADS loading, DataX configuration, MySQL table creation, Shell script parameterized execution, and common error diagn...
Neo4j Access Modes: Embedded vs Server with Java Examples
Neo4j embedded database vs server mode comparison, Java API access examples. This article deeply analyzes principles and practical applications.
Offline Data Warehouse Advertising Business: Flume Import...
Using Flume Agent to collect event logs and write to HDFS, then use Hive scripts to complete ODS and DWD layer data loading by date. Content covers Flume Agent's Source, Channel, Sink basic structu...
Neo4j Backup/Recovery + Warm-up and Execution Plan Practice
Neo4j database backup and recovery, data warm-up and execution plan analysis. This article deeply analyzes principles and practical applications.
Offline Data Warehouse Advertising Business Hive Analysis...
Implementation of advertising impression, click, purchase hourly statistics based on Hive offline data warehouse, completing CTR, CVR and advertising effect...
Flink Streaming Introduction: DataStream API & Program St...
Flink DataStream API getting started guide, program execution flow, environment acquisition, data source definition, operator chaining and execution mode details, demonstrating stream processing pr...
Flink Window and Watermark: Time Windows, Tumbling/Slidin...
Comprehensive analysis of Flink Window mechanism: tumbling windows, sliding windows, session windows, Watermark principle and generation strategies, late data processing mechanism.
Offline Data Warehouse Hive Advertising Business Practice...
Hive offline data warehouse advertising business practice, combined with typical pipeline of Flume + Hive + UDF + Parquet, demonstrates how to map raw event...
Offline Data Warehouse Member Metrics Verification, DataX...
Offline data warehouse practice based on Hadoop + Hive + HDFS + DataX + MySQL, covering member metrics testing (active/new/retention), HDFS export, DataX sync to MySQL, and advertising business ODS...
Neo4j Transaction, Index and Constraint Practice: Syntax,...
Neo4j transaction handling, index creation, constraint settings and concurrency issue troubleshooting. This article deeply analyzes principles and practical applications.
Offline Data Warehouse Practice: Flume+HDFS+Hive Building...
Demonstrates a complete pipeline from log collection to member metric analysis, covering Flume Taildir monitoring, HDFS partition storage, Hive external table loading, ODS/DWD/DWS/ADS layered proce...
Flink Installation & Deployment: Local, Standalone, YARN ...
Complete tutorial for Apache Flink installation and deployment in three modes: Local, Standalone cluster, and YARN integration, including environment configuration, parameter tuning, and common iss...
Flink on YARN Deployment: Environment Preparation, Resour...
Detailed explanation of three Flink deployment modes on YARN cluster: Session, Application, Per-Job modes, Hadoop dependency configuration, YARN resource application and job submission process.