Blog

Technical exploration and thoughts · 655 articles

All big-datajavaartificial-intelligencemysqldata-engineeringbackenddata-warehouseflinkpythonroboticssparkhivellmmachine-learningdistributeddistributed-systemkafkaaiprogrammer-lifehdfsembodied-aideep-learningdatabasespringlangchainscalamybatiselasticsearchmongodbsystem-architectureperformance-optimizationmessage-queuehealtharchitectureredisspring-bootrabbitmqmqhadoopflumestream-processingtransactionelkmessagingrpctutorialsklearncachingcachedubboclickhousehbasejava-rabbitmqkylinneo4jmicroservicessqlindextomcatprogrammermultimodalzookeeperdruidcanalmllibormiocnutritionrobot-armteslaindie-devnginxdataxshardingshardingspherefastdfsrocketmqtime-managementcareer-growthdockeretlguavaoptimizationlearningapplicationslarge-modelquantizationkudulogstashdecision-treesqoopairflowrealtime-warehousemycatstorage-engineconsistencyfat-lossgptproduct-managercoffeebusiness-analysiscareer-and-growthmiddlewarecomputer-visionautonomous-drivingfsdqwenmapreducedeploymentcrudmonitoringdatabase-shardingdistributed-transactionconcurrencytransaction-pitfallsgraph-databasememcachednettyinnodbsalarycareer-developmentcold-showerrunningefficiencyluckinindustrymedicalindustrialautomationalgorithmlfplfp-batterybatteryevfitnesscareer-personal-growthocrdeepseekdeepseek-ocromniprogramming-languagecloud-nativeyarndatastreamjdbcolapknnlinear-regressionzipper-tablegriffindata-mappingdesign-patternshigh-availabilityread-write-separationsharding-jdbcsagasecurityreplica-setcqlsource-code-analysisevcacheservletaopload-balancinghandwrittenniomindfulnessmeditationreinforcement-learningagentconflictevaluationmoney-managementconsumptionsavingssocial-mediadatingmemoryprice-warcottiptqqatqloraqwen2.5-vlmultivitamincalciumevolutionindustrial-robotagriculturehardwaresimulationroslarge-language-modeldegradationslamresearchlinuxwindowraftkibanaaggregationnumpyregularizationlogistic-regressionprometheusexporteratlasstate-managementdevopskubernetesmavenacidannotation-developmentmaster-slave-replicationflexible-transactionxacap2pc3pcbsonexplainb-plus-treeslow-queryauthenticationclusterossaliyunsource-codeasyncnetflixjmsjava-rocketmqpaxosrmiengineeringphysiologyhot-showerpractical-guideexercisemuscle-buildingtransformertensorflowreportstechnical-sharingproductentrepreneurshipmethodologyteam-collaborationconflict-resolutioncollaborationgtdtoolsusage-timehealth-managementchina-usculturemarriagepartnercoffee-beverage-trendhomemade-coffeetasteperformancefine-tuningblip-2minigpt-4llavaalibabavitaminsfish-oilvitamin-cironfolatechronic-diseasesupplementstraditional-chinese-medicinewestern-medicineintegrated-medicinedevelopment-historytech-evolutionlakehousedata-meshserverlesstalenttech-selectionhistoryunimatetechnologyhydraulic-driveai-collaborationcategoriesservice-robothumanoid-robotlogisticscareerskillstrendsservicescaracobotmotorreducersensorplcmpccontroltrajectory-planningvisioncore-technologyperceptiondecision-makinghomedatamarketchallengescommercializationfuture-trendsmeta-learninglifestylenmc-batterybody-fat-percentagebody-shapingmuscle-gainstrength-trainingbody-fatmetabolismsympathetic-nerveparasympathetic-nerveautonomic-nervous-systemhrvtesting-platformapi-integrationautomotive3dmodel-yvisual-inspectionopen-sourceimitation-learningjava-21golangk8sgeminicepsourcesinkdatasetmergetreeik-analyzerdslterm-queryfilterinverted-indexnrtgrokfilebeattezdata-miningcross-validationnormalizationevaluation-metricsridge-regressionlassogradient-descentgrafanavisualizationodsscddimension-tabledwddwsadsrealtimememory-managementparallelismharborcontaineresp32home-assistantjenkinsgitlabcicdessaywebsiteastrofrontendxml-mappingdynamic-sqlsqlsessionhigh-concurrencymhafailoverdistributed-primary-keyscalingbinding-tablessql-optimizationbinding-tabletccseatadata-maskingdistributed-databasesharding-proxysharding-strategye-r-shardingconfiguration-filetransaction-isolation-levelschema.xmlpropagationdeclarative-transactionprogrammatic-transactiontransactionalplugindatabase-operationsnosqljsonpipelinepaginationwriteconcernpagehelpergeneric-mapperb-treeuse-casesselection-guidetemplaterepositorywiredtigerinmemorycontainerizationdata-modelingembeddedreferenceoplogelectionpermissionssharded-clustergraph-theoryeuler-pathproxy-patternembedded-databasebackupaccess-controldynamic-proxycloud-storagelruconcurrenthashmapoomdistributed-cachespymemcachedactivemqblockingqueuemessage-storagequeue-indexerlanghandwritten-frameworkjdkreverse-proxyprocessconfigurationclass-loadingssljvmioheartbeat-detectionspiroutingstorage-structureundoredothread-modeltablespacebinlogreplicationclustered-indexlockmvccsortingpipofflinevoice

Sqoop Incremental Import and CDC Change Data Capture Principles

Introduce Sqoop's --incremental append incremental import mechanism, and deeply explain CDC (Change Data Capture) core concepts, capture method comparisons...

ZooKeeper Distributed Coordination Framework Introduction and ZAB Protocol

Introduction to ZooKeeper core concepts, Leader/Follower/Observer role division, ZAB protocol principles, and demonstration of 3-node cluster installation and configurati...

Big Data 23 - Sqoop Partial Import: --query, --columns and --where

Detailed explanation of three ways Sqoop imports partial data from MySQL to HDFS by condition: custom query, specify columns, WHERE condition filtering, with applicable s...

Sqoop and Hive Integration: MySQL ↔ Hive Bidirectional Data Transfer

Demonstrates Sqoop importing MySQL data directly to Hive table, and exporting Hive data back to MySQL, covering key parameters like --hive-import, --create-hive-table usa...

Sqoop Data Migration ETL Tool Introduction and Installation

Introduction to Apache Sqoop core principles, use cases, and installation configuration steps on Hadoop cluster, helping quickly get started with batch data migration bet...

Sqoop Practice: MySQL Full Data Import to HDFS

Complete example demonstrating Sqoop importing MySQL table data to HDFS, covering core parameter explanations, MapReduce parallel mechanism, and execution result verifica...

Flume Collect Hive Logs to HDFS

Use Flume exec source to real-time track Hive log files, buffer via memory channel, configure HDFS sink to write with time-based partitioning, implement automatic log dat...

Flume Dual Sink: Write Logs to Both HDFS and Local File

This is article 20 in the Big Data series. Demonstrates Flume replication mode with dual Sink architecture—same data written to both HDFS and local filesystem.

Apache Flume Architecture and Core Concepts

Introduction to Apache Flume positioning, core components (Source, Channel, Sink), event model and common data flow topologies, and installation configuration methods.

Flume Hello World: NetCat Source + Memory Channel + Logger Sink

Through Flume's simplest Hello World case, use netcat source to monitor port, memory channel for buffering, logger sink for console output, demonstrating complete Source→...

Hive Metastore Three Modes and Remote Deployment

Detailed explanation of Hive Metastore's embedded, local, and remote deployment modes, and complete steps to configure high-availability remote Metastore on three-node cl...

HiveServer2 Configuration and Beeline Remote Connection

Introduction to HiveServer2 architecture and role, configure Hadoop proxy user and WebHDFS, implement cross-node JDBC remote access to Hive via Beeline client.

Hive DDL and DML Operations

Systematic explanation of Hive DDL (database/table creation, internal and external tables) and DML (data loading, insertion, query) operations.

Hive HQL Advanced: Data Import/Export and Query Practice

Deep dive into Hive's multiple data import methods (LOAD/INSERT/External Table/Sqoop), data export methods, and practical usage of HQL query operations like aggregation...

MapReduce JOIN Four Implementation Strategies

This is article 11 in the Big Data series. Introduces four classic strategies for implementing multi-table JOIN in MapReduce framework and their Java implementations.

Hive Introduction: Architecture and Cluster Installation

Introduction to Hive data warehouse core concepts, architecture components and pros/cons, with detailed steps to install and configure Hive 2.3.9 on three-node Hadoop clu...

HDFS Java Client Practice: Upload/Download Files, Directory Operations and API Usage

This is article 9 in the Big Data series. Learn to operate HDFS through Java code, master Hadoop's Java Client API.

Java Implementation MapReduce WordCount Complete Code

Implement Hadoop MapReduce WordCount from scratch: Hadoop serialization mechanism detailed explanation, writing Mapper, Reducer, Driver three components...

HDFS Distributed File System Read/Write Principle

Deep dive into HDFS architecture: NameNode, DataNode, Client roles, Block storage mechanism, file read/write process (Pipeline write and nearest read), and HDFS basic com...

HDFS CLI Practice Complete Command Guide

Complete HDFS CLI practice: hadoop fs common commands including directory operations, file upload/download, permission management, with three-node cluster live demo.