Blog

Technical exploration and thoughts · 655 articles

All big-datajavaaiartificial-intelligenceprogrammer-lifemachine-learningmysqldata-engineeringbackenddistributeddata-warehouseflinkarchitecturepythonroboticssparkhivellmdistributed-systemkafkadatabasescalaembodied-aihdfsdeep-learningspringmessage-queuelangchainsystem-architecturemybatisperformance-optimizationelasticsearchmongodbhealthredisspring-bootrabbitmqmqhadoopelkflumestream-processingtransactionmessagingrpctutorialsklearncachingcachedubbojava-rabbitmqclickhousehbasekylinneo4jmicroservicessqlindextomcatprogrammermultimodalzookeeperdruidcanalmllibormiocnutritionlarge-modelrobot-armteslaindie-devnginxdataxshardingshardingspherefastdfsrocketmqtime-managementapplicationscareer-growthdockeretlguavajava-rocketmqoptimizationlearningquantizationdeploymentkudulogstashdecision-treesqoopairflowrealtime-warehousemycatstorage-engineconsistencyfat-lossgptproduct-managercoffeebusiness-analysisautomationalgorithmcareer-and-growthmiddlewarecomputer-visionautonomous-drivingfsdqwenmapreducecrudmonitoringdatabase-shardingdistributed-transactionconcurrencytransaction-pitfallsgraph-databasememcachednettyinnodbsalarycareer-developmentcold-showerrunningefficiencyluckinindustrymedicalindustriallfplfp-batterybatteryevfitnesscareer-personal-growthocrdeepseekdeepseek-ocromnicloud-nativeyarndatastreamjdbcolapknnlinear-regressionnumpyzipper-tablegriffindevopskubernetesdata-mappingdesign-patternshigh-availabilityread-write-separationsharding-jdbcsagasecurityreplica-setcqlsource-code-analysisevcacheservletaopload-balancinghandwrittenniomindfulnessmeditationexercisereinforcement-learningagentconflictevaluationmoney-managementconsumptionsavingssocial-mediadatingmemoryprice-warcottiptqqatqloraqwen2.5-vlmultivitamincalciumevolutiontechnologyindustrial-robotagriculturehardwaresimulationroslarge-language-modeldegradationslamvisual-inspectionprogramming-languagelinuxwindowraftkibanaaggregationregularizationlogistic-regressionprometheusexporteratlasstate-managementmavenacidannotation-developmentmaster-slave-replicationflexible-transactionxacap2pc3pcbsonexplainb+treeslow-queryauthenticationclusterossaliyunsource-codeasyncnetflixjmspaxosrmiengineeringphysiologyhot-showerpractical-guidemuscle-buildingtransformertensorflowreportstechnical-sharingproductentrepreneurshipmethodologyteam-collaborationconflict-resolutioncollaborationgtdtoolsusage-timehealth-managementchina-usculturemarriagepartnercoffee-beverage-trendhomemade-coffeetasteperformancefine-tuningblip-2minigpt-4llavaalibabavitaminsfish-oilvitamin-cironfolatechronic-diseasesupplementstraditional-chinese-medicinewestern-medicineintegrated-medicinedevelopment-historytech-evolutionlakehousedata-meshserverlesstalenttech-selectionhistoryunimatehydraulic-driveai-collaborationcategoriesservice-robothumanoid-robotlogisticscareerskillstrendsservicescaracobotmotorreducersensorplcmpccontroltrajectory-planningvisioncore-technologyperceptiondecision-makinghomedatamarketchallengescommercializationfuture-trendsmeta-learninglifestylenmc-batterybody-fat-percentagebody-shapingmuscle-gainstrength-trainingbody-fatmetabolismsympathetic-nerveparasympathetic-nerveautonomic-nervous-systemhrvtesting-platformapi-integrationautomotive3dmodel-yopen-sourceimitation-learningvisual-algorithmsresearchjava-21kotlingolangrustjavascriptnodek8sgeminicepsourcesinkdatasetmergetreeik-analyzerdslterm-queryfilterinverted-indexnrtgrokfilebeattezdata-miningcross-validationnormalizationevaluation-metricsridge-regressionlassogradient-descentgrafanavisualizationodsscddimension-tabledwddwsadsrealtimememory-managementparallelismharborcontaineresp32home-assistantjenkinsgitlabcicdessaywebsiteastrofrontendxml-mappingdynamic-sqlsqlsessionhigh-concurrencymhafailoverdistributed-primary-keyscalingbinding-tablessql-optimizationbinding-tabletccseatadata-maskingdistributed-databasesharding-proxysharding-strategye-r-shardingconfiguration-filetransaction-isolation-levelschema.xmlpropagationdeclarative-transactionprogrammatic-transactiontransactionalplugindatabase-operationsnosqljsonpipelinepaginationwriteconcernpagehelpergeneric-mapperb-treeuse-casesselection-guidetemplaterepositorywiredtigerinmemorycontainerizationdata-modelingembeddedreferenceoplogelectionpermissionssharded-clustergraph-theoryeuler-pathproxy-patternembedded-databasebackupaccess-controldynamic-proxycloud-storagelruconcurrenthashmapoomdistributed-cachespymemcachedactivemqblockingqueuemessage-storagequeue-indexerlanghandwritten-frameworkjdkreverse-proxyprocessconfigurationclass-loadingssljvmioheartbeat-detectionspiroutingstorage-structureundoredothread-modeltablespacebinlogreplicationclustered-indexlockmvccsortingpipofflinepandasvoice

Spark MLlib Decision Tree Pruning: Pre-pruning, Post-prun...

This article systematically introduces decision tree pre-pruning and post-pruning principles, compares core differences between three mainstream algorithms...

Spark MLlib Decision Tree: Classification Principles, Gin...

This article introduces the basic concepts, classification principles, and classification principles of decision trees. Decision tree is a non-linear...

Spark MLlib Logistic Regression: Input Function, Sigmoid,...

This article introduces the basic principles, application scenarios, and implementation in Spark MLlib of logistic regression. Logistic regression is an efficient binary classification algorithm wi...

Spark MLlib Linear Regression: Scenarios, Loss Function a...

Linear regression uses regression equations to model relationships between independent and dependent variables. This article covers regression scenarios (house...

Big Data #268: Real-time Warehouse ODS Layer - Writing Ka...

Writing dimension tables (DIM) from Kafka typically involves reading real-time or batch data from Kafka topics and updating dimension tables based on the data...

Big Data #269: Real-time Warehouse DIM, DW and ADS Layer ...

DW (Data Warehouse layer) is built from DWD, DWS, and DIM layer data, completing data architecture and integration, establishing consistent dimensions, and...

Spark MLlib Logistic Regression: Sigmoid, Loss Function a...

Logistic regression is a classification model in machine learning — an efficient binary classification algorithm widely used in ad click-through rate...

Big Data #266: Canal Integration with Kafka - Real-time D...

This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog. Demonstrates how to integrate...

Realtime Warehouse - ODS Lambda Architecture Kappa Archit...

In internet companies, common ODS data includes business log data (Log) and business DB data. For business DB data, collecting data from relational databases...

Spark MLlib Linear Regression: Scenarios, Loss Function a...

Linear Regression is an analytical method that uses regression equations to model the relationship between one or more independent variables and a dependent...

Canal Deployment: Installation, Service Startup and Commo...

Canal is an open-source data synchronization tool from Alibaba for MySQL database incremental log parsing and synchronization. It simulates the MySQL slave...

Canal Working Principle: Workflow and MySQL Binlog Introd...

Canal is an open-source tool for MySQL database binlog incremental subscription and consumption, primarily used for data synchronization and distributed...

MySQL Binlog Deep Dive: Storage Directory, Change Records...

MySQL's Binary Log (binlog) is a log file type in MySQL that records all change operations performed on the database (excluding SELECT and SHOW queries). It is...

Canal Data Sync: Introduction, Background, Principles and...

Alibaba B2B's cross-region business between domestic sellers and overseas buyers drove the need for data synchronization between Hangzhou and US data centers.

Spring In-Depth: AOP Aspect Enhancement Core Concepts Pro...

In-depth introduction to Spring AOP aspect enhancement covering core concepts, related terminology, and proxy configuration with practical examples.

Realtime Warehouse - Business Database Table Structure: T...

Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput,...

Spring In-Depth: IoC Container Circular Dependency Protot...

In-depth analysis of Spring IoC container system covering circular dependency, prototype beans, prototype scope, and lazy ObjectFactory with solutions.

Real-time Data Warehouse: Background, Architecture, Requi...

Real-time data processing capability has become a key competitive factor for enterprises. Initially, each new requirement spawned a separate real-time task,...

Spring In-Depth: IoC Container System BeanFactory Analysi...

In-depth analysis of Spring IoC container system covering BeanFactory process analysis and Bean lazy loading mechanism with practical examples.

Apache Griffin Configuration: pom.xml, sparkProperties an...

Apache Griffin is an open-source data quality management framework designed to help organizations monitor and improve data quality in big data environments.