I. Financial Industry

Industry Characteristics and Data Requirements

  • PB-level data generated daily
  • Transaction data: hundreds of millions of transactions per day
  • Customer data: hundreds of millions of users
  • Market data: millisecond-level Tick data

Core Business Scenarios

ScenarioTechnologyEffect
Credit scoringLogistic Regression/XGBoostIntegrate PBOC credit data
Real-time fraud detectionKafka+Flink+RedisLatency <100ms
Intelligent investment advisoryKDB+Time-series databaseNanosecond-level market analysis
Claims automationImage recognition92% accuracy

Technical Architecture

  • Data source → Kafka → Flink → Spark → Redis/Elasticsearch → Hadoop → BI

Talent Demand

PositionSkillsSalary
Data EngineerFlink/Spark, SQL400K-800K CNY
Quantitative AnalystPython, Financial Engineering600K-1.2M CNY
Risk Model ExpertMachine Learning500K-1M CNY

II. E-commerce Industry

Data Sources

  • Page browsing轨迹, search keywords, purchase records, review feedback
  • TB-PB level data scale

Core Applications

ApplicationTechnologyEffect
Recommendation systemCollaborative filtering亿级商品毫秒级筛选
Inventory managementDemand forecastingStockout rate <5%
Precision marketingReal-time bidding500K+ times/second

Technical Architecture

  • Real-time layer: Kafka+Flink (Double 11 peak QPS 100M+)
  • Offline layer: Hadoop/MaxCompute (EB level/day)
  • Storage: Redis hot data + HBase user profiles + ES product search

III. Technology and Internet

Main Categories

TypeRepresentative ProductsCore Technology
SocialFacebook, WeChatSocial graphs, content recommendation
SearchGoogle, BaiduNLP models, real-time indexing
VideoYouTube, DouyinMulti-objective recommendation, AI moderation
O2OUber, MeituanIntelligent dispatching, dynamic pricing

Typical Tech Stack

  • Storage: HDFS/S3 + HBase/Cassandra
  • Compute: Spark/Flink + Presto
  • Scheduling: Airflow/Dagster

IV. Other Industries

Communications

  • Network optimization, package design, customer churn prediction, 5G network planning

Manufacturing

  • Predictive maintenance, quality tracking, supply chain optimization, digital twins

Healthcare

  • Precision medicine, new drug development, epidemic monitoring, medical resource optimization

Education

  • Personalized learning, teaching effectiveness evaluation, campus safety management

Technology Stack Evolution

PhaseTechnology
EarlyHadoop distributed computing
MiddleSpark in-memory acceleration
CurrentFlink real-time stream processing
FutureLakehouse unified architecture

Emerging Directions

  • Lakehouse: Data lake + data warehouse
  • Federated learning: Cross-institutional modeling with privacy protection
  • Data Fabric: Unified metadata management
  • Edge computing: Proximate processing to reduce latency