I. Financial Industry
Industry Characteristics and Data Requirements
- PB-level data generated daily
- Transaction data: hundreds of millions of transactions per day
- Customer data: hundreds of millions of users
- Market data: millisecond-level Tick data
Core Business Scenarios
| Scenario | Technology | Effect |
|---|---|---|
| Credit scoring | Logistic Regression/XGBoost | Integrate PBOC credit data |
| Real-time fraud detection | Kafka+Flink+Redis | Latency <100ms |
| Intelligent investment advisory | KDB+Time-series database | Nanosecond-level market analysis |
| Claims automation | Image recognition | 92% accuracy |
Technical Architecture
- Data source → Kafka → Flink → Spark → Redis/Elasticsearch → Hadoop → BI
Talent Demand
| Position | Skills | Salary |
|---|---|---|
| Data Engineer | Flink/Spark, SQL | 400K-800K CNY |
| Quantitative Analyst | Python, Financial Engineering | 600K-1.2M CNY |
| Risk Model Expert | Machine Learning | 500K-1M CNY |
II. E-commerce Industry
Data Sources
- Page browsing轨迹, search keywords, purchase records, review feedback
- TB-PB level data scale
Core Applications
| Application | Technology | Effect |
|---|---|---|
| Recommendation system | Collaborative filtering | 亿级商品毫秒级筛选 |
| Inventory management | Demand forecasting | Stockout rate <5% |
| Precision marketing | Real-time bidding | 500K+ times/second |
Technical Architecture
- Real-time layer: Kafka+Flink (Double 11 peak QPS 100M+)
- Offline layer: Hadoop/MaxCompute (EB level/day)
- Storage: Redis hot data + HBase user profiles + ES product search
III. Technology and Internet
Main Categories
| Type | Representative Products | Core Technology |
|---|---|---|
| Social | Facebook, WeChat | Social graphs, content recommendation |
| Search | Google, Baidu | NLP models, real-time indexing |
| Video | YouTube, Douyin | Multi-objective recommendation, AI moderation |
| O2O | Uber, Meituan | Intelligent dispatching, dynamic pricing |
Typical Tech Stack
- Storage: HDFS/S3 + HBase/Cassandra
- Compute: Spark/Flink + Presto
- Scheduling: Airflow/Dagster
IV. Other Industries
Communications
- Network optimization, package design, customer churn prediction, 5G network planning
Manufacturing
- Predictive maintenance, quality tracking, supply chain optimization, digital twins
Healthcare
- Precision medicine, new drug development, epidemic monitoring, medical resource optimization
Education
- Personalized learning, teaching effectiveness evaluation, campus safety management
V. Technology Development Trends
Technology Stack Evolution
| Phase | Technology |
|---|---|
| Early | Hadoop distributed computing |
| Middle | Spark in-memory acceleration |
| Current | Flink real-time stream processing |
| Future | Lakehouse unified architecture |
Emerging Directions
- Lakehouse: Data lake + data warehouse
- Federated learning: Cross-institutional modeling with privacy protection
- Data Fabric: Unified metadata management
- Edge computing: Proximate processing to reduce latency