1. International Big Data Development History
Beginning 1997
| Year | Milestone |
|---|---|
| 1997 | NASA first proposed “big-data” term |
| 2001 | Gartner proposed “3V” model (Volume, Variety, Velocity) |
| 2003 | Google published GFS paper (distributed file system) |
| 2004 | Google published MapReduce paper |
| 2006 | Google published Bigtable paper |
| 2005 | Hadoop framework born (Doug Cutting) |
Turning Point 2008
- Hadoop officially became Apache top project
- Facebook data processing reached 15PB/month
- Ecosystem formed:
- Storage: HBase, Cassandra
- Processing: Hive, Pig, Spark
- Collection: Flume, Sqoop
- Scheduling: ZooKeeper, Ozzie
- Machine learning: Mahout
Mainstream After 2011
| Year | Milestone |
|---|---|
| 2011 | Apache Kafka open sourced |
| 2012 | Apache Spark launched (memory computing 100x improvement) |
| 2014 | Spark became Apache top project |
Diversification 2015
-
Computing framework diversification:
- Batch processing: Hadoop MapReduce, Spark
- Interactive analysis: Presto, Impala
- Real-time stream computing: Spark Streaming, Flink, Storm
-
Market size: $10.3 billion in 2013 → $193.1 billion in 2019
2. Domestic Big Data Industry Development
Chinese Enterprise Open Source Contributions
- Apache Kylin: Led by eBay China team, 2015 first Apache top project led by Chinese team
- Apache Flink: Alibaba deeply involved from 2016, contributed over 50% of code
Localized Platforms
- Alibaba MaxCompute: Processes EB-level data daily, processes over 100PB during Double 11
- Huawei FusionInsight: Supports PB-level data management, thousand-node clusters
Future Outlook
- “East Data West Computing” project promotion
- Data factor market cultivation
- From “following” to “running alongside”
Technology Evolution Summary
| Stage | Time | Core Technology |
|---|---|---|
| Concept formation | 1997-2005 | 3V model, GFS, MapReduce |
| Open source ecosystem | 2005-2012 | Hadoop ecosystem |
| Memory computing | 2012-2014 | Spark |
| Diversification | 2015-present | Batch-stream integration, cloud-native, real-time data warehouse |