1. International Big Data Development History

Beginning 1997

YearMilestone
1997NASA first proposed “big-data” term
2001Gartner proposed “3V” model (Volume, Variety, Velocity)
2003Google published GFS paper (distributed file system)
2004Google published MapReduce paper
2006Google published Bigtable paper
2005Hadoop framework born (Doug Cutting)

Turning Point 2008

  • Hadoop officially became Apache top project
  • Facebook data processing reached 15PB/month
  • Ecosystem formed:
    • Storage: HBase, Cassandra
    • Processing: Hive, Pig, Spark
    • Collection: Flume, Sqoop
    • Scheduling: ZooKeeper, Ozzie
    • Machine learning: Mahout

Mainstream After 2011

YearMilestone
2011Apache Kafka open sourced
2012Apache Spark launched (memory computing 100x improvement)
2014Spark became Apache top project

Diversification 2015

  • Computing framework diversification:

    • Batch processing: Hadoop MapReduce, Spark
    • Interactive analysis: Presto, Impala
    • Real-time stream computing: Spark Streaming, Flink, Storm
  • Market size: $10.3 billion in 2013 → $193.1 billion in 2019

2. Domestic Big Data Industry Development

Chinese Enterprise Open Source Contributions

  • Apache Kylin: Led by eBay China team, 2015 first Apache top project led by Chinese team
  • Apache Flink: Alibaba deeply involved from 2016, contributed over 50% of code

Localized Platforms

  • Alibaba MaxCompute: Processes EB-level data daily, processes over 100PB during Double 11
  • Huawei FusionInsight: Supports PB-level data management, thousand-node clusters

Future Outlook

  • “East Data West Computing” project promotion
  • Data factor market cultivation
  • From “following” to “running alongside”

Technology Evolution Summary

StageTimeCore Technology
Concept formation1997-20053V model, GFS, MapReduce
Open source ecosystem2005-2012Hadoop ecosystem
Memory computing2012-2014Spark
Diversification2015-presentBatch-stream integration, cloud-native, real-time data warehouse