Big Data Top 100

The Big Data Top 100 List initiative is an open community-based effort for benchmarking big data systems.

The objective is to develop an end-to-end application-layer benchmark for big data applications to enable ranking of big data systems according to a well-defined, verifiable/audited performance metric, with an accompanying efficiency metric.

With “big data” becoming a major force of innovation across enterprises of all sizes, new platforms for managing big data sets are being announced with some regularity, with increasingly more features. The Big Data Top 100 initiative is interested in developing metrics to enable comparability among such platforms.


The BigData Top100 List is a new collaborative effort between academia and industry to develop definitions and specifications for big data benchmarking and support publication of benchmark results, blending approaches from high performance computing ( and transaction processing and database systems (Transaction Processing Council, TPC).

The activity was launched by the Center for Large-Scale Data Systems Research (CLDS), San Diego Supercomputer Center (SDSC), University of California San Diego, which organized the First Workshop on Big Data Benchmarking (WBDB2012) - supported by the National Science Foundation and industry sponsorship - held in May 2012 in San Jose, CA. The Second WBDB meeting was held in December 2012 in Pune, India. The Third WBDB meeting will be held in July 2013 in Xi'an, China.

An initial board of directors has been formed to steer this activity, coordinated by the San Diego Supercomputer Center, UC San Diego.

Board of Directors

  • Chaitanya Baru, San Diego Supercomputer Center
  • Milind Bhandarkar, Pivotal
  • Dhruba Borthakur, Facebook
  • Kshitij Doshi, Intel
  • Eyal Gutkind, Mellanox
  • Jian Li, IBM
  • Raghunath Nambiar, Cisco
  • Ken Osterberg, Seagate
  • Scott Pearson, Brocade
  • Meikel Poess, Oracle
  • Tilmann Rabl, University of Toronto
  • Richard Treadway, NetApp
  • Jerry Zhao, Google

An article on the BigData Top100 List initiative was published in the inaugural, March 2013 issue of the Big Data Journal [ PDF ]. The initiative will be presented at the Strata Conference in Santa Clara, CA on February 28, 2013.

Interested parties are encouraged to contact one of the board members. You may also join the Big Data Benchmarking Community (BDBC) to participate in biweekly video seminars. Click on the Join tab. The seminar schedule and presentations from prior seminars are available from



  • Overview paper: Setting the Direction for Big Data Benchmark Standards by C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, and T. Rabl, was published in Selected Topics in Performance Evaluation and Benchmarking, Lecture Notes in Computer Science, Volume 7755, 2013, pp 197-208. [ Abstract ] [ Full Paper ]
  • Presentations on BigData Top100 List initiative:
    • Bhandarkar and Baru at the Strata Conference, February 26-28, 2103, Santa Clara, CA [pdf attached]
    • Baru, Database Seminar, CSE Dept, UC San Diego, March 1, 2013 [ pdf ]

Benchmark specifications currently under consideration

  • Data Analytics Pipeline

  • BigBench

    Proposal to extend TPC-DS specification to include unstructured and semi-structured data; modify the TPC-DS query set to include operations on these data; and incorporate data mining procedures in some of the queries. A data model for BigBench was proposed in the First WBDB workshop by Ghazal [ PDF ]. This was expanded with a set of associated queries at the Second WBDB workshop by Ghazal et al [ PDF ].

Related Links

Benchmark initiatives