List of current projects that researchers in the ACID group are involved in.
Selected Research Areas
Big Data Benchmarking
End-to-end application-layer benchmark for big data applications to enable ranking of big data systems according to a well-defined, verifiable/audited performance metric, with an accompanying efficiency metric.
The ACID group at SDSC keeps close tabs on the latest advances in compute hardware, memory, storage, and networking, along with the latest techniques to manage data and computation. We benchmark systems to look deeply into how they work, where they run into bottlenecks, and how to improve their performance.
Technical expertise on industry standard benchmarks such as TPC, SpecWEB, and VMmark.
Fig: Benchmarking of array operations on Intel’s Xeon Phi show the impact of cache and memory latency (the stair steps) and the processor’s prefetcher (zig-zag lines under the stair steps).
- Baru C, Bhandarkar M, Curino C, Danisch M, Frank M, Gowda B, Jacobsen H-A, Jie H, Kumar D, Nambiar R et al.. 2015. Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data. Performance Characterization and Benchmarking. Traditional to Big Data. 8904:44-63.
- Baru C, Bhandarkar M, Curino C, Danisch M, Frank M, Gowda B, Huang J, Jacobsen H-A, Kumar D, Nambiar R et al.. 2014. An Analysis of the BigBench Workload. Sixth TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC 2014) in conjunction with VLDB 2014.
- Rabl T, Poess M, Baru C, Jacobsen H-A. 2014. Specifying Big Data Benchmarks. Lecture Notes in Computer Science. 8163
- Baru C, Bhandarkar M, Nambiar R, Poess M, Rabl T. 2013. Benchmarking Big Data Systems and the BigData Top100 List. Big Data. 1:60–64.
Exploration of new strategies and technologies for data-intensive cloud computing; investigate application profiles that benefit from this paradigm; and, develop corresponding applications.
Software Tools: Open Earth Framework
SDSC’s Open Earth Framework visualization tools integrate world terrain maps with earthquake epicenter logs and subsurface models for the western US to reveal correlations between observed events and subsurface structures.
Fig: Open Earth Framework
Use Case: Data integration in the Geosciences
Many scientific discoveries today are a result of collaborations between researchers sharing data and resources. By allowing scientists to share data and tools (services) via the web, we can enable interactions between a larger group of researchers working on a common problem.
A data integration framework in the realm of the geosciences was developed with the goal of responding to the pressing need in the geosciences to interlink and share multi-disciplinary datasets to understand the complex dynamics of Earth systems. Creating a infrastructure to integrate, analyze, and model geoscientific data poses many challenges due to the extreme heterogeneity of geoscience data formats, storage and computing systems and, most importantly, the ubiquity of differing conventions, terminologies, and ontological frameworks across disciplines. The data integration framework is distinct from other efforts in scientific data management due to:
Resource registration strategy
Our solution requires data and service providers to register their resources with the framework. Instead of explicitly mapping resources to each other as is done in mediation systems, we implicitly map the sources to common metadata framework by describing each resource using the 4tuple, (Metadata descriptions, Ontology mappings, Spatial extent, Temporal extent). The 4tuple converts each resource into a point in a 4D space and thereby enables efficient discovery of resources using queries formulated over the 4D space. The ontology mappings become useful in overcoming heterogeneity in local schemas.
The framework we have developed contains capabilities of both a data warehouse (data providers can store their datasets) and a data mediation system (users can design views spanning multiple distributed databases). Furthermore, by supporting integration of both data and services, our framework provides the unique capability to perform both data and application driven integration.