CloudStor: Data Intensive Computing in the Cloud

The objective of the CloudStor project is to explore new strategies and technologies for data-intensive cloud computing; investigate application profiles that benefit from this paradigm; and, develop corresponding applications. The CloudStor group is interested in evaluating the performance and price/performance of alternative, dynamic strategies for provisioning data intensive applications based on parallel database systems versus Hadoop.

We are currently investigating applications involving remote-sensed LiDAR data, in conjunction with the OpenTopography project. These applications allows users to (i) subset remote sensing data (stored as “point cloud” data sets), (ii) process the data subsets in multiple steps, using various algorithms, and (iii) visualize the output. The project is running a series of performance evaluation experiments. Cloud platforms with thousands of processors and access to hundreds of terabytes of storage provide a natural environment for implementing OpenTopography processing routines, which are highly data-parallel in nature.

Our studies will contribute towards the understanding of performance tradeoffs and feasibility in dynamic provisioning strategies for serving large scientific data sets. A possible outcome is a reassessment of how data archives are implemented and how data sets are served to a broad user community using on-demand and dynamic approaches for provisioning data sets, as opposed to the current static approach. Cloud-based implementations can be made available to the user community via a Services Oriented Architecture (SOA), such as that employed at the GEON Portal, thereby bringing the benefits of massively-scaled computing resources to a large community of users.

For our studies, we have access to the UCSD Triton Resource, the FutureGrid platform, the Google-IBM CluE cluster, and Amazon AWS, in addition to small in-house clusters for software development and testing.

Principal Investigators: 

Chaitan Baru
Co-Investigator: Sriram Krishnan

Past ACID Participants: 
Funding Source: 

National Science Foundation (NSF) (Award# IIS-0844530)
SDSC Triton Research Opportunities (TRO)
Amazon Web Service (AWS) in Education

Technologies: