Chaitan Baru

Chaitan Baru's picture

(Currently on assignment as Senior Advisor for Data Science in the Computer and Information Science & Engineering Directorate at the National Science Foundation)

Chaitan Baru is a Distinguished Scientist and Associate Director of Data Initiatives at the San Diego Supercomputer Center (SDSC), UC San Diego where he works on applied and applications-oriented research problems related to data management and data analytics. He has participated in a number of "data cyberinfrastructure" initiatives, including as Principal Investigator (PI) of the OpenTopography project; Cyberinfrastructure Lead, Tropical Ecology, Assessment and Monitoring network (TEAM); Co-Investigator of the Cyberinfrastructure for Comparative Effectiveness Research project (CYCORE); Member of the founding Senior Management Team of the National Ecologial Observatory Network (NEON) and Co-PI of the NEON Cyberinfrastructure Testbed; Co-PI of the CUAHSI Hydrologic Information Systems (CUAHSI-HIS); Director, NEES Cyberinfrastructure Center (NEESit); PI/Project Director, Geosciences Network (GEON); and member of the How Much Information? project.

Baru leads the Advanced Cyberinfrastructure Development (ACID) Group at SDSC and is also Director of the Center for Large-scale Data Systems research (CLDS).

Prior to joining SDSC in 1996, Baru was at IBM, where he led one of the development teams for DB2 Parallel Edition Version 1 (released Dec 1995); and at the University of Michigan, where he served on the faculty of the EECS Department. He received his B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras, and M.E. and Ph.D. in Electrical Engineering from the University of Florida, Gainesville.


2014 -             Senior Advisor for Data Science, Computer and Information Science and Engineering Directorate, National Science Foundation.

2013 –             Associate Director, Data Initiatives, San Diego Supercomputer Center, UC San DIego.

2011 –             Director, Center for Large-scale Data Systems research (CLDS), SDSC.

2007 –             Distinguished Scientist and Director, Advanced Cyberinfrastructure Development (ACID) Group, SDSC.

2004 – 2007    Division Director, Science R&D Division, SDSC.

2004 -             Member of SDSC Senior Management Team.

2004 –             Member, California Institute for Information Technology and Telecommunication, Calit2, now Qualcomm Institute.

2001 – 04        Co-Director, Data and Knowledge Systems Program, SDSC.

2000 – 01        Assistant Director, Data Intensive Computing Environments (DICE) group.

1996–2000      Senior Principal Scientist, Data Intensive Computing Environments (DICE) group, SDSC, General Atomics.

1992 – 95        Advisory Programmer, Database Technology Institute, IBM Almaden Research Labs, San Jose, CA (1995). Advisory Development Analyst and Group Lead, Database Technology Group, IBM Toronto Labs (1992-95).

1985 – 92        Assistant Professor, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor.

  • Member of Advanced Computer Architecture Lab (ACAL)
  • Member of Software Systems Research Lab (SSRL)
  • Member, Executive Committee, Univ. of Michigan Human Genome Center, led by Dr. Francis Collins.

Cyberinfrastructure Leadership Activities

  1. Project Director, The Geosciences Network (GEON), 2002-2010. PI of a large NSF Information Technology Research (ITR) grant involving 12 collaborating institutions. The project was renewed as GEON 2.0 and also resulted in another spinoff activity, viz.,
  2. Director of Cyberinfrastructure for NSF National Earthquake Engineering Simulations (NEESit), 2007-2009. Initially served as the NEESit Cyberinfrastructure Advisor, 2006.
  3. Cyberinfrastructure Lead/PI for the Tropical Ecology, Assessment and Monitoring Network (TEAM), 2007-present. TEAM is a project managed by Conservation International, originally funded by the Moore Foundation.
  4. Member of Senior Management Team, NSF National Ecological Observatory Network (NEON,, 2005-2007. Cyberinfrastructure Lead for the NEON Testbed, and co-PI of the NEON Cyberinfrastructure Diagnostic Testbed.
  5. Lead, KatrinaSafe Database Project. Worked in close collaboration with American Red Cross to develop KatrinaSafe, a database to assist victims of Hurricane Katrina. This led to the development of DisasterSafe, hosted at SDSC—a standard service offered by Red Cross for victims of any disaster.
  6. Member, IRIS Data Management System Standing Committee, 2007-2009. IRIS is the NSF data archives for seismological data.
  7. Co-Director, National Laboratory for Advanced Data Research (NLADR, Joint activity with the National Computational Science Alliance (NCSA) with Dr. Michael Welge as the other co-Director.
  8. Executive Director, SDSC/Calit2 Synthesis Center (, 2005-2008. Joint facility consisting of SDSC staff and equipment located at Calit2.
  9. Member of Cyberinfrastructure Advisory Committee, Long-Term Ecological Research Network (LTER,, 2006.
  10. Member of Advisory Board, CLEANER Project Office, 2005-2006.
  11. Co-Convener, NSF Earth Science CyberInfrastructure (ES-CI) Task Force (with Lee Allison and Tom Jordan), 2004.
  12. SDSC PI for CUAHSI Hydrologic Information System (, 2004-2008.
  13. Member of Leadership Team, Biomedical Informatics Research Network (BIRN,, 2001-2004. One of the co-Investigators of the original BIRN Coordinating Center (BIRN-CC).


  1. CARTA: Cyberinfrastructure and bioinformatics lead for the UCSD / Salk Institute-led ORU Center for Advanced Research and Training in Anthropogeny led by Profs. Ajit Varki, Margaret Schoeninger, and Rusty Gage (Salk). Funded by the Mathers Foundation. Duration: 2007—ongoing.
  2. CYCORE: Co-PI of Cyberinfrastructure for Comparative Effectiveness Research project funded by NIH. Project is led by Dr. Kevin Patrick (SOM & Calit2) in collaboration with M.D. Anderson Cancer Center, Houston. Duration: October 2009—September 2010.
  3. CISA3: Co-PI with Profs. Tom Levy and Falko Kuester of the Mediterranean Archaeology Network (MedArchNet). Funded by the UCSD ChancellorÕs Collaboratory initiative, for 2009-2010 academic year.
  4. 911: PI of NSF-funded project on Spatiotemporal Analysis of 911 Call Stream Data, with Prof. William Hodgkiss, SIO, as co-PI. Duration: 2004-2008.
  5. Hazards: Coordinated a hazards initiative on campus with funding from OVCR, JSOE, SIO, and SDSC. Pre-proposal on Cyberinfrastructure Center for Urgent Response to Emergencies (CICURE) submitted to the NSF STC program (not selected). Other proposal planning activities are under way.
  6. WIISARD: Co-investigator (with Prof. Leslie Lenert as PI) on the original Wireless Internet Information System for Medical Response in Disasters project. Funded by NIH, 2005-2007. Was responsible for the data management component.
  7. BIRN: Collaborated with Prof. Mark Elisman as co-Investigator on the original BIRN-CC project, funded by NIH, 2001. Was responsible for the data integration component.
  8. I2T: PI of the NSF-funded Information Integration Testbed (I2T) project with Prof. Yannis Papakonstantinou (CSE) as co-PI. Duration: 2002-2004.
  9. UC-SGH: Co-PI on a proposal for a Center of Excellence on Disasters to the UC School of Global Health with Prof. Craig Van Dyke, UCSF (PI), and Profs. Gretchen Kalonji (UCOP) and Nicholas Sitar (UCB). (not selected).
  10. RISC MRU: Co-PI on a multi-campus research unit (MRU) proposal on Rapid Information for Science during Catastrophes (RISC), led by Prof. Emily Brodsky, UCSC. (not selected).

Software Development

  1. One of the group leaders and developers of IBM DB2 Parallel Edition Version 1.0, released commercially in December 1995.
  2. One of the designers of the SDSC Storage Resource Broker (SRB). Version 1 was released in September 1997.
  3. One of the designers of the Data Integration Cartª technology for ontology-based data integration (invention disclosure filed: 2007).

U.S. Patents

  1. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 7,349,915, March 25, 2008. Licensed to Nirvana Storage.
  2. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 6,963,875, November 8, 2005. Licensed to Nirvana Storage.
  3. System and method for construction, storage, and transport of presentation-independent multimedia content, C. Baru, J. Chase, T. Elvins, R. Fassett, E. Nebel, Patent No. 7,028,252, March 22, 2001. Assigned to Oracle Corporation.
  4. Method and apparatus for achieving uniform data distribution in a parallel database system, C. Baru and F. Koo, Patent No.US5970495, IBM, Oct.19, 1999.
  5. Method and apparatus for implementing partial declustering in a parallel database system, C. Baru, G. Fecteau, J. Kirton, L. Kollar, F. Koo, Patent No. US5878409, IBM, March 2, 1999.

Ph.D. Committees Chaired

  • Ophir Frieder. Dissertation title: "Database processing on a cube-connected multicomputer system," EECS Dept., University of Michigan, Dec. 1987.  Recipient of IBM Graduate Fellowship Award. Currently, Chaired Professor, Georgetown University, Washington, DC.
  • Piyush Goel. Dissertation title: "Dataflow query processing and optimization," EECS Dept., University of Michigan, May 1992. Co-Founder of, San Jose, CA.
  • Sriram Padmanabhan. Dissertation title: "Data placement in shared-nothing parallel database systems," EECS Dept., University of Michigan, July 1992. Recipient of IBM Graduate Fellowship Award. Currently, Distinguished Engineer, IBM Silicon Valley Labs.

Committee Memberships

  • Co-chair, SPEC Research Group on Big Data Benchmarking, 2014 - .
  • Lead, TeraGrid Data Working Group, 20012002.
  • Member of Review Committee, Canada Research Chairs program, Natural Sciences and Engineering Research Council (NSERC) of Canada, 20002002.
  • Member of the Architecture Working Group, California Digital Library, University of California, Office of the President, Oakland, CA, 1998–2000.
  • Member of the Grants Selection Committee (GSC) for Computer and Information Sciences, Natural Sciences and Engineering Research Council (NSERC) of Canada, 1994-97.  (The GSC is responsible for annually reviewing grant proposals from computer science faculty in Canada and making funding decisions).
  • IBM representative on the Transaction Processing Council's TPC-D Benchmark Standard Subcommittee, 1993-95. Participated in drafting the original TPC-D specification.
Parallel database systems; Scientific data management; Big Data Benchmarking.