Chaitan Baru

Chaitan Baru's picture


Chaitan Baru is Senior Advisor, Data Science Research Initiatives, University of California San Diego, with the Office of Research Affairs, San Diego Supercomputer Center, and the Halicioglu Data Science Institute.

From August 2014- August 2018, he served as the first Senior Advisor for Data Science in the Computer and Information Science and Engineering Directorate (CISE) at the National Science Foundation (NSF), Alexandria, VA, where he co-chaired the NSF Harnessing the Data Revolution Big Idea; played a leadership role in the NSF BIGDATA program; advised the NSF Big Data Regional Innovations Hubs and TRIPODS programs. He was instrumental in establishing the partnership between the NSF BIGDATA program and the public cloud providers--AWS, Google, Microsoft, beginning in 2017. IBM joined this partnership in 2018.

Prior to joining NSF, Baru was Associate Director of Data Initiatives at the San Diego Supercomputer Center (SDSC). He established the Advanced Cyberinfrastructure Development lab at SDSC, and also created the Center for Large-scale Data Systems Research (CLDS). His research interests are in translational data science and related areas, which include applied and applications-oriented research in data management and data analytics.

He has led as well as participated in a number of data cyberinfrastructure initiatives, including as Principal Investigator (PI) of the OpenTopography project; Cyberinfrastructure Lead, Tropical Ecology, Assessment and Monitoring network (TEAM); Co-Investigator of the Cyberinfrastructure for Comparative Effectiveness Research project (CYCORE); Member of the founding Senior Management Team of the National Ecologial Observatory Network (NEON) and Co-PI of the NEON Cyberinfrastructure Testbed; Co-PI of the CUAHSI Hydrologic Information Systems (CUAHSI-HIS); Director, NEES Cyberinfrastructure Center (NEESit); PI/Project Director, Geosciences Network (GEON); and member of the How Much Information? project.

Prior to joining SDSC in 1996, Baru was at IBM, where he led one of the development teams for DB2 Parallel Edition Version 1 (released Dec 1995); and at the University of Michigan, where he served on the faculty of the EECS Department. He received his B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras, and M.E. and Ph.D. in Electrical Engineering from the University of Florida, Gainesville.


2018 -              Senior Advisor, Data Science Research Initiatives, UC San Diego

2014 -  2018    Senior Advisor for Data Science, Computer and Information Science and Engineering Directorate, National Science Foundation.

2013 –  2018          Associate Director, Data Initiatives, San Diego Supercomputer Center, UC San DIego.

2011 –             Director, Center for Large-scale Data Systems research (CLDS), SDSC.

2007 –             Distinguished Scientist and Director, Advanced Cyberinfrastructure Development (ACID) Group, SDSC.

2004 – 2007    Division Director, Science R&D Division, SDSC.

2004 -             Member of SDSC Senior Management Team.

2004 –             Member, California Institute for Information Technology and Telecommunication, Calit2, now Qualcomm Institute.

2001 – 04        Co-Director, Data and Knowledge Systems Program, SDSC.

2000 – 01        Assistant Director, Data Intensive Computing Environments (DICE) group.

1996–2000      Senior Principal Scientist, Data Intensive Computing Environments (DICE) group, SDSC, General Atomics.

1992 – 95        Advisory Programmer, Database Technology Institute, IBM Almaden Research Labs, San Jose, CA (1995). Advisory Development Analyst and Group Lead, Database Technology Group, IBM Toronto Labs (1992-95).

1985 – 92        Assistant Professor, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor.

  • Member of Advanced Computer Architecture Lab (ACAL)
  • Member of Software Systems Research Lab (SSRL)
  • Member, Executive Committee, Univ. of Michigan Human Genome Center, led by Dr. Francis Collins.

Cyberinfrastructure Leadership Activities

  1. Project Director, The Geosciences Network (GEON), 2002-2010. PI of a large NSF Information Technology Research (ITR) grant involving 12 collaborating institutions. The project was renewed as GEON 2.0 and also resulted in another spinoff activity, viz.,
  2. Director of Cyberinfrastructure for NSF National Earthquake Engineering Simulations (NEESit), 2007-2009. Initially served as the NEESit Cyberinfrastructure Advisor, 2006.
  3. Cyberinfrastructure Lead/PI for the Tropical Ecology, Assessment and Monitoring Network (TEAM), 2007-present. TEAM is a project managed by Conservation International, originally funded by the Moore Foundation.
  4. Member of Senior Management Team, NSF National Ecological Observatory Network (NEON,, 2005-2007. Cyberinfrastructure Lead for the NEON Testbed, and co-PI of the NEON Cyberinfrastructure Diagnostic Testbed.
  5. Lead, KatrinaSafe Database Project. Worked in close collaboration with American Red Cross to develop KatrinaSafe, a database to assist victims of Hurricane Katrina. This led to the development of DisasterSafe, hosted at SDSC—a standard service offered by Red Cross for victims of any disaster.
  6. Member, IRIS Data Management System Standing Committee, 2007-2009. IRIS is the NSF data archives for seismological data.
  7. Co-Director, National Laboratory for Advanced Data Research (NLADR, Joint activity with the National Computational Science Alliance (NCSA) with Dr. Michael Welge as the other co-Director.
  8. Executive Director, SDSC/Calit2 Synthesis Center (, 2005-2008. Joint facility consisting of SDSC staff and equipment located at Calit2.
  9. Member of Cyberinfrastructure Advisory Committee, Long-Term Ecological Research Network (LTER,, 2006.
  10. Member of Advisory Board, CLEANER Project Office, 2005-2006.
  11. Co-Convener, NSF Earth Science CyberInfrastructure (ES-CI) Task Force (with Lee Allison and Tom Jordan), 2004.
  12. SDSC PI for CUAHSI Hydrologic Information System (, 2004-2008.
  13. Member of Leadership Team, Biomedical Informatics Research Network (BIRN,, 2001-2004. One of the co-Investigators of the original BIRN Coordinating Center (BIRN-CC).


  1. CARTA: Cyberinfrastructure and bioinformatics lead for the UCSD / Salk Institute-led ORU Center for Advanced Research and Training in Anthropogeny led by Profs. Ajit Varki, Margaret Schoeninger, and Rusty Gage (Salk). Funded by the Mathers Foundation. Duration: 2007—ongoing.
  2. CYCORE: Co-PI of Cyberinfrastructure for Comparative Effectiveness Research project funded by NIH. Project is led by Dr. Kevin Patrick (SOM & Calit2) in collaboration with M.D. Anderson Cancer Center, Houston. Duration: October 2009—September 2010.
  3. CISA3: Co-PI with Profs. Tom Levy and Falko Kuester of the Mediterranean Archaeology Network (MedArchNet). Funded by the UCSD ChancellorÕs Collaboratory initiative, for 2009-2010 academic year.
  4. 911: PI of NSF-funded project on Spatiotemporal Analysis of 911 Call Stream Data, with Prof. William Hodgkiss, SIO, as co-PI. Duration: 2004-2008.
  5. Hazards: Coordinated a hazards initiative on campus with funding from OVCR, JSOE, SIO, and SDSC. Pre-proposal on Cyberinfrastructure Center for Urgent Response to Emergencies (CICURE) submitted to the NSF STC program (not selected). Other proposal planning activities are under way.
  6. WIISARD: Co-investigator (with Prof. Leslie Lenert as PI) on the original Wireless Internet Information System for Medical Response in Disasters project. Funded by NIH, 2005-2007. Was responsible for the data management component.
  7. BIRN: Collaborated with Prof. Mark Elisman as co-Investigator on the original BIRN-CC project, funded by NIH, 2001. Was responsible for the data integration component.
  8. I2T: PI of the NSF-funded Information Integration Testbed (I2T) project with Prof. Yannis Papakonstantinou (CSE) as co-PI. Duration: 2002-2004.
  9. UC-SGH: Co-PI on a proposal for a Center of Excellence on Disasters to the UC School of Global Health with Prof. Craig Van Dyke, UCSF (PI), and Profs. Gretchen Kalonji (UCOP) and Nicholas Sitar (UCB). (not selected).
  10. RISC MRU: Co-PI on a multi-campus research unit (MRU) proposal on Rapid Information for Science during Catastrophes (RISC), led by Prof. Emily Brodsky, UCSC. (not selected).

Software Development

  1. One of the group leaders and developers of IBM DB2 Parallel Edition Version 1.0, released commercially in December 1995.
  2. One of the designers of the SDSC Storage Resource Broker (SRB). Version 1 was released in September 1997.
  3. One of the designers of the Data Integration Cartª technology for ontology-based data integration (invention disclosure filed: 2007).

U.S. Patents

  1. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 7,349,915, March 25, 2008. Licensed to Nirvana Storage.
  2. Persistent Archives, R. Moore, A. Rajasekar, C. Baru, B. Ludaescher, A. Gupta, R. Marciano, US Patent 6,963,875, November 8, 2005. Licensed to Nirvana Storage.
  3. System and method for construction, storage, and transport of presentation-independent multimedia content, C. Baru, J. Chase, T. Elvins, R. Fassett, E. Nebel, Patent No. 7,028,252, March 22, 2001. Assigned to Oracle Corporation.
  4. Method and apparatus for achieving uniform data distribution in a parallel database system, C. Baru and F. Koo, Patent No.US5970495, IBM, Oct.19, 1999.
  5. Method and apparatus for implementing partial declustering in a parallel database system, C. Baru, G. Fecteau, J. Kirton, L. Kollar, F. Koo, Patent No. US5878409, IBM, March 2, 1999.

Ph.D. Committees Chaired

  • Ophir Frieder. Dissertation title: "Database processing on a cube-connected multicomputer system," EECS Dept., University of Michigan, Dec. 1987.  Recipient of IBM Graduate Fellowship Award. Currently, Chaired Professor, Georgetown University, Washington, DC.
  • Piyush Goel. Dissertation title: "Dataflow query processing and optimization," EECS Dept., University of Michigan, May 1992. Co-Founder of, San Jose, CA.
  • Sriram Padmanabhan. Dissertation title: "Data placement in shared-nothing parallel database systems," EECS Dept., University of Michigan, July 1992. Recipient of IBM Graduate Fellowship Award. Currently, Distinguished Engineer, IBM Silicon Valley Labs.

Committee Memberships

  • Co-chair, SPEC Research Group on Big Data Benchmarking, 2014 - .
  • Lead, TeraGrid Data Working Group, 20012002.
  • Member of Review Committee, Canada Research Chairs program, Natural Sciences and Engineering Research Council (NSERC) of Canada, 20002002.
  • Member of the Architecture Working Group, California Digital Library, University of California, Office of the President, Oakland, CA, 1998–2000.
  • Member of the Grants Selection Committee (GSC) for Computer and Information Sciences, Natural Sciences and Engineering Research Council (NSERC) of Canada, 1994-97.  (The GSC is responsible for annually reviewing grant proposals from computer science faculty in Canada and making funding decisions).
  • IBM representative on the Transaction Processing Council's TPC-D Benchmark Standard Subcommittee, 1993-95. Participated in drafting the original TPC-D specification.
Data science
data management
knowledge networks
data analytics