In the spring of 2011, the UC San Diego Research Cyberinfrastructure (RCI) Implementation Team invited researchers and research teams to participate in the Research Curation and Data Management Pilot program. Twenty applications were received and after due deliberation the RCI Oversight Committee selected eight pilot projects, five that are curation-intensive and three that are storage-intensive.
The pilot participants will receive assistance with the creation of metadata to make data discoverable and available for future re-use; with the ingest of data into the San Diego Supercomputer Center’s (SDSC) new Cloud Storage system, which is accessible via high-speed networks; and with the movement of data into Chronopolis, a geographically-dispersed preservation system.
Data curation plays an essential role in the University’s Research Cyberinfrastructure initiative, which is critical for supporting and advancing academic research in the Digital Age. In addition to the University’s own need to manage research data, federal funding agencies are requiring that the data generated by publicly-funded research be easily discoverable and accessible. The National Institutes of Health (NIH) have been requiring researchers to share their data since 2003, and other funding agencies have begun to require similar action.
The first 5 below are data-curation projects; the last 3 are storage-intensive projects:
The Brain Observatory
was created and directed by Dr. Jacopo Annese in the Department of Radiology. The project will create the infrastructure to preserve and curate the digital version of the brain of patient HM, the most studied neuropsychological patient in modern medicine. HM became amnesic after undergoing experimental surgery in 1953 for the relief of epileptic seizures. During more than five decades of ensuing studies, his case generated more than 2,000 scientific publications and elucidated most of the concepts on human memory function. The project goal is to preserve the anatomy relative to this case and to make the data available to the largest possible number of researchers worldwide.
The National Science Foundation (NSF) OpenTopography Facility
is managed by Dr. Chaitan Baru of SDSC’s Advanced Cyberinfrastructure Development Group. This production-level data facility supports the NSF Earth science community and facilitates community access to high-resolution, earth science-oriented, lidartopography data, and related tools and resources. In this pilot project, OpenTopography will explore ways to leverage the UCSD RCI Research Data Management and Curation Program to provide fully curated, DOI-referenced, long-term hosting for the OpenTopography data archive.
The UCSD Levantine Archaeology Laboratory
is a joint program of the Division of Social Sciences and the Center of Interdisciplinary Science for Art, Architecture and Archaeology (CISA3), directed by Anthropology Professor Thomas Levy. The increasing availability and relatively low cost of digital data collection technologies have created a data avalanche for archaeologists, who are collaborating with computer scientists to develop new visualization and analysis tools. This pilot project will develop the infrastructure needed to curate cultural heritage data that, spurred by the increasing use of these tools, grows exponentially each year.
Scripps Institution of Oceanography Geological Collections
is managed by Geologist Richard Norris. The geological collections hold physical specimens in the form of ~7000 deep ocean cores, 4000 dredges from the deep sea, and ~40,000 slides of marine microfossils that are associated with digital data sets, photographs, and metadata. This is one of the largest collections of marine geology samples in the United States, used by an international community of marine geologists, biologists, and oceanographers. Goals for the project include the creation of a searchable, graphical interface Web presence; the means to automatically transfer digital holdings; and the creation of a more user-friendly Web form system.
The Laboratory for Computational Astrophysics
is a joint program of SDSC, the Center for Astrophysics and Space Sciences, and the Department of Physics,managed by Rick Wagner. The work of this lab encompasses large-scale simulations of astrophysical phenomenon in cosmology, star formation and turbulence. Its emphasis is on high-resolution grid methods modeling complex physics including radiation transport and magnetohydrodynamics. For the pilot project, this group is interested in using data management and curation to improve its collaborations with other UCSD researchers, and to support publishing its simulations.
The CineGrid Exchange
is led by the Director of Visualization and senior research scientist Dr. Tom DeFanti, Calit2, Center for Networked Systems (CNS). The CineGrid Exchange is a distributed digital media repository designed to support user-driven testbeds for digital media asset management, distribution, and preservation. The CineGrid Exchange node at Calit2 is currently about 30TB, but is expected to grow three to ten times once it is proven (one of the international sites, in Amsterdam, has 100TB now).
The Center for Research in Biological Systems (CRBS)
and its National Center for Microscopy and Imaging Research is led by Neurosciences and Bioengineering Professor Mark H. Ellisman. Building on early successes with the integrative development of the next generation of probes for correlated light and electron microscopy, the CRBS research team is now fielding one of the world’s largest and most distinctive collections of data-generating platforms of its kind; most notably the nation's only serial block-face scanning electron microscopy (3View) installations for biomedical research. The data being produced are not only unprecedented in description and value to the biological community, but also unprecedented in scale and complexity.
The Center for Research in Computing and the Arts
, led by Theater Professor Shahrokh Yadegari, whose research is focused on an audio/video synthesis system that is used both as a theoretical research tool in the study of the perceptual boundary between sound and music, and as a production tool in artistic presentations. The generated data are composed of high-resolution, multi-channel audio and video files, as well as analysis files. Yadegari's interests include experimenting with the availability issues (i.e., constant network bandwidth) for using curated data within a production and/or presentation context where large amounts of data are re-mixed in real-time in multiple remote locations in synchrony with each other.
For more information about the University’s Research Cyberinfrastructure initiatives visit the RCI website. For more information about the Libraries contributions to UCSD's RCI, visit the UCSD Libraries Research Data Curation Services page.
Note: This article first appeared in the Fall 2011 issue of Faculty File, the UC San Diego Libraries' newsletter for the faculty.