Ron Taylor, Principal Investigator

Researchers at Pacific Northwest National Laboratory (PNNL) are creating a common infrastructure to share and integrate diverse data sets focused on a common biology system. The ultimate goal of this data integration effort is to gain new insights into cellular processes.

We are designing, developing, and implementing a computational infrastructure to provide researchers with the tools necessary to support data acquisition, metadata tracking, data storage, data retrieval, and analysis capabilities in a structured framework. As part of this endeavor, three tools are being integrated: Complex Queries (CQ), Integrated Database for Experiment and Analysis (IDEA), and Computational Cell Environment (CCE). Together, these tools will provide a mechanism for common data storage, organization and management of resources, and integration and interrogation of data.

The CQ tool is a database to store biological information that can be asked complex questions and provide meaningful answers. To be able to ask and receive information from the heterogeneous data sets included in this biological information, the data must be unified by a common interface, referred to as an abstraction layer. In our research to develop a prototype of the complex queries interface, we helped identify the different types of data needed to answer complicated queries from the end user’s perspective. This need is being addressed through the design and development of the IDEA database.

IDEA will serve as a centralized framework for managing projects, defining and designing experiments, cataloging resources, tracking samples sent for different analytical techniques, and storing results. All these data will be tracked in a way that can be easily queried. For example, a researcher could store information from a family of time-course experiments, including western blot, flow cytometry, microarray, and proteomics, then query this database about changes in the expression levels of a protein and its associated gene over time.

The CQ project team is currently working with the IDEA team to release the first version of the IDEA database, with which users will be able to upload experimental data. The CQ researchers are simultaneously focusing on building abstraction layers that would allow high-level queries to be translated to the form needed by the individual databases and return results relevant to the higher-level biological context.

PNNL's CCE is a problem-solving environment that provides user-friendly access to an extensible set of heterogeneous data sources. CQ will eventually build web-service interfaces to the CCE, enabling biologists and bioinformaticists to query across multiple experiments and integrate with other publicly available data through a common abstraction layer.

