Data Integration and Pattern Recognition

Katrina Waters, Principal Investigator

Proteomic data sets, graphic
Researchers at PNNL are developing bioinformatic capabilities to integrate microarray and proteomic data sets. Click for a larger version.

The greatest challenge in using new, high-throughput molecular profiling technologies is developing novel bioinformatic approaches for data integration and analysis using semi-automated routines that facilitate data mining and interpretation. The microarray and proteomic datasets already existing and being developed at Pacific Northwest National Laboratory (PNNL) are driving our efforts to develop such approaches. These efforts are being carried out by PNNL’s Data Integration and Pattern Recognition project team.

We are creating bioinformatic capabilities to integrate heterogeneous data sets, such as gene expression and protein levels, in a statistically rigorous way and across multiple platforms and experiments. Statistical routines for data merging and pattern recognition are being prototyped and optimized for use by other scientists. Data merging tools will facilitate the integration of publicly available biological function and interaction information for full context data mining and interpretation. Pattern recognition tools will allow scientists to identify significant sets of genes or proteins whose expression is regulated in a time-dependent or ligand-specific manner to better understand how biological systems respond to their environment. Statistical data fusion clustering algorithms will enable scientists to deal with missing data and meta-clustering of disparate data that takes into account uncertainties from individual data sources. We will incorporate our new methods into existing computational frameworks and databases to enable scientists to gain greater insight from their experiments more efficiently. Also, novel visualization tools are being developed to facilitate comprehension of extremely large data sets.

