Skip to Main Content U.S. Department of Energy
small banner

Software Environment for Biological Network Inference (SEBINI)

Ron Taylor, Principal Investigator

Figure 1. A block diagram of the SEBINI–CABIN system
Figure 1. A block diagram of the SEBINI–CABIN system. Click for a larger version.

One of the core tasks in systems biology is the reconstruction of the regulatory, interaction, and signaling networks in an organism. Pacific Northwest National Laboratory's (PNNL's) Software Environment for BIological Network Inference (SEBINI) [1] project team created a sofware platform that provides an interactive environment for the deployment and evaluation of algorithms used to reconstruct the structure of biological regulatory and interaction networks. SEBINI aids in more accurate reconstruction of biological networks, with less effort, in less time.

Figure 2. The start of a SEBINI table showing the nodes and undirected edges for an inferred protein-protein interaction network for the bacterium R.  palustris.
Figure 2. The start of a SEBINI table showing the nodes and undirected edges for an inferred protein-protein interaction network for the bacterium R. palustris. Click for a larger version.

SEBINI provides a framework such that a user can analyze high-throughput gene expression, protein abundance, or protein activation data via a suite of state-of-the-art network inference algorithms. It also allows algorithm developers to compare and train network inference methods on artificial networks and simulated gene expression perturbation data. SEBINI can, therefore, be used by software developers wishing to evaluate, refine, or combine inference techniques, as well as by bioinformaticians analyzing experimental data. Networks inferred by one of the algorithms in the SEBINI platform can be automatically passed on to a second tool, PNNL's Collective Analysis of Biological Interaction Networks (CABIN) [2] for further analysis, i.e., for network validation or for network expansion using public bioinformatics data.

Figure 3.  The protein-protein interaction network shown here was inferred using the BEPro algorithm.
Figure 3. The protein-protein interaction network shown here was inferred using the BEPro algorithm, operating on a set of 854 bait-prey experiments for the bacterium R. palustris. The Cytoscape window shows part of the network graph invoked from a SEBINI webpage for the network. The Cytoscape attribute browser pane, shows information passed from SEBINI database to Cytoscape on the proteins (nodes) of this inferred network. Click for a larger version.

SEBINI provides a framework that employs a standard three-tier Web architecture. SEBINI has a Web-based client user interface, with some use of AJAX technology. The middle tier application logic consists of a suite of Java servlets and auxiliary Java programs. Lastly, a relational database stores the data required by the middle tier. The database permanently stores the inferred networks, as well as the raw data, the processed data (processing may mean binning of microarray data, or peptide-to-protein collapse for mass spectrometry bait-prey data), and the algorithm parameter selections used to generate the networks. The stored networks can be visualized as graphs, using Cytoscape [3], with Cytoscape being invoked in its own window via Java Web Start from a launch button on the SEBINI webpage for the inferred network. The SEBINI software can also perform topological and statistical analysis on the inferred networks, and export them in a human-readable or program-specific format. Any kind of executable program can be used as an inference algorithm or data processing algorithm. A Java handler class is written for each new algorithm to handle communication between the algorithm, the invocation webpage, and the database. Security is based on a project organization, with passwords assigned to the project owner and project users.

Each inferred network can be displayed as a table of nodes and edges on its own webpage. Also, the details for inferred edges and nodes are available on web pages, with a separate page for each edge and each node. The raw and processed state values across the entire set of experiments for the two nodes of each inferred edge are stored with the edge. These values can be viewed by the user on the edge’s webpage, or exported with the inferred network topology for use in dynamic modeling of regulatory networks via equation fitting. Each inferred network can also be viewed as a graph within Cytoscape, as noted above, and further analyzed or annotated with CABIN. We can also export the edges of an inferred network in Cytoscape SIF file format.

The central database in SEBINI can store networks of many types, and the SEBINI platform can incorporate any algorithm that is applied to pieces of evidence attached to nodes in a potential network or applied to evidence attached to the set of nodes as a whole. At present, algorithms in SEBINI are typically directed towards searching for networks of causal influence, where the state of one node affects the state of another node. The best-known examples of such are transcriptional regulatory networks derived from correlations in mRNA expression levels measured in microarray experiments. For this purpose SEBINI has incorporated algorithms from classical statistics (e.g., Pearson correlation), and static and dynamic Bayesian network structure learning algorithms (e.g., the BANJO toolkit from Duke University [4, 5]). Also, information-theoretic algorithms using the concept of mutual information have been added—basic no-frills mutual information [6], the CLR algorithm from Boston University[7], and the ARACNE algorithm from Columbia University [8-11].

SEBINI's toolkit also includes the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro) algorithm [12-15]. This algorithm was developed at PNNL to infer undirected, non-causal protein-protein interaction networks from sets of mass spectrometry bait-prey experiments. Inference of protein interactions from bait-prey data sets is not based on state correlation in the same sense that correlation is used in the analysis of microarray experiments to infer transcriptional regulatory edges. Rather, evidence from a set of mass spectrometry experiments can be tied to the set of proteins, uploaded into the SEBINI platform, processed within SEBINI via a peptide-to-protein evidence collapse program, and finally passed to the BEPro algorithm, which will then return a set of interactions that together form the network topology.

References

(1) Taylor RC, Shah A, Treatman C, Blevins M. 2006. "SEBINI: Software Environment for BIological Network Inference." Bioinformatics 21:2706-2708.

(2) Singhal M, Domico K. 2007. "CABIN: Collective Analysis of Biological Interaction Networks." Computational Biology and Chemistry, 31:222-225.

(3) Shannon P, Markiel A, Ozier O, Baliga NS, Want JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. "Cytoscape: a software environment for integrated models of biomolecular interaction networks." Genome Res, 13:2498-2504.

(4) Hartemink AJ, Gifford DK, Jaakkola TS, A. YR. 2002. "Combining location and expression data for principled discovery of genetic regulatory network models." Pacific Symposium on Biocomputing, 7:37-43.

(5) Hartemink A. 2005. "Banjo: Bayesian Network Inference with Java Objects."

(6) Cover TM, Thomas JA. 1991. Elements of Information Theory. 1st edn. New York: John Wiley & Sons; 1991.

(7)Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. 2007. "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles." PLoS Biology 2007, 5:54-66.

(8) Margolin AA, Wang K, Lim WK, Kustagi M, Nemenman I, Califano A. 2006. "Reverse engineering cellular networks." Nature Protocols 1:663-672.

(9) Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla-Favera R, Califano A. 2006. "ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context." BMC Bioinformatics, 7:S1-7.

(10) Hartemink AJ. "Reverse engineering gene regulatory networks." 2005. Nat Biotech2005, 23:554-555.

(11) MDeC Bioinformatics Core Facility at the Columbia Genome Center: "ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks)."

(12) Gilchrist MA, Salter LA, Wagner A. 2004. "A statistical framework for combining and interpreting proteomic datasets." Bioinformatics, 20:689-700.

(13) Gilmore J, Auberry DL, White AM, Sharp JL, Anderson KK, Daly DS. 2006. Bayesian Estimator of Protein-Protein Association Probabilities (BEPro) website.

(14) Sharp JL, Anderson KK, Daly DS, Auberry DL, Cannon WR, White AM, Kery V: "Inferring protein-protein associations with pull-down LC-MS assay experiments." Journal of Proteome Research. Submitted.

(15) Gilmore J, Auberry D, Sharp J, White A, Anderson K, Daly D: "A Bayesian Estimator of Protein-Protein Association Probabilities." Bioinformatics. Submitted October 2007.

Systems Biology at PNNL

Research & Capabilities

Resources

Related Projects

Visit the SEBINI demonstration website. The Java source code and PostgreSQL database schema are available for free for non-commercial use.