Top

Systems Biology: The Next Frontier for Bioinformatics

Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from ‘omics platform technologies, in particular “downstream” technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful we…
Advances in Bioinformatics via MedWorm.com

The necessity of adjusting tests of protein category enrichment in discovery proteomics

Motivation: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher’s exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias.

Results: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.

Contact: eugene.kolker@seattlechildrens.org

Supplementary information: Supplementary data are available at Bioinformatics online.

Bioinformatics – recent issues

High-throughput prediction of protein antigenicity using protein microarray data

Motivation: Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response.

Results: Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro.

Availability: ANTIGENpro is integrated in the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu.

Contact: pfbaldi@ics.uci.edu

Bioinformatics – recent issues

Discovering Genomics, Proteomics and Bioinformatics (2nd Edition)

Discovering Genomics, Proteomics and Bioinformatics (2nd Edition)

KEY BENEFIT: Discovering Genomics is the first genomics text that combines web activities and case studies with a problem-solving approach to teach upper-level undergraduates and first-year graduate students the fundamentals of genomic analysis. More of a workbook than a traditional text, Discovering Genomics, Second Edition allows students to work with real genomic data in solving problems and provides the user with an active learning experience. KEY TOPICS: Genomic Medicine Case Study: What’s wrong with my child? Genome Sequence Acquisition and Analysis, Comparative Genomics in Evolution and Medicine, Genome Variations, Genomic Medicine Case Study: Why Can’t I Just Take a Pill to Lose Weight? Basic Research with DNA Microarrays, Applied Research with DNA Microarrays, Proteomics, Genomic Medicine Case Study: Why Can’t We Cure More Diseases? Genomic Circuits in Single Genes, Integrated Genomic Circuits, Modeling Whole-Genome Circuits. MARKET: For all readers interested in genomics.

Rating: (out of 7 reviews)

List Price: $ 115.00

Price: $ 66.97

Machine learning based prediction for peptide drift times in Ion Mobility Spectrometry.

Publication Date: 2010 May 21 PMID: 20495001
Authors: Shah, A. R. – Agarwal, K. – Baker, E. S. – Singhal, M. – Mayampurath, A. M. – Ibrahim, Y. M. – Kangas, L. J. – Monroe, M. E. – Zhao, R. – Belov, M. E. – Anderson, G. A. – Smith, R. D.
Journal: Bioinformatics

MOTIVATION: Ion mobility spectrometry (IMS) has gained significant traction over the past few years as a proven technique for rapid, high-resolution separations of analytes based upon gas-phase ion structure with significant impact in the field of proteomic analysis. IMS coupled with mass spectrometry (MS) affords multiple improvements over traditional proteomics techniques such as the elucidation of secondary structure information, identification of post-translational modifications, as well as higher identification rates with reduced experiment times. The high throughput nature of this technique calls for accurate calculation of cross sections, mobilities and associated drift times of peptides, thereby enabling downstream data analysis. Here we present a model that uses physicochemical properties of peptides to accurately predict a peptide’s drift time directly from its amino acid sequence. This model is used in conjunction with two mathematical techniques, a partial least squares regression (PLS) and a support vector regression (SVR) setting. RESULTS: When tested on an experimentally created high confidence database of 8675 peptide sequences with measured drift times, both techniques statistically significantly outperform the intrinsic size parameters-based calculations, the currently held practice in the field, on all charge states (+2,+3 and +4). AVAILABILITY: The software executable, imPredict, is available for download from http://omics.pnl.gov/software/imPredict.php.

post to: CiteULike

View full post on Bioinformatics

Next Page »

Bottom