Biological assessment of robust noise models in microarray data analysis
March 28, 2011 by Bioinformatics Computational Biology · Leave a Comment
Motivation: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest.
Results: We propose to this end assessing different noise models on a large number of microarray experiments. The goodness of fit of noise models is quantified by a hierarchical Bayesian analysis of variance model, which predicts normalized expression values as a mixture of a Gaussian density and t-distributions with adjustable degrees of freedom. Inference of differentially expressed genes is taken into consideration at a second mixing level. For attaining far reaching validity, our investigations cover a wide range of analysis platforms and experimental settings. As the most striking result, we find irrespective of the chosen preprocessing and normalization method in all experiments that a heavy-tailed noise model is a better fit than a simple Gaussian. Further investigations revealed that an appropriate choice of noise model has a considerable influence on biological interpretations drawn at the level of inferred genes and gene ontology terms. We conclude from our investigation that neglecting the over dispersed noise in microarray data can mislead scientific discovery and suggest that the convenience of Gaussian-based modelling should be replaced by non-parametric approaches or other methods that account for heavy-tailed noise.
Contact: peter.sykacek@boku.ac.at
Availability: http://bioinf.boku.ac.at/alexp/robmca.html.
Bioinformatics – recent issues
High-throughput prediction of protein antigenicity using protein microarray data
November 30, 2010 by Bioinformatics Computational Biology · Leave a Comment
Motivation: Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response.
Results: Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro.
Availability: ANTIGENpro is integrated in the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu.
Contact: pfbaldi@ics.uci.edu
Bioinformatics – recent issues
Qlucore Omics Explorer Supports Vital Work in Cancer Research
October 22, 2010 by Bioinformatics Computational Biology · Leave a Comment
Qlucore, a leader in the development of bioinformatics software, has today announced that Dr. Helena Carén has been using its Qlucore Omics Explorer to research tumour growth and treatment whilst at the Department of Clinical Genetics, Institute of Biomedicine, The Sahlgrenska Academy at the University of Gothenburg, Gothenburg, Sweden. The Sahlgrenska Academy is the faculty of health sciences at the University of Gothenburg, where education and research are conducted within the fields of pharmacy, medicine, odontology and health care sciences.
Dr. Carén has been using the Qlucore software to analyse methylation data from the Illumina platform in order to identify patterns that could help to categorise tumours into different categories of seriousness, and also to predict how the tumour is likely to develop.
“It is often very difficult to find meaningful patterns in very large datasets like these, but Qlucore’s software has made it much easier for me to understand the relevance of the data produced during my methylation analysis,” says Helena Carén, PhD. “The 3D graphics, in particular, have been very helpful, since it is easier to spot important patterns when you can view your results as a 3D image, and even rotate the image, if needed, directly on the computer screen.”
The ultimate goal of the methylation study is to identify a set of genes whose methylation profile can accurately determine how aggressive a tumour is, as well as the most effective method of treatment. In the longer term, these studies will also help to identify the specific genes that have contributed to formation of the tumour itself.
Qlucore’s highly intuitive Qlucore Omics Explorer application adds increased creativity to this kind of research, thanks to the software’s impressive speed and statistical capability. Founded in 2007, Qlucore has already delivered its solutions to leading research organisations and pharmaceutical companies all over the world.
“Qlucore Omics Explorer has become a highly respected name in the bioinformatics market, and we are very proud to support such exciting work in this important area,” says Carl-Johan Ivarsson, President, Qlucore. “With so many biologists, researchers and scientists now using our software to study crucial areas like disease prevention, it is very rewarding to know that we are helping them to conduct more creative research and to achieve truly outstanding results.”
Qlucore Omics Explorer has already gained a global reputation within the scientific community, as it allows the actual researchers – the people with the most biological insight – to study the data and to look for patterns and structures. In addition, because Qlucore Omics Explorer allows researchers to explore different hypotheses and alternative scenarios within seconds, the software is already helping to play a key role in unveiling important new discoveries.
“One of the best aspects of Qlucore Omics Explorer is that it has allowed me to manipulate all of my data myself, which means that it wasn’t necessary to consult bioinformatics specialists every time I wanted to consider a new theory,” Dr. Carén adds. “Plus, not only is it very easy for biologists to identify patterns in the data set very quickly by themselves, it is also easy to produce impressive charts and figures, which is very useful when presenting important findings for publication.”
About Qlucore
Qlucore started as a collaborative research project at Lund University, Sweden, supported by researchers at the Departments of Mathematics and Clinical Genetics, in order to address the vast amount of high-dimensional data generated with microarray gene expression analysis. As a result, it was recognised that an interactive scientific software tool was needed to conceptualise the ideas evolving from the research collaboration.
The basic concept behind the software is to provide a tool that can take full advantage of the most powerful pattern recogniser that exists – the human brain. The result is a core software engine that lets the user handle and filter data and the same time instantly visualise it in 3D. This will aid the user in identifying hidden structures and patterns. Over the last two years major efforts have been made to optimise the early ideas and to develop a core software engine that is extremely fast, allowing the user to explore and analyse high-dimensional data sets with the use of a normal PC, interactively and in real time.
Qlucore was founded in early 2007 and the first product released was the “Qlucore Gene Expression Explorer 1.0”. The latest version of this software, now called Qlucore Omics Explorer, represents a major step forward with advanced statistics support. All user action is at most two mouse clicks away. The company’s early customers are mainly from the Life-science and Biotech industries, but solutions for other industries are currently under development.
One of the key methods used by Qlucore Omics Explorer to visualise data is dynamic principal component analysis (PCA), an innovative way of combining PCA analysis with immediate user interaction. PCA analysis works by projecting high dimensional data down to lower dimensions. The specific projections of the high-dimensional data are chosen in order to maintain as much variance as possible in the projected data set. With Qlucore Omics Explorer, data is projected and plotted on the two dimensional computer screen and then rotated manually or automatically.
Discovering Genomics, Proteomics and Bioinformatics (2nd Edition)
October 19, 2010 by Bioinformatics Computational Biology · 5 Comments
Discovering Genomics, Proteomics and Bioinformatics (2nd Edition)
KEY BENEFIT: Discovering Genomics is the first genomics text that combines web activities and case studies with a problem-solving approach to teach upper-level undergraduates and first-year graduate students the fundamentals of genomic analysis. More of a workbook than a traditional text, Discovering Genomics, Second Edition allows students to work with real genomic data in solving problems and provides the user with an active learning experience. KEY TOPICS: Genomic Medicine Case Study: What’s wrong with my child? Genome Sequence Acquisition and Analysis, Comparative Genomics in Evolution and Medicine, Genome Variations, Genomic Medicine Case Study: Why Can’t I Just Take a Pill to Lose Weight? Basic Research with DNA Microarrays, Applied Research with DNA Microarrays, Proteomics, Genomic Medicine Case Study: Why Can’t We Cure More Diseases? Genomic Circuits in Single Genes, Integrated Genomic Circuits, Modeling Whole-Genome Circuits. MARKET: For all readers interested in genomics.
Rating:
(out of 7 reviews)
List Price: $ 115.00
Price: $ 66.97
Module-based prediction approach for robust inter-study predictions in microarray data
October 19, 2010 by Bioinformatics Computational Biology · Leave a Comment
Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering.
Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP.
Availability: http://www.biostat.pitt.edu/bioinfo/
Contact: ctseng@pitt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.




