Finding Biomarker Signatures in Pooled Sample Designs: A Simulation Framework for Methodological Comparisons
July 8, 2010 by BioinformaticsDirectory.com · Leave a Comment
Detection of discriminating patterns in gene expression data can be accomplished by using various methods of statistical learning. It has been proposed that sample pooling in this context would have negative effects; however, pooling cannot always be avoided. We propose a simulation framework to explicitly investigate the parameters of patterns, experimental design, noise, and choice of method in order to find out which effects on classification performance are to be expected. We use a two-group classification task and simulated gene expression data with independent differentially expressed genes as well as bivariate linear patterns and the combination of both. Our results show a clear increase of prediction error with pool size. For pooled training sets powered partial least squares discr…
Advances in Bioinformatics via MedWorm.com
A Comprehensive Study of Progressive Cytogenetic Alterations in Clear Cell Renal Cell Carcinoma and a New Model for ccRCC Tumorigenesis and Progression
July 8, 2010 by BioinformaticsDirectory.com · Leave a Comment
We present a comprehensive study of cytogenetic alterations that occur during the progression of clear cell renal cell carcinoma (ccRCC). We used high-density high-throughput Affymetrix 100 K SNP arrays to obtain the whole genome SNP copy number information from 71 pretreatment tissue samples with RCC tumors; of those, 42 samples were of human ccRCC subtype. We analyzed patterns of cytogenetic loss and gain from different RCC subtypes and in particular, different stages and grades of ccRCC tumors, using a novel algorithm that we have designed. Based on patterns of cytogenetic alterations in chromosomal regions with frequent losses and gains, we inferred the involvement of candidate genes from these regions in ccRCC tumorigenesis and development. We then proposed a new model of ccRC…
Advances in Bioinformatics via MedWorm.com
Homolonto: Generating homology relationships by pairwise alignment of ontologies and application to vertebrate anatomy.
June 5, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Jun 2 PMID: 20519284
Authors: Parmentier, G. – Bastian, F. B. – Robinson-Rechavi, M.
Journal: Bioinformatics
MOTIVATION: The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relations between ontologies describing different species. RESULTS: We present a new algorithm, and its implementation in the software Homolonto, to create new relationships between anatomical ontologies, based on the homology concept. Homolonto uses a supervised ontology alignment approach. Several alignments can be merged, forming homology groups. We also present an algorithm to generate relationships between these homology groups. This has been used to build a multi-species ontology, for the database of gene expression evolution Bgee. AVAILABILITY: download section of the Bgee website http://bgee.unil.ch/
post to: CiteULike
View full post on Bioinformatics
A computational genomics pipeline for prokaryotic sequencing projects.
June 5, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Jun 2 PMID: 20519285
Authors: Kislyuk, A. O. – Katz, L. S. – Agrawal, S. – Hagen, M. S. – Conley, A. B. – Jayaraman, P. – Nelakuditi, V. – Humphrey, J. C. – Sammons, S. A. – Govil, D. – Mair, R. D. – Tatti, K. M. – Tondella, M. L. – Harcourt, B. H. – Mayer, L. W. – Jordan, I. K.
Journal: Bioinformatics
MOTIVATION: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation, and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. RESULTS: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and Implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell, and MySQL and is compatible with Linux and other Unix systems. CONTACT: king.jordan@biology.gatech.edu SUPPLEMENTARY INFORMATION: See http://nbase.biology.gatech.edu.
post to: CiteULike
View full post on Bioinformatics
Structure-Based Variable Selection for Survival Data.
June 5, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Jun 2 PMID: 20519286
Authors: Lagani, V. – Tsamardinos, I.
Journal: Bioinformatics
MOTIVATION: Variable selection is a typical approach used for molecular-signature and biomarker discovery, however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data, called Survival Max-Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian Networks and the Markov Blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the Bayesian Network representing the data distribution. The selected variables also have a causal interpretation that we discuss. RESULTS: We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert. AVAILABILITY: Matlab and R code are freely available from http://www.mensxmachina.org CONTACT: vlagani@ics.forth.gr.
post to: CiteULike
View full post on Bioinformatics


