Repitools: an R package for the analysis of enrichment-based epigenomic data.
May 13, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 May 10 PMID: 20457667
Authors: Statham, A. L. – Strbenac, D. – Coolen, M. W. – Stirzaker, C. – Clark, S. J. – Robinson, M. D.
Journal: Bioinformatics
SUMMARY: Epigenetics, the study of heritable somatic phenotypic changes not related to DNA sequence, has emerged as a critical component of the landscape of gene regulation. The epigenetic layers, such as DNA methylation, histone modifications and nuclear architecture are now being extensively studied in many cell types and disease settings. Few software tools exist to summarize and interpret these datasets. We have created a toolbox of procedures to interrogate and visualize epigenomic data (both array- and sequencing-based) and make available a software package for the cross-platform R language. AVAILABILITY: The package is freely available under LGPL from the R-Forge web site (http://repitools.r-forge.r-project.org/). CONTACT: mrobinson@wehi.edu.au.
post to: CiteULike
View full post on Bioinformatics
LCE: A Link-based Cluster Ensemble Method for Improved Gene Expression Data Analysis.
May 8, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 May 5 PMID: 20444838
Authors: Iam-On, N. – Boongoen, T. – Garrett, S.
Journal: Bioinformatics
MOTIVATION: It is far from trivial to select the most effective clustering method and its parameterisation, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarise sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing amongst clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering. RESULTS: The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques, and (iv) is computationally efficient for large datasets and gene clustering. AVAILABILITY: Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010 CONTACT: nii07@aber.ac.uk.
post to: CiteULike
View full post on Bioinformatics
PepC: Proteomics software for identifying differentially expressed proteins based on spectral counting.
April 25, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Apr 22 PMID: 20413636
Authors: Heinecke, N. L. – Pratt, B. S. – Vaisar, T. – Becker, L.
Journal: Bioinformatics
SUMMARY: Identifying biologically significant changes in protein abundance between two conditions is a key issue when analyzing proteomic data. One widely used approach centers on spectral counting, a label-free method that sums all the tandem mass spectra for a protein observed in an analysis. To assess the significance of the results, we recently combined the t-test and G-test, with random permutation analysis, and we validated this approach biochemically. To automate the statistical method, we developed PepC, a software program that balances the tradeoff between the number of differentially expressed proteins identified and the false discovery rate. This tool can be applied to a wide range of proteomic datasets, making data analysis rapid, reproducible, and easily interpretable by proteomics specialists and non-specialists alike. Availability and Implementation: The software is implemented in Java. It has been added to the Trans Proteomic Pipeline project’s “Petunia” web interface, but can also be run as a command line program. The source code is LGPL-licensed and the program is freely available on the web. http://sashimi.svn.sourceforge.net/viewvc/sashimi/trunk/trans_proteomic_pi peline/src/Quantitation/Pepc CONTACT: levb@u.washington.edu or brian.pratt@insilicos.com.
post to: CiteULike
View full post on Bioinformatics
Inference of Combinatorial Boolean Rules of Synergistic Gene Sets from Cancer Microarray Datasets.
April 24, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Apr 21 PMID: 20410052
Authors: Park, I. – Lee, K. H. – Lee, D.
Journal: Bioinformatics
MOTIVATION: Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene based pattern analyses. Although a number of approaches have been proposed for gene set based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. RESULTS: We propose a new approach to inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. AVAILABILITY: Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the supplementary file. CONTACT: dhlee@biosoft.kaist.ac.kr.
post to: CiteULike
View full post on Bioinformatics
Inferring combined CNV/SNP haplotypes from genotype data.
April 23, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Apr 20 PMID: 20406911
Authors: Su, S. Y. – Asher, J. E. – Jarvelin, M. R. – Froguel, P. – Blakemore, A. I. – Balding, D. J. – Coin, L. J.
Journal: Bioinformatics
MOTIVATION: Copy number variants (CNVs) are increasingly recognised as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few methods been designed for inferring CNV haplotypic phase and none of these are applicable at genomewide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0 , is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate. RESULTS: We generated diploid phase-known CNV-SNP genotype datasets by pairing male X chromosome CNV-SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration, and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset-a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap’s accuracy extends to real-life datasets. AVAILABILITY: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin CONTACT: l.coin@imperial.ac.uk.
post to: CiteULike
View full post on Bioinformatics
















