PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies
May 1, 2011 by Bioinformatics Computational Biology · Leave a Comment
Motivation: The concept of pleiotropy was proposed a century ago, though up to now there have been insufficient efforts to design robust statistics and software aimed at visualizing and evaluating pleiotropy at a regional level. The Pleiotropic Region Identification Method (PRIMe) was developed to evaluate potentially pleiotropic loci based upon data from multiple genome-wide association studies (GWAS).
Methods: We first provide a software tool to systematically identify and characterize genomic regions where low association P-values are observed with multiple traits. We use the term Pleiotropy Index to denote the number of traits with low association P-values at a particular genomic region. For GWAS assumed to be uncorrelated, we adopted the binomial distribution to approximate the statistical significance of the Pleiotropy Index. For GWAS conducted on traits with known correlation coefficients, simulations are performed to derive the statistical distribution of the Pleiotropy Index under the null hypothesis of no genotype–phenotype association. For six hematologic and three blood pressure traits where full GWAS results were available from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, we estimated the trait correlations and applied the simulation approach to examine genomic regions with statistical evidence of pleiotropy. We then applied the approximation approach to explore GWAS summarized in the National Human Genome Research Institute (NHGRI) GWAS Catalog.
Results: By simulation, we identified pleiotropic regions including SH2B3 and BRAP (12q24.12) for hematologic and blood pressure traits. By approximation, we confirmed the genome-wide significant pleiotropy of these two regions based on the GWAS Catalog data, together with an exploration on other regions which highlights the FTO, GCKR and ABO regions.
Availability and Implementation: The Perl and R scripts are available at http://www.framinghamheartstudy.org/research/gwas_pleiotropictool.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics – recent issues
aCGH.Spline–an R package for aCGH dye bias normalization
May 1, 2011 by Bioinformatics Computational Biology · Leave a Comment
Motivation: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH—usually Cy5 and Cy3—can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries.
Results: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner.
Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).
Contact: l.larcombe@cranfield.ac.uk;
Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics – recent issues
Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software
April 14, 2011 by Bioinformatics Computational Biology · Leave a Comment
Summary: There is a strong and growing need in the biology research community for accurate, automated image analysis. Here, we describe CellProfiler 2.0, which has been engineered to meet the needs of its growing user base. It is more robust and user friendly, with new algorithms and features to facilitate high-throughput work. ImageJ plugins can now be run within a CellProfiler pipeline.
Availability and Implementation: CellProfiler 2.0 is free and open source, available at http://www.cellprofiler.org under the GPL v. 2 license. It is available as a packaged application for Macintosh OS X and Microsoft Windows and can be compiled for Linux.
Contact: anne@broadinstitute.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics – recent issues
BiC: a web server for calculating bimodality of coexpression between gene and protein networks
April 14, 2011 by Bioinformatics Computational Biology · Leave a Comment
Summary: Bimodal patterns of expression have recently been shown to be useful not only in prioritizing genes that distinguish phenotypes, but also in prioritizing network models that correlate with proteomic evidence. In particular, subgroups of strongly coexpressed gene pairs result in an increased variance of the correlation distribution. This variance, a measure of association between sets of genes (or proteins), can be summarized as the bimodality of coexpression (BiC). We developed an online tool to calculate the BiC for user-defined gene lists and associated mRNA expression data. BiC is a comprehensive application that provides researchers with the ability to analyze both publicly available and user-collected array data.
Availability: The freely available web service and the documentation can be accessed at http://gurkan.case.edu/software.
Contact: gurkan@case.edu
Bioinformatics – recent issues
SiGN-SSM: open source parallel software for estimating gene networks with state space models
April 14, 2011 by Bioinformatics Computational Biology · Leave a Comment
Summary: SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles.
Availability: SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site.
Contact: tamada@ims.u-tokyo.ac.jp
Bioinformatics – recent issues


