A computational genomics pipeline for prokaryotic sequencing projects.
June 5, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Jun 2 PMID: 20519285
Authors: Kislyuk, A. O. – Katz, L. S. – Agrawal, S. – Hagen, M. S. – Conley, A. B. – Jayaraman, P. – Nelakuditi, V. – Humphrey, J. C. – Sammons, S. A. – Govil, D. – Mair, R. D. – Tatti, K. M. – Tondella, M. L. – Harcourt, B. H. – Mayer, L. W. – Jordan, I. K.
Journal: Bioinformatics
MOTIVATION: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation, and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. RESULTS: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and Implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell, and MySQL and is compatible with Linux and other Unix systems. CONTACT: king.jordan@biology.gatech.edu SUPPLEMENTARY INFORMATION: See http://nbase.biology.gatech.edu.
post to: CiteULike
View full post on Bioinformatics
EuGene-maize: a web site for maize gene prediction.
April 21, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Apr 16 PMID: 20400755
Authors: Montalent, P. – Joets, J.
Journal: Bioinformatics
MOTIVATION: A large part of the maize B73 genome sequence is now available and emerging sequencing technologies will offer cheap and easy ways to sequence areas of interest from many other maize genotypes. One of the steps required to turn these sequences into valuable information is gene content prediction. To date, there is no publicly available gene predictor specifically trained for maize sequences. To this end, we have chosen to train the EuGene software which can combine several sources of evidence into a consolidated gene model prediction. AVAILABILITY: http://genome.jouy.inra.fr/eugene/cgi-bin/eugene_form.pl CONTACT: joets@moulon.inra.fr SUPPLEMENTARY INFORMATION: The training gene set.
post to: CiteULike
View full post on Bioinformatics
Structural Variation Analysis with Strobe Reads.
April 11, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Apr 8 PMID: 20378554
Authors: Ritz, A. – Bashir, A. – Raphael, B. J.
Journal: Bioinformatics
MOTIVATION: Structural variation including deletions, duplications and rearrangements of DNA sequence are an important contributor to genome variation in many organisms. In human, many structural variants are found in complex and highly repetitive regions of the genome making their identification difficult. A new sequencing technology called strobe sequencing generates strobe reads containing multiple subreads from a single contiguous fragment of DNA. Strobe reads thus generalize the concept of paired reads, or mate pairs, that have been routinely used for structural variant detection. Strobe sequencing holds promise for unraveling complex variants that have been difficult to characterize with current sequencing technologies. RESULTS: We introduce an algorithm for identification of structural variants using strobe sequencing data. We consider strobe reads from a test genome that have multiple possible alignments to a reference genome due to sequencing errors and/or repetitive sequences in the reference. We formulate the combinatorial optimization problem of finding the minimum number of structural variants in the test genome that are consistent with these alignments. We solve this problem using an integer linear program. Using simulated strobe sequencing data, we show that our algorithm has better sensitivity and specificity than paired read approaches for structural variation identification. CONTACT: braphael@brown.edu.
post to: CiteULike
View full post on Bioinformatics
Genome Analysis and Bioinformatics: A Practical Approach
March 23, 2010 by BioinformaticsDirectory.com · Leave a Comment
Product Description
With the decoding of whole genome sequences of many organisms, new vistas of research have emerged in computational biology. The scientific community has free access to the genome sequence data from the public databases. Many times, it is really hard to make sense of these huge data of DNA and protein sequences. Therefore, bioinformatics tools are used to handle, store and analyze genome sequence data for the benefit of mankind. The book has been written in a simplest possible manner so that every one should understand the basic concepts of genome sequence analysis and bioinformatics. The book is structured in such a way so that readers should first know about how whole genome sequences are generated by using high throughput DNA sequencing technologies and then storing of sequences in biological databases. Second part deals with the basic principals involved in sequence analysis and applications of softwares along with practical exercises. Thirdly, data mining approaches for the discovery of genes and DNA markers have also been discussed. Besides, glossary of important terms and introduction to basic bioinformatics softwares has been included for the benefits of readers. The book will serve as a text book to the B. Tech (Bioinformatics & Biotechnology) students and would also be useful reference book to the postgraduate students and research scientists working in the areas of life sciences, genomics, biotechnology and molecular biology as well as Masters in Computer Applications (MCA) who are interested in bioinformatics.
Order from Amazon Today Genome Analysis and Bioinformatics: A Practical Approach
BEDTools: a flexible suite of utilities for comparing genomic features.
March 10, 2010 by BioinformaticsDirectory.com · Leave a Comment
Publication Date: 2010 Mar 15 PMID: 20110278
Authors: Quinlan, A. R. – Hall, I. M.
Journal: Bioinformatics
MOTIVATION: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. RESULTS: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools CONTACT: aaronquinlan@gmail.com; imh4y@virginia.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
post to: CiteULike
View full post on Bioinformatics



