Saturday, February 25, 2012

AGBT 2012

Like half of all blog posts on the internet, I have to start by apologizing for neglecting to update the blog regularly. No reason to get into details, so I'll go straight to the post.

Last week, while I was learning to tango in Buenos Aires, my colleagues presented a poster we put together detailing the many hours we've put into establishing a procedure for doing hybrid de novo assemblies with Illumina and 454 sequencing data. The experience was great and perhaps my first front-to-back computational research project. We achieved some synergistic results and have since included the procedures into our own analysis offerings. We are receiving some flattering attention from the BROAD institute as well as from everywhere across the web-sphere. Below is the abstract and here is the poster.


Evaluation of Strategies for De Novo Assembly of Genomes and Transcriptomes Using Combined Illumina and Roche 454 Sequencing Data.

Jon R. Armstrong, Jarret I. Glasscock, Ian Schillebeeckx, and Navish Dadighat.
Cofactor Genomics, St. Louis, MO

The emergence of next-generation sequencing platforms has made low-cost sequencing an attractive approach for de novo assembly of genomes and transcriptomes. The two most widely used next-generation platforms for de novo assembly are the Illumina and Roche 454 and each system has particular strengths and weaknesses. The Illumina generates short reads (100 bp) while the Roche 454 FLX+ produces read lengths close to Sanger (800 bp), however the price per base on the Roche 454 machine is approximately 50x more costly than the Illumina platform. Few studies exist which employ assembly of both read types [1, 2], and the methods used are not explicitly described. If strategies were determined to assemble data from both platforms, and capitalize on the strengths of each system while minimizing the weaknesses and cost, researchers could apply them during future de novo assembly projects. To this end, we evaluated multiple strategies for de novo assembly of combined Illumina and Roche 454 sequencing data, originating from several genomes and transcriptomes of varying size and origin. Our analysis shows that assembly of the Roche 454 reads, prior to combining with Illumina raw reads, produces the best assembly metrics for genomes and transcriptomes, however, depending on the type of assembly it may be detrimental to assemble Illumina reads prior to combining with Roche 454 data. We will also present the types of assemblers used and workflows specific to genome and transcriptome assemblies.

1. Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 2009 February; 19(2): 294–305.

2. Nowrousian M, Stajich JE, Chu M, Engh I, Espagne E, Halliday K, Kamerewerd J, Kempken F, Knab B, Kuo HC, Osiewacz HD, Pöggeler S, Read ND, Seiler S, Smith KM, Zickler D, Kück U, Freitag M. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis. PLoS Genet. 2010 Apr 8;6(4):e1000891.