PerspectivesAre you interested in submitting a Perspective Article? Be sure to read The Science Advisory Board's Editorial Guides for Perspective Articles. Click here. Next-Generation DNA Sequencing: Competition, Confusion and Fireworks in Florida by Richard Wintle, PhD. There is considerable interest in "next-generation" DNA sequencing (NGS) technologies, so named as they represent a sea change over the existing "generation" of capillary electrophoresis (CE) instruments. These include the market leading Applied Biosystems 3730xl, and competing instruments from Beckman Coulter and GE Healthcare. The bulk of the Human Genome Project was accomplished with CE, as was the recent "Venter" diploid human genome [1]. More recently, the popular and scientific presses [2] have latched on to the idea of a "$1,000 genome", fuelled in large part by the overlapping goals of the Personal Genome Project (PGP) [3] and the Archon X-Prize for Genomics [4]. However, there seems little to no justification for this figure, other than that it fits into the price range currently charged for other genomic tests such as exon sequencing or array-CGH analysis. The cost of sequencing an entire human genome to a reasonable depth of coverage using conventional Sanger capillary electrophoresis, estimated at tens of millions of dollars [5], is still hundreds of thousands of dollars using NGS [6]. In order to accomplish cheap and rapid whole-genome sequencing and many other large-scale sequencing projects such as metagenomic analysis, transcript profiling, and genome-wide chromatin immunoprecipitation sequencing, NGS technologies will need to be adopted. Some are already market-available, and funding initiatives by the US National Institutes of Health (NIH) as well as the X-prize and PGP have been helping to drive innovation in this area. The market leaders in the NGS field are clearly Roche, with their GS-FLX instrument, which relies on pyrosequencing technology, and Illumina's Genome Analyzer. Applied Biosystems appears to be a solid third-runner with their SOLiD platform, and has essentially caught up in capability, but not yet in market penetration. Other contenders, such as Helicos Biosciences, are well behind and so far only the Roche, Illumina and AB systems are on the market. Roche is dominating the scientific literature, with more than 100 publications to date [7]; the Illumina system has now achieved some traction with about a dozen primary research publications [8]. However, for the researcher considering a purchase of any of these sequencing systems, the landscape is confusing: comparisons of run costs, read accuracy, and throughput are a labyrinth to navigate, and all of these systems, once shipping, duty and taxes, ancillary equipment and reagents are included, cost in the neighbourhood of half a million dollars (and in some cases much more). Choosing the wrong one would be an expensive mistake indeed. The leading technologies all share a number of features: random ("shotgun") sequencing of target molecules (for example, an entire genome, a population of cDNAs, or pools of selected DNA fragments); a reliance on many sequencing reactions proceeding in parallel, geographically separated on a substrate such as a glass slide or microarray; a reliance on enzymatic chemistries and/or fluorescent labels to report DNA sequence; an imaging system to capture sequence information base-by-base, in real time as the reactions progress; and finally, significant back-end databasing and computational infrastructure required to re-assemble these many small fragments of the genome into coherent sequences long enough for further analysis (for example, genomic scaffolds or entire exon sequences). Notably, all of these technologies are very expensive at the present time, with single runs costing thousands of dollars. All use read lengths shorter than conventional CE, with useful reads of about 30-35 bases for the Illumina and AB systems, and around 220 bases for the Roche GS-FLX. All are aiming for longer reads in future, and claims of up to 70 for Illumina and more than 400 for Roche have been made recently. Whether the error rate in the longest portions of these reads is acceptable is unclear. Like Sanger sequencing, most of these technologies rely on amplification of the input DNA sample by the polymerase chain reaction, in order to achieve sufficient signal for detection. A growing number of technologies, most notably that of Helicos Biosciences, choose to rely on unamplified, single molecules of DNA, in order to both speed up the process and avoid any errors that might be introduced by the PCR process. Some, such as RNA polymerase-mediated sequencing [9], the zero-mode waveguide approach of Pacific Biosciences [10] and various nanopore-based approaches, can also have the advantage of taking miniaturization to its most extreme, with the aim of reducing cost as much as possible. One main problem in comparing performance of these systems is in the bewildering array of statistics provided: numbers of raw vs. "quality filtered" reads, number of bases vs. number of runs, and different methods of measuring and presenting error rates. In the highly competitive environment surrounding these technologies, each manufacturer is working hard to present its data in the best light, and the results are obfuscating at best, and at worst completely bewildering. Nowhere was this more apparent than at the recent Advances in Genome Biology and Technology conference in Marco Island, Florida, which might well be renamed the "Next-Generation DNA Sequencing Conference". Other technologies were remarkably thin on the ground, and the intense sponsorship presence of the leading players (Applied Biosystems, Illumina and Roche) was a clear message that this was to be the focus of the conference. Of existing NGS platforms, none was clearly dominant, although the Illumina Genome Analyzer has remarkable traction (in large part due to multi-instrument purchases by a number of different Genome Centers in the USA and abroad). Applied Biosystems' SOLiD is still obviously in third place, although competing claims about latest throughputs from Illumina and AB seem to indicate that these systems are relatively close in performance. Paired-end capability, long available on the 454/Roche and beginning to be available for the other two leading platforms, will be key for many applications including de novo genome sequencing and re-sequencing of complex genomes. Roche's capabilities here are already well established, and the rest of the meeting took on some feeling of a dogfight between AB and Illumina. The biggest splash of the meeting, however, was undoubtedly provided by Pacific Biosciences, who chose to finally speak openly about their Zero-Mode-Waveguide based DNA sequencing technology. Using nanofabricated pores, each containing a single modified DNA polymerase, and reading out light signals from bases added to single template molecules in real-time, the technology looks very promising and clearly generated excitement among the audience. However, the technology is not yet at the stage of sequencing real-world, complex template mixtures, but will be one to keep an eye on in years to come. With an A-list dinner panel discussion, and a fireworks show on the beach, the feeling certainly was that this year's AGBT was PacBio's coming-out party. Long the "fourth player" in NGS, Helicos Biosciences does not appear to have developed their HeliScope instrument much in recent months. Throughputs are hardly approaching Helicos' heady predictions of a billion bases an hour, and it seems very unlikely that the promise of sequencing unamplified, single-molecule templates will offset the million-dollar-plus entry point for this instrument. Helicos still does not apparently have beta instruments in the field (despite earlier predictions of a commercial release in mid-2007) and it leaves one wondering whether this technology will ever be commercially viable. By contrast, there seems to be widespread confusion surrounding Danaher Motion-Dover's "open-source" Polonator instrument co-developed with George Church's academic group [11]. It is difficult to understand the business model behind this (admittedly cheap to purchase) platform and its concept of "open, affordable sequencing". Most researchers I spoke with were concerned about this platform's lack of track record, and about the ability of the vendor to service the instrument and provide applications support. The concept of open, freely-adaptable protocols and reagent sources was also confusing to many. Interestingly, Shimadzu corporation also had a presence at AGBT, and are still apparently working on a microfabricated, very high throughput Sanger-based sequencing platform. This idea has been around at least since Shimadzu's press release in early 2002 [12], and it will also be interesting to see if this is ever commercialized, particularly since the major players in CE are not making any obvious moves toward higher-throughput, Sanger-chemistry based instruments. One major pitfall of NGS sequencing technologies is their inability to provide targeted sequencing of specific regions, relying rather on shotgun sequencing of whatever DNA molecules are loaded on the instrument. Accordingly, approaches to select or target specific genomic regions prior to sequencing garnered a lot of attention; with recently published Nimblegen array capture technology dominating the discussions [13,14]. Other approaches, including bead-based solution capture using RNA copies of DNA oligo "baits", and a variety of long-PCR tiling approaches, were also presented. One clear message is that even the best approaches result in at least 10-fold differences in representation of different captured regions (for example, exons of a gene, or adjacent parts of a contiguous genomic region). For de novo mutation detection, this may not be a problem provided sufficient depth of coverage exists in any given region studied; for quantitation (for example, measuring genomic copy number) this is clearly an issue with no obvious solution. Interestingly, there was remarkably little discussion of a previous "hot button" topic: homopolymer tracts and the relative ability of the various instruments to read through them. This perhaps indicates that this problem has been solved to everyone's satisfaction, or (more likely) that it has been accepted as an unavoidable limitation of the technologies and that some degree of user fatigue with this topic has set in. Finally, it is clear from discussions with users that there is still a real need for better assembly algorithms and programs for short-read, NGS technologies, and that every user is struggling with the tremendous volumes of data generated, even from a single instrument. In the very near future, the biggest challenges facing researchers embracing these new technologies will not be in thinking of interesting experiments to do, nor even of performing them - they will be in finding places to put the Terabytes of generated data, in choosing how long to archive the enormous primary image files generated, in building or accessing high-performance cluster computing resources for their analysis, and in how to create efficient, manageable pipelines for the analysis of hundreds of thousands of short sequencing reads. These are exciting times for genomics researchers, but with exciting times usually come difficult challenges and barriers to entry. How tractable these challenges are to individual researchers, core facilities or large genome centres, will become more apparent in the very near future. 1. Levy S, Sutton G, Ng PC, et al. (2007). The diploid genome sequence of an individual human. PLoS Biology Vol. 5, No. 10, e254 2. Service RS (2006). The race for the $1000 genome. Science 311:1544-1546. 3. Church, GM (2006). Genomes for all. Sci. Amer. 294(1):46-54. 4. Perkel JM (2006). Who wants the X Prize? The Scientist 20(12):65, December 2006. 5. Berka J (2006). Enabling routine sequencing of individual genomes. Genetic Engineering News, June 2006 suppl.:12-15. 6. Karow, J. (2008). Illumina, ABI testing how their next-gen tools can sequence a human genome. In Sequence February 26, 2008. 7. Data from http://www.454.com/news-events/publications.asp 8. Data from http://www.illumina.com/pagesnrn.ilmn?ID=93 9. Greenleaf WJ and Block SM (2006). Single-molecule, motion-based DNA sequencing using RNA polymerase. Science 313:801. 10. Levene MJ, Korlach J, Turner SW, et al. (2003). Zero-mode waveguides for single-molecule analysis at high concentrations. Science 2999:682-686. 11.Information at http://www.polonator.org 12. Shimadzu announces development of next-generation DNA sequencer. Shimadzu Corporation press release, JCN Newswire, March 19, 2002. 13. Albert TJ, Molla MN, Muzny DM, et al. (2007). Direct selection of human genomic loci by microarray hybridization. Nat. Methods. 4(11):903-5. 14. Hodges E, Xuan Z, Balija V, et al. (2007). Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39(12):1522-7. ### << Previous Next >> [ View All Perspectives ] |
|