Genomes for All



2 Gen omes for ALL Next-generation technologies that make reading DNA fast, cheap and widely accessible are coming in less than a decade. Their potential to revolutionize research and bring about the era of truly personalized medicine means the time to start preparing is now By George M. Church hen the World Wide Web launched in 1993, it seemed to catch on and spread overnight, unlike most new technologies, which typically take at least a decade to move from first “proof of concept” to broad ac - ceptance. But the Web did not really emerge in a single year. It built W on infrastructure, including the construction of the Internet between 1965 and 1993, as well as a sudden recognition that resources, such as personal computers, had passed a critical threshold. Vision and market forces also push the development and spread of new technolo - gies. The space program, for example, started with a government vision, and only much later did military and civilian uses for satellites propel the industry to commer - cial viability. Looking forward to the next technological revolution, which may be in SCIENTIFIC AMERICAN 47 COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

3 READING DNA biotechnology, one can begin to imagine what markets, vi- sions, discoveries and inventions may shape its outcome and Many techniques for decoding genomes capitalize on the complementary base-pairing rule of DNA. The genomic alphabet what critical thresholds in infrastructure and resources will contains only four letters, elemental units called bases adenine — make it possible. (A), cytosine (C), guanine (G) and thymine (T). They pair with each In 1984 and 1985, I was among a dozen or so researchers other (A with T; C with G) to form the rungs of the classic DNA ladder. who proposed a Human Genome Project (HGP) to read, for The message encoded in the sequence of bases along a strand of DNA the fi rst time, the entire instruction book for making and is effectively written twice, because knowing the identity of a base maintaining a human being contained within our DNA. The on one strand reveals its complement on the other strand. Living project’s goal was to produce one full human genome sequence cells use this rule to copy and repair their own DNA molecules for $3 billion between 1990 and 2005. below ) and label DNA of 2 1 ), and it can be exploited to copy ( – ( We managed to fi nish the easiest 93 percent a few years interest, as in the sequencing technique developed by Frederick early and to leave a legacy of useful technologies and methods. Sanger in the 1970s ( 4 3 – ) that is still the basis of most sequencing Their ongoing refi nement has brought the street price of a hu- performed today. man genome sequence accurate enough to be useful down to about $20 million today. Still, that rate means large-scale ge- DNA COPYING IN CELLS Copies netic sequencing is mostly confi ned to dedicated sequencing centers and reserved for big, expensive research projects. The “$1,000 genome” has become shorthand for the promise of DNA-sequencing capability made so affordable Nucleotide Base that individuals might think the once-in-a-lifetime expendi- ture to have a full personal genome sequence read to a disk for doctors to reference is worthwhile. Cheap sequencing technol- ogy will also make that information more meaningful by mul- tiplying the number of researchers able to study genomes and the number of genomes they can compare to understand vari- ations among individuals in both sickness and health. “Human” genomics extends beyond humans, as well, to an environment full of pathogens, allergens and benefi cial mi- Ligase crobes in our food and our bodies. Many people attend to When a cell copies its own weather maps; perhaps we might one day benefi t from daily DNA, the strands separate pathogen and allergen maps. The rapidly growing fi elds of and an enzyme called nanotechnology and industrial biotechnology, too, might ac- polymerase uses each New celerate their mining of biomes for new “smart” materials and original strand as a template strand microbes that can be harnessed for manufacturing or biore- to synthesize new complementary chains of mediation of pollution. nucleotides. A second The barrier to these applications and many more, including enzyme called ligase those we have yet to imagine, remains cost. Two National stitches these fragments Institutes of Health funding programs for “Revolutionary Ge- into a continuous strand, Polymerase nome Sequencing Technologies” challenge scientists to achieve matching them to the original. DNA Revolutions Overview/ Biotechnology’s full potential may only be realized ■ a $100,000 human genome by 2009 and a $1,000 genome by when its tools, such as genome-reading technology, 2014. An X Prize–style cash reward for the fi rst group to attain are as inexpensive and accessible as personal such benchmarks is also a possibility. And these goals are al- computers today. ); ready close. A survey of the new approaches in development New approaches to reading DNA reduce costs by cutting ■ for reading genomes illustrates the potential for breakthroughs preparatory steps, radically miniaturizing equipment that could produce a $20,000 human genome as soon as four and sequencing millions of molecules simultaneously. ) years from now Reaching the goal of low-cost sequencing will raise new and brings to light some considerations that — ■ preceding pages above ( will arise once it arrives. questions about how abundant personal genetic information is best used and by whom. The Personal Reinventing Gene Reading Genome Project is an attempt to begin exploring these issues. the size, structure m e t h o d , w i t h a n y s e qu e n c i n g and function of DNA itself can present obstacles or be turned TERESE WINSLOW STUART BRADFORD ( SCIENTIFIC AMERICAN JANUARY 2006 48 COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

4 READING DNA Before Sanger-style PCR, fragments are heated so they will separate into During sequencing, an original 1 single strands. A short nucleotide sequence called a primer is 2 DNA strand is broken into then annealed to each original template. Starting at the primer, smaller fragments and cloned polymerase links free-fl oating nucleotides (called dNTPs) into new within colonies of Escherichia complementary strands. The process is repeated over and over to coli bacteria. Once extracted generate millions of copies of each fragment. from the bacteria, the DNA fragments will undergo E. coli Template dNTP another massive round Copies of copying, known as amplifi cation, by a process called polymerase chain reaction (PCR). Colony Heat Cloned fragments Primer Single-stranded Capillary electrophoresis separates the fragments, which are negatively charged, by fragments are next drawing them toward a positively charged pole. Because the shortest fragments move 4 3 tagged in a process fastest, their order refl ects their size and their ddNTP terminators can thus be “read” as the similar to PCR but with template’s base sequence. Laser light activates the fl uorescent tags as the fragments pass a detection window, producing a color readout that is translated into a sequence. fl uorescently labeled terminator nucleotides Capillary electrophoresis (ddNTPs) added to the dNTP mixture of primers, polymerase and dNTPs. Complementary strands are built until by chance Capillaries ddNTP a ddNTP is incorporated, halting synthesis. The resulting copy fragments Laser light have varying lengths and a tagged nucleotide Detection window at one end. Polymerase Readout Tagged copy fragments Tagged fragments in 384 wells individual’s personal genome can really be said to contain six into advantages. The human genome is made up of three bil- billion base pairs. Identifying individual bases in a stretch of lion pairs of nucleotide molecules. Each of these contains one the genome requires a sensor that can detect the subnanome- of four types of bases abbreviated A, C, G and T — — that rep- ter-scale differences between the four base types. Scanning resent a genomic alphabet encoding the information stored in tunneling microscopy is one physical method that can visual- DNA. Bases typically pair off according to strict rules to form ize these tiny structures and their subtle distinctions. For read- the rungs in the ladderlike DNA structure. Because of these ing millions or billions of bases, however, most sequencing pairing rules, reading the sequence of bases along one half of techniques rely at some stage on chemistry. the ladder reveals the complementary sequence on the other A method developed by Frederick Sanger in the 1970s be- side as well. came the workhorse of the HGP and is still the basis of most Our three-billion-base-long genome is broken into 23 sep- sequencing performed today. Sometimes described as se- arate chromosomes. People usually have two full sets of these, quencing by separation, the technique requires several rounds one from each parent, that differ by 0.01 percent, so that an 49 SCIENTIFIC AMERICAN COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

5 SEQUENCING BY SYNTHESIS Most new sequencing techniques simulate aspects of natural DNA synthesis to identify the bases on a DNA strand of ). Both approaches depend on repeated cycles of chemical below interest either by “base extension” or “ligation” ( reactions, but the technologies lower sequencing costs and increase speed by miniaturizing equipment to reduce the opposite page amount of chemicals used in all steps and by reading millions of DNA fragments simultaneously ( ). LIGATION BASE EXTENSION An “anchor primer” is attached to a single-stranded template to A single-stranded DNA fragment, known as the template, is anchored to a surface with the starting point of a complementary designate the beginning of an unknown sequence ( ). Short, a strand, called the primer, attached to one of its ends ( fl uorescently labeled “query primers” are created with degenerate a ). When DNA, except for one nucleotide at the query position bearing one of the fl uorescently tagged nucleotides (dNTPs) and polymerase are four base types ( exposed to the template, a base complementary to the template will ). The enzyme ligase joins one of the query primers b be added to the primer strand ( to the anchor primer, following base-pairing rules to match the base b ). Remaining polymerase and dNTPs at the query position in the template strand ( are washed away, then laser light excites the fl uorescent tag, ). The anchor-query- c revealing the identity of the newly incorporated nucleotide ( primer complex is then stripped away and the process repeated for c ). Its a different position in the template. fl uorescent tag is then stripped away, and the process starts anew. Laser light Template Anchor primer Bead dNTP Template a Polymerase Fluorescent tag Fluorescent tag Query Primer primers b Ligase a b c Pyrophosphate detection uses c bioluminescence, instead of fl uorescence, to signal base-extension Query position events. A pyrophosphate molecule is released when a base is added to the complementary strand, causing a chemical reaction with a luminescent protein that produces a fl ash of light. approaches to sequencing therefore seek to increase speed and of duplication to produce large numbers of copies of the ge- reduce costs by cutting out the slow separation steps, minia- nome stretch of interest. The fi nal round yields copy frag- turizing components to reduce chemical volumes, and execut- ments of varying lengths, each terminating with a fl uores- ing reactions in a massively parallel fashion so that millions of cently tagged base. Separating these fragments by size in a sequence fragments are read simultaneously. process called electrophoresis, then reading the fl uorescent Many research groups have converged on methods often signal of each terminal tag as it passes by a viewer, provides lumped together under the heading of sequencing by synthesis the sequence of bases in the original strand [ see box on pre- because they exploit high-fi delity processes that living systems page ceding two ]. s use to copy and repair their own genomes. When a cell is pre- Reliability and accuracy are advantages of Sanger sequenc- paring to divide, for example, its DNA ladder splits into single ing, although even with refi nements over the years, the meth- strands, and an enzyme called polymerase moves along each od remains time-consuming and expensive. Most alternative TERESE WINSLOW JANUARY 2006 50 SCIENTIFIC AMERICAN COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

6 SEQUENCING BY SYNTHESIS types is usually involved. If a fl uorescent molecule is attached to the added bases, the color signal it gives off can be seen us- AMPLIFICATION ing optical microscopy. Fluorescence detection is employed in Because light signals are diffi cult to detect at the scale of a single DNA molecule, both base-extension and ligation sequencing by many groups, base-extension or ligation reactions are often performed on millions of copies including those of Michael Metzker and his colleagues at Bay- of the same template strand simultaneously. Cell-free methods ( ) for a b and making these copies involve PCR on a miniaturized scale. lor University, Robi Mitra of Washington University in St. Louis, my own lab at Harvard Medical School and at Agen- After PCR Polonies court Bioscience Corporation. Template An alternative method uses bioluminescent proteins, such as the fi refl y enzyme luciferase, to detect pyrophosphate re- leased when a base attaches to the primer strand. Developed by Mostafa Ronaghi, who is now at Stanford University, this sys- Polony cluster tem is used by Pyrosequencing/Biotage and 454 Life Sciences. a created directly on the surface of a slide — polymerase colonies — Polonies Both forms of detection usually require multiple instances or gel each contain a primer, which a template fragment can fi nd and bind to. PCR of the matching reaction to happen at the same time to pro- within each polony produces a cluster containing millions of template copies. duce a signal strong enough to be seen, so many copies of the After PCR sequence of interest are tested simultaneously. Some investi- Template Adapter gators, however, are working on ways to detect fl uorescent signals emitted from just one template strand molecule. Ste- phen Quake of the California Institute of Technology and Bead scientists at Helicos Biosciences and Nanofl uidics are all tak- Droplets containing polymerase within b ing this single-molecule approach, intended to save time and Bead polony an oil emulsion can serve as tiny PCR costs by eliminating the need to make copies of the template chambers to produce bead polonies. When a template fragment attached to to be sequenced. a bead is added to each droplet, PCR Oil emulsion Detecting single fl uorescent molecules remains extremely produces 10 million copies of the challenging. Because some 5 percent are missed, more “reads” template, all attached to the bead. must be performed to fi ll in the resulting gap errors. That is why most groups fi rst copy, or amplify, the single DNA tem- MULTIPLEXING plate of interest by a process called polymerase chain reac- Sequencing thousands or millions of template fragments in parallel maximizes tion (PCR). In this step, too, a variety of approaches have speed. A single-molecule base-extension system using fl uorescent-signal detection, for example, places hundreds of millions of different template emerged that make the use of bacteria to generate DNA cop- ). Another method immobilizes millions fragments on a single array ( below left ies unnecessary. of bead polonies on a gel surface for simultaneous sequencing by ligation One cell-free amplifi cation method, developed by Eric Ka- with fl uorescence signals, shown in the image at right below, which represents 0.01 percent of the total slide area. washima of the Serono Pharmaceutical Research Institute in Geneva, Alexander Chetverin of the Russian Academy of Sci- ences, and Mitra when he was at Harvard, creates individual colonies of polymerase freely arrayed directly on — polonies — the surface of a microscope slide or a layer of gel. A single tem- plate molecule undergoes PCR within each polony, producing millions of copies, which grow rather like a bacterial colony from the central original template. Because each resulting po- lony cluster is one micron wide and one femtoliter in volume, Single-molecule array Bead polonies billions of them can fi t onto a single slide. A variation on this system fi rst produces polonies on tiny beads inside droplets within an emulsion. After the reaction of these. Using the old strands as templates and following base- millions of such beads, each bearing copies of a different tem- pairing rules, polymerase catalyzes the addition of nucleotides plate, can be placed in individual wells or immobilized by a gel into complementary sequences. Another enzyme called ligase where sequencing is performed on all of them simultaneously. then joins these pieces into whole complementary strands These methods of template amplifi cation and of sequenc- while matching them to the original templates. ing by base extension or by ligation are just a few representa- Sequencing-by-synthesis methods simulate parts of this VOL. 309; SEPTEMBER 9, tive examples of the approaches dozens of different academic process on a single DNA strand of interest. As bases are added and corporate research groups are taking to sequencing by by polymerase to the starting point of a new complementary SCIENCE, synthesis. strand, known as a primer, or recognized by ligase as a match, Still another technique, sequencing by hybridization, also the template’s sequence is revealed. uses fl uorescence to generate a visible signal and, like sequenc- How such events are detected varies, but one of two signal 2005. WITH PERMISSION BY A A AS ET AL. IN BEAD POLONIES FROM JAY SHENDURE SCIENTIFIC AMERICAN 51 COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

7 NANOPORE SEQUENCING Like electrophoresis, this technique draws DNA toward a positive charge. To get there, the molecule must cross a membrane by going through a pore whose Time in Seconds 1 2 3 4 5 narrowest diameter of 1.5 nanometers will allow only single-stranded DNA to pass b a ( ). As the strand transits the pore, nucleotides block the opening momentarily, Open –120 pore altering the membrane’s electrical conductance, measured in picoamperes (pA). Physical differences between the four base types produce blockades of different degrees and durations ( b ). A close-up of a blockade event measurement shows a conductance change when a 150-nucleotide strand of a single base type passed through the pore ( c ). Picoamperes Refining this method a –15 to improve its resolution to single Conductance bases could produce c a sequence readout such as the 1.5 nm –120 pA hypothetical example open pore at bottom ( ) and d yield a sequencing Single-stranded technique capable of 500 DNA microseconds reading a whole TIBTECH, human genome in just 20 hours without –15 pA expensive DNA Nanopore copying steps and chemical reactions. Hypothetical readout d Membrane A A A A T T T T T T C C C C in a DNA molecule. Grouped under the heading of nanopore ing by ligation, exploits the tendency of DNA strands to bind, sequencing, these methods focus on the physical differences or hybridize, with their complementary sequences and not between the four base types to produce a readable signal. When with mismatched sequences. This system, employed by Af - a single strand of DNA passes through a 1.5-nanometer pore, fymetrix, Perlegen Sciences and Illumina, is already in wide - it causes fluctuations in the pore’s electrical conductance. Each spread commercial use, primarily to look for variations in base type produces a slightly different conductance change that known gene sequences. It requires synthesizing short single VOL. 1, NO. 1; JANUARY 2002 see box above ]. Devised by Dan can be used to identify it [ strands of DNA in every possible combination of base se - Branton of Harvard, Dave Deamer of the University of Cali quences and then arranging them on a large slide. When cop - - fornia, Santa Cruz, and me, this method is in development now ies of the template strand whose sequence is unknown are washed across this array, they will bind to their complemen by Agilent Technologies and others with interesting variations, - such as fluorescent signal detection. tary sequences. The best match produces the brightest fluores - cent signal. Illumina also adds a base-extension step to this test of hybridization specificity. Lowering Cost One final technique with great long-term promise takes an t h e s e n e x t - g e n e r a t i o n e va l ua t i n g sequencing NATURE REVIEWS DRUG DISCOVERY, entirely different approach to identifying the individual bases systems against one another and against the Sanger method illustrates some of the factors that will influence their useful - is professor of genetics at Harvard Medical GEORGE M. CHURCH ness. For example, two research groups, my own at Harvard School and director of the Harvard-Lipper Center for Computa - and one from 454 Life Sciences, recently published peer- tional Genetics, U.S. Department of Energy Genome Technology reviewed descriptions of genome-scale sequencing projects - Laboratory, and the National Institutes of Health Centers of Ex that allow for a direct comparison. cellence in Genomic Science. His research spans and integrates My colleagues and I described a sequencing-by-ligation THE AUTHOR technologies for analyzing and synthesizing biomolecules and system that used polony bead amplification of the template cells. He holds 10 U.S. patents and has been scientific adviser DNA and a common digital microscope to read fluorescent to more than 20 companies. signals. The 454 group used a similar oil-emulsion PCR for VOL. 18; APRIL 2000. © 2000 ELSEVIER SCIENCE LTD. ALL RIGHTS RESERVED; AND “MOVING SMALLER IN DRUG DISCOVERY AND DELIVERY,” BY DAVID A. L AVAN, DAVID M. LYNN AND ROBERT L ANGER, IN TERESE WINSLOW; SOURCES: “NANOPORES AND NUCLEIC ACIDS: PROSPECTS FOR ULTRARAPID SEQUENCING,” BY DAVID W. DEAMER AND MARK AKESON, IN SCIENTIFIC AMERICAN JANUARY 2006 52 COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

8 43 runs per base amplification followed by base-extension sequencing with py - — of the target genome, 454 achieved accu - rophosphate detection in an array of wells. Both groups read racy of one error per 2,500 base pairs. The Harvard group had about the same amount of sequence, 30 million base pairs, in less than one error per three million base pairs with 7 cover - × each sequencing run. Our system read about 400 base pairs a age. To handle templates, both teams employed capture beads, second, whereas 454 read 1,700 a second. Sequencing usually whose size affects the amount of expensive reagents consumed. involves performing multiple runs to produce a more accurate Our beads were one micron in diameter, whereas 454 used consensus sequence. With 43-times coverage (43 28-micron beads in 75-picoliter wells. ) — × that is, THE PERSONAL GENOME PROJECT side of the country to notice, and inform Every baby born in the U.S. today is me, that I was long overdue for a follow- tested for at least one genetic disease, up test of my cholesterol medication. phenylketonuria, before he or she leaves the hospital. Certain lung cancer The tip led to a change in my dose and diet and consequently to a dramatic patients are tested for variations in a gene called EGFR lowering of at least one type of risk. In to see if they are likely the future this kind of experience would to respond to the drug Iressa. Genetic not rely on transcontinental serendipity tests indicating how a patient will but could spawn a new industry of third- metabolize other drugs are increasingly party genomic software tools. used to determine the drugs’ dosage. Beginnings of the personalized The PGP has approval from the Harvard Medical School Internal Review medicine that will be possible with low- cost personal genomes can already be Board, and like all human research glimpsed, and demand for it is growing. subjects, participants must be informed Beyond health concerns, we also of potential risks before consenting to want to know our genealogy. How closely provide their data. Every newly are we related to Genghis Khan or to each recruited PGP volunteer will also be able other? We want to know what interaction to review the experience of previous GEORGE M. CHURCH, shown with images of fluorescent polonies, is one of a group of genes with other genes and with the subjects before giving informed of volunteers planning to open their consent. The project’s open nature, environment shapes our faces, our genomes to public scrutiny. including fully identifying subjects with bodies, our dispositions. Thousands or millions of data sets comprising their data, will be less risky both to the These resources will include full and individuals’ whole genome subjects and the project than the (46-chromosome) genome sequences, — the traits that result from phenome alternative of promising privacy and digital medical records, as well as — instructions encoded in the genome risking accidental release of information that could one day be part will make it possible to start unraveling information or access by hackers. of a personal health profile, such as some of those complex pathways. Like the free data access policy comprehensive data about RNA and Yet the prospect of this new type of established by the HGP, the openness of proteins, body and facial measure- personal information suddenly the PGP is designed to maximize potential ments, and MRI and other cutting-edge becoming widely available also prompts for discovery. In addition to providing a imagery. We will also create and deposit — worries about how it might be misused scientific resource, the project also offers human cell lines representing each by insurers, employers, law- an experiment in public access and subject in the Coriell repository of the enforcement agents, friends, neighbors, insurance coverage. In its early stages, National Institute of General Medical commercial interests or criminals. private donors will help to insure a diverse Sciences. Our purpose is to make all this No one can predict what living in an set of human subjects against the event genomic and trait information broadly era of personal genomics will be like that they experience genetic accessible so that anyone can mine it to until the waters are tested. That is why discrimination as a consequence of the test their own hypotheses and my colleagues and I recently launched PGP. This charity-driven mechanism has — and be inspired to come up algorithms the Personal Genome Project (PGP). the advantage of not needing to be with new ones. With this natural next step after the profitable at first, but insurance A recent incident provides a simple Human Genome Project, we hope to companies may nonetheless be very example of what might happen. A few — G.M.C. explore possible rewards and risks of interested in its outcome. are — my own — PGP medical records personal genomics by recruiting already publicly available online, which volunteers to make their own genome Details of the PGP can be found at prompted a hematologist on the other / and phenome data openly available. JOHN SOARES 53 SCIENTIFIC AMERICAN COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

9 will be needed to process sequence information so that it is The best available electrophoresis-based sequencing meth - manageable by doctors, for example. They will need a meth ods average 150 base pairs per dollar for “finished” sequence. - The 454 group did not publish a project cost, but the Harvard od to derive an individualized priority list for each patient of team’s finished sequence cost of 1,400 base pairs per $1 rep the top 10 or so genetic variations likely to be important. - Equally essential will be assessing the effects of widespread resents a ninefold reduction in price. access to this technology on people. These and other new techniques are expected very soon to From its outset, the HGP established a $10-million-a-year bring the cost of sequencing the six billion base pairs of a program to study and address the ethical, legal and social is personal genome down to $100,000. For any next-generation - sequencing method, pushing costs still lower will depend on sues that would be raised by human genome sequencing. Par - a few fundamental factors. Now that automation is common ticipants in the effort agreed to make all our data publicly - available with unprecedented speed place in all systems, the biggest expenditures are for chemical - within one week of dis — reagents and equipment. Miniaturization has already reduced covery — and we rose to fend off attempts to commercialize reagent use relative to conventional Sanger reactions one bil human nature. Special care was also taken to protect the ano - - nymity of the public genomes (the “human genome” we pro lionfold from microliters to femtoliters. - Many analytic imaging devices can collect raw data at duced is a mosaic of several people’s chromosomes). But many rates of one billion bytes (a gigabyte) per minute, and com of the really big questions remain, such as how to ensure pri - - puters can process the information at a speed of several bil vacy and fairness in the use of personal genetic information by - scientists, insurers, employers, courts, schools, adoption agen lion operations a second. Therefore, any imaging device lim - - We have much work in a short time . LOW-COST GENOMES to get ready for cies, the government, or individuals making clinical and re ited by a slow physical or chemical process, such as electro - - phoresis or enzymatic reaction, or one that is not tightly productive decisions. packed in space and time, making every pixel count, will be These difficult and important questions need to be re - correspondingly more costly to operate per unit DNA base searched as rigorously as the technological and biological dis - determined. covery aspects of human genomics. My colleagues and I have Another consideration in judging emerging sequencing see box on pre therefore initiated a Personal Genome Project [ - technologies is how they will be used. Newer methods tend to ceding page ] to begin exploring the potential risks and re - have short read-lengths of five to 400 base pairs, compared wards of living in an age of personal genomics. with typical Sanger read-lengths of 800 base pairs. Sequencing When we invest in stocks or real estate or relationships, we and piecing together a previously unknown genome from understand that nothing is a sure thing. We think probabilisti - scratch is therefore much harder with the new techniques. If cally about risk versus value and accept that markets, like life, medicine is the primary driver of widespread sequencing, how are complex. Just as personal digital technologies have caused - economic, social and scientific revolutions unimagined when ever, we will be largely resequencing the human genome look - we had our first few computers, we must expect and prepare ing for minute variations in individuals’ DNA, and short read- for similar changes as we move forward from our first few lengths will not be such a problem. genomes. Accuracy requirements will also be a function of the ap - plications. Diagnostic uses might demand a reduction in error MORE TO EXPLORE rates below the current HGP standard of 0.01 percent, because that still permits 600,000 errors per human genome. At the Advanced Sequencing Technologies: Methods and Goals. Jay Shendure, Robi D. Mitra, Chris Varma and George M. Church in other end of the spectrum, high-error-rate (4 percent) random Nature Reviews Genetics, Vol. 5, pages 335–344; May 2004. sampling of the genome has proved useful for discovery and DOE Joint Genome Institute, U.S. Dept. of How Sequencing Is Done. classification of various RNA and tissue types. A similar Energy, Office of Science, updated September 9, 2004. Available at “shotgun” strategy is applied in ecological sampling, where as l few as 20 base pairs are sufficient to identify an organism in October NHGRI Seeks Next Generation of Sequencing Technologies. 0 2004 news release available at an ecosystem. Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Jay Shendure et al. in Science, Vol. 309, pages 1728–1732; Raising Value September 9, 2005. these new sequencing technologies, be yon d de v e l opi ng Genome Sequencing in Microfabricated High-Density Picolitre we have much work to do in a short amount of time to get Vol. 437, pages 376–380; Nature, Marcel Margulies et al. in Reactors. ready for the advent of low-cost genome reading. Software September 15, 2005. 54 SCIENTIFIC AMERICAN JANUARY 2006 COPYRIGHT 2005 SCIENTIFIC AMERICAN, INC.

Related documents