Genomics: Beyond the book of life

URL: http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v408/n6815/full/408894a0_fs.html

Date accessed: 06 February 2001

Nature 408, 894 - 896 (2000) © Macmillan Publishers Ltd.

nature 21/28th December 2000

PETER ALDHOUS

It was the year of genomes. Every week seemed to bring another landmark — be it human, animal, plant or pathogen. But there was more to 2000 than strings of 'A's, 'C's, 'G's and 'T's. Nature explores some of the highs, lows and emerging trends behind the year's scientific headlines.

Where were you on 26 June 2000, while history was being made? On that day, at press conferences throughout the world, scientific leaders ceremonially opened the 'book of life', announcing the completion of a working draft of the human genome.

BOB BOSTON, WASHINGTON UNIVERSITY/LIAISON AGENCY

Colour coded: the relentless accumulation of human genome data (left) brought forth rhetoric from world leaders and an uneasy truce between Celera head Craig Venter and the publicly funded effort.

In Washington, leaders of the publicly funded Human Genome Project (HGP) made peace at the White House with their bitter rival, Craig Venter's Celera Genomics of Rockville, Maryland — which simultaneously declared complete its own 'first assembly' of the human genome. President Bill Clinton, the consummate politician, provided the necessary rhetoric: "With this profound new knowledge, humankind is on the verge of gaining immense new power."

In reality, the chosen date was arbitrary. But by neatly bisecting 2000, the joint HGP/Celera announcement provided an appropriate point to mark the switch from the genome sequencing era to the age of functional genomics. Although much DNA remains to be sequenced, the emphasis will now be on deciphering what the accumulated sequence information means. And that is where the scientific fun really starts.

"The stage is set for a full scale exploration of the ways in which this disarmingly simple one-dimensional instruction book is converted into the four dimensions of space and time that characterize living organisms," says Francis Collins, director of the National Human Genome Research Institute in Bethesda, Maryland.

Despite its big budgets and high public profile, genomics has so far been seen by many biologists as a subdiscipline of genetics. But with the expansion of functional genomics, it will soon be influencing every area of biology. And given concerns about the implications of gene patenting, and privacy of genetic information, genomics also promises to keep social scientists, ethicists and lawyers extremely busy.

In the first half of this year, relations between the HGP and Celera became seriously strained, as the rivals failed to agree terms for collaboration. The impasse was the issue of access to the sequence data. The HGP refused to accept restrictions demanded by Celera, which argued that it had to protect its commercial interests. But in the end, political pressure on the two sides to stop the squabbling — which was in danger of undermining public recognition of both groups' achievements — proved sufficient to broker the joint announcement, if not a meaningful collaboration.

The quality of the HGP's draft and Celera's assembly still remains unclear. Peer-reviewed papers providing a detailed synthesis of each group's work are expected to appear early in the New Year. Filling in the remaining gaps in the sequence, and repeated sequencing by the HGP to correct any errors in the code, could take until 2003. But one chromosome, number 21, was unveiled in finished form in May1 — the second to be completed after chromosome 22 (ref. 2).

This year's model
While we wait for publication of the human genome, there have been plenty of other sequencing milestones this year. Laboratory 'model' organisms led the way, with the fruitfly Drosophila melanogaster3 and the workhorse of plant science, Arabidopsis thaliana4, being the published highlights. Complete sequences for pathogenic microorganisms also came thick and fast, including the cholera bacterium Vibrio cholerae 5 and Pseudomonas aeruginosa6, a common cause of opportunistic infections. And among the pathogens came a paper proving that genomics is not the preserve of developed nations. In July, a Brazilian consortium completed the genome of Xylella fastidiosa7, which causes variegated chlorosis, a disease of citrus crops.

As each organism's sequence is completed, the focus shifts to characterizing all of its genes, and determining their functions — or 'annotating' the genome. For D. melanogaster, this task was kicked off in late 1999 with a two-week jamboree held at Celera's Rockville headquarters, at which the company's scientists worked alongside academic fruitfly biologists. Annotating the human genome may require innovative 'collaboratory' strategies, with scientists sharing data over the Internet.

The first job is to determine the total number of human genes, now generally thought to lie somewhere between 30,000 and 70,000. Geneticists and bioinformaticists are running a sweepstake on the outcome — see http://www.ensembl.org/Genesweep for the current distribution of bets. But going beyond gene numbers to investigate gene function is where things really start to get difficult.

Comparisons between species can help. Hints at the function of unknown genes may come from similarities to sequences in well-studied genes from model organisms. More generally, cross-species comparisons can identify conserved sequences involved in processes of fundamental biological significance, and help understand the genome's overall structure.

This is why fruitfly biologists are gearing up to sequence a second Drosophila species, probably D. pseudoobscura, and researchers annotating the genome of the nematode worm Caenorhabditis elegans would like to sequence C. briggsiae. For the human genome, a complete mouse genome sequence will be similarly valuable — and Celera and publicly funded researchers are working separately towards this new goal.

Although much progress can be made using computational tools that compare gene sequences, or predict protein structure from DNA sequence information, the mouse sequence will also offer another advantage to those annotating the human genome. It is possible to experiment with mice and other model organisms in a way that is impossible with people, disabling genes systematically to determine their function.

Indeed, functional genomics will require a diversity of techniques to disrupt normal gene activity. Although it probably cannot be generally applied in mammals, one of the most promising is a phenomenon called RNA interference. This is the gene-silencing effect that occurs when cells are exposed to double-stranded RNA matching the sequence of a given gene. The technique has already been applied systematically in C. elegans: two papers published in November8, 9 targeted two of the worm's six chromosomes, identifying many genes involved in development and cell division.

Some researchers are using small molecules to disable particular genes and so determine their precise functions. This field of 'chemical genetics' scored a notable success in September. By applying some subtle genetic manipulation to render genes susceptible to a particular chemical 'switch', a team of chemists and geneticists showed that the approach can be used to selectively disable any protein kinase10. This large family of enzymes has previously proved particularly resistant to detailed functional analysis.

Other research groups are using chemical genetics in a less directed way, adding a range of small molecules to cells until they find one that causes a particular effect, and then working out which gene the molecule has disrupted11. This approach is conceptually similar to 'forward genetic' screens that randomly mutate the genome of model organisms to create a wide range of biologically interesting mutants. Two such screens for mutant mice, unveiled in August12, 13, are expected to become important resources for those interpreting the mouse genome.

Code breakers
But ultimately, no amount of experimental studies on model organisms will uncover all of the secrets hidden in the human genetic code. And on the human side, one of the top priorities is to identify the subtle genetic variations that make people susceptible to big killers such as cancer and heart disease. To this end, several large population studies examining genetic variation, lifestyle factors and disease are now getting underway throughout the world.

This task has been helped by the efforts of a pioneering collaboration between academic groups, multinational companies and Britain's charitable Wellcome Trust. The Single Nucleotide Polymorphism Consortium is preparing a map of many hundreds of thousands of genetic markers that will be an invaluable resource for anyone trying to pin down the location of disease genes (see http://snp.cshl.org).

Other researchers believe that the key to turning the human genome into tomorrow's drugs will be the industrialization of structural biology. The leaders of this new field of structural genomics14 are now busy creating 'protein structure factories'. Here, DNA sequences will be engineered into cells to culture large quantities of protein, which will be purified and subjected to structural analysis. The goal is to automate the entire process, churning out protein structures on an unprecedented scale.

But even these huge efforts are still based on analysing genes and proteins one by one. In real biological systems, genes and proteins work in concert — which is why perhaps the biggest growth area within functional genomics in 2000 was the use of DNA microarrays, or 'gene chips', to investigate wider patterns of gene expression.

A typical microarray consists of hundreds, or even thousands, of DNA sequences from the coding regions of individual genes immobilized on a surface. Messenger RNA from a given sample will bind to the corresponding sequence on the chip. As a result, DNA microarrays allow biologists to see at a glance which genes are active within a given cell, or tissue. Among this year's highlights were two studies of the yeast Saccharomyces cerevisiae — one examining patterns of gene expression associated with particular signal transduction pathways15, the second documenting changes in gene expression caused by mutations and exposure to a range of chemicals16. Microarrays were also used this year to profile gene expression in human cancers17, 18.

Microarray technology, this time involving immobilized proteins19, could prove crucial in what is the biggest functional genomic challenge: proteomics. This field aims to understand the function of every protein produced by an organism — an enormous task, given that processes such as RNA editing may allow the tens of thousands of genes within the human genome to produce several million distinct proteins. Proteomics received a boost in February, with a pioneering study of protein–protein interactions in S. cerevisiae 20. But in the long run, whether or not the discipline achieves its ambitious goals may depend on the development of advanced new technologies for protein analysis.

Companies and academic groups worldwide are now jostling for position on this wild functional genomic frontier. And true to form, Venter vows that Celera will be among the leaders. If that provides the same competitive spur to this field as it did in the race to sequence the human genome, prepare for another fast-paced year.

 

------------------

References
1. The Chromosome 21 Mapping and Sequencing Consortium Nature 405, 311-319 (2000).
2. Dunham, I. et al. Nature 402, 489-495 (1999). | PubMed |
3. Adams, M. D. et al. Science 287, 2185-2195 (2000). | Article | PubMed |
4. The Arabidopsis Genome Initiative Nature 408, 796-815 (2000).
5. Heidelberg, J. F. et al. Nature 406, 477-483 (2000). | Article | PubMed |
6. Stover, C. K. et al. Nature 406, 959-964 (2000). | Article | PubMed |
7. Simpson, A. et al. Nature 406, 151-157 (2000). | Article | PubMed |
8. Fraser, A. G. et al. Nature 408, 325-330 (2000). | Article | PubMed |
9. Gönczy, P. et al. Nature 408, 331-336 (2000). | Article | PubMed |
10. Bishop, A. C. et al. Nature 407, 395-401 (2000). | Article | PubMed |
11. Mayer, T. U. et al. Science 286, 971-974 (2000).
12. Nolen, P. M. et al. Nature Genet. 25, 440-443 (2000). | PubMed |
13. Hrabé de Angelis, M. et al. Nature Genet. 25, 444-447 (2000). | PubMed |
14. Nature Struct. Biol. 7, Suppl., 927-994 (2000).
15. Roberts, C. J. et al. Science 287, 873-880 (2000). | Article | PubMed |
16. Hughes, T. R. et al. Cell 102, 109-126 (2000). | PubMed |
17. Alizadeh, A. A. et al. Nature 403, 503-511 (2000). | Article | PubMed |
18. Perou, C. M. et al. Nature 406, 747-752 (2000). | Article | PubMed |
19. MacBeath, G. & Schreiber, S. L. Science 289, 1760-1763 (2000). | PubMed |
20. Uetz, P. et al. Nature 403, 623-627 (2000). | PubMed |


Macmillan MagazinesNature © Macmillan Publishers Ltd 2000 Registered No. 785998 England.

Categories: 32. Genome Project, 54. Proteomics