The Next Chapter in the Book of Life: Structural Genomics

URL: http://www.nytimes.com/library/national/science/070400sci-genome-protein.html

Date accessed: 15 February 2001

By ANDREW POLLACK
 


Michael Poche for The New York Times

Dr. Tim Harris, top, president of Structural GenomiX Inc., and Dr. Nathaniel E. David, bottom, director of business development at Syrrx, are involved in a project to identify the shapes of proteins, information that could aid in the design of drugs and provide clues to the functions of thousands of genes.

S AN DIEGO -- Now that scientists have effectively determined the complete sequence of human DNA, research teams are gearing up for a follow-on project that many say will be every bit as ambitious and difficult -- but also full of promise for medical research.

 

The new endeavor focuses on proteins, which are the substances made by the body in response to instructions provided by the genes. It is actually the proteins that form the body and carry out its functions, so they are in some sense of more direct relevance to medicine than the genes themselves.

The Human Genome Project has led to the discovery of genes coding for the production of tens of thousands of proteins. But in most cases, the functions of these genes and their proteins remains unknown. One clue, however, could be provided by the shape of a protein, since proteins interact with other molecules based on their three-dimensional configuration, like keys fitting into locks.

So the new effort, known as structural genomics, aims to determine the three-dimensional structures of thousands upon thousands of proteins, much as the genome project determined gene sequences en masse.

 

"It's basically the next step after the Human Genome Project," said Dr. Helen M. Berman, a professor of chemistry at Rutgers University and director of the Protein Data Bank, a federally financed database of protein structures. "Instead of a list of letters, we'll understand biology in a three-dimensional way."

Besides helping to determine the function of a protein, knowing its shape could also make it easier to design drugs that bind to the protein, turning it on or off. AIDS drugs known as protease inhibitors were designed using this so-called structure-based approach. But the technique has not been widely used until now, in part because the structures of many interesting proteins are not known.

Much like the quest for the genome itself, structural genomics is being pursued by a big publicly financed project that will make its data public and by private companies that will sell their data, raising the prospect of the kind of public-private rivalry that marked the genome effort.

Proponents of structural genomics say structure determination is now where the genome search was about 15 years ago, when scientists began to envision sequencing the entire genome, without knowing the functions of the genes in advance, rather than working on one gene at a time. In the past it took years to determine the three-dimensional structure of a protein by examining protein crystals using X-rays. But new techniques -- particularly powerful X-ray generators known as synchrotrons -- have reduced the time to weeks.

"We're building tools that will allow us to solve structures at a historically unprecedented rate," said Dr. Nathaniel E. David, director of business development at Syrrx, a new structural genomics company based here in San Diego. It has hired former General Motors engineers to design robots to automate the process of preparing and examining protein crystals.

Still, no one believes it will be cheap or easy. "This is big science," said Dr. Tim Harris, president of Structural GenomiX Inc., a year-old company also based here. Dr. Harris, who likens his company to Celera Genomics, the private company that raced the public consortium in the genome effort, expects to spend $100 million to $500 million to determine 5,000 protein shapes in five years. "This is not for the faint-hearted," he said.

In April, the National Institutes of Health and Britain's Wellcome Trust sponsored the first international structural genomics meeting to discuss plans for the public project.

The two organizations are also the sponsors of the Human Genome Project.

In the next few months the N.I.H.

will award grants for four to six pilot projects in its Protein Structure Initiative. Total financing for the five-year pilot phase could reach $100 million, said Dr. Marvin Cassman, director of the National Institute of General Medical Sciences, the N.I.H. division coordinating the project. If the pilots work, financing could grow much beyond that, though a final figure is not known, he said. Projects are also getting under way in Germany, Japan and other countries.

The striking parallels between structural genomics and genomics raise questions about whether there will be the competition and petty sniping that accompanied the efforts by Celera and the Human Genome Project until their recent truce. "I think everybody's aware of it and everybody's thinking about it," said Dr. Raymond C. Stevens, a professor of molecular biology at the Scripps Research Institute.

For now, both academic scientists and corporate ones say this will not be the case. Celera and the genome project are pursuing the same target, the complete human genetic code. But no structural genomics company or laboratory will be able to do more than a small subset of all proteins. And they will choose their targets differently.

The private companies will pick proteins that are medically useful. The public project aims to determine the shapes of a wide variety of proteins to understand protein structures better.

"We're choosing our targets on the basis of trying to cover as much of the protein landscape as we can," Dr. Cassman said. "That's not the same thing as choosing your targets based on medical interest."

Still, some applicants for N.I.H. grants want to focus on medically useful proteins. And the potential for conflict is there because the public project calls for the structures to be placed in the Protein Data Bank, freely available to all researchers. But the private companies say they will offer most of their data only to subscribers and will patent the structures and functions they discover.

 

"We want to lock up as much of that proteome as we can," said Dr. John Chiplin, president of GeneFormatics Inc., a start-up company based here. The term "proteome" refers to all proteins produced by a species, much as the genome is the entire set of genes.

The patenting of protein structures could be as disputed as the patenting of genes. The federal Patent and Trademark Office has begun granting patents on the three-dimensional coordinates of a protein in a computer readable form. But new Patent Office guidelines make it unlikely that patents will be granted on protein structures unless the protein's function is known.

Complicating the public-private issue is that many of the scientists seeking the N.I.H. grants are also the founders of companies. Dr. Stevens at Scripps is a co-founder of Syrrx, which is being spun out of a Novartis research center here. Structural GenomiX was founded by Dr. Wayne A. Hendrickson and Dr. Barry Honig, both of Columbia University. Another company, Structure Function Genomics in Princeton, N.J., is being started by Gaetano T. Montelione and Stephen Anderson, both professors at Rutgers.




Right now, there is more conflict within the academic community than between the public and private sectors.

Some scientists oppose the public project, saying that blindly determining structures, rather than working on proteins of known interest, is neither useful nor intellectually challenging.

"It's really hack money," said Dr. Michael G. Rossmann, professor of biological sciences at Purdue University. "The concepts are to do things very quickly without thinking, and I don't think that creates good science."

Proponents of structural genomics point out that similar objections were raised at first to the Human Genome Project. But now, they say, most scientists agree that having the entire gene sequence is extremely valuable.

Structural genomics is just one endeavor spawned by completion of the genome project. Another related one is proteomics, which involves determining which proteins are in different cells and how the proteins work together.

The genome project is expected to lead to the discovery of tens of thousands of genes, providing the input for structural genomics and also the need for it. The challenge now is to figure out what the genes do. Looking at the shape of the protein produced by a gene is one way to do that. "Form and function are tightly coupled," said Sung-Hou Kim, a professor of chemistry at the University of California at Berkeley.

For instance, scientists knew that a protein dubbed Tubby was connected with obesity but did not know how it worked. So Dr. Lawrence Shapiro and colleagues at the Mount Sinai School of Medicine determined the structure of the Tubby protein. Its shape and positively charged face led them to deduce that the protein binds to DNA, turning a gene on. With that clue, scientists can now begin searching for the gene regulated by Tubby.

Still, critics challenge how valuable structural genomics will be. The shape can provide clues only about the type of molecule a protein is, for instance, that it binds to DNA, but not what metabolic pathway or disease it is associated with.

Dr. Thomas A. Steitz, a professor of molecular biophysics and biochemistry at Yale University, said that in many cases scientists cannot tell anything about function from the shape. He also said that many proteins are interlaced with others in the body, so the shape they have when isolated is not the same as they are in the body.

Moreover, structural genomics will tackle only a tiny subset of all protein structures, making it far less comprehensive than the Human Genome Project.

That is partly because there are more proteins than genes, since a single gene can make more than one protein variant and because proteins can change shape as they act. Some experts think there are several hundred thousand human proteins even though there are only about 100,000 genes. Including plants, animals and micro-organisms, there are millions of proteins.

And it is almost impossible to determine the shapes of some proteins because they cannot be made into crystals needed for the X-ray crystallography. Proteins embedded in cell membranes -- estimated to be about a third of all proteins -- are almost impossible to crystallize. And many of these are the very proteins to which drugs bind.

Also, even with the more efficient techniques, it can still take weeks and cost tens of thousands of dollars to determine a single structure. Structural GenomiX's slogan is "5,000 proteins in 5 years." The N.I.H. effort aims at about 10,000 to 20,000 proteins over 10 years.

Some question how valuable such limited samples would be. "With a population of six million sequences, 5,000 structures in five years doesn't cut the mustard," said Dr. Chiplin of GeneFormatics.

Instead of X-ray analysis, GeneFormatics plans to use computers to predict the shape and the function of proteins, at the rate of thousands of proteins a day. But computer analysis has its own shortcomings.

Proteins are made of building blocks known as amino acids that are arranged like beads on a string. But the string then folds into a complex shape. A three-letter genetic sequence codes for each amino acid, so it is straightforward to predict the amino acid sequence from the gene. But no one has yet been able to predict how this string of amino acids will fold.

The International Business Machines Corporation is building a computer 500 times faster than any available today to predict how a protein will fold by calculating the atomic forces acting on the amino acid chain. Even with that computer, it is expected to take an entire year of number crunching to simulate the folding of a single protein, a job the body does almost instantaneously.

 

However, if a protein shares more than 30 percent of its amino acid sequence with a protein with a known shape, computers can estimate the unknown protein's shape reasonably well. Indeed, the N.I.H. project is based on the assumption that there are only several thousand different classes of protein folds. Determining the structure of several examples of each class would provide the data necessary for computers to model virtually any other protein.

For now, though, structures estimated by computers are not likely to be precise enough to use in drug design. So there will be need for physical structure determination.

So both the public and private projects are going forward and hoping to avoid the conflict that has marked genomics. "One would hope there might be some lessons learned," said Dr. Berman of Rutgers. "One would hope."

Category: 32. Genome Project and Genomics