Single nucleotide polymorphisms: From the evolutionary past. . .
URL: http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v409/n6822/full/409821a0_fs.html
Date accessed: 15 March 2001
15 February 2001 |
Nature 409, 821 - 822 (2001) © Macmillan Publishers Ltd. |
MARK STONEKING
Single nucleotide
polymorphisms are the bread-and-butter of DNA sequence variation. They provide a
rich source of information about the evolutionary history of human populations.
Studies of genetic variation in human populations began inauspiciously1.
The first such study of ABO blood-group frequencies was carried out by
two Polish immunologists, Ludwik and Hanka Hirszfeld, at the end of the First
World War. This work was notable for its broad coverage of the world's
populations, large sample sizes and scrupulous attention to anthropological
details. Yet the Hirszfelds still ran into difficulties in publishing in The
Lancet, the premier medical journal of the time. The editor could not see
the relevance of their work, and so this seminal study of human genetic
variation first appeared in an obscure anthropological journal2.
The relevance became abundantly clear when Felix Bernstein subsequently used the
Hirszfelds' data to demonstrate that the ABO blood-group frequencies were better
explained by a single gene with three variants (alleles), and not as
prevailing wisdom then held two genes each with two alleles3. Happily, times have changed, diversity is now all the rage4,
5, and editors have become more appreciative of the
importance of human genetic variation. The latest evidence of that is the paper
on page
928 of this issue6, which reports the
identification and mapping of 1.4 million single nucleotide polymorphisms (SNPs,
pronounced 'snips') in the human genome. The paper is the result of the labours
of a large collaboration, The International SNP Map Working Group. So, what are SNPs? Quite simply, they are the bread-and-butter of DNA
sequence variation polymorphism, to those in the business. A DNA sequence is
a linear combination of four nucleotides; compare two sequences, position by
position, and wherever you come across different nucleotides at the same
position, that's a SNP (see Fig. 1 on page
823). So SNPs reflect past mutations that were mostly (but not exclusively)
unique events, and two individuals sharing a variant allele are thereby marked
with a common evolutionary heritage. In other words, our genes have ancestors,
and analysing shared patterns of SNP variation can identify them. However, the real importance of SNPs is that there are so many of them. One
estimate7 is that comparing two human DNA sequences
results in a SNP every 1,0002,000 nucleotides. That may not sound like much
until you realize that there are 3.2 billion nucleotides in the human genome,
which translates into 1.6 million3.2 million SNPs. And that's just from
comparing two sequences the total number of SNPs in humans is obviously much
more. Most human variation that is influenced by genes can be traced to SNPs,
especially in such medically (and commercially) important traits as how likely
you are to become afflicted with a particular disease, or how you might respond
to a particular pharmaceutical treatment, as discussed by Chakravarti8
on the following page. And even when a SNP is not directly responsible, the
sheer number of SNPs means they can also be used to locate genes that influence
such traits. The deluge of SNPs reported by the SNP working group6
also promises great things for those of us who analyse patterns of molecular
genetic variation to reconstruct the evolutionary history of human populations.
Our genes contain the signature of an expansion from Africa within the past
150,000 years or so9. But there is still debate as
to whether the modern humans from Africa completely replaced archaic non-African
populations with no interbreeding, or whether we perhaps carry the vestiges of
Neanderthal or other archaic non-African genes. Demonstrating a recent African origin for every single one of our 3.2 billion
nucleotides goes beyond the bounds of reason or necessity, but there is still
much to be learned. For a start, most of our insights into molecular
anthropology arise from DNA in mitochondria and (more recently) polymorphisms of
the Y chromosome. This is because these DNA sequences are haploid that is,
represented just once in each cell, in contrast to the other chromosomes, which
are represented twice and they are inherited from just one parent, so they
do not undergo the usual sequence shuffling (recombination) during egg and sperm
production. This makes them easier to analyse and extremely informative. But
both suffer from the drawback that, in the absence of recombination, they behave
as single genes, and the history of any single gene can differ from that of a
population or species because of natural selection or chance events involving
that gene. Accurate inferences concerning population history demand the analysis of
several genes, with the most promising approach involving haplotypes10,
which consist of several closely spaced (linked) polymorphisms. The advantage of
haplotypes over simply analysing polymorphisms at random is that there is
valuable information in the associations between linked polymorphisms the
whole is greater than the sum of the parts. So the 1.4 million SNPs are a
welcome resource that will greatly help in identifying haplotypes for tracing
human evolutionary history, especially those that might reveal archaic
non-African ancestry. However, answering all of our questions about human evolutionary history will
not be as simple as mining the SNP database and determining haplotypes in a
representative sample of worldwide populations. There are four main reasons for
that. First, to be really useful, the SNPs in the database should really be SNPs,
and not errors or artefacts, and they should be polymorphic in other samples,
not just the sample of individuals used to find the SNPs. An important aspect of
the SNP working group's data is that 1,585 SNPs were chosen for further
verification, of which about 95% turned out to be true SNPs, which is good news
indeed. Moreover, 1,276 SNPs were tested on additional population samples and at
least 82% were polymorphic, which is reassuring. Second, one might ask why only 0.1% of the 1.4 million SNPs were verified and
tested. The answer is that our ability to determine allele frequencies
efficiently and inexpensively for large numbers of SNPs lags behind our ability
to simply identify them. This situation is reminiscent of the beginnings of the
Human Genome Project, when developing technology was a primary concern and it
was not at all clear how the 3.2 billion nucleotides were going to be
determined. But human ingenuity won out then, and given the number of bright and
capable minds now wrestling with the SNP-typing problem, one or more solutions
should soon be at hand (especially with the motivation of lucrative commercial
applications). Third, a problem known as ascertainment bias can complicate the
interpretation of results based on SNPs. For example, SNPs that were found to be
polymorphic in European populations will overestimate genetic diversity in
European as opposed to non-European populations. Moreover, the probability of
finding a SNP, and the frequency of polymorphism at a SNP, depends on how many
times a particular DNA segment was sequenced, and from how many individuals. The
SNP working group report some intriguing preliminary findings regarding how SNP
diversity is apportioned among chromosomes. But further work is required to see
if these are truly biological differences, or if they instead reflect
ascertainment biases. Ascertainment bias is not an insurmountable problem
statistical geneticists love this sort of challenge and are already coming up
with creative solutions11. Even so, SNP-finders
must keep careful track of how their SNPs were ascertained. Fourth, the emphasis in the SNP database is on SNPs where both of the alleles
occur at high frequency, because these will be most useful for
disease-association studies. In general, the higher the frequency of a SNP
allele, the older the mutation that produced it, so high-frequency SNPs largely
predate human population diversification. But many questions in human evolution
involve specific migrations (such as the colonization of Polynesia or the
Americas) for which population-specific alleles are most informative indeed,
this is one of the attractions of mitochondrial-DNA and Y-chromosome analyses
for such questions, because population-specific alleles can be readily found. It
is unlikely that Polynesian-specific SNPs are present in the database, so more
work will be required to find such informative, population-specific SNPs. Still, one can imagine that in the not-too-distant future the details of
human population history will have been fleshed out, at least to the extent
possible by analysing genetic variation in extant populations. What then? One
area that is receiving increasing attention is the detection of the effects of
natural selection in human populations12. Using
SNPs to find chromosomal regions with abnormally low levels of variation is a
particularly promising way of detecting the genomic signature of selection for
favourable mutations13. Another area of increasing interest is identifying the molecular genetic
basis of 'normal' phenotypic variation4 that
is, variation of the old-fashioned, morphological kind, which is a traditional
concern of anthropology. Molecular anthropology has for the most part
concentrated on the molecules and what their diversity tells us about human
evolution. With the advent of the human genome sequence and the SNP database,
the ultimate in molecular tools, we are ironically now poised to focus on
phenotypes and what their diversity tells us about human evolution thereby
bringing the anthropology back into molecular anthropology.
1. | Mourant, A. E. Blood Relations p.13 (Oxford Univ. Press, 1983). |
2. | Hirszfeld, L. & Hirszfeld, H. Anthropologie 29, 505-537 (1919). |
3. | Crow, J. F. Genetics 133, 4-7 (1993). | PubMed | |
4. | Weiss, K. M. Genome Res. 8, 691-697 (1998). | PubMed | |
5. | Collins, F. S., Brooks, L. D. & Chakravarti, A. Genome Res. 8, 1229-1231 (1998). | PubMed | |
6. | The International SNP Map Working Group Nature 409, 928-933 (2001). |
7. | Li, W. H. & Sadler, L. A. Genetics 129, 513-523 (1991). | PubMed | |
8. | Chakravarti, A. Nature 409, 822-823 (2001). | Article | |
9. | Stoneking, M. Evol. Anthropol. 2, 60-73 (1993). |
10. | Tishkoff, S. A. et al. Science 271, 1380-1387 (1996). | PubMed | |
11. | Kuhner, M. K., Beerli, P., Yamato, J. & Felsenstein, J. Genetics 156, 439-447 (2000). | PubMed | |
12. | Przeworski, M., Hudson, R. R. & Di Rienzo, A. Trends Genet. 16, 296-302 (2000). | Article | PubMed | |
13. | Nurminsky, D., De Aguiar, D., Bustamante, C. D. & Hartl, D. L. Science 291, 128-130 (2001). | Article | PubMed | |
Category: 32. Genome Project and Genomics