Chapter One: INTRODUCTION TO CONCEPTS, DATA, AND METHODS
1.2. Genetic definitions
1.3. Techniques for detection of polymorphic markers
1.4. The evolution of gene frequencies
1.5. Classical attempts to distinguish human "races"
1.6. Scientific failure of the concept of human races
1.7. Identifying population units
1.8. Linguistic classification
1.9. Nature and sources of the data
1.10. Methods of analysis
1.11. Genetic distances
1.12. Phylogenetic tree analysis
1.13. Analysis of principal components (PCs) and derived methods
1.14. Geographic maps of gene frequencies
1.15. Synthetic maps
1.16. Isolation by distance
1.17. Admixtures, their estimation, and their effect on tree structure
For some time, geneticists had been aware of a certain amount of genetic variation among the individuals forming a species, but the remarkable extent of this variation was not appreciated until about 25 years ago. Conspicuous human traits like hair and eye color clearly vary from one individual to the other in many populations; these differences are easily perceived by the layman, as are variation in height, weight, body build, and facial traits, which are also genetically determined to some extent. Their hereditary transmission, however, is complex, and these traits contribute little to our understanding of the extent of variation. The first example of clear-cut genetic variation, that of ABO blood groups, was described at the beginning of the century (Landsteiner 1901). Dissimilarities between individuals regarding ABO blood-group variation are due to small chemical differences between molecules found at the surface of red blood cells.
These studies were soon extended to other blood-group systems, and a body of data began to accumulate showing that different human populations have different proportions of blood groups. However, the first glimpse of the staggering magnitude of genetic variation came later--beginning in the 1950s and coming to full development in the 1960s--when individual differences for proteins could be systematically studied. A protein is a large molecule made of a linear sequence of components called amino acids; different proteins vary considerably in their amino-acid composition and serve very different functions. The relationship between structure and function has been demonstrated for many proteins. The same protein may show small, strictly inherited differences between individuals. The first example was observed in the protein hemoglobin, in which the replacement of a specific amino acid by another was shown to determine a hereditary disease known as sickle-cell anemia. This first case of "molecular pathology" was detected by subjecting the protein to an electric field with a procedure called electrophoresis (Pauling et al. 1949; Ingram 1957). The amino-acid replacement involved in sickle-cell anemia causes a change in the electric charge of the hemoglobin molecule, which allows the separation of normal and sickle-cell hemoglobins. Electrophoretic analysis has since been further developed and has helped detect a great deal of variation in proteins. It is now known that the majority of the tens of thousands of different proteins found in an organism exist in more than one form, so that some individuals may have one form of the protein, whereas others may have another form.
Protein variation is still the tip of the iceberg. Only when the analysis could be carried out at the level of the hereditary material itself, deoxyribonucleic acid (DNA), could the full extent of individual genetic variation begin to emerge. This technique became widely available only in the 1980s, and although comparisons of segments of DNA in different individuals are still rare, they are becoming more common. They are, however, adequate to convince us that there is much more variation at the DNA level than was suspected when only proteins and blood groups could be analyzed.
Techniques of DNA analysis are still being developed rapidly, and the future will undoubtedly see more and more attention being paid to individual variation at the DNA level. Meanwhile, an enormous wealth of information has accumulated and keeps accumulating on individual variation studied with immunological techniques (as the blood groups are) or with electrophoresis of proteins.
If we know that there exist different genetic types of a specific protein or other strictly inherited character, we can count individuals carrying one type or the other and establish the proportions of that type in the population being examined. These proportions vary from one population to another because they change over time in each population in a relatively unpredictable manner. The change in proportions of these types over time is the evolutionary process itself. It proceeds slowly but incessantly over generations. The analysis of populations living today in different places gives us a cross section in time of this continuing process, which is inevitably diverse in the various parts of the inhabited Earth.
Our primary interest is in understanding this evolutionary process. The first task is to describe the existing variation, using a variety of techniques that lend themselves to this work and allow us to test the relevant evolutionary models. We restrict our interest to aboriginal populations, which we define as those already living in the area of study in A.D. 1492. After this time, geographic discoveries stimulated the expansion and migrations of the economically more advanced populations all over the planet. Some movement took place before A.D. 1492, but at a smaller scale. Ordinarily, populations that migrated after that date have mixed only partially with earlier residents and are easily recognizable on the basis of physical appearance and historical and social knowledge. They, and some populations that are highly isolated and/or have had a complex history--such as Samaritans, Jews, Gypsies, and several others--need special study and are not considered in this book. Samaritans, as well as many Jewish populations, have been the object of analysis by Batsheva Bonné-Tamir (1980; Bonné-Tamir et al. 1992). Several general articles and books have been dedicated to Jews (e.g., Mourant 1978; Carmelli and Cavalli-Sforza 1979; Karlin et al. 1979; Morton et al. 1982; Livshits et al. 1991).
One way of studying living populations is a geographic representation of the data. For this purpose we first consider each gene (a segment of DNA endowed with a specific function) by itself, and for each gene we separately analyze the different forms that we can recognize, the alleles of that gene. The proportion of a given allele in different populations is the raw material of this approach. It is well established that the proportion of an allele varies considerably from place to place, but usually there is little difference between neighboring populations so that the greatest variation is observed at large distances. It is thus possible to prepare geographic maps representing these proportions for a particular allele (also called allele frequencies, or simply, gene frequencies) when a sufficient number of populations have been tested. The standard procedure is to draw isogenic curves or lines connecting points of equal gene frequency.
Geographic maps of an allele are useful for understanding facts specific to that allele, including its evolutionary history and the effects of evolutionary factors like mutation and natural selection. The geographic distribution of a particular allele may give information on the place of origin of the genetic change (mutation) that generated it. Correlations of the distributions of gene frequencies with environmental parameters at the geographic level have been instrumental in the discovery of specific genetic adaptations. The sickle-cell anemia gene was the first example, because its geographic distribution showed a correlation with that of malaria (Haldane 1949). The hypothesis that this gene may confer resistance to malaria was later confirmed by more direct tests.
For a long time anthropologists tried to reconstruct evolutionary relationships and history on the basis of a single character or gene. A favorite for over 100 years was the cephalic index (the percentage of skull breadth to length) introduced shortly before the middle of the last century. However, with a single trait, two populations of different origin could well turn out to be more or less identical. Anthropometric traits of this kind also have another very serious drawback: there is no guarantee that the character is completely under the control of biological inheritance and the variations observed could be due to short-term response to environmental changes. This was shown by Boas (1940) at the beginning of the century, but this lesson was, and still is, usually forgotten. The main advantage offered by such traits, namely the availability of data from fossil bones, was therefore minimized because of the uncertain nature of the observed differences.
After the first blood-group system (ABO) was discovered, ABO gene frequencies soon became a favorite for classifying populations. The information thus obtained, however, is also inadequate, even if it escapes to a large extent the limitation of possible short-term changes under direct environmental effect. Every gene frequency varies over time in ways that can be considered, at least superficially, nearly random. Therefore, it is not surprising that populations having clearly different evolutionary histories may show similar gene frequencies. This drawback can be avoided if one cumulates the information from more than one gene. As one increases the number of genes considered simultaneously, the probability that a similar confusion takes place becomes more and more remote. In 1963 it was shown that even with as few as 20 alleles from five genes one could successfully attempt a reconstruction of human evolution (Cavalli-Sforza and Edwards 1964). Later experience proved that a larger number is desirable or even necessary.
Several methods allow us to combine the information from many genes into appropriate statistical indices. They are usually called multivariate to distinguish them from those using single traits or genes (univariate).
Multivariate analysis is especially useful for understanding evolutionary forces that tend to operate in a parallel fashion on all genes: migration and random genetic drift (the random fluctuation of gene frequencies in time, to be further explained later). These and other methods are applied to the existing data with the aim of extracting information of genetic and evolutionary interest.
The reconstruction of human evolution, including the fissions, the major migrations, and the understanding of the roles of mutation, drift, and natural selection is often difficult and challenging. There is clearly little hope of an experimental approach to our species, in which the evolutionary process could be repeated and interfered with in known ways. This, as well as the present almost total lack of fossil data on genetic variation (from populations living at earlier times), generates a strong desire for external evidence that can support the conclusions of genetic analysis. Fortunately, information from other sources can supply some clarification. The credibility of our conclusions can be greatly strengthened if these conclusions can be confirmed in the light of an interdisciplinary approach. Results from genetic data should be compared with relevant knowledge from other fields, in particular, paleoanthropology, prehistory, history, the geographic and ecological setting, and the cultural evidence that comes indirectly from linguistic studies. We have considered such feedback an essential part of our analysis, and we have designed our book in order to satisfy this requirement. The remainder of this chapter is dedicated to an introduction to specific concepts, data, models, and methods.
Return to Book Description
File created: 8/7/2007
Questions and comments to: firstname.lastname@example.org
Princeton University Press