Next: The Problem
Up: Chapter 1: Qualitative Overview
Previous: Chapter 1: Qualitative Overview
Contrary to the pessimistic claims in the ecological inference literature
(since Robinson, 1950), aggregate data are sometimes useful
even
without inferences about individuals. Studies of incumbency
advantage, the political effects of redistricting plans, forecasts of
macro-economic conditions, and comparisons of infant mortality rates
across nations are just a few of the cases where both questions and
data coincide at the aggregate level.
Nevertheless, even studies
such as these that ask questions about aggregates can usually be
improved with valid inferences about the individuals who make up the
aggregates. And more importantly, numerous other questions exist for
which only valid ecological inferences will do.
Fundamental questions in most empirical subfields of political science require
ecological inferences. Researchers in many other fields of academic inquiry,
as well as the real world of public policy, also routinely try to make
inferences about the attributes of individual behavior from aggregate data. If
a valid method of making such inferences were available, scholars could provide
accurate answers to these questions with ecological data, and policymakers
could base their decisions on reliable scientific techniques. Many of the
ecological inferences pursued in these other fields are also of interest to
political scientists, which reemphasizes the close historical connection
between the ecological inference problem and political science research. The
following list represents a small sample of ecological inferences that have
been attempted in a variety of fields.
- In American public policy, ecological inferences are required to
implement key features of federal law. For example, the U.S. Voting Rights
Act of 1965 (and its extensions in 1970, 1975, and 1982) prohibited voting
discrimination on the basis of race, color, or language. If discrimination is
found, the courts or the U.S. Justice Department can order a state or local
jurisdiction to redistrict its political boundaries, or to impose or prevent
various other changes in electoral laws. Under present law, legally significant
discrimination only exists when plaintiffs (or the Justice Department) can
first demonstrate that members of a minority group (usually African American or
Hispanic) vote both cohesively and differently from other voters.
Sometimes they must also prove that majority voters consistently prevent
minorities from electing a candidate of their choice. Since survey data are
rarely available in these cases, and because they are not often trustworthy in
racially polarized contests, an application of the Voting Rights Act requires a
valid ecological inference from electoral data and U.S. Census data.
Voting Rights Act assessments of minority and majority voting begins with
electoral returns from precincts, the smallest geographic unit for which
electoral data are available. In addition to the numbers of votes received by
each candidate in a precinct, census data also gives the fraction of voters in
the same precinct who are African American (or other minority) or
white.
With these two
sets of aggregate data, plaintiffs must make an ecological inference about how
each racial group casts its ballots. That is, since the secret ballot prevents
analysts from following voters into the voting booth and peering over their
shoulders as they cast their ballots, the voting behavior of each racial group
must be inferred using only aggregate electoral and census data. Because of
the inadequacy of current methods, in some situations the wrong policies are
being implemented: the wrong districts are being redrawn, and the wrong
electoral laws are being changed. (Given the great importance and practicality
of this problem, I will use it as a running example.)
- In one election to the German Reichstag in September 1930, Adolf Hitler's
previously obscure and electorally insignificant National Socialist German
Worker's party became the Weimar Republic's second largest political party.
The National Socialists continued their stunning electoral successes in
subsequent state, local, and presidential elections, and ultimately reached
37.3% of the vote in the last election prior to their taking power. As so
many have asked, how could this have happened? Who voted for the Nazis (and
the other extreme groups)? Was the Nazi constituency dominated by the
downwardly mobile lower middle class or was support much more widespread?
Which religious groups and worker categories supported the National Socialists?
Which sectors of which political parties lost votes to the Nazis? The data
available to answer these questions directly include aggregate data from some
of the 1,200 Kreise (districts) for which both electoral data and various
census data are available. Because survey data are not available, accurate
answers to these critical questions will only be possible with a valid method
of ecological inference (see Hamilton, 1982; Childers, 1983; and Falter, 1991).
- Epidemiologists and public policy makers need to know whether and to what
extent residential levels of radioactive radon are a risk factor for
lung cancer
(Stidley and Samet, 1993; Greenland and Robins, 1994a). Radon leaks through
basement floors and may pose a significant health risk. Legislators in many
states are considering bills that would require homeowners to test for radon
and, if high levels are found, to install one of several mechanical means of
reducing future exposure.
Policymakers' decisions about such legislation obviously depend in
part on the
demonstrated health effects of radon. Unfortunately, collecting random samples
of individual-level data would be impractical, as it would require measures of
radon exposure over many years for each subject. Moreover, because only a
small fraction of people with or without radon exposure get lung cancer, and
because other variables like smoking are powerful covariates, reliably
estimating the differences in lung cancer rates for those with different levels
of radon exposure in an individual-level study would require measurements for
tens of thousands of individuals. This would be both prohibitively expensive
and ethically unacceptable without altering the radon levels for
individuals in a way that would probably also ruin the study. Researchers have
tried case-control studies, which avoid the necessity of large samples but risk
sample selection bias, and extreme-case analyses of coal miners, where the
effects are larger but their high levels of radon exposure makes the results
difficult to extrapolate back to residential settings. The most extensive data
that remain include information such as county-level counts of lung cancer
deaths from the federal Centers for Disease Control, and samples of radon
concentration from each county. Ecological inferences are therefore the only
hope of ascertaining the dose-response effect of radon exposure from these
data. Unfortunately, without a better method of making ecological inferences,
the evidence from these data will likely remain inconclusive (Lubin,
1994).
- In the academic field of marketing (and its real-world counterpart),
researchers try to ascertain who has bought specific products, and where
advertising is most likely to be effective in influencing consumers to buy
more. In many situations, researchers do not have data on the demographic and
socio-economic characteristics of individuals who buy particular products, data
that would effectively answer many of the research questions directly.
Instead, they have extensive indirect data on the average characteristics of
people in a geographic area, such as at the level of the zip code (or sometimes
9-digit zip code) in the United States. Researchers generally also have information from
the company about how much of a product was sold in each of these areas. The
question is, given the number of new products sold in each geographic area and,
for example, the fraction of households in each area that have children, are in
the upper quartile of income, are in single-parent families, or have other
characteristics, how does demand for the product vary by these characteristics
within each community? Only with a valid ecological inference in each
geographic area can researchers learn the answers they seek. With this
information, scholars will be able to study how product demand depends on these
family and individual characteristics, and companies will be able to decide how
to target advertising to consumers likely to be interested in their products.
- Since voter surveys are neither always possible nor necessarily reliable,
candidates for political office study aggregate election returns in order to
decide what policies to favor, and also to tailor campaign appeals.
Understanding how the support for policies varies among demographic and
political groups is critical to the connections between elected officials and
their constituents, and for the smooth operation of representative democracy.
- Historians are also interested in the political preferences of
demographic groups, and usually for time periods for which modern survey
research had not even been invented. For example, only valid ecological
inferences will enable these scholars to ascertain the extent to which
working-class voters supported the Socialist party in depression-era America.
- An important sociological question is the relationship between
unemployment and crime, especially as affected by race and as mediated by
divorce and single parenthood. Unfortunately, the best available data are
usually aggregated at the level of cities or counties (Blau and Blau, 1982;
Messner, 1982; Byrne and Sampson, 1986). Official U.S. government data on
race-specific crime rates (in the form of the Uniform Crime Report) are usually
insufficient, and individual-level survey data are in very short supply and,
because they are based on self-reports, are often of dubious quality (Sampson,
1987). Only better data or a valid method of ecological inference will enable
scholars to determine the critical linkages between unemployment, family
disruption, race, and crime.
- The ecological inference problem, and other related aggregation problems,
are central to the discipline of economics, as explained by Theil in his
classic study (1954: 1): ``A serious gap exists between the greater part of
rigorous economic theory and the pragmatic way in which economic systems are
empirically analyzed. Axiomatically founded theories refer mostly to
individuals, for instance the consumer or the entrepreneur. Empirical
descriptions of economic actions in large communities, on the other hand, are
nearly always extremely global: they are confined to the behavior of groups of
individuals. The necessity of such a procedure can scarcely be
questioned
But the introduction of relations pretending to describe the
reactions of groups of individuals instead of single individuals raises
questions of fundamental importance, which are not very well understood.''
Economists have made much progress in clarifying the links between
microeconomic and macroeconomic behavior in the more than forty years since
these words were written (see Stoker, 1993). They also have some good survey
data, and much more impressive formal theories, but a method of ecological
inference would enable economists to evaluate some of their sophisticated
individual-level theoretical models more directly. This would be especially
important in a field where there is much reason to value individual responses
to surveys less than revealed preference measures that are best gathered at the
aggregate level. Economists are also interested in developing models of
aggregate economic indicators that are built from and consistent with
individual-level economic theories and data, even when the individual level is
not of direct interest (see Section
). - A controversial issue in education policy is the effects of school choice
voucher programs, where states or municipalities provide vouchers to students
who cannot afford to attend private schools. Private schools are then composed
of students from wealthy families and from those who pay with state vouchers.
One of the many substantive and methodological issues in this field is
determining the differential performance of students who take advantage of the
voucher system to attend private schools, compared to those who would be there
even without the program. Thus, data exist on aggregate school-level variables
such as the dropout rate or the percent who attend college, as well as on the
proportion of each private school's students who paid with a voucher. Because
of privacy concerns, researchers must make ecological inferences in order to
learn about the fraction of voucher students who attend college, or the
fraction of non-voucher students who drop out.
The point of this list is to provide a general sense of the diversity
of questions that have been addressed by (necessarily) inadequate
methods of ecological inference. No tiny sample of ecological
inferences such as this could do justice to the vast array of
important scholarly and practical questions about individual
attributes for which only aggregate data are available.
Next: The Problem
Up: Chapter 1: Qualitative Overview
Previous: Chapter 1: Qualitative Overview