The data for the first application come from the state of Louisiana,
which records by precinct the number of blacks who vote and the number
of whites who vote (among those registered). These data make it
possible to evaluate the ecological inference model described in this
book as follows. For each of Louisiana's 3,262 precincts, the
procedure uses only aggregate data: the fraction of those registered
who are black and the fraction of registered people turning out to
vote for the 1990 elections (as well as the number registered). These
aggregate, precinct-level data are then used to estimate the
fraction of blacks who vote in each precinct. Finally, I validate the
model by comparing these estimates to the true fractions of
blacks who turn out to vote. (That is, the true fractions of black
and white turnout are not used in the estimation
procedure.)
One brief summary of the results of this analysis appears in Figure
The results are compelling. If Figure
Part IV provides many more model evaluations and of many
types. These evaluations include data sets for which existing methods do
reasonably well at estimating the statewide average, in which case the
method offered here also gives reasonable statewide results and in
addition much more information in the form of correct confidence
intervals and accurate results for each precinct in the state.
Part IV also gives examples of data sets where existing
methods are hopelessly biased, but the method offered here gives
highly accurate estimates. For example, the best existing method
indicates that 20% fewer males in South Carolina fall below the
poverty level than there are males in that state (see Table 11.2
on page 226. In contrast, the method
offered here gives accurate answers for this statewide aggregate (see
Figure 11.2 on page 229) as well as for the
fraction of males in poverty in each of the 3,187 precinct-sized
geographic units (see Figure 11.3 on page 230).
The book also includes situations in which almost all information was
aggregated away and standard methods give even more ridiculous
results; in those cases, the method described here gives reasonable
results with wider confidence intervals, reflecting accurately the
degree of uncertainty in the ecological inference (see
Chapter 12). The method usually even gives accurate estimates
when all the conditions for ``aggregation bias'' are met, when the
process of aggregation eliminates most of the variation in one of the
aggregate variables, and when extrapolations far from the range of
observed data are necessary. In all these difficult examples, the
method offered here gives accurate answers with correct confidence
intervals. The method will not always work: since information is lost
during aggregation, no method of ecological inference could work in
all data sets. However, the procedures introduced here come with
diagnostics that researchers can use to evaluate the risks and avoid
the problems in most cases.
Finally, I give a brief report of an analysis of 1990 turnout by race
in New Jersey's 567 minor civil divisions (mostly cities and towns).
These data cannot be used to verify ecological inferences since the
true individual-level answers are not known, but they can be used to
demonstrate how much more information the method offered here provides
to users. The most popular existing method (Goodman's regression)
gives only two numbers of relevance, the state-wide fractions of
blacks who vote and whites who vote (the latter estimate,
incidentally, is five standard deviations above its maximum possible
value given by the method of bounds). In contrast, the solution to
the ecological inference problem offered here gives reliable estimates
of these two numbers for the state-wide average as well as for each of
the 567 cities and towns.
In order to emphasize the rich information this method unearths,
Figure
Next: The Method
Up: Chapter 1: Qualitative Overview
Previous: The Solution
. This figure plots the estimated fraction of blacks
turning out to vote in 1990 (horizontally) by the true fraction of blacks
voting in that year (vertically). Each precinct is represented in the figure
by a circle with area proportional to the number of blacks in the precinct. If
the model estimates were exactly correct in every precinct, each circle would
be centered exactly on the
line. In fact, almost all of the 3,262
precincts fall on or near this diagonal line, demonstrating the success of this
method of making inferences about individual behavior using only aggregate
data. The few precincts that are farther from the line have tiny numbers of
African Americans, so the vast majority of individual voters are correctly
estimated.
were merely a plot of the
observed values of a variable by the fitted values of the same variable used
during the estimation procedure, any empirical researcher should be pleased:
the fit is extremely good. If instead the figure were based on the harder
problem of making out-of-sample predictions, where past realizations were used
to calibrate the prediction, the result would be even better. But the result
here is even more dramatic, since the estimates in the figure were computed
from only aggregate data. The true fraction of blacks turning out to vote (the
vertical dimension in the figure) was not part of the estimation procedure.
Moreover, no past realizations of the truth being estimated were used.
maps the estimated degree of voter turnout among
non-minorities. In this map, minor civil divisons in New Jersey are
given darker shades when the estimated degree of non-minority voter
turnout is higher. A few landmarks are labeled to give
readers some bearing. The vast increase in information the method
provides is represented by the interesting geographic variation in
this map (and an additional complete map for minority
turnout). For example, Figure
shows that non-minority
turnout is substantially higher in the city of Newark than the
neighboring city of Elizabeth. Is this because of a racial threat
posed by Newark's larger minority population? Is the white
mobilization in the wealthy towns of Bergen County near Englewood
Cliffs a result of the state government's attempt to integrate schools
by regionalizing its school districts? By providing reliable
individual-level geographic-based information, the solution to the
ecological inference problem can be used to raise numerous questions
such as these. The method also provides opportunities for answering
such questions by using the estimates provided as dependent variables
in second-stage analyses (using, in this case, explanatory variables
such as fraction minority population, or state attempts at
integration).