Next: Roadmaps Up: Preface Previous: Preface

Outline

This book is divided into five main parts. Part I contains two introductions. Chapter 1 is a qualitative introduction to the entire book and includes a summary of results, an overview of some of the uses to which the method can be put, and a brief outline of the statistical model; because it includes no technical details about the statistical method developed in the subsequent fifteen chapters, it should be accessible even to those without a background in statistics. Chapter 2 gives a formal statement of the ecological inference problem along with the mathematical notation used throughout the remainder of the book.

Part II is divided into aggregation problems (Chapter 3) and problems unrelated to aggregation (Chapter 4). In the first of these chapters, I prove that all of the diverse problems attributed to aggregation bias in the literature are mathematically equivalent, so that only one aggregation problem remains to be solved. The second of these chapters describes a series of basic statistical problems that, although unrelated to aggregation and mostly ignored in the literature, still affect ecological inferences. Any model intended to provide valid ecological inferences must resolve all non-aggregation problems as well.

Part III describes my proposed solution to the ecological inference problem. It reformulates the data by generalizing the method of bounds both algebraically and with easy-to-use graphical methods as well as providing narrower, more informative bounds for the aggregate-level quantities of interest than have been used in the literature (Chapter 5), and introduces a statistical approach to modeling the remaining uncertainty within the observation-level deterministic bounds (Chapter 6). Chapter 7 develops procedures for estimating the model; Chapter 8 shows how to compute quantities of interest at the aggregate level and for each individual observation. Chapter 9 explains how to verify model assumptions with only aggregate data, shows what can go wrong, and provides diagnostic tests, extensions of the basic model, and a fully nonparametric version to cope with any problems that may remain. Part III also explains how the ecological inference problem and the solution proposed are mathematically equivalent to aspects of the ``tomography'' problem, which involves reconstructing cross-sectional images of body parts using X-rays and CT scans rather than surgery, or images of the earth's interior via inferences from the detection of seismic waves, produced by earthquakes or nuclear explosions, instead of by digging.

Part IV validates the model by comparing myriad observation-level estimates from the model using aggregate data with their corresponding, known individual-level truths. These comparisons include a typical example of ecological inference, a study of registration by race in the 1960s Southern United States with all the intermediate results described (Chapter 10); an analysis of poverty status by sex in South Carolina which demonstrates that the model is highly robust to aggregation bias and restricted aggregate variances (Chapter 11); a study of black registration in Kentucky that shows how the model gives reasonable answers even in the face of ecological data with almost all relevant information aggregated away (Chapter 12); and two classic applications of ecological inference, the transitions of voters between elections and turn-of-the-century county data on literacy by race (Chapter 13). The method works in practice: it gives accurate answers and correct assessments of uncertainty even when existing methods lead to incorrect inferences or impossible results (such as tex2html_wrap_inline1302 of African Americans voting).

Finally, Part V generalizes the basic model in several ways and then concludes. Chapter 14 analyzes three related non-ecological aggregation problems: solving the ``modifiable areal unit problem'' (a related problem in geography); combining survey and aggregate data to improve ecological inferences (as often studied in the discipline of statistics); and using aggregate-level data for inferences about relationships among continuous individual-level variables (a standard aggregation problem in econometrics). Chapter 15 generalizes the basic model to larger and multidimensional tables.

Chapter 16 concludes with a checklist of items to consider in applying the methods developed here. Technical appendices and a Glossary of Symbols follow.

Next: Roadmaps Up: Preface Previous: Preface