An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.
Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data. Filled with real-world examples, the book shows how its thinking tools apply to problems in a wide variety of subjects, including elections, civil conflict, crime, terrorism, financial crises, health care, sports, music, and space travel.
Above all else, Thinking Clearly with Data demonstrates why, despite the many benefits of our data-driven age, data can never be a substitute for thinking.
- An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
- Introduces the basic toolkit of data analysis—including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
- Uses real-world examples and data from a wide variety of subjects
- Includes practice questions and data exercises
Ethan Bueno de Mesquita is the Sydney Stein Professor and deputy dean at the Harris School of Public Policy at the University of Chicago. He is the author of Political Economy for Public Policy and the coauthor of Theory and Credibility: Integrating Theoretical and Empirical Social Science (both Princeton). Twitter @ethanbdm Anthony Fowler is a professor at the Harris School of Public Policy at the University of Chicago.
- Preface
- Organization
- Who Is This Book For?
- Acknowledgments
- CHAPTER 1 Thinking Clearly in a Data-Driven Age
- What You’ll Learn
- Introduction
- Cautionary Tales
- Abe’s hasty diagnosis
- Civil resistance
- Broken-windows policing
- Thinking and Data Are Complements, Not Substitutes
- Readings and References
- PART I ESTABLISHING A COMMON LANGUAGE
- CHAPTER 2 Correlation: What Is It and What Is It Good For?
- What You’ll Learn
- Introduction
- What Is a Correlation?
- What Is a Correlation Good For?
- Description
- Forecasting
- Causal inference
- Measuring Correlations
- Mean, variance, and standard deviation
- Covariance
- Correlation coefficient
- Slope of the regression line
- Populations and samples
- Straight Talk about Linearity
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 3 Causation: What Is It and What Is It Good For?
- What You’ll Learn
- Introduction
- What Is Causation?
- Potential Outcomes and Counterfactuals
- What Is Causation Good For?
- The Fundamental Problem of Causal Inference
- Conceptual Issues
- What is the cause?
- Causality and counterexamples
- Causality and the law
- Can causality run backward in time?
- Does causality require a physical connection?
- Causation need not imply correlation
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- PART II DOES A RELATIONSHIP EXIST?
- CHAPTER 4 Correlation Requires Variation
- What You’ll Learn
- Introduction
- Selecting on the Dependent Variable
- The 10,000-hour rule
- Corrupting the youth
- High school dropouts
- Suicide attacks
- The World Is Organized to Make Us Select on the Dependent Variable
- Doctors mostly see sick people
- Post-mortems
- The Challenger disaster
- The financial crisis of 2008
- Life advice
- Wrapping Up
- Key Term
- Exercises
- Readings and References
- CHAPTER 5 Regression for Describing and Forecasting
- What You’ll Learn
- Introduction
- Regression Basics
- Linear Regression, Non-Linear Data
- The Problem of Overfitting
- Forecasting presidential elections
- How Regression Is Presented
- A Brief Intellectual History of Regression
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 6 Samples, Uncertainty, and Statistical Inference
- What You’ll Learn
- Introduction
- Estimation
- Why Do Estimates Differ from Estimands?
- What Makes for a Good Estimator?
- Quantifying Precision
- Standard errors
- Small samples and extreme observations
- Confidence intervals
- Statistical Inference and Hypothesis Testing
- Hypothesis testing
- Statistical significance
- Statistical Inference about Relationships
- What If We Have Data for the Whole Population?
- Substantive versus Statistical Significance
- Social media and voting
- The Second Reform Act
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 7 Over-Comparing, Under-Reporting
- What You’ll Learn
- Introduction
- Can an octopus be a soccer expert?
- Publication Bias
- Are Most Scientific “Facts” False?
- ESP
- Get out the vote
- p-hacking forensics
- Potential Solutions
- Reduce the significance threshold
- Adjust p-values for multiple testing
- Don’t obsess over statistical significance
- Pre-registration
- Requiring pre-registration in drug trials
- Replication
- Test important and plausible hypotheses
- Beyond Science
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 8 Reversion to the Mean
- What You’ll Learn
- Introduction
- Francis Galton and Regression to Mediocrity
- Reversion to the Mean Is Not a Gravitational Force
- Seeking Help
- Reversion to the Mean, the Placebo Effect, and Cosmic Habituation
- The placebo effect
- Cosmic habituation explained
- Cosmic habituation and genetics
- Beliefs Don’t Revert to the Mean
- Wrapping Up
- Key Words
- Exercises
- Readings and References
- PART III IS THE RELATIONSHIP CAUSAL?
- CHAPTER 9 Why Correlation Doesn’t Imply Causation
- What You’ll Learn
- Introduction
- Thinking Clearly about Potential Outcomes
- Sources of Bias
- Confounders
- Reverse causality
- The 10,000-hour rule, revisited
- Diet soda
- How Different Are Confounders and Reverse Causality?
- Signing the Bias
- Mechanisms versus Confounders
- Thinking Clearly about Bias and Noise
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 10 Controlling for Confounders
- What You’ll Learn
- Introduction
- Party whipping in Congress
- A note on heterogeneous treatment effects
- The Anatomy of a Regression
- How Does Regression Control?
- Controlling and Causation
- Is social media bad for you?
- Reading a Regression Table
- Controlling for Confounders versus Mechanisms
- There Is No Magic
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 11 Randomized Experiments
- What You’ll Learn
- Introduction
- Randomization and Causal Inference
- Estimation and Inference in Experiments
- Standard errors
- Hypothesis testing
- Problems That Can Arise with Experiments
- Noncompliance and instrumental variables
- Chance imbalance
- Lack of statistical power
- Attrition
- Interference
- Natural Experiments
- Military service and future earnings
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 12 Regression Discontinuity Designs
- What You’ll Learn
- Introduction
- How to Implement an RD Design
- Are extremists or moderates more electable?
- Continuity at the Threshold
- Does continuity hold in election RD designs?
- Noncompliance and the Fuzzy RD
- Motivation and Success
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 13 Difference-in-Differences Designs
- What You’ll Learn
- Introduction
- Parallel Trends
- Two Units and Two Periods
- Unemployment and the minimum wage
- N Units and Two Periods
- Is watching TV bad for kids?
- N Units and N Periods
- Contraception and the gender-wage gap
- Useful Diagnostics
- Do newspaper endorsements affect voting decisions?
- Is obesity contagious?
- Difference-in-Differences as Gut Check
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- CHAPTER 14 Assessing Mechanisms
- What You’ll Learn
- Introduction
- Causal Mediation Analysis
- Intermediate Outcomes
- Cognitive behavioral therapy and at-risk youths in Liberia
- Independent Theoretical Predictions
- Do voters discriminate against women?
- Testing Mechanisms by Design
- Social pressure and voting
- Disentangling Mechanisms
- Commodity price shocks and violent conflict
- Wrapping Up
- Key Terms
- Exercises
- Readings and References
- PART IV FROM INFORMATION TO DECISIONS
- CHAPTER 15 Turn Statistics into Substance
- What You’ll Learn
- Introduction
- What’s the Right Scale?
- Miles-per-gallon versus gallons-per-mile
- Percent versus percentage point
- Visual Presentations of Data
- Policy preferences and the Southern realignment
- Some rules of thumb for data visualization
- From Statistics to Beliefs: Bayes’ Rule
- Bayes’ rule
- Information, beliefs, priors, and posteriors
- Abe’s celiac revisited
- Finding terrorists in an airport
- Bayes’ rule and quantitative analysis
- Expected Costs and Benefits
- Screening frequently or accurately
- Wrapping Up
- Key Words
- Exercises
- Readings and References
- CHAPTER 16 Measure Your Mission
- What You’ll Learn
- Introduction
- Measuring the Wrong Outcome or Treatment
- Partial measures
- Metal detectors in airports
- Intermediate outcomes
- Blood pressure and heart attacks
- Ill-defined missions
- Climate change and economic productivity
- Do You Have the Right Sample?
- External validity
- Malnutrition in India and Bangladesh
- Selected samples
- College admissions
- Why can’t major league pitchers hit?
- Strategic Adaptation and Changing Relationships
- The duty on lights and windows
- The shift in baseball
- The war on drugs
- Wrapping Up
- Key Words
- Exercises
- Readings and References
- CHAPTER 17 On the Limits of Quantification
- What You’ll Learn
- Introduction
- Decisions When Evidence Is Limited
- Cost-benefit analysis and environmental regulation
- Floss your teeth and wear a mask
- Floss your teeth
- Wear a mask
- Quantification and Values
- How quantitative tools sneak in values
- Algorithms and racial bias in health care
- How quantification shapes our values
- Think Clearly and Help Others Do So Too
- Exercises
- Readings and References
- Index
"I very much recommend this book, not only to all that teach statistics to (under)graduate students, but also those that use statistics for their own research, that would like to value the work of others, or engage in debates using actual or perceived facts."—Gijs Dekkers, International Statsitical Review
“A common phrase one hears in public life is that correlations and causality are the same but different. But how are they the same and how exactly do they differ? Thinking Clearly with Data threads a needle between two advanced subjects by clearly laying out a theory of both. This book is destined to become a classic and, if we are lucky, will be on every social scientist’s shelf.”—Scott Cunningham, Baylor University
“Witty, erudite, and chock-full of memorable and engaging examples, Thinking Clearly with Data brings core statistical ideas to life. The insights it offers are helpful not only to scholars in search of creative research strategies but also to readers who are simply trying to make sensible everyday decisions on topics from parenting to personal finance.”—Donald P. Green, Columbia University
“By making thinking the primary focus in teaching data analysis, Thinking Clearly with Data fills a big need.”—Dustin Tingley, Harvard University
“Whether you are a social scientist engaged in research, an attorney pleading a case, or a patient deciding on a medical treatment, you need to read Thinking Clearly with Data. This timely—and useful—book for making decisions in the data-rich twenty-first century is one that everyone who thinks about evidence should read.”—Lynn Vavreck, University of California, Los Angeles
“Thinking Clearly with Data gives readers the necessary tools to be critical consumers of claims that others make based on data, and even to start making credible claims based on data themselves.”—Andy Eggers, University of Chicago
“Rather than getting bogged down in the math and statistics underlying the methods, Thinking Clearly with Data walks students through the big ideas of what can be learned from data and flags common mistakes even well-trained data analysts make.”—Jonathan Davis, University of Oregon
“Thinking Clearly with Data is one of the most accessible and welcoming books I’ve seen on how to make sense of the world with data, thoughtfulness, and rigor. It’s a must-read for anyone looking to be smarter in our data-driven world.”—Andrea Jones-Rooy, New York University