From social media posts and text messages to digital government documents and archives, researchers are bombarded with a deluge of text reflecting the social world. This textual data gives unprecedented insights into fundamental questions in the social sciences, humanities, and industry. Meanwhile new machine learning tools are rapidly transforming the way science and business are conducted. Text as Data shows how to combine new sources of data, machine learning tools, and social science research design to develop and evaluate new insights.
Text as Data is organized around the core tasks in research projects using text—representation, discovery, measurement, prediction, and causal inference. The authors offer a sequential, iterative, and inductive approach to research design. Each research task is presented complete with real-world applications, example methods, and a distinct style of task-focused research.
Bridging many divides—computer science and social science, the qualitative and the quantitative, and industry and academia—Text as Data is an ideal resource for anyone wanting to analyze large collections of text in an era when data is abundant and computation is cheap, but the enduring challenges of social science remain.
- Overview of how to use text as data
- Research design for a world of data deluge
- Examples from across the social sciences and industry
Justin Grimmer is professor of political science and a senior fellow at the Hoover Institution at Stanford University. Twitter @justingrimmer Margaret E. Roberts is associate professor in political science and the Halıcıoğlu Data Science Institute at the University of California, San Diego. Twitter @mollyeroberts Brandon M. Stewart is assistant professor of sociology and Arthur H. Scribner Bicentennial Preceptor at Princeton University. Twitter @b_m_stewart
"Among the metaverse of possible books on Text as Data that could have been published . . . I was pleased that my universe produced this one. I will assign this book as a critical part of my own course on content analysis for years to come, and it has already altered and improved the coherence of my own vocabulary and articulation for several critical choices underlying the process of turning text into data. . . . Highly recommend."—James Evans, Sociological Methods & Research
"This is the definitive guide for social scientists wishing to work with text-based data. Written by pioneers in the field, Text as Data provides a comprehensive overview of the state of the art. But the authors don’t stop there: they offer a fresh agenda for doing social science, showing how algorithms can augment our ability to develop theories of human behavior, rather than poorly attempting to replace us.”—Chris Bail, author of Breaking the Social Media Prism
“Text as Data is a long-awaited book by an all-star team of methodologists. The explosion of textual data provides unprecedented opportunities to learn about human behavior and society at a massive scale. Through this authoritative book, Grimmer, Roberts, and Stewart lay the foundation of text analysis for students and researchers.”—Kosuke Imai, author of Quantitative Social Science
“This book provides a clear and comprehensive introduction to the key computational techniques for analyzing text data. The technical material is contextualized within a broader research philosophy that will drive exciting new applications in computational social science, the digital humanities, and commercial data science. I highly recommend it.”—Jacob Eisenstein, author of Introduction to Natural Language Processing
“Beyond offering an engaging survey of text analysis methods, this book is a vital guide to social science research design. Diverse applications from detecting Chinese censorship to classifying jihadist texts bring text analysis to life for readers of all methodological backgrounds. My students praised Text as Data as one of the best textbooks they have encountered.”—Alexandra Siegel, University of Colorado, Boulder
"This book fills acute gaps in the theory components of text as data. Accessible to advanced undergraduates and graduate students with some background in social science terminology and methodology, this volume draws together aspects of text-as-data approaches that are often discussed and applied separately, and brings them into a coherent framework."—Sarah Bouchat, Northwestern University
"Written by leaders in the discipline, Text as Data is an excellent book. Comprehensive in its scope, this work is a perfect introduction for social science graduate students and faculty getting into the field."—Arthur Spirling, New York University
"There is a clear lack of relevant textbooks in the social science text-as-data area. Thorough and manageable, Text as Data presents a good conceptual overview and frames issues at the right level."—David Mimno, Cornell University