<span>Learn how to conduct a robust text analysis project from start to finish--and then do it again. </span><span><br><br>Mining is the dominant metaphor in computational text analysis. When mining texts, the implied assumption is that analysts can find kernels of truth--they just have to sift thro
Mapping Texts: Computational Text Analysis for the Social Sciences
β Scribed by Dustin S. Stoltz, Marshall A. Taylor
- Publisher
- Oxford University Press
- Year
- 2024
- Tongue
- English
- Leaves
- 326
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Cover
Advance Praise for Mapping Texts
Mapping Texts: Computational Text Analysis for the Social Sciences
Copyright
Dediaction
Contents
Preface
What You Will Learn
What We Left Out
Acknowledgments
Part I: Bounding Texts
1: Text in Context
What Is Language?
What Is Text?
2: Corpus Building
Texts Are Not People
Balance, Range, and Representativeness
Text Metadata
Authors and Audiences
Time and Location
Domains and Media
Text Data
Languages and Dialects
Genres and Topics
Registers and Styles
Redrawing Boundaries
Part II: Prerequisites
3: Computing Basics
Brass Tacks
Coding Environments
Data Objects, Types, and Structures
Dialects of R
Control Processes: Functions, Loops, and Apply
Installing and Loading Packages
Using Python in R
Data Visualization
Where to from Here
4: Math Basics
The Fundamentals
Comparing Vectors
Dot Product
Euclidean Distance and Cosine Similarity
Correlation
Regression
Comparing Distributions
Central Tendency
Dispersion
Types of Distributions
Our Dear Friend, the Matrix
Matrix Projection
Vector Spaces and Singular Value Decomposition
Graphs and Matrix Projection
A Little Math Goes a Long Way
Part III: Foundations
5: Acquiring Text
Public Text Datasets
Optical Character Recognition
Automated Audio Transcription
Application Programming Interfaces (APIs)
Automated Web Scraping
Legal and Ethical Side of Scraping
Terms of Service
Intellectual Property
Individual and Organizational Privacy
6: From Text to Numbers
Units of Analysis
Tokenizing
Chunking
Document Features
Sparsity
Dedicated DTM Functions
Token Distributions
Zipfβs Law and Herdan-heapsβ Law
Weighting and Norming
Relative Term Frequency
Term Frequency/inverse Document Frequency
Summary of Weightings
Term Features
Dimension Reduction
Part IV: Below the Document
7: Wrangling Words
Character Encoding
Markup Characters
Removing and Replacing Characters
βmisspelledβ Words
Removing Words and Stoplists
Replacing Words
Stemming
Lemmatizing
Lemmatizing in French
Wrangling Workflow
8: Tagging Words
Dictionary Tagging
Named-Entity Recognition
How Named-entity Recognition Works
Named Entities in R
Part-of-Speech and Dependency Parsing
How Pos Tagging Works
Part-of-speech Tagging in R
Dependency Parsing
Part-of-speech Tagging for French
Part V: The Document and Beyond
9: Core Deductive
Discrete Indicators
Weighted Indicators
Frequency-weighted
Term-weghted Dictionaries
Selecting and Building Dictionaries
Pre-built Dictionaries
Building Dictionaries With Supervised Learning
10: Core Inductive
Document Similarity
One-mode Projections
Euclidean Distances and Cosine Similarities
Document Clustering
Hierarchical Clustering
K-means Clustering
Topic Modeling
Topic Modeling With Lsa
Topic Modeling With Lda
11: Extended Inductive
Inference and Topic Models
Predicting Topics With Covariates
Topic Prevalence, Conditional on Covariates
Topic Content, Conditional on Covariates
Word Embeddings: The First Generation
Word Embedding Basics
Weighting the Tcm
Dimension Reduction With Singular Value Decomposition
Word Embeddings: The Next Generation
The Global Approach: Glove
The Neural Network Approach: Cbow, Sngs, and Fasttext
Contextualized Embeddings: Elmo and Bert
Inductive Analysis With Word Embeddings
Semantic Change
Semantic Directions and Semantic Centroids
Word Moverβs Distance
12: Extended Deductive
Supervision and Validation
Representing Objects as Features
Splitting Corpora
Classic Training With Supervision
Logistic Regression
Naive Bayes
Training With Neural Networks
Deductive Analysis With Pretrained Models
Neural Networks With Pretrained Embeddings
Concept Moverβs Distance With Pretrained Embeddings
Retrofitting Pretrained Embeddings for Deductive Analysis
Inference With Text Networks
Building Text Networks
Centrality
Degree Centrality
Betweenness Centrality
Network Backbones
Univariate Network Inference
Multivariate Network Inference
13: Project Workflow and Iteration
The Paradox of the Complete Map
Containerize Our Projects
Memoing and Datasheets
Repeating, Replicating, and Simulating the Null
Knowledge Takes a Village
Appendix
References
Index
π SIMILAR VOLUMES
Addressing how national immigration concerns play out at urban, rural, and suburban levels in the state of New York, this special issue of Social Text offers new insight into an area of study that has long been focused primarily on cities. As new Latino/a immigrants change the culture and social fab
<span>Online communities generate massive volumes of natural language data and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it. </span><span>Text Mining: A Guidebook for the Social Sciences</span><span> brings together