With the primary goal of expanding access to spatial data science tools, this book offers dozens of minimal or low-code functions and tutorials designed to ease the implementation of fully reproducible Spatial Socio-Econometric Modeling (SSEM) analyses. Designed as a University of Pennsylvania Ph.D.
Spatial Socio-econometric Modeling (SSEM): A Low-Code Toolkit for Spatial Data Science and Interactive Visualizations Using R
✍ Scribed by Manuel S. González Canché
- Publisher
- Springer
- Year
- 2023
- Tongue
- English
- Leaves
- 532
- Series
- Springer Texts in Social Sciences
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
With the primary goal of expanding access to spatial data science tools, this book offers dozens of minimal or low-code functions and tutorials designed to ease the implementation of fully reproducible Spatial Socio-Econometric Modeling (SSEM) analyses. Designed as a University of Pennsylvania Ph.D. level course for sociologists, political scientists, urban planners, criminologists, and data scientists, this textbook equips social scientists with all concepts, explanations, and functions required to strengthen their data storytelling. It specifically provides social scientists with a comprehensive set of open-access minimal code tools to:
•Identify and access place-based longitudinal and cross-sectional data sources and formats•Conduct advanced data management, including crosswalks, joining, and matching
•Fully connect social network analyses with geospatial statistics•Formulate research questions designed to account for place-based factors in model specification and assess their relevance compared to individual- or unit-level indicators•Estimate distance measures across units that follow road network paths •Create sophisticated and interactive HTML data visualizations cross-sectionally or longitudinally, to strengthen research storytelling capabilities•Follow best practices for presenting spatial analyses, findings, and implications•Master theories on neighborhood effects, equality of opportunity, and geography of (dis)advantage that undergird SSEM applications and methods•Assess multicollinearity issues via machine learning that may affect coefficients' estimates and guide the identification of relevant predictors•Strategize how to address feedback loops by using SSEM as an identification framework that can be merged with standard quasi-experimental techniques like propensity score models, instrumental variables, and difference in differences•Expand the SSEM analyses to connections that emerge via social interactions, such as co-authorship and advice networks, or any form of relational data
The applied nature of the book along with the cost-free, multi-operative R software makes the usability and applicability of this textbook worldwide.
✦ Table of Contents
Preface
Unique Contribution to Social Sciences
SSEM Statistical Modeling Culture
Level at Which This Book is Aimed
References
Acknowledgements
Contents
Acronyms
List of Figures
List of Tables
Code Listings for Replication Exercises
Part I Conceptual and Theoretical Underpinnings
1 SPlaces
SPlaces: Spaces, Places, and Spatial Socioeconometric Modeling
Spaces
Places
SPlaces
Inequality in Mobility Prospects
Measuring Inequality and Growing Inequality
Neighborhood Effects and Concentration of (dis)Advantages
Splace-Based Modeling Challenges
Causality in Spatial Modeling
Individual and Place-based Multicollinearity
Closing Thoughts and Next Steps
Next Steps
Discussion Questions
References
2 Operationalizing SPlaces
Delimiting and Operationalizing Neighborhoods as Splaces
Representing Physical Spaces and Nesting Structures
Zooming in Across Administrative Boundaries
Shapefiles as Spaces
Elements of a Shapefile
Place-Based Indicators Contributing to Building Splaces
Neighborhood Operationalization and Disaggregation
Data Point Differentiation Across Neighborhood Levels
Illustration of Splaces and Data Point Gains
Tradeoffs of Data Point Differences
What Might be the Best Choice?
Bringing Concepts, Shapefiles, and Place-Based Indicators Together
ACS Published or Pre-tabulated Data
Identifying Proxies for Poverty
Identifying Proxies for Median Income
Identifying Proxies for Unemployment
Identifying Proxies for Housing Quality
Identifying Proxies for Family Structure
Closing Thoughts and Next Steps
Next Steps
Discussion Questions
References
3 Data Formats, Coordinate Reference Systems, and Differential Privacy Frameworks
Types of Geo-Referenced Data: Raster and Vector Data
Raster Data
Vector Data
Point Geometries
Line Geometries
Polygon Geometries
Geometries as Layers
Vector to Raster Transformations and Vice Versa
Vector to Raster Data Transformations
Moving From Raster to Vector Data
From Raster to Points
From Raster to Polygons
Coordinate Reference Systems
Elements of CRS
Implications of Distortions Resulting from Map Projections
Why is CRS Harmonization Important?
Commonly Used Coordinate Reference Systems
Differential Privacy Framework (DPF) and Changes to Census Micro Data
Are Differential Privacy and Synthetic Data the Same Privacy Protection Strategy?
What are Differential Privacy Algorithms?
Relevance of Differential Privacy For SSEM
Strategies to Protect Privacy
Methodological Implications of DPF for SSEM
Next Steps
Discussion Questions
References
Part II Data Science SSEM Identification Tools: Distances, Networks, and Neighbors
4 Access and Management of Spatial or Geocoded Data
R Tutorial
Installation
R Infrastructure
Code Rationale
Reading Data from an External Source
Creating Datasets from Within R
Merging Joining Data
Installing Packages
Moving Forward
Reading Polygon Shapefiles
Reading Polygons at the Country Level
Reading Polygons at the County Level
Reading Polygons at the ZIP Code Tabulated Area (ZCTA) Level
Reading Polygons at the Census Tract Level
Reading Polygons at the Block Group Level
Reading Line Shapefiles
All Roads Shapefiles
Primary Roads Shapefiles
Primary and Secondary Roads Shapefiles
Appending Polygon Shapefiles
Reading Point Shapefiles
Point Geocoding or Georeferencing
Batch Geocoding Using Addresses in R
From data.frame to sf Objects
Point Batch Geocoding Using ZCTAs in R
Crosswalking
Lower to Higher Level Crosswalking
Place-Based Data Access at the Polygon Level
Applying for a Census API Key
Poverty Proxy
Median Income
Unemployed in Labor Force (Civilian)
Housing Quality: Plumbing Facilities
Family Structure: Women Led Households
Joining ACS Databases
IRS Data
Place-Based Data Access at the Point Level
Joining Points with Polygons Data
Closing Thoughts
Next Steps
Discussion Questions
Replication Exercises
References
5 Distances
Distances
Geolocated Data: Polygons, Points, or Both?
Why is Distance Estimation Relevant for SSEM?
Data Source and Data Requirements
Projections, Distortions, and Bias Concerns?
Network Analysis Tools and Data Transformations
Approaches to Distance Connections Identification
Multiple Sources of Points
Matrix to Edgelist Transformations
Transformations Using One Unit Type
Transformations Using Two Unit Types
Summary and Next Steps for Distance Calculations
As the Crow Flies Distance Calculations
From a Matrix to a List of Connections (Edgelist) with Distances
As the Crow Flies'' Distances Including Multiple Unit Types
Network Route Distance Calculations: As Humans Walk
As Humans Walk Distances Between Two Points
humanswalktwo Function Application
humanswalktwo(...) Function Components
BatchAs Humans Walk'' Distances
As Humans Walk'' Batch Data Requirements
Network Transformations
humanswalkbatch Function Application
Batch Walking Distances Among Units of the Same Type
Batch Walking Distances Among Units of Different Types
Moving Beyond Single Counties and States
Navigation/Travel Time DistancesTravel Distance'' Data Format and Requirements
Types of Travel Time Estimates Supported
traveltimes Function Applications
Comparing Travel Times with Google Maps
Closing Thoughts
Next Steps
Discussion Questions
Replication Exercises
References
6 Geographical Networks as Identification Tools
Neighboring Structures and Networks
What is a Network and How is it Different from or Similar to Neighboring Structures?
From Distances (or Travel Times) to Networks and Neighboring Structures
Point-Based Network and Neighboring Structures Identification Rules
Radius-Based Approach
Kth Closest Neighbor(s) Approach
Inverse Distances
From Neighboring Structures to Weights
Code Application
Moving Forward and Beyond These Standard Identification Approaches
Crow Flies Versus Road Networks Distances
Crow Flies
Applying Road Networks Distances: ``As Humans Walk''
Using our Own Network Distances (and/or Travel Times) to Identify Neighboring Structures
rad Function
klosest Function
radkthrow Function
radkthinv Function
Moving Forward
Identifying Neighboring Structures Among Different Types of Units
Identifying the Local Presence of Units of Different Type
What is Commuting Distance?
Visualization
Indirect Neighboring Structures
Transformation Application to Real Data
Moving Forward and Feedback Loops
Two-Mode Kth Closest Identification and Selection
Identification Application
Adding a Threshold to Nearby Limits and Place Heterogeneities
Feedback Loops and Self-selection
Polygons and Matrices of Influence
Rook's
Bishop's
Queen's
Application
Higher Order Neighbors
Closing Thoughts
Next Steps
Discussion Questions
Replication Exercises
References
Part III SSEM Hypothesis Testing of Cross-Sectional and Spatio-Temporal Data and Interactive Visualizations
7 SODA: Spatial Outcome Dependence or Autocorrelation
SODA: Spatial Outcome Dependence or Autocorrelation
Why Is SODA Statistically Concerning?
Assessing SODA Based on Polygon Data
Moran's I Regression Approach
Moran's I Code Application with Polygon Data
Is the First Order Neighboring Structure Enough?
Combining Higher Order Neighboring Structures
Decision Selection Process to Model Higher Order Neighbors
Assessing SODA Based on Point Data
Machine Learning Tools to Assess SODA Decadence
Moran's I Code Application with One-Mode Point Data
Data Source and Outcome of Interest
Neighboring Structures and Weight Matrix
Analytic Steps
Code Application
Moran's I Code Application with Two-Mode Point Data
Two- To One-Mode Transformations and Rationale
Analytic Steps
Causal Chains Through Spillovers in SSEM
Local Moran's I
Visualizing Local Moran's I
Quadrant Representation
Map Representation
Code Application Local Moran's I Polygon Data
Code Application Local Moran's I One-Mode Point Data
Code Application Local Moran's I Two-Mode Point Data
To Retain or Exclude Neighborless Units
Code Application to Exclude Neighborless Units
Social Outcome Dependence or Autocorrelation: SODA 2.0
Relationships in SODA 2.0
Application of SODA 2.0
Adjacency List to Edgelist Transformation
Author's Individual Publication Record
SODA 2.0 Interpretation
Moving Forward with SODA 2.0
Next Steps
Discussion Questions
Replication Exercises
References
8 SSEM Regression Based Analyses
Residual SODA and the Importance of Spatial Regression Modeling
SODA Mechanisms in Regression Residuals
Testing for RSODA
Simultaneous Autoregressive (SAR) Modeling
Mechanisms and Implications of RSODA
SAR Application to Polygon Data
Assessing Whether RSODA was Handled
Building a SAR Model While Addressing Place-based Multicollinearity
Application of Feature Selection Via Random Forests
Application Simultaneous Autoregressive Models
Revisiting the Notion of Splaces and Data Point Gains
SAR Application to Two-Mode Point Data
Data Preparation and Transformations
Outcome Indicators and Feature Selection Rationale
Two-mode to One-mode Transformations
Neighborless Units and Decision Making
Feature Selection with Point Data
Building SAR Model While Addressing Place-based Multicollinearity
Multilevel SAR Models
Multilevel Data
How Does SAR differ from Multilevel SAR?
Statistical Description of Multilevel SAR
Polygon Matrices of Influence 數瑥浤慳栠Higher Level Matrix M
Point Matrices of Influence 數瑥浤慳栠Lower-Level Matrix W
Δ Matrix to Account for Fixed Group (or Nesting) Effects
Bringing the the Three Matrices Together
Multilevel SAR Function Application
Two-Mode Multilevel SAR
Multilevel SAR Results: Two Mode
Multilevel SAR Results: One Mode
SAR or Multilevel SAR
Testing for Spatial Heterogeneity Via Geographically Weighted Regression
How Does SAR differ from GW Approaches?
Distance and Travel Time Matrices and Kernel Functions
Do we Need GW?: GW Multiscale Summary Statistics
WG Summary Execution
Geographically Weighted Regression and Visualization
Bootstrap Strategy
WG Regression Execution
Mapping of GW Results
Spatio-Temporal SAR: A Difference in Differences Application
Spatio-Temporal Data or Panel Data with Spatial Information
Testing for RSODA in Panel SAR
SAR Panel Set Up
SAR Panel Data Source and Setting
Identification
Falsification Test Identification
SAR Panel Application
SAR Panel Function
Falsification Tests
SAR and Multilevel SAR with Social Data
Multilevel SAR Constrains for SODA 2.0
Social Multilevel SAR
socialmultilevelSAR(...) Application
Closing Thoughts and Next Steps
Discussion Questions
Replication Exercises
References
9 Visualization, Mining, and Density Analyses of Spatial and Spatio-Temporal Data
SSEM Visualizations
Polygon Data Visualization
polymap(...) Function Application
Exploratory Spatio-Temporal Data Mining and Visualization
spatiopanelvisual(...) Implementation
Point Data Visualization
pointmap(...) Function Application
Geospatial Point Density
Methodological Approach
What Questions May We Address with Geospatial Point Density?
Findings Research Question 1
Findings Research Question 2
Code Application for the Maps
Next Steps in Gesopatial Point Density
Geographical Network Visualizations
Data Sources
Preparation Rationale
geographicalnetworks(...) Application
Two-Mode Networks Application
Two- to One-Mode Transformation
One-Mode Geographical Networks Procedures
Closing Thoughts
Discussion Questions
Replication Exercises
References
10 Final Words
References
Glossary
Index
📜 SIMILAR VOLUMES
<p>This contributed volume applies spatial and space-time econometric methods to spatial interaction modeling. The first part of the book addresses general cutting-edge methodological questions in spatial econometric interaction modeling, which concern aspects such as coefficient interpretation, con
"The book fills a void in the literature and available software, providing a crucial link for students and professionals alike to engage in the analysis of spatial and spatio-temporal health data from a Bayesian perspective using R. The book emphasizes the use of MCMC via Nimble, BRugs, and CARBAyes
<p><i>Spatial Analysis Using Big Data: Methods and Urban Applications </i>helps readers understand the most powerful, state-of-the-art spatial econometric methods, focusing particularly on urban research problems. The methods represent a cluster of potentially transformational socio-economic modelin
The Spatial Fiscal Impact Analysis Method is an innovative approach to measure fiscal impact and project the future costs of a proposed development, recognizing that all revenues and expenditures are spatially related. The Spatial Method focuses on estimating existing fiscal impacts of detailed land