<P>This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model, the
Data Quality and Record Linkage Techniques
โ Scribed by Thomas N. Herzog
- Publisher
- Springer
- Year
- 2007
- Tongue
- English
- Leaves
- 225
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
This book helps practitioners gain a working understanding of issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models, focusing on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. Brief examples are included to show how these techniques work. The second part of the book presents real-world case studies in which one or more of these techniques are applied. These cover a wide variety of application areas, including mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists. The authors also discuss software that has been developed to apply the techniques described here.
Readers will find in this book a mixture of practical advice, mathematical rigor, management insight and philosophy.
โฆ Table of Contents
978-0-387-69505-1_BookFrontMatter_OnlinePDF.pdf
Data Quality and RecordLinkage Techniques
Preface
Contents
About the Authors
978-0-387-69505-1_1_OnlinePDF.pdf
Introduction
Audience and Objective
Scope
Structure
978-0-387-69505-1_1_Part_OnlinePDF.pdf
Part1 Data Quality: What It Is, Why It Is Important, and How to Achieve It
978-0-387-69505-1_2_OnlinePDF.pdf
What Is Data Quality and Why Should We Care?
When Are Data of High Quality?
Why Care About Data Quality?
How Do You Obtain High-Quality Data?
Practical Tips
Where Are We Now?
978-0-387-69505-1_3_OnlinePDF.pdf
Examples of Entities Using Datato their Advantage/Disadvantage
Data Quality as a Competitive Advantage
Data Quality Problems and their Consequences
How Many People Really Live to 100 and Beyond?
Disabled Airplane Pilots -- A Successful Application of Record Linkage
Completeness and Accuracy of a Billing Database: Why It Is Important to the Bottom Line
Where Are We Now?
978-0-387-69505-1_4_OnlinePDF.pdf
Properties of Data Quality and Metrics for Measuring It
Desirable Properties of Databases/Lists
Examples of Merging Two or More Lists and the Issues that May Arise
Metrics Used when Merging Lists
Where Are We Now?
978-0-387-69505-1_5_OnlinePDF.pdf
Basic Data Quality Tools
Data Elements
Requirements Document
A Dictionary of Tests
Deterministic Tests
Probabilistic Tests
Exploratory Data Analysis Techniques
Minimizing Processing Errors
Practical Tips
Where Are We Now?
978-0-387-69505-1_2_Part_OnlinePDF.pdf
Part2 Specialized Tools for Database Improvement
978-0-387-69505-1_6_OnlinePDF.pdf
Mathematical Preliminaries for Specialized Data Quality Techniques
Conditional Independence1
Statistical Paradigms
Capture--Recapture Procedures and Applications
978-0-387-69505-1_7_OnlinePDF.pdf
Automatic Editing and Imputation of Sample Survey Data
Introduction
Early Editing Efforts
Fellegi--Holt Model for Editing
Practical Tips
Imputation
Constructing a Unified Edit/Imputation Model
Implicit Edits -- A Key Construct of Editing Software
Editing Software
Is Automatic Editing Taking Up Too Much Time and Money?
Selective Editing
Tips on Automatic Editing and Imputation
Where Are We Now?
978-0-387-69505-1_8_OnlinePDF.pdf
Record Linkage -- Methodology
Introduction
Why Did Analysts Begin Linking Records?
Deterministic Record Linkage
Probabilistic Record Linkage -- A Frequentist Perspective
Probabilistic Record Linkage -- A Bayesian Perspective
Where Are We Now?
978-0-387-69505-1_9_OnlinePDF.pdf
Estimating the Parametersof the Fellegi--Sunter RecordLinkage Model
Basic Estimation of Parameters Under Simple Agreement/Disagreement Patterns
Parameter Estimates Obtained via Frequency-Based Matching
Parameter Estimates Obtained Using Data from Current Files
Parameter Estimates Obtained via the EM Algorithm
Advantages and Disadvantages of Using the EM Algorithm
General Parameter Estimation Using the EM Algorithm
Where Are We Now?
978-0-387-69505-1_10_OnlinePDF.pdf
Standardization and Parsing
Obtaining and Understanding Computer Files
Standardization of Terms
Parsing of Fields
Where Are We Now?
978-0-387-69505-1_11_OnlinePDF.pdf
Phonetic Coding Systems for Names
Soundex System of Names
NYSIIS Phonetic Decoder
Where Are We Now?
978-0-387-69505-1_12_OnlinePDF.pdf
Blocking
Independence of Blocking Strategies
Blocking Variables
Using Blocking Strategies to Identify Duplicate List Entries
Using Blocking Strategies to Match Records Between Two Sample Surveys
Estimating the Number of Matches Missed
Where Are We Now?
978-0-387-69505-1_13_OnlinePDF.pdf
String Comparator Metrics for Typographical Error
Jaro String Comparator Metric for Typographical Error
Adjusting the Matching Weight for the Jaro String Comparator
Winkler String Comparator Metric for Typographical Error
Adjusting the Weights for the Winkler Comparator Metric
Where are We Now?
978-0-387-69505-1_3_Part_OnlinePDF.pdf
Part3 Record Linkage Case Studies
978-0-387-69505-1_14_OnlinePDF.pdf
Duplicate FHA Single-Family Mortgage Records
A Case Study of Data Problems, Consequences,and Corrective Steps1
Introduction
FHA Case Numbers on Single-Family Mortgages
Duplicate Mortgage Records
Mortgage Records with an Incorrect Termination Status
Estimating the Number of Duplicate Mortgage Records
978-0-387-69505-1_15_OnlinePDF.pdf
Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areas
Biomedical and Genetic Research Studies
Who goes to a Chiropractor?
National Master Patient Index
Provider Access to Immunization Register Securely (PAiRS) System
Studies Required by the Intermodal Surface Transportation Efficiency Act of 1991
Crash Outcome Data Evaluation System
978-0-387-69505-1_16_OnlinePDF.pdf
Constructing List Frames and Administrative Lists
National Address Register of Residences in Canada
USDA List Frame of Farms in the United States
List Frame Development for the US Census of Agriculture
Post-enumeration Studies of US Decennial Census
978-0-387-69505-1_17_OnlinePDF.pdf
Social Security and Related Topics
Hidden Multiple Issuance of Social Security Numbers
How Social Security Stops Benefit Payments after Death
CPS--IRS--SSA Exact Match File
Record Linkage and Terrorism
978-0-387-69505-1_4_Part_OnlinePDF.pdf
Part4 Other Topics
978-0-387-69505-1_18_OnlinePDF.pdf
Confidentiality: Maximizing Accessto Micro-data while Protecting Privacy
Importance of High Quality of Datain the Original File
Documenting Public-use Files
Checking Re-identifiability
Elementary Masking Methods and Statistical Agencies
Protecting Confidentiality of Medical Data
More-Advanced Masking Methods -- Synthetic Datasets
Where Are We Now?
978-0-387-69505-1_19_OnlinePDF.pdf
Review of Record Linkage Software
Government
Commercial
Checklist for Evaluating Record Linkage Software
978-0-387-69505-1_20_OnlinePDF.pdf
Summary Chapter
978-0-387-69505-1_BookBackMatter_OnlinePDF.pdf
Bibliography
Index
๐ SIMILAR VOLUMES
<p><P>This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model,
<p><p>Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domai
<p><p>Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domai
<p><P>Poor data quality can seriously hinder or damage the efficiency and effectiveness of organizations and businesses. The growing awareness of such repercussions has led to major public initiatives like the "Data Quality Act" in the USA and the "European 2003/98" directive of the European Parliam