Validity, Reliability, and Significance: Empirical Methods for Nlp and Data Science

✍ Scribed by Stefan Riezler, Michael Hagmann

Publisher: Morgan & Claypool
Year: 2021
Tongue: English
Leaves: 165
Series: Synthesis Lectures on Human Language Technologies
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Empirical methods are means to answering methodological questions of empirical sciences by statistical techniques. The methodological questions addressed in this book include the problems of validity, reliability, and significance. In the case of machine learning, these correspond to the questions of whether a model predicts what it purports to predict, whether a model's performance is consistent across replications, and whether a performance difference between two models is due to chance, respectively. The goal of this book is to answer these questions by concrete statistical tests that can be applied to assess validity, reliability, and significance of data annotation and machine learning prediction in the fields of NLP and data science.

Our focus is on model-based empirical methods where data annotations and model predictions are treated as training data for interpretable probabilistic models from the well-understood families of generalized additive models (GAMs) and linear mixed effects models (LMEMs). Based on the interpretable parameters of the trained GAMs or LMEMs, the book presents model-based statistical tests such as a validity test that allows detecting circular features that circumvent learning. Furthermore, the book discusses a reliability coefficient using variance decomposition based on random effect parameters of LMEMs. Last, a significance test based on the likelihood ratio of nested LMEMs trained on the performance scores of two machine learning models is shown to naturally allow the inclusion of variations in meta-parameter settings into hypothesis testing, and further facilitates a refined system comparison conditional on properties of input data.

This book can be used as an introduction to empirical methods for machine learning in general, with a special focus on applications in NLP and data science. The book is self-contained, with an appendix on the mathematical background on GAMs and LMEMs, and with an accompanying webpage including R code to replicate experiments presented in the book.

✦ Table of Contents

Preface
Acknowledgments
Introduction
Empirical Methods in Machine Learning
Scope and Outline of this Book
Intended Readership
Validity
Validity Problems in NLP and Data Science
Bias Features
Illegitimate Features
Circular Features
Theories of Measurement and Validity
The Concept of Validity in Psychometrics
The Theory of Scales of Measurement
Theories of Measurement in Philosophy of Science
Prediction as Measurement
Feature Representations
Measurement Data
Descriptive and Model-Based Validity Tests
Dataset Bias Test
Transformation Invariance Test
A Model-Based Test for Circularity
Notes on Practical Usage
Reliability
Untangling Terminology: Reliability, Agreement, and Others
Performance Evaluation as Measurement
Descriptive and Model-Based Reliability Tests
Agreement Coefficients for Data Annotation
Bootstrap Confidence Intervals for Model Evaluation
Model-Based Reliability Testing
Notes on Practical Usage
Significance
Parametric Significance Tests
Sampling-Based Significance Tests
Bootstrap Resampling
Permutation Tests
Model-Based Significance Testing
The Generalized Likelihood Ratio Test
Likelihood Ratio Tests using LMEMs
Notes on Practical Usage
Mathematical Background
Generalized Additive Models
General Form of Model
Example
Parameter Estimation
Linear Mixed Effects Models
General Form of Model
Example
Parameter Optimization
The Distribution of the Likelihood Ratio Statistic
Score Function and Fisher Information
Taylor Expansion and Asymptotic Distribution
Bibliography
Authors' Biographies

📜 SIMILAR VOLUMES

Validity, Reliability, and Significance:

📁 Validity, Reliability, and Significance: Empirical Methods for NLP and Data Science (Synthesis Lectures on Human Language Technologies)

✍ Stefan Riezler, Michael Hagmann 📂 Library 📅 2024 🏛 Springer; Second Edition 2024 🌐 English

This book introduces empirical methods for machine learning with a special focus on applications in natural language processing (NLP) and data science. The authors present problems of validity, reliability, and significance and provide common solutions based on statistical methodology to so

Validity and Reliability

📁 Validity and Reliability

✍ G. David Garson 📂 Library 📅 2013 🏛 Statistical Associates Publishing 🌐 English

Methods for Statistical Analysis of Reli

📁 Methods for Statistical Analysis of Reliability and Life Data

✍ Nancy R. Mann; Ray E. Schafer; Nozer D. Singpurwalla 📂 Library 📅 1975 🏛 Wiley–Blackwell 🌐 English

Traffic Simulation and Data: Validation

📁 Traffic Simulation and Data: Validation Methods and Applications

✍ Winnie Daamen (Editor); Christine Buisson (Editor); Serge P. Hoogendoorn (Editor 📂 Library 📅 2014 🏛 CRC Press

A single source of information for researchers and professionals, Traffic Simulation and Data: Validation Methods and Applications offers a complete overview of traffic data collection, state estimation, calibration and validation for traffic modelling and simulation. It derives from the Multitud

Traffic simulation and data : validation

📁 Traffic simulation and data : validation methods and applications

✍ Winnie Daamen; Christine Buisson; S P Hoogendoorn 📂 Library 📅 2015 🏛 CRC Press 🌐 English

"This book provides a comprehensive overview of calibration and validation techniques for traffic simulation models. It details the data required as an input for the calibration and validation processes and shows how to increase its applicability using data enhancement techniques. It presents an ext

Statistical Methods for Reliability Data

📁 Statistical Methods for Reliability Data

✍ William Q. Meeker, Luis A. Escobar 📂 Library 📅 1998 🏛 Wiley 🌐 English

Amstat News asked three review editors to rate their top five favorite books in the September 2003 issue. Statistical Methods for Reliability Data was among those chosen. Bringing statistical methods for reliability testing in line with the computer age This volume presents s