Principles and Theory for Data Mining and Machine Learning
β Scribed by Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang
- Publisher
- Springer
- Year
- 2009
- Tongue
- English
- Leaves
- 793
- Series
- Springer Series in Statistics
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Extensive treatment of the most up-to-date topics
Provides the theory and concepts behind popular and emerging methods
Range of topics drawn from Statistics, Computer Science, and Electrical Engineering
β¦ Table of Contents
Preface
Variability, Information, and Prediction
The Curse of Dimensionality
The Two Extremes
Perspectives on the Curse
Sparsity
Exploding Numbers of Models
Multicollinearity and Concurvity
The Effect of Noise
Coping with the Curse
Selecting Design Points
Local Dimension
Parsimony
Two Techniques
The Bootstrap
Cross-Validation
Optimization and Search
Univariate Search
Multivariate Search
General Searches
Constraint Satisfaction and Combinatorial Search
Notes
Hammersley Points
Edgeworth Expansions for the Mean
Bootstrap Asymptotics for the Studentized Mean
Exercises
Local Smoothers
Early Smoothers
Transition to Classical Smoothers
Global Versus Local Approximations
LOESS
Kernel Smoothers
Statistical Function Approximation
The Concept of Kernel Methods and the Discrete Case
Kernels and Stochastic Designs: Density Estimation
Stochastic Designs: Asymptotics for Kernel Smoothers
Convergence Theorems and Rates for Kernel Smoothers
Kernel and Bandwidth Selection
Linear Smoothers
Nearest Neighbors
Applications of Kernel Regression
A Simulated Example
Ethanol Data
Exercises
Spline Smoothing
Interpolating Splines
Natural Cubic Splines
Smoothing Splines for Regression
Model Selection for Spline Smoothing
Spline Smoothing Meets Kernel Smoothing
Asymptotic Bias, Variance, and MISE for Spline Smoothers
Ethanol Data Example -- Continued
Splines Redux: Hilbert Space Formulation
Reproducing Kernels
Constructing an RKHS
Direct Sum Construction for Splines
Explicit Forms
Nonparametrics in Data Mining and Machine Learning
Simulated Comparisons
What Happens with Dependent Noise Models?
Higher Dimensions and the Curse of Dimensionality
Notes
Sobolev Spaces: Definition
Exercises
New Wave Nonparametrics
Additive Models
The Backfitting Algorithm
Concurvity and Inference
Nonparametric Optimality
Generalized Additive Models
Projection Pursuit Regression
Neural Networks
Backpropagation and Inference
Barron's Result and the Curse
Approximation Properties
Barron's Theorem: Formal Statement
Recursive Partitioning Regression
Growing Trees
Pruning and Selection
Regression
Bayesian Additive Regression Trees: BART
MARS
Sliced Inverse Regression
ACE and AVAS
Notes
Proof of Barron's Theorem
Exercises
Supervised Learning: Partition Methods
Multiclass Learning
Discriminant Analysis
Distance-Based Discriminant Analysis
Bayes Rules
Probability-Based Discriminant Analysis
Tree-Based Classifiers
Splitting Rules
Logic Trees
Random Forests
Support Vector Machines
Margins and Distances
Binary Classification and Risk
Prediction Bounds for Function Classes
Constructing SVM Classifiers
SVM Classification for Nonlinearly Separable Populations
SVMs in the General Nonlinear Case
Some Kernels Used in SVM Classification
Kernel Choice, SVMs and Model Selection
Support Vector Regression
Multiclass Support Vector Machines
Neural Networks
Notes
Hoeffding's Inequality
VC Dimension
Exercises
Alternative Nonparametrics
Ensemble Methods
Bayes Model Averaging
Bagging
Stacking
Boosting
Other Averaging Methods
Oracle Inequalities
Bayes Nonparametrics
Dirichlet Process Priors
Polya Tree Priors
Gaussian Process Priors
The Relevance Vector Machine
RVM Regression: Formal Description
RVM Classification
Hidden Markov Models -- Sequential Classification
Notes
Proof of Yang's Oracle Inequality
Proof of Lecue's Oracle Inequality
Exercises
Computational Comparisons
Computational Results: Classification
Comparison on Fisher's Iris Data
Comparison on Ripley's Data
Computational Results: Regression
Vapnik's sinc Function
Friedman's Function
Conclusions
Systematic Simulation Study
No Free Lunch
Exercises
Unsupervised Learning: Clustering
Centroid-Based Clustering
K-Means Clustering
Variants
Hierarchical Clustering
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
Theory for Hierarchical Clustering
Partitional Clustering
Model-Based Clustering
Graph-Theoretic Clustering
Spectral Clustering
Bayesian Clustering
Probabilistic Clustering
Hypothesis Testing
Computed Examples
Ripley's Data
Iris Data
Cluster Validation
Notes
Derivatives of Functions of a Matrix:
Kruskal's Algorithm: Proof
Prim's Algorithm: Proof
Exercises
Learning in High Dimensions
Principal Components
Main Theorem
Key Properties
Extensions
Factor Analysis
Finding and
Finding K
Estimating Factor Scores
Projection Pursuit
Independent Components Analysis
Main Definitions
Key Results
Computational Approach
Nonlinear PCs and ICA
Nonlinear PCs
Nonlinear ICA
Geometric Summarization
Measuring Distances to an Algebraic Shape
Principal Curves and Surfaces
Supervised Dimension Reduction: Partial Least Squares
Simple PLS
PLS Procedures
Properties of PLS
Supervised Dimension Reduction: Sufficient Dimensions in Regression
Visualization I: Basic Plots
Elementary Visualization
Projections
Time Dependence
Visualization II: Transformations
Chernoff Faces
Multidimensional Scaling
Self-Organizing Maps
Exercises
Variable Selection
Concepts from Linear Regression
Subset Selection
Variable Ranking
Overview
Traditional Criteria
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
Choices of Information Criteria
Cross Validation
Shrinkage Methods
Shrinkage Methods for Linear Models
Grouping in Variable Selection
Least Angle Regression
Shrinkage Methods for Model Classes
Cautionary Notes
Bayes Variable Selection
Prior Specification
Posterior Calculation and Exploration
Evaluating Evidence
Connections Between Bayesian and Frequentist Methods
Computational Comparisons
The n > p Case
When p > n
Notes
Code for Generating Data in Section 10.5
Exercises
Multiple Testing
Analyzing the Hypothesis Testing Problem
A Paradigmatic Setting
Counts for Multiple Tests
Measures of Error in Multiple Testing
Aspects of Error Control
Controlling the Familywise Error Rate
One-Step Adjustments
Stepwise p-Value Adjustments
PCER and PFER
Null Domination
Two Procedures
Controlling the Type I Error Rate
Adjusted p-Values for PFER/PCER
Controlling the False Discovery Rate
FDR and other Measures of Error
The Benjamini-Hochberg Procedure
A BH Theorem for a Dependent Setting
Variations on BH
Controlling the Positive False Discovery Rate
Bayesian Interpretations
Aspects of Implementation
Bayesian Multiple Testing
Fully Bayes: Hierarchical
Fully Bayes: Decision theory
Notes
Proof of the Benjamini-Hochberg Theorem
Proof of the Benjamini-Yekutieli Theorem
References
Index
π SIMILAR VOLUMES
official instructor's manual for "Principles and Theory for Data Mining and Machine Learning" (2010), obtained right through Springer.com the book is the holy book of the mathematical underpinnings of Machine Learning; you might have some struggles at the beginning, but it certainly pays back. Enjo
Introduction -- Learning and intelligence -- Machine learning basics -- Knowledge representation -- Learning as search -- Attribute quality matters -- Data preprocessing -- Constructive induction -- Symbolic learning -- Statistical learning -- Artificial neural networks -- Cluster analysis -- Learn