Time series data analysis is increasingly important due to the massive production of such data through the internet of things, the digitalization of healthcare, and the rise of smart cities. As continuous monitoring and data collection become more common, the need for competent time series analysis
Statistical Prediction and Machine Learning
β Scribed by John Tuhao Chen, Clement Lee, Lincy Y. Chen
- Publisher
- Chapman and Hall/CRC
- Year
- 2024
- Tongue
- English
- Leaves
- 315
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Written by an experienced statistics educator and two data scientists, this book unifies conventional statistical thinking and contemporary machine learning framework into a single overarching umbrella over data science. The book is designed to bridge the knowledge gap between conventional statistics and machine learning. It provides an accessible approach for readers with a basic statistics background to develop a mastery of machine learning. The book starts with elucidating examples in Chapter 1 and fundamentals on refined optimization in Chapter 2, which are followed by common supervised learning methods such as regressions, classification, support vector machines, tree algorithms, and range regressions. After a discussion on unsupervised learning methods, it includes a chapter on unsupervised learning and a chapter on statistical learning with data sequentially or simultaneously from multiple resources.
One of the distinct features of this book is the comprehensive coverage of the topics in statistical learning and medical applications. It summarizes the authorsβ teaching, research, and consulting experience in which they use data analytics. The illustrating examples and accompanying materials heavily emphasize understanding on data analysis, producing accurate interpretations, and discovering hidden assumptions associated with various methods.
Key Features:
- Unifies conventional model-based framework and contemporary data-driven methods into a single overarching umbrella over data science.
- Includes real-life medical applications in hypertension, stroke, diabetes, thrombolysis, aspirin efficacy.
- Integrates statistical theory with machine learning algorithms.
- Includes potential methodological developments in data science.
β¦ Table of Contents
Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Preface
List of Figures
List of Tables
1. Two Cultures in Data Science
1.1. Model-based culture
1.2. Data-driven culture
1.3. Intrinsics between the two culture camps
1.3.1. Small sample inference necessitates model assumptions
1.3.2. Prediction accuracy demands large sample
1.3.3. Which camp to go?
1.4. Learning outcome evaluation
1.4.1. Error rates in model-based culture camp
1.4.2. Cost functions in neural networks
1.5. Learning process optimization
1.5.1. Model-based camp
1.5.2. Data-Driven camp
2. Fundamental Instruments
2.1. Data identification
2.1.1. Data types
2.1.2. Pooling data, Simpson's paradox, and solution
2.2. Basic concepts of trees
2.3. Sensitivity, specificity, and ROC curves
2.4. Cross-Validation
2.4.1. LOOCV and Jackknife
2.4.2. LOOCV for linear regressions
2.4.3. K-fold cross-validation and SAS examples
2.5. Bootstrapping
2.5.1. Non-parametric bootstrapping
2.5.2. Parametric bootstrapping
3. Sensitivity and Specificity Trade-off
3.1. Dilemma on false positive and false negative errors
3.2. Most sensitive diagnostic variable
3.3. Two-ended diagnostic measures
3.4. UMEDP with confounding factors
3.5. Efficient and invariant diagnostic predictors
3.5.1. Invariant principle in data transformation
3.5.2. Invariance and efficiency
4. Bias and Variation Trade-off
4.1. Reducible and Irreducible Errors in Prediction
4.2. Minimum variance unbiased estimators
4.3. Minimum risk estimators for transformed data
5. Linear Prediction
5.1. Pitfalls in linear regressions
5.2. Model training and prediction
5.2.1. Building models with training data
5.2.2. Evaluating trained models without normality
5.2.3. Model significance with normal data
5.2.4. Confidence prediction with trained models
5.3. Multiple linear regression
5.3.1. Confounding effects
5.3.2. Information loss and model selection
5.4. Categorical predictors
5.5. Outliers and leverage statistics
6. Nonlinear Prediction
6.1. Restricted optimization and shrinkage
6.1.1. Ridge regression
6.1.2. LASSO regression
6.2. Model Selection and Regularization
6.3. High Dimensional Data
6.3.1. Curse of Dimensionality
6.3.2. Dimension Reduction by Transformation
6.4. Polynomial spline regression
7. Minimum Risk Classification
7.1. Zero-one Loss Classification
7.1.1. Bayesian Discriminant Functions
7.1.2. Logistic regression classification
7.2. General Loss Functions
7.3. Local and Universal Optimizations
7.4. Optimal ROC Classifiers
8. Support Vectors and Duality Theorem
8.1. Maximal Margin Classifier
8.1.1. Hyperplane
8.1.2. Definition of maximal margin classifier
8.2. Support Vector Classifiers
8.3. Support Vector Machine
8.4. Duality Theorem with Perturbation
9. Decision Trees and Range Regressions
9.1. Regression Trees and UMVUE
9.2. Classification Tree
9.2.1. Misclassification Rate
9.2.2. Gini Index
9.2.3. Entropy
9.2.4. UMVUE for homogeneity in classification trees
9.3. Extending regression trees to range regression
10. Unsupervised Learning and Optimization
10.1. K-means Clustering
10.1.1. Clustering with Squared Euclidean Distance
10.1.2. Non-Euclidean Clustering
10.2. Principal Component Analysis
10.2.1. Population Principal Components
10.2.2. Sample principal components
11. Simultaneous Learning and Multiplicity
11.1. Sequential Data
11.1.1. Wald's sequential likelihood ratio test (SPRT)
11.1.2. Two-stage Estimation for Sequential Data
11.2. Simultaneous Learning in Dose-response Studies
11.2.1. Confidence Procedures for Aspirin Efficacy
11.2.2. Confidence Bands on Thrombolysis Effects
11.3. Weighted Simultaneous Confidence Regions
11.3.1. Weighted Hypotheses and Breast Cancer Study
11.3.2. Confidence Sets for Weighted Parameter Arrays
Bibliography
Index
π SIMILAR VOLUMES
Time series data analysis is increasingly important due to the massive production of such data through the internet of things, the digitalization of healthcare, and the rise of smart cities. As continuous monitoring and data collection become more common, the need for competent time series analysis
Time series data analysis is increasingly important due to the massive production of such data through the internet of things, the digitalization of healthcare, and the rise of smart cities. As continuous monitoring and data collection become more common, the need for competent time series analysis
<span>This book is open access under a CC BY 4.0 license<div><br></div><div><div>This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each stat
<span>This book is open access under a CC BY 4.0 license<br>This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool