Elements of Data Science, Machine Learning, and Artificial Intelligence Using R

✍ Scribed by Frank Emmert-Streib, Salissou Moutari, Matthias Dehmer

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 582
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

The textbook provides students with tools they need to analyze complex data using methods from data science, machine learning and artificial intelligence. The authors include both the presentation of methods along with applications using the programming language R, which is the gold standard for analyzing data. The authors cover all three main components of data science: computer science; mathematics and statistics; and domain knowledge. The book presents methods and implementations in R side-by-side, allowing the immediate practical application of the learning concepts. Furthermore, this teaches computational thinking in a natural way. The book includes exercises, case studies, Q&A and examples.

✦ Table of Contents

Preface
Contents
1 Introduction to Learning from Data
1.1 What Is Data Science?
1.2 Converting Data into Knowledge
1.2.1 Big Aims: Big Questions
1.2.2 Generating Insights by Visualization
1.3 Structure of the Book
1.3.1 Part I
1.3.2 Part II
1.3.3 Part III
1.4 Our Motivation for Writing This Book
1.5 How to Use This Book
1.6 Summary
Part I General Topics
2 General Prediction Models
2.1 Introduction
2.2 Categorization of Methods
2.2.1 Properties of the Data
2.2.2 Properties of the Optimization Algorithm
2.2.3 Properties of the Model
2.2.4 Summary
2.3 Overview of Prediction Models
2.4 Causal Model versus Predictive Model
2.5 Explainable AI
2.6 Fundamental Statistical Characteristics of Prediction Models
2.6.1 Example
2.7 Summary
2.8 Exercises
3 General Error Measures
3.1 Introduction
3.2 Motivation
3.3 Fundamental Error Measures
3.4 Error Measures
3.4.1 True-Positive Rate and True-Negative Rate
3.4.2 Positive Predictive Value and Negative Predictive Value
3.4.3 Accuracy
3.4.4 F-Score
3.4.5 False Discovery Rate and False Omission Rate
3.4.6 False-Negative Rate and False-Positive Rate
3.4.7 Matthews Correlation Coefficient
3.4.8 Cohen's Kappa
3.4.9 Normalized Mutual Information
3.4.10 Area Under the Receiver Operator Characteristic Curve
3.5 Evaluation of Outcome
3.5.1 Evaluation of an Individual Method
3.5.2 Comparing Multiple Binary Decision-Making Methods
3.6 Summary
3.7 Exercises
4 Resampling Methods
4.1 Introduction
4.2 Resampling Methods for Error Estimation
4.2.1 Holdout Set
4.2.2 Leave-One-Out CV
4.2.3 K-Fold Cross-Validation
4.3 Extended Resampling Methods for Error Estimation
4.3.1 Repeated Holdout Set
4.3.2 Repeated K-Fold CV
4.3.3 Stratified K-Fold CV
4.4 Bootstrap
4.4.1 Resampling With versus Resampling Without Replacement
4.5 Subsampling
4.6 Different Types of Prediction Data Sets
4.7 Sampling from a Distribution
4.8 Standard Error
4.9 Summary
4.10 Exercises
5 Data
5.1 Introduction
5.2 Data Types
5.2.1 Genomic Data
5.2.2 Network Data
5.2.3 Text Data
5.2.4 Time-to-Event Data
5.2.5 Business Data
5.3 Summary
Part II Core Methods
6 Statistical Inference
6.1 Exploratory Data Analysis and Descriptive Statistics
6.1.1 Data Structure
6.1.2 Data Preprocessing
6.1.3 Summary Statistics and Presentation of Information
6.1.4 Measures of Location
6.1.4.1 Sample Mean
6.1.4.2 Trimmed Sample Mean
6.1.4.3 Sample Median
6.1.4.4 Quartile
6.1.4.5 Percentile
6.1.4.6 Mode
6.1.4.7 Proportion
6.1.5 Measures of Scale
6.1.5.1 Sample Variance
6.1.5.2 Range
6.1.5.3 Interquartile Range
6.1.6 Measures of Shape
6.1.6.1 Skewness
6.1.6.2 Kurtosis
6.1.7 Data Transformation
6.1.8 Example: Summary of Data and EDA
6.2 Sample Estimators
6.2.1 Point Estimation
6.2.2 Unbiased Estimators
6.2.3 Biased Estimators
6.2.4 Sufficiency
6.3 Bayesian Inference
6.3.1 Conjugate Priors
6.3.2 Continuous Parameter Estimation
6.3.2.1 Example: Continuous Bayesian Inference Using R
6.3.3 Discrete Parameter Estimation
6.3.4 Bayesian Credible Intervals
6.3.5 Prediction
6.3.6 Model Selection
6.4 Maximum Likelihood Estimation
6.4.1 Asymptotic Confidence Intervals for MLE
6.4.2 Bootstrap Confidence Intervals for MLE
6.4.3 Meaning of Confidence Intervals
6.5 Expectation-Maximization Algorithm
6.5.1 Example: EM Algorithm
6.6 Summary
6.7 Exercises
7 Clustering
7.1 Introduction
7.2 What Is Clustering?
7.3 Comparison of Data Points
7.3.1 Distance Measures
7.3.2 Similarity Measures
7.4 Basic Principle of Clustering Algorithms
7.5 Non-hierarchical Clustering Methods
7.5.1 K-Means Clustering
7.5.2 K-Medoids Clustering
7.5.3 Partitioning Around Medoids (PAM)
7.6 Hierarchical Clustering
7.6.1 Dendrograms
7.6.2 Two Types of Dissimilarity Measures
7.6.3 Linkage Functions for Agglomerative Clustering
7.6.4 Example
7.7 Defining Feature Vectors for General Objects
7.8 Cluster Validation
7.8.1 External Criteria
7.8.2 Assessing the Numerical Values of Indices
7.8.3 Internal Criteria
7.9 Summary
7.10 Exercises
8 Dimension Reduction
8.1 Introduction
8.2 Feature Extraction
8.2.1 An Overview of PCA
8.2.2 Geometrical Interpretation of PCA
8.2.3 PCA Procedure
8.2.4 Underlying Mathematical Problems in PCA
8.2.5 PCA Using Singular Value Decomposition
8.2.6 Assessing PCA Results
8.2.7 Illustration of PCA Using R
8.2.8 Kernel PCA
8.2.9 Discussion
8.2.10 Non-negative Matrix Factorization
8.2.10.1 NNMF Using the Frobenius Norm as Objective Function
8.2.10.2 NNMF Using the Generalized Kullback-Leibler Divergence as Objective Function
8.2.10.3 Example of NNMF Using R
8.3 Feature Selection
8.3.1 Filter Methods Using Mutual Information
8.4 Summary
8.5 Exercises
9 Classification
9.1 Introduction
9.2 What Is Classification?
9.3 Common Aspects of Classification Methods
9.3.1 Basic Idea of a Classifier
9.3.2 Training and Test Data
9.3.3 Error Measures
9.3.3.1 Error Measures for Multi-class Classification
9.4 Naive Bayes Classifier
9.4.1 Educational Example
9.4.2 Example
9.5 Linear Discriminant Analysis
9.5.1 Extensions
9.6 Logistic Regression
9.7 k-Nearest Neighbor Classifier
9.8 Support Vector Machine
9.8.1 Linearly Separable Data
9.8.2 Nonlinearly Separable Data
9.8.3 Nonlinear Support Vector Machines
9.8.4 Examples
9.9 Decision Tree
9.9.1 What Is a Decision Tree?
9.9.1.1 Three Principal Steps to Get a Decision Tree
9.9.2 Step 1: Growing a Decision Tree
9.9.3 Step 2: Assessing the Size of a Decision Tree
9.9.3.1 Intuitive Approach
9.9.3.2 Formal Approach
9.9.4 Step 3: Pruning a Decision Tree
9.9.4.1 Alternative Way to Construct Optimal Decision Trees: Stopping Rules
9.9.5 Predictions
9.10 Summary
9.11 Exercises
10 Hypothesis Testing
10.1 Introduction
10.2 What Is Hypothesis Testing?
10.3 Key Components of Hypothesis Testing
10.3.1 Step 1: Select Test Statistic
10.3.2 Step 2: Null Hypothesis H0 and AlternativeHypothesis H1
10.3.3 Step 3: Sampling Distribution
10.3.3.1 Examples
10.3.4 Step 4: Significance Level α
10.3.5 Step 5: Evaluate the Test Statistic from Data
10.3.6 Step 6: Determine the p-Value
10.3.7 Step 7: Make a Decision about the Null Hypothesis
10.4 Type 2 Error and Power
10.4.1 Connections between Power and Errors
10.5 Confidence Intervals
10.5.1 Confidence Intervals for a Population Mean with Known Variance
10.5.2 Confidence Intervals for a Population Mean with Unknown Variance
10.5.3 Bootstrap Confidence Intervals
10.6 Important Hypothesis Tests
10.6.1 Student's t-Test
10.6.1.1 One-Sample t-Test
10.6.1.2 Two-Sample t-Test
10.6.1.3 Extensions
10.6.2 Correlation Tests
10.6.3 Hypergeometric Test
10.6.3.1 Null Hypothesis and Sampling Distribution
10.6.3.2 Examples
10.6.4 Finding the Correct Hypothesis Test
10.7 Permutation Tests
10.8 Understanding versus Applying Hypothesis Tests
10.9 Historical Notes and Misinterpretations
10.10 Summary
10.11 Exercises
11 Linear Regression Models
11.1 Introduction
11.1.1 What Is Linear Regression?
11.1.2 Motivating Example
11.2 Simple Linear Regression
11.2.1 Ordinary Least Squares Estimation of Coefficients
11.2.2 Variability of the Coefficients
11.2.3 Testing the Necessity of Coefficients
11.2.4 Assessing the Quality of a Fit
11.3 Preprocessing
11.4 Multiple Linear Regression
11.4.1 Testing the Necessity of Coefficients
11.4.2 Assessing the Quality of a Fit
11.5 Diagnosing Linear Models
11.5.1 Error Assumptions
11.5.2 Linearity Assumption of the Model
11.5.3 Leverage Points
11.5.4 Outliers
11.5.5 Collinearity
11.5.6 Discussion
11.6 Advanced Topics
11.6.1 Interactions
11.6.2 Nonlinearities
11.6.3 Categorical Predictors
11.6.4 Generalized Linear Models
11.6.4.1 How to Determine Which Family to Use When Fitting a GLM
11.6.4.2 Advantages of GLMs over Traditional OLS Regression
11.6.4.3 Example: Poisson Regression
11.6.4.4 Example: Logistic Regression
11.7 Summary
11.8 Exercises
12 Model Selection
12.1 Introduction
12.2 Difference Between Model Selection and Model Assessment
12.3 General Approach to Model Selection
12.4 Model Selection for Multiple Linear Regression Models
12.4.1 R2 and Adjusted R2
12.4.2 Mallow's Cp Statistic
12.4.3 Akaike's Information Criterion (AIC) and Schwarz's BIC
12.4.4 Best Subset Selection
12.4.5 Stepwise Selection
12.4.5.1 Forward Stepwise Selection
12.4.5.2 Backward Stepwise Selection
12.5 Model Selection for Generalized Linear Models
12.5.1 Negative Binomial Regression Model
12.5.2 Zero-Inflated Poisson Model
12.5.3 Quasi-Poisson Model
12.5.4 Comparison of GLMs
12.6 Model Selection for Bayesian Models
12.7 Nonparametric Model Selection for General Models with Resampling
12.8 Summary
12.9 Exercises
Part III Advanced Topics
13 Regularization
13.1 Introduction
13.2 Preliminaries
13.2.1 Preprocessing and Norms
13.2.2 Data
13.2.3 R Packages for Regularization
13.3 Ridge Regression
13.3.1 Example
13.4 Non-negative Garrote Regression
13.5 LASSO
13.5.1 Example
13.5.2 Explanation of Variable Selection
13.5.3 Discussion
13.5.4 Limitations
13.6 Ridge Regression
13.7 Dantzig Selector
13.8 Adaptive LASSO
13.8.1 Example
13.9 Elastic Net
13.9.1 Example
13.9.2 Discussion
13.10 Group LASSO
13.10.1 Example
13.10.2 Remarks
13.11 Discussion
13.12 Summary
13.13 Exercises
14 Deep Learning
14.1 Introduction
14.2 Architectures of Classical Neural Networks
14.2.1 Mathematical Model of an Artificial Neuron
14.2.2 Feedforward Neural Networks
14.2.3 Recurrent Neural Networks
14.2.3.1 Hopfield Networks
14.2.3.2 Boltzmann Machine
14.2.4 Overview of General Network Architectures
14.3 Deep Feedforward Neural Networks
14.3.1 Example: Deep Feedforward Neural Networks
14.4 Convolutional Neural Networks
14.4.1 Basic Components of a CNN
14.4.1.1 Convolutional Layer
14.4.1.2 Pooling Layer
14.4.1.3 Fully Connected Layer
14.4.2 Important Variants of CNN
14.4.3 Example: CNN
14.5 Deep Belief Networks
14.5.1 Pre-training Phase: Unsupervised
14.5.2 Fine-Tuning Phase: Supervised
14.6 Autoencoder
14.6.1 Example: Denoising and Variational Autoencoder
14.7 Long Short-Term Memory Networks
14.7.1 LSTM Network Structure with Forget Gate
14.7.2 Peephole LSTM
14.7.3 Applications
14.7.4 Example: LSTM
14.8 Discussion
14.8.1 General Characteristics of Deep Learning
14.8.2 Explainable AI
14.8.3 Big Data versus Small Data
14.8.4 Advanced Models
14.9 Summary
14.10 Exercises
15 Multiple Testing Corrections
15.1 Introduction
15.2 Preliminaries
15.2.1 Formal Setting
15.2.2 Simulations Using R
15.2.3 Focus on Pairwise Correlations
15.2.4 Focus on a Network Correlation Structure
15.2.5 Application of Multiple Testing Procedures
15.3 Motivation of the Problem
15.3.1 Theoretical Considerations
15.3.2 Experimental Example
15.4 Types of Multiple Testing Procedures
15.4.1 Single-Step versus Stepwise Approaches
15.4.2 Adaptive versus Nonadaptive Approaches
15.4.3 Marginal versus Joint Multiple Testing Procedures
15.5 Controlling the FWER
15.5.1 Šidák Correction
15.5.2 Bonferroni Correction
15.5.3 Holm Correction
15.5.4 Hochberg Correction
15.5.5 Hommel Correction
15.5.5.1 Examples
15.5.6 Westfall-Young Procedure
15.6 Controlling the FDR
15.6.1 Benjamini-Hochberg Procedure
15.6.1.1 Example
15.6.2 Adaptive Benjamini-Hochberg Procedure
15.6.3 Benjamini-Yekutieli Procedure
15.6.3.1 Example
15.6.4 Benjamini-Krieger-Yekutieli Procedure
15.6.5 Blanchard-Roquain Procedure
15.6.5.1 BR-1S Procedure
15.6.5.2 BR-2S Procedure
15.7 Computational Complexity
15.8 Comparison
15.9 Summary
15.10 Exercises
16 Survival Analysis
16.1 Introduction
16.2 Motivation
16.2.1 Effect of Chemotherapy: Breast Cancer Patients
16.2.2 Effect of Medication: Agitation
16.3 Censoring
16.4 General Characteristics of a Survival Function
16.5 Nonparametric Estimator for the Survival Function
16.5.1 Kaplan-Meier Estimator for the Survival Function
16.5.2 Nelson-Aalen Estimator for the Survival Function
16.6 Comparison of Two Survival Curves
16.6.1 Log-Rank Test
16.7 Hazard Function
16.7.1 Weibull Model
16.7.2 Exponential Model
16.7.3 Log-Logistic Model
16.7.4 Log-Normal Model
16.7.5 Interpretation of Hazard Functions
16.8 Cox Proportional Hazard Model
16.8.1 Why Is the Model Called a Proportional Hazard Model?
16.8.2 Interpretation of General Hazard Ratios
16.8.3 Adjusted Survival Curves
16.8.4 Testing the Proportional Hazard Assumption
16.8.4.1 Graphical Evaluation
16.8.4.2 Goodness-of-Fit Test
16.8.5 Parameter Estimation of the CPHM via Maximum Likelihood
16.8.5.1 Case Without Ties
16.8.5.2 Case with Ties
16.9 Stratified Cox Model
16.9.1 Testing No-Interaction Assumption
16.9.2 Case of Many Covariates Violating thePH Assumption
16.10 Survival Analysis Using R
16.10.1 Comparison of Survival Curves
16.10.2 Analyzing a Cox Proportional Hazard Model
16.10.3 Testing the PH Assumption
16.10.4 Hazard Ratios
16.11 Further Reading
16.12 Summary
16.13 Exercises
17 Foundations of Learning from Data
17.1 Introduction
17.2 Computational and Statistical Learning Theory
17.2.1 Probabilistic Learnability
17.2.2 Probably Approximately Correct (PAC) Learning
17.2.2.1 Example: Rectangle Learning
17.2.2.2 General Bound for a Finite Hypothesis Space H: The Inconsistent Case
17.2.3 Vapnik-Chervonenkis (VC) Theory
17.2.3.1 Example: One-dimensional Intervals
17.2.3.2 Example: Axis-Aligned Rectangles
17.3 Importance of Bias for Learning
17.4 Learning as Optimization Problem
17.4.1 Empirical Risk Minimization
17.4.2 Structural Risk Minimization
17.5 Fundamental Theorem of Statistical Learning
17.6 Discussion
17.7 Modern Machine Learning Paradigms
17.7.1 Semi-supervised Learning
17.7.1.1 Methodological Approaches
17.7.2 One-Class Classification
17.7.2.1 Methodological Approaches
17.7.3 Positive-Unlabeled Learning
17.7.3.1 Methodological Approaches
17.7.4 Few/One-Shot Learning
17.7.4.1 Methodological Approaches
17.7.5 Transfer Learning
17.7.5.1 Methodological Approaches
17.7.6 Multi-Task Learning
17.7.6.1 Methodological Approaches
17.7.7 Multi-Label Learning
17.7.7.1 Methodological Approaches
17.8 Summary
17.9 Exercises
18 Generalization Error and Model Assessment
18.1 Introduction
18.2 Overall View of Model Diagnosis
18.3 Expected Generalization Error
18.4 Bias-Variance Trade-Off
18.5 Error-Complexity Curves
18.5.1 Example: Linear Polynomial Regression Model
18.5.2 Example: Error-Complexity Curves
18.5.3 Interpretation of Error-Complexity Curves
18.6 Learning Curves
18.6.1 Example: Learning Curves for Linear Polynomial Regression Models
18.6.2 Interpretation of Learning Curves
18.7 Discussion
18.8 Summary
18.9 Outlook
18.10 Exercises
References
Index

📜 SIMILAR VOLUMES

Elements of Data Science, Machine Learni

📁 Elements of Data Science, Machine Learning, and Artificial Intelligence Using R

✍ Frank Emmert-Streib; Salissou Moutari; Matthias Dehmer 📂 Library 📅 2023 🏛 Springer International Publishing 🌐 English

ARTIFICIAL INTELLIGENCE BUSINESS APPLICA

📁 ARTIFICIAL INTELLIGENCE BUSINESS APPLICATIONS: How to Learn Applied Artificial Intelligence and Use Data Science for Business. Includes Data Analytics, Machine Learning for Business and Python

✍ FORD, WILLIAM J. 📂 Library 📅 2020 🌐 English

Statistics with Julia: Fundamentals for

📁 Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence

✍ Yoni Nazarathy; Hayden Klok 📂 Library 📅 2021 🏛 Springer Cham 🌐 English

Statistics with Julia: Fundamentals for

📁 Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence

✍ Yoni Nazarathy, Hayden Klok 📂 Library 📅 2021 🏛 Springer Cham 🌐 English

<div>This monograph uses the Julia language to guide the reader through an exploration of the fundamental concepts of probability and statistics, all with a view of mastering machine learning, data science, and artificial intelligence. The text does not require any prior statistical knowledge and on

Big Data and Artificial Intelligence: Co

📁 Big Data and Artificial Intelligence: Complete Guide to Data Science, AI, Big Data and Machine Learning.

✍ Weber, Hans 📂 Library 📅 2020 🌐 English

The Era of Artificial Intelligence, Mach

📁 The Era of Artificial Intelligence, Machine Learning, and Data Science in the Pharmaceutical Industry

✍ Stephanie K. Ashenden (editor) 📂 Library 📅 2021 🏛 Academic Press 🌐 English

<p><i>The Era of Artificial Intelligence, Machine Learning and Data Science in the Pharmaceutical Industry</i> examines the drug discovery process, assessing how new technologies have improved effectiveness. Artificial intelligence and machine learning are considered the future for a wide range of d