Online Machine Learning : A Practical Guide with Examples in Python
β Scribed by Eva Bartz; Thomas Bartz-Beielstein
- Publisher
- Springer International Publishing
- Year
- 2024
- Tongue
- English
- Leaves
- 163
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Foreword
Preface
Contents
Contributors
1 Introduction: From Batch to Online Machine Learning
1.1 Streaming Data
1.2 Disadvantages of Batch Learning
1.2.1 Memory Requirements
1.2.2 Drift
1.2.3 New, Unknown Data
1.2.4 Accessibility and Availability of the Data
1.2.5 Other Problems
1.3 Incremental Learning, Online Learning, and Stream Learning
1.4 Transitioning Batch to Online Machine Learning
References
2 Supervised Learning: Classification and Regression
2.1 Classification
2.1.1 Baseline Algorithms
2.1.2 The Naive-Bayes Classifier
2.1.3 Tree-Based Methods
2.1.4 Other Classification Methods
2.2 Regression
2.2.1 Online Linear Regression
2.2.2 Hoeffding Tree Regressor
2.3 Ensemble Methods for OML
2.4 Clustering
2.5 Overview: OML Methods
References
3 Drift Detection and Handling
3.1 Architectures for Drift Detection Methods
3.1.1 Adaptive Estimators
3.1.2 Change Detectors
3.1.3 Ensemble-Based Approaches
3.2 Basic Considerations for Windowing Techniques
3.3 Popular Drift Detection Methods
3.3.1 Statistical Tests for Drift and Change Detection
3.3.2 Control Charts
3.3.3 Adaptive Windowing (ADWIN)
3.3.4 Implicit Drift Detection Algorithms
3.4 OML Algorithms with Drift Detection: Hoeffding-Window Trees
3.4.1 Concept-Adapting Very Fast Decision Trees (CVFDT)
3.4.2 Hoeffding Adaptive Trees (HAT)
3.4.3 Overview: Hoeffding-Window Trees
3.4.4 Overview: HT in River
3.5 Drift Scaling in Online Machine Learning
3.5.1 Statistical Measures in a Sequential Manner
3.5.2 Adapted Scaling Techniques
References
4 Initial Selection and Subsequent Updating of OML Models
4.1 Initial Model Selection
4.2 Updating and Changing the Model
4.2.1 Adding New Features
4.2.2 Manual Model Changes in Response to Drift
4.2.3 Ensuring Model Quality After a Model Update
4.3 Catastrophic Forgetting
4.3.1 Strategies for Dealing with Catastrophic Forgetting
References
5 Evaluation and Performance Measurement
5.1 Data Selection Methods
5.1.1 Holdout Selection
5.1.2 Progressive Validation: Interleaved Test-Then-Train
5.1.3 Machine Learning in Batch Mode with a Prediction Horizon
5.1.4 Landmark Batch Machine Learning with a Prediction Horizon
5.1.5 Window-Batch Method with Prediction Horizon
5.1.6 Online-Machine Learning with a Prediction Horizon
5.1.7 Online-Maschine Learning
5.2 Determining the Training and Test Data Set in the Package spotRiver
5.2.1 Methods for BML und OML
5.2.2 Methods for OML River
5.3 Algorithm (Model) Performance
5.4 Data Stream and Drift Generators
5.4.1 Data Stream Generators in Sklearn
5.4.2 SEA-Drift Generator
5.4.3 Friedman-Drift Generator
5.5 Summary
References
6 Special Requirements for Online Machine Learning Methods
6.1 Missing Data, Imputation
6.2 Categorical Attributes
6.3 Outlier and Anomaly Detection
6.3.1 Additional Anomaly Detection Methods for Time-Series Data
6.3.2 One-Class SVM for Anomaly Detection
6.3.3 Algorithms for Anomaly Detection in river
6.4 Imbalanced Data
6.5 Large Number of Features (Attributes)
6.6 FAIR, Interpretability, and Explainability
References
7 Practical Applications of Online Machine Learning
7.1 Applications and Application Perspectives in Official Statistics
7.1.1 Potentials and Challenges
7.1.2 Compatibility with Quality Criteria
7.1.3 Embedding in the Statistics Production Process
7.1.4 (Online) Machine Learning Applications in Statistical Institutions
7.1.5 Other Applications with Reference to Official Statistics
7.1.6 Summary: OML in Official Statistics
7.2 Industrial Application of OML in the Context of Hot Rolling
7.2.1 Hot Rolling
7.2.2 Machine Learning in Hot Rolling
7.2.3 Drift in Hot Rolling
7.2.4 Application of OML in Hot Rolling
7.2.5 Summary: OML in Hot Rolling
7.3 Summary: Aspects of OML Implementation in Practice
7.3.1 Recommendations for the Implementation Process
7.3.2 Expenditure for Implementation and Maintenance
7.3.3 Application and Diffusion in Practice
7.3.4 Overall Conclusions
References
8 Open-Source Software for Online Machine Learning
8.1 Overview and Description of Software Packages for Online Machine Learning
8.1.1 MOA
8.1.2 RMOA
8.1.3 Stream
8.1.4 River
8.2 Scope of the Software Packages
8.3 Programming Languages: A Brief Comparison
References
9 An Experimental Comparison of Batch and Online Machine Learning Algorithms
9.1 Study: Bike Sharing
9.1.1 Overview: Models
9.1.2 Linear Regression
9.1.3 Gradient Boosting
9.1.4 Hoeffding Regression Trees
9.1.5 Final Comparison of the Bike-Sharing Experiments
9.1.6 Summary: Bike-Sharing Experiments
9.2 Study: Very Large Data Sets With Drift
9.2.1 The Friedman-Drift Data Set
9.2.2 Algorithms
9.2.3 Results
9.3 Study: Drift Scaling in Online Machine Learning
9.4 Summary
References
10 Hyperparameter Tuning
10.1 Hyperparameter Tuning: An Introduction
10.2 The Hyperparameter-Tuning-Software SPOT
10.3 Study: Hyperparameter Tuning of the HATR Algorithm on the Friedman-Drift Data
10.3.1 Loading the Data
10.3.2 Specification of the Preprocessing Model
10.3.3 Selection of the Algorithm to be Tuned and the Default Hyperparameters
10.3.4 Modification of the Default Values for the Hyperparameters
10.3.5 Selection of the Target Function (Loss Function)
10.3.6 Calling the Hyperparameter Tuner SPOT
10.3.7 Visualization with TensorBoard
10.3.8 hatr Tuning Results
10.3.9 Explainability and Understanding
10.4 Summary
References
11 Summary and Outlook
11.1 Necessity for OML Methods
11.2 Recommendations for Using OML in Practice
References
Appendix A Definitions and Explanations
A.1 Gradient Descent
A.2 Bayes' Theorem
A.3 Hoeffding Bound
A.4 Kappa Statistics
Appendix B Supplementary Materials
B.1 Notebooks
B.2 Software
Appendix Glossary
Index
π SIMILAR VOLUMES
<h2><span>Ready to add Machine Learning to your skill stack?</span></h2><span><br>As the second title in the </span><span>Machine Learning From Scratch</span><span> series, this book teaches you </span><span><u>how to code</u></span><span> machine learning models in Python.<br><br></span><span>By wo
Apply machine learning to streaming data with the help of practical examples, and deal with challenges that surround streaming Key Features Work on streaming use cases that are not taught in most data science courses Gain experience with state-of-the-art tools for streaming data Mitigate various