<p>The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subse
Feature Engineering and Selection: A Practical Approach for Predictive Models
β Scribed by Max Kuhn, Kjell Johnson
- Publisher
- CRC Press
- Year
- 2021
- Tongue
- English
- Leaves
- 314
- Series
- Chapman & Hall/CRC Data Science
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.
β¦ Table of Contents
-
Introduction
A Simple Example
Important Concepts
A More Complex Example
Feature Selection
An Outline of the Book
Computing -
Illustrative Example: Predicting Risk of Ischemic Stroke
Splitting
Preprocessing
Exploration
Predictive Modeling Across Sets
Other Considerations
Computing -
A Review of the Predictive Modeling Process
Illustrative Example: OkCupid Profile Data
Measuring Performance
Data Splitting
Resampling
Tuning Parameters and Overfitting
Model Optimization and Tuning
Comparing Models Using the Training Set
Feature Engineering Without Overfitting
Summary
Computing -
Exploratory Visualizations
Introduction to the Chicago Train Ridership Data
Visualizations for Numeric Data: Exploring Train Ridership Data
Visualizations for Categorical Data: Exploring the OkCupid Data
Post Modeling Exploratory Visualizations
Summary
Computing -
Encoding Categorical Predictors
Creating Dummy Variables for Unordered Categories
Encoding Predictors with Many Categories
Approaches for Novel Categories
Supervised Encoding Methods
Encodings for Ordered Data
Creating Features from Text Data
Factors versus Dummy Variables in Tree-Based Models
Summary
Computing -
Engineering Numeric Predictors
Transformations
Many Transformations
Many: Many Transformations
Summary
Computing -
Detecting Interaction Effects
Guiding Principles in the Search for Interactions
Practical Considerations
The Brute-Force Approach to Identifying Predictive Interactions
Approaches when Complete Enumeration is Practically Impossible
Other Potentially Useful Tools
Summary
Computing -
Handling Missing Data
Understanding the Nature and Severity of Missing Information
Models that are Resistant to Missing Values
Deletion of Data
Encoding Missingness
Imputation methods
Special Cases
Summary
Computing -
Working with Profile Data
Illustrative Data: Pharmaceutical Manufacturing Monitoring
What are the Experimental Unit and the Unit of Prediction?
Reducing Background
Reducing Other Noise
Exploiting Correlation
Impacts of Data Processing on Modeling
Summary
Computing -
Feature Selection Overview
Goals of Feature Selection
Classes of Feature Selection Methodologies
Effect of Irrelevant Features
Overfitting to Predictors and External Validation
A Case Study
Next Steps
Computing -
Greedy Search Methods
Illustrative Data: Predicting Parkinsonβs Disease
Simple Filters
Recursive Feature Elimination
Stepwise Selection
Summary
Computing -
Global Search Methods
Naive Bayes Models
Simulated Annealing
Genetic Algorithms
Test Set Results
Summary
Computing
β¦ Subjects
Predictive Models; Data Visualization; Feature Engineering; Categorical Variables; R; Greedy Algorithms; Search Algorithms; Feature Selection
π SIMILAR VOLUMES
<p>The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subse
I found the root cause of many challenges faced by my students who recently transitioned into data science and machine learning. I have tried to address these issues in my book and would like to dedicate this book to all my students for all the love and respect I have received.
<span>A unique and comprehensive text on the philosophy of model-based data analysis and strategy for the analysis of empirical data. The book introduces information theoretic approaches and focuses critical attention on a priori modeling and the selection of a good approximating model that best rep