๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

Python Feature Engineering Cookbook: Over 70 Recipes for Creating, Engineering, and Transforming Features to Build Machine Learning Models

โœ Scribed by Soledad Galli


Publisher
Packt Publishing Ltd
Year
2020
Tongue
English
Leaves
364
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries

Key Features-:
Discover solutions for feature generation, feature extraction, and feature selection Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries Book Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, youโ€™ll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. Youโ€™ll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, youโ€™ll have discovered tips and practical solutions to all of your feature engineering problems.

โœฆ Table of Contents


Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Chapter 1: Foreseeing Variable Problems When Building ML Models
Technical requirements
Identifying numerical and categorical variables
Getting ready
How to do it...
How it works...
There's more...
See also
Quantifying missing data
Getting ready
How to do it...
How it works...
Determining cardinality in categorical variables
Getting ready
How to do it...
How it works...
There's more...
Pinpointing rare categories in categorical variables
Getting ready
How to do it...
How it works...
Identifying a linear relationship
How to do it...
How it works...
There's more...
See also
Identifying a normal distribution
How to do it...
How it works...
There's more...
See also
Distinguishing variable distribution
Getting ready
How to do it...
How it works...
See also
Highlighting outliers
Getting ready
How to do it...
How it works...
Comparing feature magnitude
Getting ready
How to do it...
How it works...
Chapter 2: Imputing Missing Data
Technical requirements
Removing observations with missing data
How to do it...
How it works...
See also
Performing mean or median imputation
How to do it...
How it works...
There's more...
See also
Implementing mode or frequent category imputation
How to do it...
How it works...
See also
Replacing missing values with an arbitrary number
How to do it...
How it works...
There's more...
See also
Capturing missing values in a bespoke category
How to do it...
How it works...
See also
Replacing missing values with a value at the end of the distribution
How to do it...
How it works...
See also
Implementing random sample imputation
How to do it...
How it works...
See also
Adding a missing value indicator variable
Getting ready
How to do it...
How it works...
There's more...
See also
Performing multivariate imputation by chained equations
Getting ready
How to do it...
How it works...
There's more...
Assembling an imputation pipeline with scikit-learn
How to do it...
How it works...
See also
Assembling an imputation pipeline with Feature-engine
How to do it...
How it works...
See also
Chapter 3: Encoding Categorical Variables
Technical requirements
Creating binary variables through one-hot encoding
Getting ready
How to do it...
How it works...
There's more...
See also
Performing one-hot encoding of frequent categories
Getting ready
How to do it...
How it works...
There's more...
Replacing categories with ordinal numbers
How to do it...
How it works...
There's more...
See also
Replacing categories with counts or frequency of observations
How to do it...
How it works...
There's more...
Encoding with integers in an ordered manner
How to do it...
How it works...
See also
Encoding with the mean of the target
How to do it...
How it works...
See also
Encoding with the Weight of Evidence
How to do it...
How it works...
See also
Grouping rare or infrequent categories
How to do it...
How it works...
See also
Performing binary encoding
Getting ready
How to do it...
How it works...
See also
Performing feature hashing
Getting ready
How to do it...
How it works...
See also
Chapter 4: Transforming Numerical Variables
Technical requirements
Transforming variables with the logarithm
How to do it...
How it works...
See also
Transforming variables with the reciprocal function
How to do it...
How it works...
See also
Using square and cube root to transform variables
How to do it...
How it works...
There's more...
Using power transformations on numerical variables
How to do it...
How it works...
There's more...
See also
Performing Box-Cox transformation on numerical variables
How to do it...
How it works...
See also
Performing Yeo-Johnson transformation on numerical variables
How to do it...
How it works...
See also
Chapter 5: Performing Variable Discretization
Technical requirements
Dividing the variable into intervals of equal width
How to do it...
How it works...
See also
Sorting the variable values in intervals of equal frequency
How to do it...
How it works...
Performing discretization followed by categorical encoding
How to do it...
How it works...
See also
Allocating the variable values in arbitrary intervals
How to do it...
How it works...
Performing discretization with k-means clustering
How to do it...
How it works...
Using decision trees for discretization
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 6: Working with Outliers
Technical requirements
Trimming outliers from the dataset
How to do it...
How it works...
There's more...
Performing winsorization
How to do it...
How it works...
There's more...
See also
Capping the variable at arbitrary maximum and minimum values
How to do it...
How it works...
There's more...
See also
Performing zero-coding โ€“ capping the variable at zero
How to do it...
How it works...
There's more...
See also
Chapter 7: Deriving Features from Dates and Time Variables
Technical requirements
Extracting date and time parts from a datetime variable
How to do it...
How it works...
See also
Deriving representations of the year and month
How to do it...
How it works...
See also
Creating representations of day and week
How to do it...
How it works...
See also
Extracting time parts from a time variable
How to do it...
How it works...
Capturing the elapsed time between datetime variables
How to do it...
How it works...
See also
Working with time in different time zones
How to do it...
How it works...
See also
Chapter 8: Performing Feature Scaling
Technical requirements
Standardizing the features
How to do it...
How it works...
See also
Performing mean normalization
How to do it...
How it works...
There's more...
See also
Scaling to the maximum and minimum values
How to do it...
How it works...
See also
Implementing maximum absolute scaling
How to do it...
How it works...
There's more...
See also
Scaling with the median and quantiles
How to do it...
How it works...
See also
Scaling to vector unit length
How to do it...
How it works...
See also
Chapter 9: Applying Mathematical Computations to Features
Technical requirements
Combining multiple features with statistical operations
Getting ready
How to do it...
How it works...
There's more...
See also
Combining pairs of features with mathematical functions
Getting ready
How to do it...
How it works...
There's more...
See also
Performing polynomial expansion
Getting ready
How to do it...
How it works...
There's more...
See also
Deriving new features with decision trees
Getting ready
How to do it...
How it works...
There's more...
Carrying out PCA
Getting ready
How to do it...
How it works...
See also
Chapter 10: Creating Features with Transactional and Time Series Data
Technical requirements
Aggregating transactions with mathematical operations
Getting ready
How to do it...
How it works...
There's more...
See also
Aggregating transactions in a time window
Getting ready
How to do it...
How it works...
There's more...
See also
Determining the number of local maxima and minima
Getting ready
How to do it...
How it works...
There's more...
See also
Deriving time elapsed between time-stamped events
How to do it...
How it works...
There's more...
See also
Creating features from transactions with Featuretools
How to do it...
How it works...
There's more...
See also
Chapter 11: Extracting Features from Text Variables
Technical requirements
Counting characters, words, and vocabulary
Getting ready
How to do it...
How it works...
There's more...
See also
Estimating text complexity by counting sentences
Getting ready
How to do it...
How it works...
There's more...
Creating features with bag-of-words and n-grams
Getting ready
How to do it...
How it works...
See also
Implementing term frequency-inverse document frequency
Getting ready
How to do it...
How it works...
See also
Cleaning and stemming text variables
Getting ready
How to do it...
How it works...
Other Books You May Enjoy
Index

โœฆ Subjects


python, machine learning, feature engineering


๐Ÿ“œ SIMILAR VOLUMES


Python. 70 recipes for creating engineer
โœ David Markus ๐Ÿ“‚ Library ๐Ÿ“… 0 ๐Ÿ› ProVersus ๐ŸŒ English

Python Feature Engineering Cookbook covers well-demonstrated recipes focused on solutions that will assist machine learning teams in identifying and extracting features to develop highly optimized and enriched machine learning models. This book includes recipes to extract and transform features from

Feature Engineering for Machine Learning
โœ Alice Zheng;Amanda Casari ๐Ÿ“‚ Library ๐Ÿ“… 2018 ๐Ÿ› O'Reilly Media, Inc. ๐ŸŒ English

Feature engineering is a crucial step in the machine-learning pipeline, yet this topic is rarely examined on its own. With this practical book, you'll learn techniques for extracting and transforming features--the numeric representations of raw data--into formats for machine-learning models. Each ch

Feature Engineering for Machine Learning
โœ Dong, Guozhu; Liu, Huan ๐Ÿ“‚ Library ๐Ÿ“… 2018 ๐Ÿ› Taylor and Francis ๐ŸŒ English

"Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the qualit

Feature Engineering for Machine Learning
โœ Alice Zheng, Amanda Casari ๐Ÿ“‚ Library ๐Ÿ“… 2018 ๐Ÿ› Oโ€™Reilly Media ๐ŸŒ English

Feature engineering is a crucial step in the machine-learning pipeline, yet this topic is rarely examined on its own. With this practical book, youโ€™ll learn techniques for extracting and transforming featuresโ€”the numeric representations of raw dataโ€”into formats for machine-learning models. Each chap

Feature Engineering for Machine Learning
โœ Alice Zheng, Amanda Casari ๐Ÿ“‚ Library ๐Ÿ“… 2018 ๐Ÿ› Oโ€™Reilly Media ๐ŸŒ English

Feature engineering is a crucial step in the machine-learning pipeline, yet this topic is rarely examined on its own. With this practical book, youโ€™ll learn techniques for extracting and transforming featuresโ€”the numeric representations of raw dataโ€”into formats for machine-learning models. Each chap

Feature engineering for machine learning
โœ Casari, Amanda;Zheng, Alice ๐Ÿ“‚ Library ๐Ÿ“… 2018 ๐Ÿ› O'Reilly Media, Inc. ๐ŸŒ English

Intro; Copyright; Table of Contents; Preface; Introduction; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgments; Special Thanks from Alice; Special Thanks from Amanda; Chapter 1. The Machine Learning Pipeline; Data; Tasks; Models; Features; Model E