✦ LIBER ✦

Applied Predictive Modeling || Data Pre-processing

✍ Scribed by Kuhn, Max; Johnson, Kjell

Book ID: 120344017
Publisher: Springer New York
Year: 2013
Weight: 971 KB
Category: Article
ISBN: 1461468493
DOI: 10.1007/978-1-4614-6849-3_3

No coin nor oath required. For personal study only.

✦ Synopsis

Data pre-processing techniques generally refer to the addition, deletion, or transformation of training set data. Although this text is primarily concerned with modeling techniques, data preparation can make or break a model's predictive ability. Different models have different sensitivities to the type of predictors in the model; how the predictors enter the model is also important. Transformations of the data to reduce the impact of data skewness or outliers can lead to significant improvements in performance. Feature extraction, discussed in Sect. 3.3, is one empirical technique for creating surrogate variables that are combinations of multiple predictors. Additionally, simpler strategies such as removing predictors based on their lack of information content can also be effective.

The need for data pre-processing is determined by the type of model being used. Some procedures, such as tree-based models, are notably insensitive to the characteristics of the predictor data. Others, like linear regression, are not. In this chapter, a wide array of possible methodologies are discussed. For modeling techniques described in subsequent chapters, we will also discuss which, if any, pre-processing techniques can be useful.

This chapter outlines approaches to unsupervised data processing: the outcome variable is not considered by the pre-processing techniques. In other chapters, supervised methods, where the outcome is utilized to pre-process the data, are also discussed. For example, partial least squares (PLS) models are essentially supervised versions of principal component analysis (PCA). We also describe strategies for removing predictors without considering how those variables might be related to the outcome. Chapter 19 discusses techniques for finding subsets of predictors that optimize the ability of the model to predict the response.

How the predictors are encoded, called feature engineering, can have a significant impact on model performance. For example, using combinations of predictors can sometimes be more effective than using the individual values: the ratio of two predictors may be more effective than using two independent

📜 SIMILAR VOLUMES

Applied Predictive Modeling || Data Pre-

Applied Predictive Modeling || Data Pre-processing

✍ Kuhn, Max; Johnson, Kjell 📂 Article 📅 2013 🏛 Springer New York ⚖ 971 KB

Applied Predictive Modeling || Introduct

Applied Predictive Modeling || Introduction

✍ Kuhn, Max; Johnson, Kjell 📂 Article 📅 2013 🏛 Springer New York ⚖ 243 KB

Data Matching || Data Pre-Processing

✍ Christen, Peter 📂 Article 📅 2012 🏛 Springer Berlin Heidelberg 🌐 German ⚖ 451 KB

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains inc

Applied Predictive Modeling || A Short T

Applied Predictive Modeling || A Short Tour of the Predictive Modeling Process

✍ Kuhn, Max; Johnson, Kjell 📂 Article 📅 2013 🏛 Springer New York ⚖ 394 KB

Applied Predictive Modeling || Measuring

Applied Predictive Modeling || Measuring Predictor Importance

✍ Kuhn, Max; Johnson, Kjell 📂 Article 📅 2013 🏛 Springer New York ⚖ 785 KB

Applied Predictive Modeling || Nonlinear

Applied Predictive Modeling || Nonlinear Regression Models

✍ Kuhn, Max; Johnson, Kjell 📂 Article 📅 2013 🏛 Springer New York ⚖ 618 KB