Discovering Knowledge in Data: An Introduction to Data Mining

✍ Scribed by Larose, Daniel T

Publisher: Wiley-Interscience
Year: 2004
Tongue: English
Leaves: 240
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Learn Data Mining by doing data mining
Data mining can be revolutionary-but only when it's done right. The powerful black box data mining software now available can produce disastrously misleading results unless applied by a skilled and knowledgeable analyst. Discovering Knowledge in Data: An Introduction to Data Mining provides both the practical experience and the theoretical insight needed to reveal valuable information hidden in large data sets.
Employing a "white box" methodology and with real-world case studies, this step-by-step guide walks readers through the various algorithms and statistical structures that underlie the software and presents examples of their operation on actual large data sets. Principal topics include:
Data preprocessing and classification
Exploratory analysis
Decision trees
Neural and Kohonen networks
Hierarchical and k-means clustering
Association rules
* Model evaluation techniques
Complete with scores of screenshots and diagrams to encourage graphical learning, Discovering Knowledge in Data: An Introduction to Data Mining gives students in Business, Computer Science, and Statistics as well as professionals in the field the power to turn any data warehouse into actionable knowledge.

An Instructor's Manual presenting detailed solutions to all the problems in the book is available online.

✦ Table of Contents

DISCOVERING KNOWLEDGE IN DATA......Page 3
CONTENTS......Page 9
PREFACE......Page 13
1 INTRODUCTION TO DATA MINING......Page 19
What Is Data Mining?......Page 20
Need for Human Direction of Data Mining......Page 22
Cross-Industry Standard Process: CRISP–DM......Page 23
Case Study 1: Analyzing Automobile Warranty Claims: Example of the CRISP–DM Industry Standard Process in Action......Page 26
Fallacies of Data Mining......Page 28
Description......Page 29
Estimation......Page 30
Prediction......Page 31
Classification......Page 32
Clustering......Page 34
Association......Page 35
Case Study 2: Predicting Abnormal Stock Market Returns Using Neural Networks......Page 36
Case Study 3: Mining Association Rules from Legal Databases......Page 37
Case Study 4: Predicting Corporate Bankruptcies Using Decision Trees......Page 39
Case Study 5: Profiling the Tourism Market Using k-Means Clustering Analysis......Page 41
References......Page 42
Exercises......Page 43
Why Do We Need to Preprocess the Data?......Page 45
Data Cleaning......Page 46
Handling Missing Data......Page 48
Identifying Misclassifications......Page 51
Graphical Methods for Identifying Outliers......Page 52
Data Transformation......Page 53
Min–Max Normalization......Page 54
Z-Score Standardization......Page 55
Numerical Methods for Identifying Outliers......Page 56
Exercises......Page 57
Hypothesis Testing versus Exploratory Data Analysis......Page 59
Getting to Know the Data Set......Page 60
Dealing with Correlated Variables......Page 62
Exploring Categorical Variables......Page 63
Using EDA to Uncover Anomalous Fields......Page 68
Exploring Numerical Variables......Page 70
Exploring Multivariate Relationships......Page 77
Selecting Interesting Subsets of the Data for Further Investigation......Page 79
Binning......Page 80
Summary......Page 81
Exercises......Page 82
Data Mining Tasks in Discovering Knowledge in Data......Page 85
Statistical Approaches to Estimation and Prediction......Page 86
Univariate Methods: Measures of Center and Spread......Page 87
Statistical Inference......Page 89
Confidence Interval Estimation......Page 91
Bivariate Methods: Simple Linear Regression......Page 93
Dangers of Extrapolation......Page 97
Prediction Intervals for a Randomly Chosen Value of y Given x......Page 98
Multiple Regression......Page 101
Verifying Model Assumptions......Page 103
Exercises......Page 106
Supervised versus Unsupervised Methods......Page 108
Methodology for Supervised Modeling......Page 109
Bias–Variance Trade-Off......Page 111
Classification Task......Page 113
k-Nearest Neighbor Algorithm......Page 114
Distance Function......Page 117
Simple Unweighted Voting......Page 119
Weighted Voting......Page 120
Quantifying Attribute Relevance: Stretching the Axes......Page 121
k-Nearest Neighbor Algorithm for Estimation and Prediction......Page 122
Choosing k......Page 123
Exercises......Page 124
6 DECISION TREES......Page 125
Classification and Regression Trees......Page 127
C4.5 Algorithm......Page 134
Decision Rules......Page 139
Comparison of the C5.0 and CART Algorithms Applied to Real Data......Page 140
Exercises......Page 144
7 NEURAL NETWORKS......Page 146
Input and Output Encoding......Page 147
Simple Example of a Neural Network......Page 149
Sigmoid Activation Function......Page 152
Gradient Descent Method......Page 153
Back-Propagation Rules......Page 154
Example of Back-Propagation......Page 155
Learning Rate......Page 157
Momentum Term......Page 158
Sensitivity Analysis......Page 160
Application of Neural Network Modeling......Page 161
Exercises......Page 163
Clustering Task......Page 165
Hierarchical Clustering Methods......Page 167
Single-Linkage Clustering......Page 168
Complete-Linkage Clustering......Page 169
Example of k-Means Clustering at Work......Page 171
Application of k-Means Clustering Using SAS Enterprise Miner......Page 176
References......Page 179
Exercises......Page 180
Self-Organizing Maps......Page 181
Kohonen Networks......Page 183
Example of a Kohonen Network Study......Page 184
Application of Clustering Using Kohonen Networks......Page 188
Interpreting the Clusters......Page 189
Cluster Profiles......Page 193
Using Cluster Membership as Input to Downstream Data Mining Models......Page 195
Exercises......Page 196
Affinity Analysis and Market Basket Analysis......Page 198
Data Representation for Market Basket Analysis......Page 200
Support, Confidence, Frequent Itemsets, and the A Priori Property......Page 201
How Does the A Priori Algorithm Work (Part 1)? Generating Frequent Itemsets......Page 203
How Does the A Priori Algorithm Work (Part 2)? Generating Association Rules......Page 204
Extension from Flag Data to General Categorical Data......Page 207
J-Measure......Page 208
Application of Generalized Rule Induction......Page 209
When Not to Use Association Rules......Page 211
Do Association Rules Represent Supervised or Unsupervised Learning?......Page 214
Local Patterns versus Global Models......Page 215
Exercises......Page 216
11 MODEL EVALUATION TECHNIQUES......Page 218
Model Evaluation Techniques for the Estimation and Prediction Tasks......Page 219
Error Rate, False Positives, and False Negatives......Page 221
Misclassification Cost Adjustment to Reflect Real-World Concerns......Page 223
Decision Cost/Benefit Analysis......Page 225
Lift Charts and Gains Charts......Page 226
Interweaving Model Evaluation with Model Building......Page 229
Confluence of Results: Applying a Suite of Models......Page 230
Exercises......Page 231
EPILOGUE: “WE’VE ONLY JUST BEGUN”......Page 233
INDEX......Page 235

📜 SIMILAR VOLUMES

Discovering Knowledge in Data An Introdu

📁 Discovering Knowledge in Data An Introduction to Data Mining

✍ Larose D 📂 Library 📅 2005 🏛 Wiley 🌐 English

Discovering Knowledge in Data: An Introd

📁 Discovering Knowledge in Data: An Introduction to Data Mining

✍ Daniel T. Larose 📂 Library 📅 2005 🏛 Wiley-Interscience 🌐 English

Discovering Knowledge in Data: An Introd

📁 Discovering Knowledge in Data: An Introduction to Data Mining

✍ Daniel T. Larose 📂 Library 📅 2004 🏛 Wiley-Interscience 🌐 English

There is a lot to like about this book, but it has some unfortunate flaws. Note that it is part of a Data Mining trilogy. The other two books are: Data Mining Methods and Models and Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. My initial reaction was more negative

Discovering Knowledge in Data: An Introd

📁 Discovering Knowledge in Data: An Introduction to Data Mining

✍ Daniel T. Larose 📂 Library 📅 2004 🏛 Wiley-Interscience 🌐 English

Data mining can be revolutionary-but only when it's done right. The powerful black box data mining software now available can produce disastrously misleading results unless applied by a skilled and knowledgeable analyst. Discovering Knowledge in Data: An Introduction to Data Mining provides both the

Discovering Knowledge in Data: An Introd

📁 Discovering Knowledge in Data: An Introduction to Data Mining

✍ Daniel T. Larose 📂 Library 📅 2004 🏛 Wiley-Interscience 🌐 English

Discovering Knowledge in Data - An Intro

📁 Discovering Knowledge in Data - An Introduction to Data Mining

✍ Larose D.T. 📂 Library 📅 2005 🏛 Wiley 🌐 English