Modern Data Mining with Python
✍ Scribed by Dushyant Singh Sengar; Vikash Chandra
- Publisher
- BPB Publications
- Year
- 2024
- Tongue
- English
- Leaves
- 438
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
"Modern Data Mining with Python" is a guidebook for responsibly implementing data mining techniques that involve collecting, storing, and analyzing large amounts of structured and unstructured data to extract useful insights and patterns.
Enter into the world of data mining and machine learning. Use insights from various data sources, from social media to credit card transactions. Master statistical tools, explore data trends, and patterns. Understand decision trees and artificial neural networks (ANNs). Manage high-dimensional data with dimensionality reduction. Explore binary classification with logistic regression. Spot concealed patterns with unsupervised learning. Analyze text with recurrent neural networks (RNNs) and visuals with convolutional neural networks (CNNs). Ensure model compliance with regulatory standards.
After reading this book, readers will be equipped with the skills and knowledge necessary to use Python for data mining and analysis in an industry...
✦ Table of Contents
Cover
Title Page
Copyright Page
Dedication Page
Foreword
About the Authors
About the Reviewers
Acknowledgement
Preface
Table of Contents
1. Understanding Data Mining in a Nutshell
Introduction
Structure
Objectives
What defines modern data mining
The lifecycle: Data to insights consumption
Understanding pattern recognition
Significance of the human learning process
The human learning process and mental models
Data: The key ingredient for meaningful patterns and relationships
How machines leverage data to build models
Machine learning process
Two dominant strategies: Classification and regression
Biases and learning shortfalls
Measuring learning accuracy and balancing trade-offs
Can data size and sample impact learning
How do humans benefit from data and learning
Modern-day data mining challenges and possible remediation
Conclusion
Points to remember
2. Basic Statistics and Exploratory Data Analysis
Introduction
Structure
Objectives
Setting up Python 3.x
Data mining and statistics
Statistics: Foundation, key terms, needs, and types
Descriptive statistics
Graphical and non-graphical exploratory data analysis
Non-graphical and graphical representation of univariate data
Non-graphical representation of multivariate data
Graphical representation of multivariate data
Probability theory
Probability distribution
Inferential statistics
Hypothesis testing with commonly used statistical tests
Introduction to Time Series Data
Exploratory data analysis: HMDA case study
Conclusion
Points to remember
3. Digging into Linear Regression
Introduction
Structure
Objectives
Linear regression
Background
Under the hood
Challenges and assumptions including multi-collinearity
Detailed EDA
Dataset description
Missing value treatment
Outlier analysis
Correlation
Checking on the assumptions of linear regression
Feature selection
Regression execution and results
Regression result interpretation
Optimization algorithm
Gradient descent
Regularization
Lasso regression
Ridge regression
Elastic-Net regression
MLflow introduction: Need and implementation
MLflow experiment tracking
Case study
Conclusion
Points to remember
4. Exploring Logistic Regression
Introduction
Structure
Objectives
Logistic regression
Background
Under the hood
Data
Estimating probabilities
Loss function
Challenges and assumptions
Logistic regression result and interpretation
Model interpretability and explainability
Performance metrics
Model generalization
K-fold cross-validation
Ensemble learning
Model lifecycle processes
Model development process
Case study: Loan repayment likelihood prediction
Conclusion
Points to remember
5. Decision Trees with Bagging and Boosting
Introduction
Structure
Objectives
Decision trees
Background
Under the hood
Data
Model
Loss function
Challenges and assumptions
Decision tree result and interpretation
Ensembling: Bagging, boosting, and stacking
Random forest
Gradient boosting
Ensembling using the stacking method
Conclusion
Points to remember
6. Support Vector Machines and K-Nearest Neighbors
Introduction
Structure
Objectives
Classification algorithms with a twist
Background
Under the hood
Data
Model
Loss function: Achieving optimal algorithmic results
Challenges and assumptions
Case study: Predicting customer propensity to subscribe to a term deposit
Conclusion
Points to remember
7. Putting Dimensionality Reduction into Action
Introduction
Structure
Objectives
Dimensionality reduction
Background
Under the dimensionality reduction hood
Data
Model: Reducing dimensions and variance
Principal component analysis
Linear discriminant analysis
t-distributed Stochastic Neighbor Embedding
Loss: Measuring Variance Reduction
Challenges and assumptions
Case study: Predicting loan repayment propensity using logistic regression, PCA, and LDA
PCA parameters and interpretation
LDA parameters and interpretation
Logistic regression
Conclusion
Further reading
Points to remember
8. Beginning with Unsupervised Models
Introduction
Structure
Objectives
Unsupervised learning
Background
Unsupervised learning techniques
Data
Model: Building meaningful clusters and profiling them
K-means clustering
Density-based spatial clustering of applications with noise
Hierarchical clustering
Loss: Efficiently achieving the optimal number of clusters
Challenges and assumptions
Case study: Bank customer portfolio segmentation
Advanced unsupervised learning: A primer
Conclusion
Points to remember
9. Structured Data Classification using Artificial Neural Networks
Introduction
Structure
Objectives
Artificial neural network
Background
Under the hood of neural networks
Data
Model
Loss function: Achieving optimal results
Back-propagation and regularization
Challenges and assumptions
Case study: Explainable and Interpretable ANN Model
Interpretable and explainable AI using SHAP and PiML
Conclusion
Points to remember
10. Language Modeling with Recurrent Neural Networks
Introduction
Structure
Objectives
Language modeling
Background
Under the hood of language modeling
Data: From spoken languages to modeling datasets
Model: The language with context
Recurrent neural network
Long short term memory
Loss: Quest for the best model
Challenges and assumptions related to text data and model
Case study: Customer complaint classification explained with LIME
Rise of transformers: A primer on BERT and GPT
Conclusion
Further reading
Points to remember
11. Image Processing with Convolutional Neural Networks
Introduction
Structure
Objectives
Deep learning for computer vision tasks
Background
Under the hood of CNN models
Data
Model
Loss: How to achieve optimal results
Challenges and assumptions
The race for the best model and transfer learning: A primer
Case study: PDF document parser
Conclusion
Further reading
Points to remember
12. Understanding Model Risk Management for Data Mining Models
Introduction
Structure
Objectives
Data mining challenges and risks
Why do model risks occur
Introduction to Model Risk Management
Key regulatory frameworks
Pillars of Model Risk Management
Introduction to Model Operations
ModelOps: Product first vs. model first mindset
How ModelOps facilitates MRM
Case study: Regulatory requirement fulfillment using MRM and ModelOps
Conclusion
Points to remember
13. Adopting ModelOps to Manage Model Risk
Introduction
Structure
Objectives
Model risk management for fair banking
Background
Case study: Fair lending model lifecycle implementation - concept to inference
Fair lending model lifecycle
Data
Model Operations tools primer
Architecting the model lifecycle using ModelOps
Fair Lending Risk Assessment: The application
Challenges and assumptions
Future of AI and its practitioners
Conclusion
Further reading
Points to remember
Index
📜 SIMILAR VOLUMES
<p><b>Harness the power of Python to analyze data and create insightful predictive models</b></p> <h2>About This Book</h2><ul><li>Learn data mining in practical terms, using a wide variety of libraries and techniques</li><li>Learn how to find, manipulate, and analyze data using Python</li><li>Step-b
<p><b>Harness the power of Python to analyze data and create insightful predictive models</b></p> <h2>About This Book</h2><ul><li>Learn data mining in practical terms, using a wide variety of libraries and techniques</li><li>Learn how to find, manipulate, and analyze data using Python</li><li>Step-b