Data Science Projects with Python: A case study approach to successful data science projects using Python, pandas, and scikit-learn

✍ Scribed by Stephen Klosterman

Year: 2019
Tongue: English
Leaves: 374
Category: Library

No coin nor oath required. For personal study only.

✦ Table of Contents

Cover
FM
Copyright
Table of Contents
Preface
Chapter 1: Data Exploration and Cleaning
Introduction
Python and the Anaconda Package Management System
Indexing and the Slice Operator
Exercise 1: Examining Anaconda and Getting Familiar with Python
Different Types of Data Science Problems
Loading the Case Study Data with Jupyter and pandas
Exercise 2: Loading the Case Study Data in a Jupyter Notebook
Getting Familiar with Data and Performing Data Cleaning
The Business Problem
Data Exploration Steps
Exercise 3: Verifying Basic Data Integrity
Boolean Masks
Exercise 4: Continuing Verification of Data Integrity
Exercise 5: Exploring and Cleaning the Data
Data Quality Assurance and Exploration
Exercise 6: Exploring the Credit Limit and Demographic Features
Deep Dive: Categorical Features
Exercise 7: Implementing OHE for a Categorical Feature
Exploring the Financial History Features in the Dataset
Activity 1: Exploring Remaining Financial Features in the Dataset
Summary
Chapter 2: Introduction to Scikit-Learn and Model Evaluation
Introduction
Exploring the Response Variable and Concluding the Initial Exploration
Introduction to Scikit-Learn
Generating Synthetic Data
Data for a Linear Regression
Exercise 8: Linear Regression in Scikit-Learn
Model Performance Metrics for Binary Classification
Splitting the Data: Training and Testing sets
Classification Accuracy
True Positive Rate, False Positive Rate, and Confusion Matrix
Exercise 9: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python
Discovering Predicted Probabilities: How Does Logistic Regression Make Predictions?
Exercise 10: Obtaining Predicted Probabilities from a Trained Logistic Regression Model
The Receiver Operating Characteristic (ROC) Curve
Precision
Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve
Summary
Chapter 3: Details of Logistic Regression and Feature Exploration
Introduction
Examining the Relationships between Features and the Response
Pearson Correlation
F-test
Exercise 11: F-test and Univariate Feature Selection
Finer Points of the F-test: Equivalence to t-test for Two Classes and Cautions
Hypotheses and Next Steps
Exercise 12: Visualizing the Relationship between Features and Response
Univariate Feature Selection: What It Does and Doesn't Do
Understanding Logistic Regression with function Syntax in Python and the Sigmoid Function
Exercise 13: Plotting the Sigmoid Function
Scope of Functions
Why is Logistic Regression Considered a Linear Model?
Exercise 14: Examining the Appropriateness of Features for Logistic Regression
From Logistic Regression Coefficients to Predictions Using the Sigmoid
Exercise 15: Linear Decision Boundary of Logistic Regression
Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients
Summary
Chapter 4: The Bias-Variance Trade-off
Introduction
Estimating the Coefficients and Intercepts of Logistic Regression
Gradient Descent to Find Optimal Parameter Values
Exercise 16: Using Gradient Descent to Minimize a Cost Function
Assumptions of Logistic Regression
The Motivation for Regularization: The Bias-Variance Trade-off
Exercise 17: Generating and Modeling Synthetic Classification Data
Lasso (L1) and Ridge (L2) Regularization
Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters
Exercise 18: Reducing Overfitting on the Synthetic Data Classification Problem
Options for Logistic Regression in Scikit-Learn
Scaling Data, Pipelines, and Interaction Features in Scikit-Learn
Activity 4: Cross-Validation and Feature Engineering with the Case Study Data
Summary
Chapter 5: Decision Trees and Random Forests
Introduction
Decision trees
The Terminology of Decision Trees and Connections to Machine Learning
Exercise 19: A Decision Tree in scikit-learn
Training Decision Trees: Node Impurity
Features Used for the First splits: Connections to Univariate Feature Selection and Interactions
Training Decision Trees: A Greedy Algorithm
Training Decision Trees: Different Stopping Criteria
Using Decision Trees: Advantages and Predicted Probabilities
A More Convenient Approach to Cross-Validation
Exercise 20: Finding Optimal Hyperparameters for a Decision Tree
Random Forests: Ensembles of Decision Trees
Random Forest: Predictions and Interpretability
Exercise 21: Fitting a Random Forest
Checkerboard Graph
Activity 5: Cross-Validation Grid Search with Random Forest
Summary
Chapter 6: Imputation of Missing Data, Financial Analysis, and Delivery to Client
Introduction
Review of Modeling Results
Dealing with Missing Data: Imputation Strategies
Preparing Samples with Missing Data
Exercise 22: Cleaning the Dataset
Exercise 23: Mode and Random Imputation of PAY_1
A Predictive Model for PAY_1
Exercise 24: Building a Multiclass Classification Model for Imputation
Using the Imputation Model and Comparing it to Other Methods
Confirming Model Performance on the Unseen Test Set
Financial Analysis
Financial Conversation with the Client
Exercise 25: Characterizing Costs and Savings
Activity 6: Deriving Financial Insights
Final Thoughts on Delivering the Predictive Model to the Client
Summary
Appendix
Index

📜 SIMILAR VOLUMES

Data Science Projects with Python: A cas

📁 Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning

✍ Stephen Klosterman 📂 Library 📅 2021 🏛 Packt Publishing 🌐 English

Gain hands-on experience in Python programming with industry-standard machine learning tools using pandas, scikit-learn, and XGBoost Key Features • Think critically about data by exploring and cleaning it • Choose an appropriate machine learning model and train it on your data • Communicate da

Data Science Projects with Python: A cas

📁 Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning, 2nd Edition

✍ Stephen Klosterman 📂 Library 📅 2021 🏛 Packt Publishing 🌐 English

<div><div><div><div><b>Gain hands-on experience in Python programming with industry-standard machine learning tools using pandas, scikit-learn, and XGBoost</b></div><div><b><br>Key Features</b><ul><li>Think critically about data by exploring and cleaning it</li><li>Choose an appropriate machine lear

Data Science Projects with Python: A cas

📁 Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning, 2nd Edition

✍ Stephen Klosterman 📂 Library 🏛 Packt Publishing 🌐 English

<p><span>Gain hands-on experience of Python programming with industry-standard machine learning techniques using pandas, scikit-learn, and XGBoost</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Think critically about data and use it to form and test a hypothesis</span></span></li><l

Data Science Projects with Python

📁 Data Science Projects with Python

✍ Stephan Klosterman 📂 Library 📅 2021 🏛 Packt Publishing Pvt. Ltd. 🌐 English

Data Analysis From Scratch With Python:

📁 Data Analysis From Scratch With Python: Beginner Guide using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and Matplotlib

✍ Peters Morgan 📂 Library 📅 2018 🏛 AI Sciences LLC 🌐 English

***** BUY NOW (Will soon return to 25.59) ******Free eBook for customers who purchase the print book from Amazon****** Are you thinking of becoming a data analyst using Python? If you are looking for a complete guide to data analysis using Python language and its library that will help you to become

Python for Data Analysis: Data Wrangling

📁 Python for Data Analysis: Data Wrangling with Pandas, Numpy, and Ipython

✍ Wes McKinney 📂 Library 📅 2017 🏛 O'Reilly Media 🌐 English

Looking for complete instructions on manipulating, processing, cleaning, and crunching structured data in Python? The second edition of this hands-on guide--updated for Python 3.5 and Pandas 1.0--is packed with practical cases studies that show you how to effectively solve a broad set of data analys