<p><b>Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.</b><p><b>Key Features</b><li>Explore the depths of data science, from data collection through to visu
Data Science with Python: Combine Python with machine learning principles to discover hidden patterns in raw data
โ Scribed by Chopra, Rohan; England, Aaron; Alaudeen, Mohamed Noordeen
- Publisher
- Packt Publishing
- Year
- 2019
- Tongue
- English
- Leaves
- 448
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.
Key Features
Book Description
Data Science with Python begins by introducing you to data science and teaches you to install the packages you need to create a data science coding environment. You will learn three major techniques in machine learning: unsupervised learning, supervised learning, and reinforcement learning. You will also explore basic classification and regression techniques, such as support vector machines, decision trees, and logistic regression.
As you make your way through chapters, you will study the basic functions, data structures, and...
โฆ Table of Contents
Preface
About the Book
About the Authors
Learning Objectives
Audience
Approach
Minimum Hardware Requirements
Software Requirements
Installation and Setup
Using Kaggle for Faster Experimentation
Conventions
Installing the Code Bundle
Chapter 1
Introduction to Data Science and Data Pre-Processing
Introduction
Python Libraries
Roadmap for Building Machine Learning Models
Data Representation
Independent and Target Variables
Exercise 1: Loading a Sample Dataset and Creating the Feature Matrix and Target Matrix
Data Cleaning
Exercise 2: Removing Missing Data
Exercise 3: Imputing Missing Data
Exercise 4: Finding and Removing Outliers in Data
Data Integration
Exercise 5: Integrating Data
Data Transformation
Handling Categorical Data
Exercise 6: Simple Replacement of Categorical Data with a Number
Exercise 7: Converting Categorical Data to Numerical Data Using Label Encoding
Exercise 8: Converting Categorical Data to Numerical Data Using One-Hot Encoding
Data in Different Scales
Exercise 9: Implementing Scaling Using the Standard Scaler Method
Exercise 10: Implementing Scaling Using the MinMax Scaler Method
Data Discretization
Exercise 11: Discretization of Continuous Data
Train and Test Data
Exercise 12: Splitting Data into Train and Test Sets
Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Performance Metrics
Summary
Chapter 2
Data Visualization
Introduction
Functional Approach
Exercise 13: Functional Approach โ Line Plot
Exercise 14: Functional Approach โ Add a Second Line to the Line Plot
Activity 2: Line Plot
Exercise 15: Creating a Bar Plot
Activity 3: Bar Plot
Exercise 16: Functional Approach โ Histogram
Exercise 17: Functional Approach โ Box-and-Whisker plot
Exercise 18: Scatterplot
Object-Oriented Approach Using Subplots
Exercise 19: Single Line Plot using Subplots
Exercise 20: Multiple Line Plots Using Subplots
Activity 4: Multiple Plot Types Using Subplots
Summary
Chapter 3
Introduction to Machine Learning via Scikit-Learn
Introduction
Introduction to Linear and Logistic Regression
Simple Linear Regression
Exercise 21: Preparing Data for a Linear Regression Model
Exercise 22: Fitting a Simple Linear Regression Model and Determining the Intercept and Coefficient
Exercise 23: Generating Predictions and Evaluating the Performance of a Simple Linear Regression Model
Multiple Linear Regression
Exercise 24: Fitting a Multiple Linear Regression Model and Determining the Intercept and Coefficients
Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
Logistic Regression
Exercise 25: Fitting a Logistic Regression Model and Determining the Intercept and Coefficients
Exercise 26: Generating Predictions and Evaluating the Performance of a Logistic Regression Model
Exercise 27: Tuning the Hyperparameters of a Multiple Logistic Regression Model
Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
Max Margin Classification Using SVMs
Exercise 28: Preparing Data for the Support Vector Classifier (SVC) Model
Exercise 29: Tuning the SVC Model Using Grid Search
Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
Decision Trees
Activity 8: Preparing Data for a Decision Tree Classifier
Exercise 30: Tuning a Decision Tree Classifier Using Grid Search
Exercise 31: Programmatically Extracting Tuned Hyperparameters from a Decision Tree Classifier Grid Search Model
Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
Random Forests
Exercise 32: Preparing Data for a Random Forest Regressor
Activity 10: Tuning a Random Forest Regressor
Exercise 33: Programmatically Extracting Tuned Hyperparameters and Determining Feature Importance from a Random Forest Regressor Grid Search Model
Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
Summary
Chapter 4
Dimensionality Reduction and Unsupervised Learning
Introduction
Hierarchical Cluster Analysis (HCA)
Exercise 34: Building an HCA Model
Exercise 35: Plotting an HCA Model and Assigning Predictions
K-means Clustering
Exercise 36: Fitting k-means Model and Assigning Predictions
Activity 12: Ensemble k-means Clustering and Calculating Predictions
Exercise 37: Calculating Mean Inertia by n_clusters
Exercise 38: Plotting Mean Inertia by n_clusters
Principal Component Analysis (PCA)
Exercise 39: Fitting a PCA Model
Exercise 40: Choosing n_components using Threshold of Explained Variance
Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
Exercise 41: Visual Comparison of Inertia by n_clusters
Supervised Data Compression using Linear Discriminant Analysis (LDA)
Exercise 42: Fitting LDA Model
Exercise 43: Using LDA Transformed Components in Classification Model
Summary
Chapter 5
Mastering Structured Data
Introduction
Boosting Algorithms
Gradient Boosting Machine (GBM)
XGBoost (Extreme Gradient Boosting)
Exercise 44: Using the XGBoost library to Perform Classification
XGBoost Library
Controlling Model Overfitting
Handling Imbalanced Datasets
Activity 14: Training and Predicting the Income of a Person
External Memory Usage
Cross-validation
Exercise 45: Using Cross-validation to Find the Best Hyperparameters
Saving and Loading a Model
Exercise 46: Creating a Python Pcript that Predicts Based on Real-time Input
Activity 15: Predicting the Loss of Customers
Neural Networks
What Is a Neural Network?
Optimization Algorithms
Hyperparameters
Keras
Exercise 47: Installing the Keras library for Python and Using it to Perform Classification
Keras Library
Exercise 48: Predicting Avocado Price Using Neural Networks
Categorical Variables
One-hot Encoding
Entity Embedding
Exercise 49: Predicting Avocado Price Using Entity Embedding
Activity 16: Predicting a Customer's Purchase Amount
Summary
Chapter 6
Decoding Images
Introduction
Images
Exercise 50: Classify MNIST Using a Fully Connected Neural Network
Convolutional Neural Networks
Convolutional Layer
Pooling Layer
Adam Optimizer
Cross-entropy Loss
Exercise 51: Classify MNIST Using a CNN
Regularization
Dropout Layer
L1 and L2 Regularization
Batch Normalization
Exercise 52: Improving Image Classification Using Regularization Using CIFAR-10 images
Image Data Preprocessing
Normalization
Converting to Grayscale
Getting All Images to the Same Size
Other Useful Image Operations
Activity 17: Predict if an Image Is of a Cat or a Dog
Data Augmentation
Generators
Exercise 53: Classify CIFAR-10 Images with Image Augmentation
Activity 18: Identifying and Augmenting an Image
Summary
Chapter 7
Processing Human Language
Introduction
Text Data Processing
Regular Expressions
Exercise 54: Using RegEx for String Cleaning
Basic Feature Extraction
Text Preprocessing
Exercise 55: Preprocessing the IMDB Movie Review Dataset
Text Processing
Exercise 56: Creating Word Embeddings Using Gensim
Activity 19: Predicting Sentiments of Movie Reviews
Recurrent Neural Networks (RNNs)
LSTMs
Exercise 57: Performing Sentiment Analysis Using LSTM
Activity 20: Predicting Sentiments from Tweets
Summary
Chapter 8
Tips and Tricks of the Trade
Introduction
Transfer Learning
Transfer Learning for Image Data
Exercise 58: Using InceptionV3 to Compare and Classify Images
Activity 21: Classifying Images using InceptionV3
Useful Tools and Tips
Train, Development, and Test Datasets
Working with Unprocessed Datasets
pandas Profiling
TensorBoard
AutoML
Exercise 59: Get a Well-Performing Network Using Auto-Keras
Model Visualization Using Keras
Activity 22: Using Transfer Learning to Predict Images
Summary
Appendix
Chapter 1: Introduction to Data Science and Data Preprocessing
Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
Chapter 2: Data Visualization
Activity 2: Line Plot
Activity 3: Bar Plot
Activity 4: Multiple Plot Types Using Subplots
Chapter 3: Introduction to Machine Learning via Scikit-Learn
Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
Activity 8: Preparing Data for a Decision Tree Classifier
Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
Activity 10: Tuning a Random Forest Regressor
Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
Chapter 4: Dimensionality Reduction and Unsupervised Learning
Activity 12: Ensemble k-means Clustering and Calculating Predictions
Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
Chapter 5: Mastering Structured Data
Activity 14: Training and Predicting the Income of a Person
Activity 15: Predicting the Loss of Customers
Activity 16: Predicting a Customer's Purchase Amount
Chapter 6: Decoding Images
Activity 17: Predict if an Image Is of a Cat or a Dog
Activity 18: Identifying and Augmenting an Image
Chapter 7: Processing Human Language
Activity 19: Predicting Sentiments of Movie Reviews
Activity 20: Predicting Sentiments from Tweets
Chapter 8: Tips and Tricks of the Trade
Activity 21: Classifying Images using InceptionV3
Activity 22: Using Transfer Learning to Predict Images
๐ SIMILAR VOLUMES
Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled dataKey FeaturesLearn how to select the most suitable Python library to solve your problemCompare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use themDelve
<p><span>Modern systems contain multicore CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source l
<p><span>Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source
<p><span>Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source
Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library f