๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

Data Science with Python: Combine Python with machine learning principles to discover hidden patterns in raw data

โœ Scribed by Chopra, Rohan; England, Aaron; Alaudeen, Mohamed Noordeen


Publisher
Packt Publishing
Year
2019
Tongue
English
Leaves
448
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.

Key Features

  • Explore the depths of data science, from data collection through to visualization
  • Learn pandas, scikit-learn, and Matplotlib in detail
  • Study various data science algorithms using real-world datasets

    Book Description

    Data Science with Python begins by introducing you to data science and teaches you to install the packages you need to create a data science coding environment. You will learn three major techniques in machine learning: unsupervised learning, supervised learning, and reinforcement learning. You will also explore basic classification and regression techniques, such as support vector machines, decision trees, and logistic regression.

    As you make your way through chapters, you will study the basic functions, data structures, and...

  • โœฆ Table of Contents


    Preface
    About the Book
    About the Authors
    Learning Objectives
    Audience
    Approach
    Minimum Hardware Requirements
    Software Requirements
    Installation and Setup
    Using Kaggle for Faster Experimentation
    Conventions
    Installing the Code Bundle
    Chapter 1
    Introduction to Data Science and Data Pre-Processing
    Introduction
    Python Libraries
    Roadmap for Building Machine Learning Models
    Data Representation
    Independent and Target Variables
    Exercise 1: Loading a Sample Dataset and Creating the Feature Matrix and Target Matrix
    Data Cleaning
    Exercise 2: Removing Missing Data
    Exercise 3: Imputing Missing Data
    Exercise 4: Finding and Removing Outliers in Data
    Data Integration
    Exercise 5: Integrating Data
    Data Transformation
    Handling Categorical Data
    Exercise 6: Simple Replacement of Categorical Data with a Number
    Exercise 7: Converting Categorical Data to Numerical Data Using Label Encoding
    Exercise 8: Converting Categorical Data to Numerical Data Using One-Hot Encoding
    Data in Different Scales
    Exercise 9: Implementing Scaling Using the Standard Scaler Method
    Exercise 10: Implementing Scaling Using the MinMax Scaler Method
    Data Discretization
    Exercise 11: Discretization of Continuous Data
    Train and Test Data
    Exercise 12: Splitting Data into Train and Test Sets
    Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
    Supervised Learning
    Unsupervised Learning
    Reinforcement Learning
    Performance Metrics
    Summary
    Chapter 2
    Data Visualization
    Introduction
    Functional Approach
    Exercise 13: Functional Approach โ€“ Line Plot
    Exercise 14: Functional Approach โ€“ Add a Second Line to the Line Plot
    Activity 2: Line Plot
    Exercise 15: Creating a Bar Plot
    Activity 3: Bar Plot
    Exercise 16: Functional Approach โ€“ Histogram
    Exercise 17: Functional Approach โ€“ Box-and-Whisker plot
    Exercise 18: Scatterplot
    Object-Oriented Approach Using Subplots
    Exercise 19: Single Line Plot using Subplots
    Exercise 20: Multiple Line Plots Using Subplots
    Activity 4: Multiple Plot Types Using Subplots
    Summary
    Chapter 3
    Introduction to Machine Learning via Scikit-Learn
    Introduction
    Introduction to Linear and Logistic Regression
    Simple Linear Regression
    Exercise 21: Preparing Data for a Linear Regression Model
    Exercise 22: Fitting a Simple Linear Regression Model and Determining the Intercept and Coefficient
    Exercise 23: Generating Predictions and Evaluating the Performance of a Simple Linear Regression Model
    Multiple Linear Regression
    Exercise 24: Fitting a Multiple Linear Regression Model and Determining the Intercept and Coefficients
    Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
    Logistic Regression
    Exercise 25: Fitting a Logistic Regression Model and Determining the Intercept and Coefficients
    Exercise 26: Generating Predictions and Evaluating the Performance of a Logistic Regression Model
    Exercise 27: Tuning the Hyperparameters of a Multiple Logistic Regression Model
    Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
    Max Margin Classification Using SVMs
    Exercise 28: Preparing Data for the Support Vector Classifier (SVC) Model
    Exercise 29: Tuning the SVC Model Using Grid Search
    Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
    Decision Trees
    Activity 8: Preparing Data for a Decision Tree Classifier
    Exercise 30: Tuning a Decision Tree Classifier Using Grid Search
    Exercise 31: Programmatically Extracting Tuned Hyperparameters from a Decision Tree Classifier Grid Search Model
    Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
    Random Forests
    Exercise 32: Preparing Data for a Random Forest Regressor
    Activity 10: Tuning a Random Forest Regressor
    Exercise 33: Programmatically Extracting Tuned Hyperparameters and Determining Feature Importance from a Random Forest Regressor Grid Search Model
    Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
    Summary
    Chapter 4
    Dimensionality Reduction and Unsupervised Learning
    Introduction
    Hierarchical Cluster Analysis (HCA)
    Exercise 34: Building an HCA Model
    Exercise 35: Plotting an HCA Model and Assigning Predictions
    K-means Clustering
    Exercise 36: Fitting k-means Model and Assigning Predictions
    Activity 12: Ensemble k-means Clustering and Calculating Predictions
    Exercise 37: Calculating Mean Inertia by n_clusters
    Exercise 38: Plotting Mean Inertia by n_clusters
    Principal Component Analysis (PCA)
    Exercise 39: Fitting a PCA Model
    Exercise 40: Choosing n_components using Threshold of Explained Variance
    Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
    Exercise 41: Visual Comparison of Inertia by n_clusters
    Supervised Data Compression using Linear Discriminant Analysis (LDA)
    Exercise 42: Fitting LDA Model
    Exercise 43: Using LDA Transformed Components in Classification Model
    Summary
    Chapter 5
    Mastering Structured Data
    Introduction
    Boosting Algorithms
    Gradient Boosting Machine (GBM)
    XGBoost (Extreme Gradient Boosting)
    Exercise 44: Using the XGBoost library to Perform Classification
    XGBoost Library
    Controlling Model Overfitting
    Handling Imbalanced Datasets
    Activity 14: Training and Predicting the Income of a Person
    External Memory Usage
    Cross-validation
    Exercise 45: Using Cross-validation to Find the Best Hyperparameters
    Saving and Loading a Model
    Exercise 46: Creating a Python Pcript that Predicts Based on Real-time Input
    Activity 15: Predicting the Loss of Customers
    Neural Networks
    What Is a Neural Network?
    Optimization Algorithms
    Hyperparameters
    Keras
    Exercise 47: Installing the Keras library for Python and Using it to Perform Classification
    Keras Library
    Exercise 48: Predicting Avocado Price Using Neural Networks
    Categorical Variables
    One-hot Encoding
    Entity Embedding
    Exercise 49: Predicting Avocado Price Using Entity Embedding
    Activity 16: Predicting a Customer's Purchase Amount
    Summary
    Chapter 6
    Decoding Images
    Introduction
    Images
    Exercise 50: Classify MNIST Using a Fully Connected Neural Network
    Convolutional Neural Networks
    Convolutional Layer
    Pooling Layer
    Adam Optimizer
    Cross-entropy Loss
    Exercise 51: Classify MNIST Using a CNN
    Regularization
    Dropout Layer
    L1 and L2 Regularization
    Batch Normalization
    Exercise 52: Improving Image Classification Using Regularization Using CIFAR-10 images
    Image Data Preprocessing
    Normalization
    Converting to Grayscale
    Getting All Images to the Same Size
    Other Useful Image Operations
    Activity 17: Predict if an Image Is of a Cat or a Dog
    Data Augmentation
    Generators
    Exercise 53: Classify CIFAR-10 Images with Image Augmentation
    Activity 18: Identifying and Augmenting an Image
    Summary
    Chapter 7
    Processing Human Language
    Introduction
    Text Data Processing
    Regular Expressions
    Exercise 54: Using RegEx for String Cleaning
    Basic Feature Extraction
    Text Preprocessing
    Exercise 55: Preprocessing the IMDB Movie Review Dataset
    Text Processing
    Exercise 56: Creating Word Embeddings Using Gensim
    Activity 19: Predicting Sentiments of Movie Reviews
    Recurrent Neural Networks (RNNs)
    LSTMs
    Exercise 57: Performing Sentiment Analysis Using LSTM
    Activity 20: Predicting Sentiments from Tweets
    Summary
    Chapter 8
    Tips and Tricks of the Trade
    Introduction
    Transfer Learning
    Transfer Learning for Image Data
    Exercise 58: Using InceptionV3 to Compare and Classify Images
    Activity 21: Classifying Images using InceptionV3
    Useful Tools and Tips
    Train, Development, and Test Datasets
    Working with Unprocessed Datasets
    pandas Profiling
    TensorBoard
    AutoML
    Exercise 59: Get a Well-Performing Network Using Auto-Keras
    Model Visualization Using Keras
    Activity 22: Using Transfer Learning to Predict Images
    Summary
    Appendix
    Chapter 1: Introduction to Data Science and Data Preprocessing
    Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
    Chapter 2: Data Visualization
    Activity 2: Line Plot
    Activity 3: Bar Plot
    Activity 4: Multiple Plot Types Using Subplots
    Chapter 3: Introduction to Machine Learning via Scikit-Learn
    Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
    Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
    Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
    Activity 8: Preparing Data for a Decision Tree Classifier
    Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
    Activity 10: Tuning a Random Forest Regressor
    Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
    Chapter 4: Dimensionality Reduction and Unsupervised Learning
    Activity 12: Ensemble k-means Clustering and Calculating Predictions
    Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
    Chapter 5: Mastering Structured Data
    Activity 14: Training and Predicting the Income of a Person
    Activity 15: Predicting the Loss of Customers
    Activity 16: Predicting a Customer's Purchase Amount
    Chapter 6: Decoding Images
    Activity 17: Predict if an Image Is of a Cat or a Dog
    Activity 18: Identifying and Augmenting an Image
    Chapter 7: Processing Human Language
    Activity 19: Predicting Sentiments of Movie Reviews
    Activity 20: Predicting Sentiments from Tweets
    Chapter 8: Tips and Tricks of the Trade
    Activity 21: Classifying Images using InceptionV3
    Activity 22: Using Transfer Learning to Predict Images


    ๐Ÿ“œ SIMILAR VOLUMES


    Data Science with Python: Combine Python
    โœ Chopra, Rohan;England, Aaron;Alaudeen, Mohamed Noordeen ๐Ÿ“‚ Library ๐Ÿ“… 2019 ๐Ÿ› Packt Publishing ๐ŸŒ English

    <p><b>Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.</b><p><b>Key Features</b><li>Explore the depths of data science, from data collection through to visu

    Applied Unsupervised Learning with Pytho
    โœ Safari, an O'Reilly Media Company.; Johnston, Benjamin; Jones, Aaron; Kruger, Ch ๐Ÿ“‚ Library ๐Ÿ“… 2019 ๐Ÿ› Packt Publishing ๐ŸŒ English

    Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled dataKey FeaturesLearn how to select the most suitable Python library to solve your problemCompare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use themDelve

    Scaling Python with Dask: From Data Scie
    โœ Holden Karau, Mika Kimmins ๐Ÿ“‚ Library ๐Ÿ› O'Reilly Media ๐ŸŒ English

    <p><span>Modern systems contain multicore CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source l

    Scaling Python with Dask: From Data Scie
    โœ Holden Karau, Mika Kimmins ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    <p><span>Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source

    Scaling Python with Dask: From Data Scie
    โœ Holden Karau, Mika Kimmins ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    <p><span>Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source

    Scaling Python with Dask: From Data Scie
    โœ Holden Karau, Mika Kimmins ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› O'Reilly Media ๐ŸŒ English

    Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library f