𝔖 Scriptorium
✦   LIBER   ✦

📁

Exploratory Data Analysis with Python Cookbook: Over 50 recipes to analyze, visualize, and extract insights from structured and unstructured data

✍ Scribed by Ayodele Oluleye


Publisher
Packt Publishing
Tongue
English
Leaves
382
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Extract valuable insights from data by leveraging various analysis and visualization techniques with this comprehensive guide

Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Gain practical experience in conducting EDA on a single variable of interest in Python
  • Learn the different techniques for analyzing and exploring tabular, time series, and textual data in Python
  • Get well versed in data visualization using leading Python libraries like Matplotlib and seaborn

Book Description

In today's data-centric world, the ability to extract meaningful insights from vast amounts of data has become a valuable skill across industries. Exploratory Data Analysis (EDA) lies at the heart of this process, enabling us to comprehend, visualize, and derive valuable insights from various forms of data.

This book is a comprehensive guide to Exploratory Data Analysis using the Python programming language. It provides practical steps needed to effectively explore, analyze, and visualize structured and unstructured data. It offers hands-on guidance and code for concepts such as generating summary statistics, analyzing single and multiple variables, visualizing data, analyzing text data, handling outliers, handling missing values and automating the EDA process. It is suited for data scientists, data analysts, researchers or curious learners looking to gain essential knowledge and practical steps for analyzing vast amounts of data to uncover insights.

Python is an open-source general purpose programming language which is used widely for data science and data analysis given its simplicity and versatility. It offers several libraries which can be used to clean, analyze, and visualize data. In this book, we will explore popular Python libraries such as Pandas, Matplotlib, and Seaborn and provide workable code for analyzing data in Python using these libraries.

By the end of this book, you will have gained comprehensive knowledge about EDA and mastered the powerful set of EDA techniques and tools required for analyzing both structured and unstructured data to derive valuable insights.

What you will learn

  • Perform EDA with leading Python data visualization libraries
  • Execute univariate, bivariate, and multivariate analyses on tabular data
  • Uncover patterns and relationships within time series data
  • Identify hidden patterns within textual data
  • Discover different techniques to prepare data for analysis
  • Overcome the challenge of outliers and missing values during data analysis
  • Leverage automated EDA for fast and efficient analysis

Who this book is for

Whether you are a data analyst, data scientist, researcher or a curious learner looking to analyze structured and unstructured data, this book will appeal to you. It aims to empower you with essential knowledge and practical skills for analyzing and visualizing data to uncover insights.

It covers several EDA concepts and provides hands-on instructions on how these can be applied using various Python libraries. Familiarity with basic statistical concepts and foundational knowledge of python programming will help you understand the content better and maximize your learning experience.

Table of Contents

  1. Generating Summary Statistics
  2. Preparing Data for EDA
  3. Visualising Data in Python
  4. Performing Univariate Analysis in Python
  5. Performing Bivariate analysis in Python
  6. Performing Multivariate analysis in Python
  7. Analysing Time Series data
  8. Analysing Text data
  9. Dealing with Outliers and Missing values
  10. Performing Automated EDA in Python

✦ Table of Contents


Cover
Title Page
Copyright and Credits
Dedication
Contributors
Table of Contents
Preface
Chapter 1: Generating Summary Statistics
Technical requirements
Analyzing the mean of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Checking the median of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Identifying the mode of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Checking the variance of a dataset
Getting ready
How to do it…
How it works...
There’s more…
Identifying the standard deviation of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Generating the range of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Identifying the percentiles of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Checking the quartiles of a dataset
Getting ready
How to do it…
How it works...
There’s more...
Analyzing the interquartile range (IQR) of a dataset
Getting ready
How to do it…
How it works...
Chapter 2: Preparing Data for EDA
Technical requirements
Grouping data
Getting ready
How to do it…
How it works...
There’s more...
See also
Appending data
Getting ready
How to do it…
How it works...
There’s more...
Concatenating data
Getting ready
How to do it…
How it works...
There’s more...
See also
Merging data
Getting ready
How to do it…
How it works...
There’s more...
See also
Sorting data
Getting ready
How to do it…
How it works...
There’s more...
Categorizing data
Getting ready
How to do it…
How it works...
There’s more...
Removing duplicate data
Getting ready
How to do it…
How it works...
There’s more...
Dropping data rows and columns
Getting ready
How to do it…
How it works...
There’s more...
Replacing data
Getting ready
How to do it…
How it works...
There’s more...
See also
Changing a data format
Getting ready
How to do it…
How it works...
There’s more...
See also
Dealing with missing values
Getting ready
How to do it…
How it works...
There’s more...
See also
Chapter 3: Visualizing Data in Python
Technical requirements
Preparing for visualization
Getting ready
How to do it…
How it works...
There’s more...
Visualizing data in Matplotlib
Getting ready
How to do it…
How it works...
There’s more...
See also
Visualizing data in Seaborn
Getting ready
How to do it…
How it works...
There’s more...
See also
Visualizing data in GGPLOT
Getting ready
How to do it…
How it works...
There’s more...
See also
Visualizing data in Bokeh
Getting ready
How to do it…
How it works...
There's more...
See also
Chapter 4: Performing Univariate Analysis in Python
Technical requirements
Performing univariate analysis using a histogram
Getting ready
How to do it…
How it works...
Performing univariate analysis using a boxplot
Getting ready
How to do it…
How it works...
There’s more...
Performing univariate analysis using a violin plot
Getting ready
How to do it…
How it works...
Performing univariate analysis using a summary table
Getting ready
How to do it…
How it works...
There’s more...
Performing univariate analysis using a bar chart
Getting ready
How to do it…
How it works...
Performing univariate analysis using a pie chart
Getting ready
How to do it…
How it works...
Chapter 5: Performing Bivariate Analysis in Python
Technical requirements
Analyzing two variables using a scatter plot
Getting ready
How to do it…
How it works...
There’s more...
See also...
Creating a crosstab/two-way table on bivariate data
Getting ready
How to do it…
How it works...
Analyzing two variables using a pivot table
Getting ready
How to do it…
How it works...
There is more...
Generating pairplots on two variables
Getting ready
How to do it…
How it works...
Analyzing two variables using a bar chart
Getting ready
How to do it…
How it works...
There is more...
Generating box plots for two variables
Getting ready
How to do it…
How it works...
Creating histograms on two variables
Getting ready
How to do it…
How it works...
Analyzing two variables using a correlation analysis
Getting ready
How to do it…
How it works...
Chapter 6: Performing Multivariate Analysis in Python
Technical requirements
Implementing Cluster Analysis on multiple variables using Kmeans
Getting ready
How to do it…
How it works...
There is more...
See also...
Choosing the optimal number of clusters in Kmeans
Getting ready
How to do it…
How it works...
There is more...
See also...
Profiling Kmeans clusters
Getting ready
How to do it…
How it works...
There’s more...
Implementing principal component analysis on multiple variables
Getting ready
How to do it…
How it works...
There is more...
See also...
Choosing the number of principal components
Getting ready
How to do it…
How it works...
Analyzing principal components
Getting ready
How to do it…
How it works...
There’s more...
See also...
Implementing factor analysis on multiple variables
Getting ready
How to do it…
How it works...
There is more...
Determining the number of factors
Getting ready
How to do it…
How it works...
Analyzing the factors
Getting ready
How to do it…
How it works...
Chapter 7: Analyzing Time Series Data in Python
Technical requirements
Using line and boxplots to visualize time series data
Getting ready
How to do it…
How it works...
Spotting patterns in time series
Getting ready
How to do it…
How it works...
Performing time series data decomposition
Getting ready
How to do it…
How it works...
Performing smoothing – moving average
Getting ready
How to do it…
How it works…
See also...
Performing smoothing – exponential smoothing
Getting ready
How to do it…
How it works...
See also...
Performing stationarity checks on time series data
Getting ready
How to do it…
How it works...
See also…
Differencing time series data
Getting ready
How to do it…
How it works...
Getting ready
How to do it…
How it works...
See also...
Chapter 8: Analysing Text Data in Python
Technical requirements
Preparing text data
Getting ready
How to do it…
How it works...
There’s more…
See also…
Dealing with stop words
Getting ready
How to do it…
How it works...
There’s more…
Analyzing part of speech
Getting ready
How to do it…
How it works...
Performing stemming and lemmatization
Getting ready
How to do it…
How it works...
Analyzing ngrams
Getting ready
How to do it…
How it works...
Creating word clouds
Getting ready
How to do it…
How it works...
Checking term frequency
Getting ready
How to do it…
How it works...
There’s more…
See also
Checking sentiments
Getting ready
How to do it…
How it works...
There’s more…
See also
Performing Topic Modeling
Getting ready
How to do it…
How it works...
Choosing an optimal number of topics
Getting ready
How to do it…
How it works...
Chapter 9: Dealing with Outliers and Missing Values
Technical requirements
Identifying outliers
Getting ready
How to do it…
How it works...
Spotting univariate outliers
Getting ready
How to do it…
How it works...
Finding bivariate outliers
Getting ready
How to do it…
How it works...
Identifying multivariate outliers
Getting ready
How to do it…
How it works...
See also
Flooring and capping outliers
Getting ready
How to do it…
How it works...
Removing outliers
Getting ready
How to do it…
How it works...
Replacing outliers
Getting ready
How to do it…
How it works...
Identifying missing values
Getting ready
How to do it…
How it works...
Dropping missing values
Getting ready
How to do it…
How it works...
Replacing missing values
Getting ready
How to do it…
How it works...
Imputing missing values using machine learning models
Getting ready
How to do it…
How it works...
Chapter 10: Performing Automated Exploratory Data Analysis in Python
Technical requirements
Doing Automated EDA using pandas profiling
Getting ready
How to do it…
How it works...
See also…
Performing Automated EDA using dtale
Getting ready
How to do it…
How it works...
See also
Doing Automated EDA using AutoViz
Getting ready
How to do it…
How it works...
See also
Performing Automated EDA using Sweetviz
Getting ready
How to do it…
How it works...
See also
Implementing Automated EDA using custom functions
Getting ready
How to do it…
How it works...
There’s more…
Index
About Packt
Other Books You May Enjoy


📜 SIMILAR VOLUMES


Exploratory Data Analysis with Python Co
✍ Ayodele Oluleye 📂 Library 🏛 Packt Publishing 🌐 English

<p><span>Extract valuable insights from data by leveraging various analysis and visualization techniques with this comprehensive guide</span></p><p><span>Purchase of the print or Kindle book includes a free PDF eBook</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Gain practical expe

Exploratory Data Analysis with Python Co
✍ Ayodele Oluleye 📂 Library 📅 2023 🏛 Packt Publishing Pvt Ltd 🌐 English

Extract valuable insights from data by leveraging various analysis and visualization techniques with this comprehensive guide Purchase of the print or Kindle book includes a free PDF eBook Key Features Gain practical experience in conducting EDA on a single variable of interest in Python Lea

Time Series Analysis with Python Cookboo
✍ Tarek A. Atwan 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English

<p><span>Perform time series analysis and forecasting confidently with this Python code bank and reference manual</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Explore forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms</sp

Time Series Analysis with Python Cookboo
✍ Tarek A. Atwan 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English

<p><span>Perform time series analysis and forecasting confidently with this Python code bank and reference manual</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Explore forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms</sp

Time Series Analysis with Python Cookboo
✍ Tarek A Atwan 📂 Library 📅 2022 🏛 Packt Publishing 🌐 English

<p><span>Perform time series analysis and forecasting confidently with this Python code bank and reference manual</span></p><p><span><br></span></p><p><span>Key Features: </span></p><ul><li><span><span>Explore forecasting and anomaly detection techniques using statistical, machine learning, and deep

Become a Python Data Analyst: Perform ex
✍ Alvaro Fuentes 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipu