𝔖 Scriptorium
✦   LIBER   ✦

📁

Python Data Cleaning Cookbook - Second Edition (Early Access)

✍ Scribed by Michael Walker


Publisher
Packt
Year
2024
Tongue
English
Leaves
242
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Learn the intricacies of data description, issue identification, and practical problem-solving, armed with essential techniques and expert tips.

Key Features
Get to grips with various data cleaning techniques to reveal key insights.
Manipulate data of different complexities to shape them into the right form according to your business needs..
Clean, monitor, and validate large data volumes to diagnose problems using cutting-edge methodologies including Machine learning and AI.
Book Description
Jumping into data analysis without proper data cleaning will certainly lead to incorrect results. The Python Data Cleaning Cookbook will show you tools and techniques for cleaning and handling data with Python for better outcomes. You will begin by getting familiar with the shape of data by using practices that can be deployed routinely with most data sources.

Fully updated to the latest version of Python and all relevant tools, this book will teach you how to manipulate data to get it into a useful form. The current edition emphasizes advanced techniques like machine learning and AI-specific approaches to data cleaning along with the conventional ones. You will learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you've identified. Next, you'll cover recipes for using supervised learning and Naive Bayes analysis to identify unexpected values and classification errors and generate visualizations for exploratory data analysis (EDA) to identify unexpected values. Finally, you'll build functions and classes that you can reuse without modification when you have new data.

By the end of this Data Cleaning book, you'll know how to clean data and diagnose problems within it.

What you will learn
Find out how to read and analyze data from a variety of sources
Produce summaries of the attributes of datasets, columns, and rows
Filter data and select columns of interest that satisfy given criteria
Address messy data issues, including working with dates and missing values
Improve your productivity in Python pandas by using method chaining
Use visualizations to gain additional insights and identify potential data issues
Enhance your ability to learn what is going on in your data
Build user-defined functions and classes to automate data cleaning
Who this book is for
This book is for anyone looking for ways to handle messy, duplicate, and poor data using different Python tools and techniques. The book takes a recipe-based approach to help you to learn how to clean and manage data with practical examples.
Working knowledge of Python programming is all you need to get the most out of the book.

✦ Table of Contents


Python Data Cleaning Cookbook, Second Edition: Detect and remove dirty data and extract key insights with pandas, machine learning and ChatGPT, Spark, and more
1 Anticipating Data Cleaning Issues when Importing Tabular Data into Pandas
Join our book community on Discord
Importing CSV files
Getting ready
How to do it…
How it works...
There’s more...
See also
Importing Excel files
Getting ready
How to do it…
How it works…
There’s more…
See also
Importing data from SQL databases
Getting ready
How to do it...
How it works…
There’s more…
See also
Importing SPSS, Stata, and SAS data
Getting ready
How to do it...
How it works...
There’s more…
See also
Importing R data
Getting ready
How to do it…
How it works…
There’s more…
See also
Persisting tabular data
Getting ready
How to do it…
How it works...
There’s more...
2 Anticipating Data Cleaning Issues when Working with HTML, JSON, and Spark Data
Join our book community on Discord
Importing simple JSON data
Getting ready…
How to do it…
How it works…
There’s more…
Importing more complicated JSON data from an API
Getting ready…
How to do it...
How it works…
There’s more…
See also
Importing data from web pages
Getting ready...
How to do it…
How it works…
There’s more…
Working with Spark data
Getting ready...
How it works...
There’s more...
Persisting JSON data
Getting ready…
How to do it...
How it works…
There’s more…
3 Taking the Measure of Your Data
Join our book community on Discord
Getting a first look at your data
Getting ready…
How to do it...
How it works…
There’s more...
See also
Selecting and organizing columns
Getting Ready…
How to do it…
How it works…
There’s more…
See also
Selecting rows
Getting ready...
How to do it...
How it works…
There’s more…
See also
Generating frequencies for categorical variables
Getting ready…
How to do it…
How it works…
There’s more…
Generating summary statistics for continuous variables
Getting ready…
How to do it…
How it works…
See also
Using generative AI to view our data
Getting ready…
How to do it…
How it works…
See also
4 Identifying Missing Values and Outliers in Subsets of Data
Join our book community on Discord
Finding missing values
Getting ready
How to do it…
How it works...
See also
Identifying outliers with one variable
Getting ready
How to do it...
How it works…
There’s more…
See also
Identifying outliers and unexpected values in bivariate relationships
Getting ready
How to do it...
How it works…
There’s more…
See also
Using subsetting to examine logical inconsistencies in variable relationships
Getting ready
How to do it…
How it works…
See also
Using linear regression to identify data points with significant influence
Getting ready
How to do it…
How it works...
There’s more…
Using K-nearest neighbor to find outliers
Getting ready
How to do it…
How it works...
There’s more...
See also
Using Isolation Forest to find anomalies
Getting ready
How to do it...
How it works…
There’s more…
See also
5 Using Visualizations for the Identification of Unexpected Values
Join our book community on Discord
Using histograms to examine the distribution of continuous variables
Getting ready
How to do it…
How it works…
There’s more...
Using boxplots to identify outliers for continuous variables
Getting ready
How to do it…
How it works...
There’s more...
See also
Using grouped boxplots to uncover unexpected values in a particular group
Getting ready
How to do it...
How it works...
There’s more…
See also
Examining both distribution shape and outliers with violin plots
Getting ready
How to do it…
How it works…
There’s more…
See also
Using scatter plots to view bivariate relationships
Getting ready
How to do it...
How it works…
There’s more...
See also
Using line plots to examine trends in continuous variables
Getting ready
How to do it…
How it works...
There’s more…
See also
Generating a heat map based on a correlation matrix
Getting ready
How to do it…
How it works…
There’s more…
See also


📜 SIMILAR VOLUMES


Python Data Visualization Cookbook Secon
✍ Igor Milovanovic; Dimitry Foures; Giuseppe Vettigli 📂 Library 📅 2015 🌐 English

Over 70 recipes to get you started with popular Python libraries based on the principal concepts of data visualizationAbout This Book• Learn how to set up an optimal Python environment for data visualization• Understand how to import, clean and organize your data• Determine different approaches to d

Python GUI Programming Cookbook - Second
✍ Burkhard A. Meier 📂 Library 📅 2017 🏛 Packt Publishing 🌐 English

Python is a multi-domain, interpreted programming language. It is a widely used general-purpose, high-level programming language. It is often used as a scripting language because of its forgiving syntax and compatibility with a wide variety of different eco-systems. Its flexible syntax enables devel

Python GUI Programming Cookbook - Second
✍ Burkhard A. Meier 📂 Library 📅 2017 🌐 English

Over 80 object-oriented recipes to help you create amazing GUIs in PythonAbout This Book* Based on the latest version of Python, 3.6* Carefully organized instructions to solve problems efficiently* Solutions that can be applied to solve real-world problemsWho This Book Is ForThis book is for interme

Python Data Science Essentials - Second
✍ Alberto Boschetti, Luca Massaron 📂 Library 📅 2016 🏛 Packt Publishing - ebooks Account 🌐 English

<h4>Key Features</h4><ul><li>Quickly get familiar with data science using Python 3.5</li><li>Save time (and effort) with all the essential tools explained</li><li>Create effective data science projects and avoid common pitfalls with the help of examples and hints dictated by experience</li></ul><h4>