𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Data Wrangling Using Pandas, SQL, and Java

✍ Scribed by Oswald Campesato


Publisher
Mercury Learning and Information
Year
2022
Tongue
English
Leaves
275
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book is intended primarily for those who plan to become data scientists as wellas anyone who needs to perform data cleaning tasks. It contains a variety of features of NumPy and Pandas and how to create databases and tables in MySQL. Chapter 7 covers many data wrangling tasks using Python scripts and awk-based shell scripts. Companion files with code are available for downloading from the publisher.

Features:

  • Provides the reader with basic Python 3, Java, and Pandas programming concepts, and an introduction to awk
  • Includes a chapter on RDBMs and SQL
  • Companion files with code

✦ Table of Contents


Cover
Title Page
Copyright
Dedication
Contents
Preface
Chapter 1: Introduction to Python
Tools for Python
easy_install and pip
virtualenv
IPython
Python Installation
Setting the PATH Environment Variable (Windows Only)
Launching Python on Your Machine
The Python Interactive Interpreter
Python Identifiers
Lines, Indentation, and Multi-Lines
Quotation and Comments
Saving Your Code in a Module
Some Standard Modules
The help() and dir() Functions
Compile Time and Runtime Code Checking
Simple Data Types
Working with Numbers
Working with Other Bases
The chr() Function
The round() Function in Python
Formatting Numbers in Python
Working with Fractions
Unicode and UTF-8
Working with Unicode
Working with Strings
Comparing Strings
Formatting Strings in Python
Uninitialized Variables and the Value None
Slicing and Splicing Strings
Testing for Digits and Alphabetic Characters
Search and Replace a String in Other Strings
Remove Leading and Trailing Characters
Printing Text Without NewLine Characters
Text Alignment
Working with Dates
Converting Strings to Dates
Exception Handling
Handling User Input
Command-Line Arguments
Summary
Chapter 2: Working with Data
Dealing with Data: What Can Go Wrong?
What is Data Drift?
What are Datasets?
Data Preprocessing
Data Types
Preparing Datasets
Discrete Data vs. Continuous Data
β€œBinning” Continuous Data
Scaling Numeric Data via Normalization
Scaling Numeric Data via Standardization
Scaling Numeric Data via Robust Standardization
What to Look for in Categorical Data
Mapping Categorical Data to Numeric Values
Working with Dates
Working with Currency
Working with Outliers and Anomalies
Outlier Detection/Removal
Finding Outliers with NumPy
Finding Outliers with Pandas
Calculating Z-Scores to Find Outliers
Finding Outliers with SkLearn (Optional)
Working with Missing Data
Imputing Values: When is Zero a Valid Value?
Dealing with Imbalanced Datasets
What is SMOTE?
SMOTE Extensions
The Bias-Variance Tradeoff
Types of Bias in Data
Analyzing Classifiers (Optional)
What is LIME?
What is ANOVA?
Summary
Chapter 3: Introduction to Pandas
What is Pandas?
Pandas Data Frames
Data Frames and Data Cleaning Tasks
A Pandas Data Frame Example
Describing a Pandas Data Frame
Pandas Boolean Data Frames
Transposing a Pandas Data Frame
Pandas Data Frames and Random Numbers
Converting Categorical Data to Numeric Data
Merging and Splitting Columns in Pandas
Combining Pandas Data Frames
Data Manipulation with Pandas Data Frames
Pandas Data Frames and CSV Files
Useful Options for the Pandas read_csv() Function
Reading Selected Rows from CSV Files
Pandas Data Frames and Excel Spreadsheets
Useful Options for Reading Excel Spreadsheets
Select, Add, and Delete Columns in Data Frames
Handling Outliers in Pandas
Pandas Data Frames and Simple Statistics
Finding Duplicate Rows in Pandas
Finding Missing Values in Pandas
Missing Values in an Iris-Based Dataset
Sorting Data Frames in Pandas
Working with groupby() in Pandas
Aggregate Operations with the titanic.csv Dataset
Working with apply() and mapapply() in Pandas
Useful One-line Commands in Pandas
Working with JSON-based Data
Python Dictionary and JSON
Python, Pandas, and JSON
Summary
Chapter 4: RDBMS and SQL
What is an RDBMS?
What Relationships Do Tables Have in an RDBMS?
Features of an RDBMS
What is ACID?
When Do We Need an RDBMS?
The Importance of Normalization
A Four-Table RDBMS
Detailed Table Descriptions
The customers Table
The purchase_orders Table
The line_items Table
The item_desc Table
What is SQL?
DCL, DDL, DQL, DML, and TCL
SQL Privileges
Properties of SQL Statements
The CREATE Keyword
What is MySQL?
What about MariaDB?
Installing MySQL
Data Types in MySQL
The CHAR and VARCHAR Data Types
String-based Data Types
FLOAT and DOUBLE Data Types
BLOB and TEXT Data Types
MySQL Database Operations
Creating a Database
Display a List of Databases
Display a List of Database Users
Dropping a Database
Exporting a Database
Renaming a Database
The INFORMATION_SCHEMA Table
The PROCESSLIST Table
SQL Formatting Tools
Summary
Chapter 5: Java, JSON, and XML
Working with Java and MySQL
Performing the Set-up Steps
Creating a MySQL Database in Java
Creating a MySQL Table in Java
Inserting Data into a MySQL Table in Java
Deleting Data and Dropping MySQL Tables in Java
Selecting Data from a MySQL Table in Java
Updating Data in a MySQL Table in Java
Working with JSON, MySQL, and Java
Select JSON-based Data from a MySQL Table in Java
Working with XML, MySQL, and Java
What is XML?
What is an XML Schema?
When are XML Schemas Useful?
Create a MySQL Table for XML Data in Java
Read an XML Document in Java
Read an XML Document as a String in Java
Insert XML-based Data into a MySQL Table in Java
Select XML-based Data from a MySQL Table in Java
Parse XML-based String Data from a MySQL Table in Java
Working with XML Schemas
Summary
Chapter 6: Data Cleaning Tasks
What is Data Cleaning?
Data Cleaning for Personal Titles
Data Cleaning in SQL
Replace NULL with 0
Replace NULL Values with Average Value
Replace Multiple Values with a Single Value
Handle Mismatched Attribute Values
Convert Strings to Date Values
Data Cleaning from the Command Line (Optional)
Working with the sed Utility
Working with Variable Column Counts
Truncating Rows in CSV Files
Generating Rows with Fixed Columns with the awk Utility
Converting Phone Numbers
Converting Numeric Date Formats
Converting Alphabetic Date Formats
Working with Date and Time Date Formats
Working with Codes, Countries, and Cities
Data Cleaning on a Kaggle Dataset
Summary
Chapter 7: Data Wrangling
What is Data Wrangling?
Data Transformation: What Does This Mean?
CSV Files with Multi-Row Records
Pandas Solution (1)
Pandas Solution (2)
CSV Solution
CSV Files, Multi-row Records, and the awk Command
Quoted Fields Split on Two Lines (Optional)
Overview of the Events Project
Why This Project?
Project Tasks
Generate Country Codes
Prepare a List of Cities in Countries
Generating City Codes from Country Codes: awk
Generating City Codes from Country Codes: Python
Generating SQL Statements for the city_codes Table
Generating a CSV File for Band Members (Java)
Generating a CSV File for Band Members (Python)
Generating a Calendar of Events (COE)
Project Automation Script
Project Follow-up Comments
Summary
Appendix A: Working with awk
The awk Command
Built-in Variables That Control awk
How Does the awk Command Work?
Aligning Text with the printf() Statement
Conditional Logic and Control Statements
The while Statement
A for Loop in awk
A for Loop with a break Statement
The next and continue Statements
Deleting Alternate Lines in Datasets
Merging Lines in Datasets
Printing File Contents as a Single Line
Joining Groups of Lines in a Text File
Joining Alternate Lines in a Text File
Matching with Meta Characters and Character Sets
Printing Lines Using Conditional Logic
Splitting Filenames with awk
Working with Postfix Arithmetic Operators
Numeric Functions in awk
One-line awk Commands
Useful Short awk Scripts
Printing the Words in a Text String in awk
Count Occurrences of a String in Specific Rows
Printing a String in a Fixed Number of Columns
Printing a Dataset in a Fixed Number of Columns
Aligning Columns in Datasets
Aligning Columns and Multiple Rows in Datasets
Removing a Column from a Text File
Subsets of Column-aligned Rows in Datasets
Counting Word Frequency in Datasets
Displaying Only β€œPure” Words in a Dataset
Working with Multi-line Records in awk
A Simple Use Case
Another Use Case
Summary
Index


πŸ“œ SIMILAR VOLUMES


Data Wrangling Using Pandas, SQL, and Ja
✍ Oswald Campesato πŸ“‚ Library πŸ› Mercury Learning and Information 🌐 English

<span>This book is intended primarily for those who plan to become data scientists as wellas anyone who needs to perform data cleaning tasks. It contains a variety of features of NumPy and Pandas and how to create databases and tables in MySQL. Chapter 7 covers many data wrangling tasks using Python

Data Wrangling with SQL: A hands-on guid
✍ Raghav Kandarpa | Shivangi Saxena
 πŸ“‚ Library πŸ“… 2023 πŸ› Packt Publishing Pvt Ltd 🌐 English

Become a data wrangling expert and make well-informed decisions by effectively utilizing and analyzing raw unstructured data in a systematic manner Key Features Implement query optimization during data wrangling using the SQL language with practical use cases Master data cleaning, handle the da

Data Wrangling with SQL: A hands-on guid
✍ Raghav Kandarpa, Shivangi Saxena πŸ“‚ Library πŸ“… 2023 πŸ› Packt Publishing 🌐 English

<p><span>Become a data wrangling expert and make well-informed decisions by effectively utilizing and analyzing raw unstructured data in a systematic manner</span></p><p><span> Purchase of the print or Kindle book includes a free PDF eBook</span></p><h4><span>Key Features</span></h4><ul><li><span><s

Hands-On Data Analysis with Pandas: Effi
✍ Stefanie Molin πŸ“‚ Library πŸ“… 2019 πŸ› Packt Publishing 🌐 English

<p><b>Get to grips with pandas―a versatile and high-performance Python library for data manipulation, analysis, and discovery</b></p> <h4>Key Features</h4> <ul><li>Perform efficient data analysis and manipulation tasks using pandas </li> <li>Apply pandas to different real-world domains using step-by

Hands-On Data Analysis with Pandas: Effi
✍ Stefanie Molin πŸ“‚ Library πŸ“… 2019 πŸ› Packt Publishing 🌐 English

Code. <p><b>Get to grips with pandas―a versatile and high-performance Python library for data manipulation, analysis, and discovery</b></p> <h4>Key Features</h4> <ul><li>Perform efficient data analysis and manipulation tasks using pandas </li> <li>Apply pandas to different real-world domains using s