𝔖 Scriptorium
✦   LIBER   ✦

📁

A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics

✍ Scribed by Gayathri Rajagopalan


Publisher
Apress
Year
2020
Tongue
English
Leaves
409
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Explore the fundamentals of data analysis, and statistics with case studies using Python. This book will show you how to confidently write code in Python, and use various Python libraries and functions for analyzing any dataset. The code is presented in Jupyter notebooks that can further be adapted and extended.


This book is divided into three parts – programming with Python, data analysis and visualization, and statistics. You'll start with an introduction to Python – the syntax, functions, conditional statements, data types, and different types of containers.  You'll then review more advanced concepts like regular expressions, handling of files, and solving mathematical problems with Python. 

The second part of the book, will cover Python libraries used for data analysis. There will be an introductory chapter covering basic concepts and terminology, and one chapter each on NumPy(the scientific computation library), Pandas (the data wrangling library) and visualization libraries like Matplotlib and Seaborn. Case studies will be included as examples to help readers understand some real-world applications of data analysis. 

The final chapters of book focus on statistics, elucidating important principles in statistics that are relevant to data science. These topics include probability, Bayes theorem, permutations and combinations, and hypothesis testing (ANOVA, Chi-squared test, z-test, and t-test), and how the Scipy library enables simplification of tedious calculations involved in statistics.

What You'll Learn
  • Further your programming and analytical skills with Python
  • Solve mathematical problems in calculus, and set theory and algebra with Python
  • Work with various libraries in Python to structure, analyze, and visualize data
  • Tackle real-life case studies using Python
  • Review essential statistical concepts and use the Scipy library to solve problems in statistics 
Who This Book Is For

Professionals working in the field of data science interested in enhancing skills in Python, data analysis and statistics.


✦ Table of Contents


Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Getting Familiar with Python
Technical requirements
Getting started with Jupyter notebooks
Shortcuts and other features in Jupyter
Tab Completion
Magic commands used in Jupyter
Python Basics
Comments, print, and input
Comments
Printing
Input
Variables and Constants
Operators
Assignment operators
Data types
Working with Strings
Conditional statements
Loops
While loop
for loop
Functions
Syntax errors and exceptions
Working with files
Reading from a file
Writing to a file
Modules in Python
Python Enhancement Proposal (PEP) 8 – standards for writing code
Summary
Review Exercises
Chapter 2: Exploring Containers, Classes, and Objects
Containers
Lists
Creating new lists from existing lists
Accessing the index of items in a list
Concatenating of lists
Tuples
Methods used with a tuple
Applications of tuples
Dictionaries
Sets
Object-oriented programming
Object-oriented programming principles
Summary
Review Exercises
Chapter 3: Regular Expressions and Math with Python
Regular expressions
Steps for solving problems with regular expressions
Python functions for regular expressions
Metacharacters
Using Sympy for math problems
Factorization of an algebraic expression
Solving algebraic equations (for one variable)
Solving simultaneous equations (for two variables)
Solving expressions entered by the user
Solving simultaneous equations graphically
Creating and manipulating sets
Union and intersection of sets
Finding the probability of an event
Solving questions in calculus
Limit of a function
Derivative of a function
Integral of a function
Summary
Review Exercises
Chapter 4: Descriptive Data Analysis Basics
Descriptive data analysis - Steps
Structure of data
Classifying data into different levels
Visualizing various levels of data
Plotting mixed data
Summary
Review Exercises
Chapter 5: Working with NumPy Arrays
Getting familiar with arrays and NumPy functions
Creating an array
Reshaping an array
Combining arrays
Testing for conditions
Broadcasting, vectorization, and arithmetic operations
Obtaining the properties of an array
Slicing or selecting a subset of data
Obtaining descriptive statistics/aggregate measures
Matrices
Summary
Review Exercises
Chapter 6: Prepping Your Data with Pandas
Pandas at a glance
Technical requirements
Building blocks of Pandas
Examining the properties of a Series
DataFrames
Creating DataFrames by importing data from other formats
From a CSV file:
From an Excel file:
From a JSON file:
From an HTML file:
Accessing attributes in a DataFrame
Accessing the values in the DataFrame
Modifying DataFrame objects
Renaming columns
Replacing values or observations in a DataFrame
Adding a new column to a DataFrame
Inserting rows in a DataFrame
Deleting columns from a DataFrame
Deleting a row from a DataFrame
Indexing
Type of an index object
Creating a custom index and using columns as indexes
Indexes and speed of data retrieval
Searching without using an index
Search using an index
Immutability of an index
Alignment of indexes
Set operations on indexes
Union operation
Difference operation
Symmetric difference operation
Data types in Pandas
Obtaining information about data types
Get the count of each data type
Select particular data types
Calculating the memory usage and changing data types of columns
Indexers and selection of subsets of data
Understanding loc and iloc indexers
Selecting consecutive rows
Selecting consecutive columns
Selecting a single row
Selecting rows using their index labels
Selecting columns using their name
Using negative index values for selection
Selecting nonconsecutive rows and columns
Other (less commonly used) indexers for data access
ix indexer
The indexing operator - [ ]
at and iat indexers
Boolean indexing for selecting subsets of data
Using the query method to retrieve data
Further reading
Operators in Pandas
Representing dates and times in Pandas
Converting strings into Pandas Timestamp objects
Extracting the components of a Timestamp object
Further reading
Grouping and aggregation
Examining the properties of the groupby object
Data type of groupby object
Obtaining the names of the groups
Returning records with the same position in each group using the nth method
Get all the data for a particular group using the get_group method
Filtering groups
Transform method and groupby
Apply method and groupby
How to combine objects in Pandas
Append method for adding rows
Understanding the various types of joins
Concat function (adding rows or columns from  other objects)
Join method – index to index
Merge method – SQL type join based on common columns
Restructuring data and dealing with anomalies
Dealing with missing data
Dropping the missing data
Imputation
Data duplication
Tidy data and techniques for restructuring data
Conversion from wide to long format (tidy data)
Stack method (wide-to-long format conversion)
Melt method (wide-to-long format conversion)
Pivot method (long-to-wide conversion)
Summary
Review Exercises
Chapter 7: Data Visualization with Python Libraries
Technical requirements
External files
Commonly used plots
Matplotlib
Approach for plotting using Matplotlib
Plotting using Pandas
Scatter plot
Histogram
Pie charts
Seaborn library
Box plots
Adding arguments to any Seaborn plotting function
Kernel density estimate
Violin plot
Count plots
Heatmap
Facet grid
Regplot
lmplot
Strip plot
Swarm plot
Catplot
Pair plot
Joint plot
Summary
Review Exercises
Chapter 8: Data Analysis Case Studies
Technical requirements
Methodology
Case study 8-1: Highest grossing movies in France – analyzing unstructured data
Case study 8-2: Use of data analysis for air quality management
Case study 8-3: Worldwide COVID-19 cases – an analysis
Summary
Review Exercises
Chapter 9: Statistics and Probability with Python
Permutations and combinations
Probability
Rules of probability
Conditional probability
Bayes theorem
Application of Bayes theorem in medical diagnostics
Another application of Bayes theorem: Email spam classification
SciPy library
Probability distributions
Binomial distribution
The shape of a binomial distribution
Poisson distribution
The shape of a Poisson distribution
Continuous probability distributions
Normal distribution
Standard normal distribution
Solved examples: Standard normal distribution
Measures of central tendency
Measures of dispersion
Measures of shape
Sampling
Probability sampling
Non-probability sampling
Central limit theorem
Estimates and confidence intervals
Types of errors in sampling
Hypothesis testing
Basic concepts in hypothesis testing
Key terminology used in hypothesis testing
Steps involved in hypothesis testing
One-sample z-test
Two-sample sample z-test
Hypothesis tests with proportions
Two-sample z-test for the population proportions
T-distribution
One sample t-test
Two-sample t-test
Two-sample t-test for paired samples
Solved examples: Conducting t-tests using Scipy functions
ANOVA
Chi-square test of association
Summary
Review Exercises
Bibliography
Index


📜 SIMILAR VOLUMES


A Python Data Analyst’s Toolkit: Learn P
✍ Gayathri Rajagopalan 📂 Library 📅 2021 🏛 Apress 🌐 English

Explore the fundamentals of data analysis, and statistics with case studies using Python. This book will show you how to confidently write code in Python, and use various Python libraries and functions for analyzing any dataset. The code is presented in Jupyter notebooks that can further be adapted

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O'Reilly Media 🌐 English

Looking for complete instructions on manipulating, processing, cleaning, and crunching structured data in Python? The second edition of this hands-on guide--updated for Python 3.5 and Pandas 1.0--is packed with practical cases studies that show you how to effectively solve a broad set of data analys

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O’Reilly Media 🌐 English

<div><p>Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll lea

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O’Reilly Media 🌐 English

<div><p>Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll lea