𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Mastering Pandas: Master the features and capabilities of pandas, a data analysis toolkit for Python

✍ Scribed by Femi Anthony


Publisher
Packt
Year
2015
Tongue
English
Leaves
364
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Table of Contents


Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Introduction to pandas
and Data Analysis
Motivation for data analysis
We live in a big data world
4 V's of big data
Volume of big data
Velocity of big data
Variety of big data
Veracity of big data
So much data, so little time for analysis
The move towards real-time analytics
How Python and pandas fit into the data analytics mix
What is pandas
Benefits of using pandas
Summary
Chapter 2: Installation of pandas and Supporting Software
Selecting a version of Python to use
Python installation
Linux
Installing Python from compressed tarball
Windows
Core Python installation
Third-party Python software install
Mac OS/X
Installation using a package manager
Installation of Python and pandas from a third-party vendor
Continuum Analytics Anaconda
Installing Anaconda
Linux
Mac OS/X
Windows
Final step for all platforms
Other numeric or analytics-focused Python distributions
Downloading and installing pandas
Linux
Ubuntu/Debian
Red Hat
Ubuntu/Debian
Fedora
OpenSuse
Mac
Source installation
Binary installation
Windows
Binary Installation
Source installation
IPython
IPython Notebook
IPython installation
Linux
Windows
Mac OS/X
Install via Anaconda (for Linux/Mac OS/X)
Wakari by Continuum Analytics
Virtualenv
Virtualenv installation and usage
Summary
Chapter 3: The pandas Data Structures
NumPy ndarrays
NumPy array creation
NumPy arrays via numpy.array
NumPy array via numpy.arange
NumPy array via numpy.linspace
NumPy array via various other functions
NumPy datatypes
NumPy indexing and slicing
Array slicing
Array masking
Complex indexing
Copies and views
Operations
Basic operations
Reduction operations
Statistical operators
Logical operators
Broadcasting
Array shape manipulation
Flattening a multi-dimensional array
Reshaping
Resizing
Adding a dimension
Array sorting
Data structures in pandas
Series
Series creation
Operations on Series
DataFrame
DataFrame Creation
Operations
Panel
Using 3D NumPy array with axis labels
Using a Python dictionary of DataFrame objects
Using the DataFrame.to_panel method
Other operations
Summary
Chapter 4: Operations in Pandas, Part I – Indexing and Selecting
Basic indexing
Accessing attributes using dot operator
Range slicing
Label, integer, and mixed indexing
Label-oriented indexing
Selection using a Boolean array
Integer-oriented indexing
The .iat and .at operators
Mixed indexing with the .ix operator
Multi-indexing
Swapping and re-ordering levels
Cross-sections
Boolean indexing
The is in and any all methods
Using the where() method
Operations on indexes
Summary
Chapter 5: Operations in pandas,
Part II – Grouping, Merging, and Reshaping of Data
Grouping of data
The groupby operation
Using groupby with a MultiIndex
Using the aggregate method
Applying multiple functions
The transform() method
Filtering
Merging and joining
The concat function
Using append
Appending a single row to a DataFrame
SQL-like merging/joining of DataFrame objects
The join function
Pivots and reshaping data
Stacking and unstacking
The stack() function
Other methods to reshape DataFrames
Using the melt function
Summary
Chapter 6: Missing Data, Time Series, and Plotting Using Matplotlib
Handling missing data
Handling missing values
Handling time series
Reading in time series data
DateOffset and TimeDelta objects
Time series-related instance methods
Shifting/lagging
Frequency conversion
Resampling of data
Aliases for Time Series frequencies
Time series concepts and datatypes
Period and PeriodIndex
Conversion between Time Series datatypes
A summary of Time Series-related objects
Plotting using matplotlib
Summary
Chapter 7: A Tour of Statistics – The Classical Approach
Descriptive statistics versus inferential statistics
Measures of central tendency and variability
Measures of central tendency
The mean
The median
The mode
Computing measures of central tendency of a dataset in Python
Measures of variability, dispersion, or spread
Range
Quartile
Deviation and variance
Hypothesis testing – the null and alternative hypotheses
The null and alternative hypotheses
The alpha and p-values
Type I and Type II errors
Statistical hypothesis tests
Background
The z-test
The t-test
A t-test example
Confidence intervals
An illustrative example
Correlation and linear regression
Correlation
Linear regression
An illustrative example
Summary
Chapter 8: A Brief Tour of
Bayesian Statistics
Introduction to Bayesian statistics
Mathematical framework for Bayesian statistics
Bayes theory and odds
Applications of Bayesian statistics
Probability distributions
Fitting a distribution
Discrete probability distributions
Discrete uniform distribution
Continuous probability distributions
Bayesian statistics versus Frequentist statistics
What is probability?
How the model is defined
Confidence (Frequentist) versus Credible (Bayesian) intervals
Conducting Bayesian statistical analysis
Monte Carlo estimation of the likelihood function and PyMC
Bayesian analysis example – Switchpoint detection
References
Summary
Chapter 9: The pandas Library Architecture
Introduction to pandas' file hierarchy
Description of pandas' modules and files
pandas/core
pandas/io
pandas/tools
pandas/sparse
pandas/stats
pandas/util
pandas/rpy
pandas/tests
pandas/compat
pandas/computation
pandas/tseries
pandas/sandbox
Improving performance using Python extensions
Summary
Chapter 10: R and pandas Compared
R data types
R lists
R DataFrames
Slicing and selection
R-matrix and Numpy array compared
R lists and pandas series compared
Specifying column name in R
Specifying column name in pandas
R DataFrames versus pandas DataFrames
Multi-column selection in R
Multi-column selection in pandas
Arithmetic operations on columns
Aggregation and GroupBy
Aggregation in R
The pandas' GroupBy operator
Comparing matching operators in R and pandas
R %in% operator
The pandas isin() function
Logical subsetting
Logical subsetting in R
Logical subsetting in pandas
Split-apply-combine
Implementation in R
Implementation in pandas
Reshaping using Melt
The R melt() function
The pandas melt() function
Factors/categorical data
An R example using cut()
The pandas solution
Summary
Chapter 11: Brief Tour of Machine Learning
Role of pandas in machine learning
Installation of scikit-learn
Installing via Anaconda
Installing on Unix (Linux/Mac OSX)
Installing on Windows
Introduction to machine learning
Supervised versus unsupervised learning
Illustration using document classification
Supervised learning
Unsupervised learning
How machine learning systems learn
Application of machine learning – Kaggle Titanic competition
The Titanic: Machine Learning from Disaster problem
The problem of overfitting
Data analysis and preprocessing using pandas
Examining the data
Handling missing values
A naΓ―ve approach to Titanic problem
The scikit-learn ML/classifier interface
Supervised learning algorithms
Constructing a model using Patsy for scikit-learn
General boilerplate code explanation
Logistic regression
Support vector machine
Decision trees
Random forest
Unsupervised learning algorithms
Dimensionality reduction
K-means clustering
Summary
Index


πŸ“œ SIMILAR VOLUMES


Mastering pandas: Master the features an
✍ Femi Anthony πŸ“‚ Library πŸ“… 2015 πŸ› Packt Publishing 🌐 English

Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simp

Mastering pandas : master the features a
✍ Anthony, Femi πŸ“‚ Library πŸ“… 2015 πŸ› Packt Publishing 🌐 English

Overview: Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. Master the features and capabilities of pandas, a data analysis toolkit for Python

Mastering pandas : master the features a
✍ Anthony, Femi πŸ“‚ Library πŸ“… 2015 πŸ› Packt Publishing 🌐 English

Overview: Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. Master the features and capabilities of pandas, a data analysis toolkit for Python

Mastering pandas for Finance: Master pan
✍ Michael Heydt πŸ“‚ Library πŸ“… 2015 πŸ› Packt Publishing 🌐 English

This book will teach you to use Python and the Python Data Analysis Library (pandas) to solve real-world financial problems. Starting with a focus on pandas data structures, you will learn to load and manipulate time-series financial data and then calculate common financial measures, leading into m

Python for Data Analysis: A Beginners Gu
✍ Brady Ellison πŸ“‚ Library πŸ“… 2021 πŸ› WhiteFlowerPublsihing 🌐 English

Ready to learn Data Science through Python language? Python for Data Analysis is a step-by-step guide for beginners and dabblers-alike. This book is designed to offer working knowledge of Python and data science and some of the tools required to apply that knowledge. It's possible that you hav