𝔖 Scriptorium
✦   LIBER   ✦

📁

A Python Data Analyst’s Toolkit: Learn Python And Python-based Libraries With Applications In Data Analysis And Statistics

✍ Scribed by Gayathri Rajagopalan


Publisher
Apress
Year
2021
Tongue
English
Leaves
409
Edition
1st Edition
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Explore the fundamentals of data analysis, and statistics with case studies using Python. This book will show you how to confidently write code in Python, and use various Python libraries and functions for analyzing any dataset. The code is presented in Jupyter notebooks that can further be adapted and extended. This book is divided into three parts – programming with Python, data analysis and visualization, and statistics. You'll start with an introduction to Python – the syntax, functions, conditional statements, data types, and different types of containers. You'll then review more advanced concepts like regular expressions, handling of files, and solving mathematical problems with Python. The second part of the book, will cover Python libraries used for data analysis. There will be an introductory chapter covering basic concepts and terminology, and one chapter each on NumPy(the scientific computation library), Pandas (the data wrangling library) and visualization libraries like Matplotlib and Seaborn. Case studies will be included as examples to help readers understand some real-world applications of data analysis. The final chapters of book focus on statistics, elucidating important principles in statistics that are relevant to data science. These topics include probability, Bayes theorem, permutations and combinations, and hypothesis testing (ANOVA, Chi-squared test, z-test, and t-test), and how the Scipy library enables simplification of tedious calculations involved in statistics. What You'll Learn:
• Further your programming and analytical skills with Python
• Solve mathematical problems in calculus, and set theory and algebra with Python
• Work with various libraries in Python to structure, analyze, and visualize data
• Tackle real-life case studies using Python
• Review essential statistical concepts and use the Scipy library to solve problems in statistics
Who This Book Is For: Professionals working in the field of data science interested in enhancing skills in Python, data analysis and statistics.

✦ Table of Contents


Table of Contents......Page 5
About the Author......Page 12
About the Technical Reviewer......Page 13
Acknowledgments......Page 14
Introduction......Page 15
Technical requirements......Page 17
Getting started with Jupyter notebooks......Page 18
Shortcuts and other features in Jupyter......Page 21
Magic commands used in Jupyter......Page 23
Comments......Page 24
Printing......Page 25
Variables and Constants......Page 27
Operators......Page 28
Assignment operators......Page 30
Data types......Page 31
Working with Strings......Page 36
Conditional statements......Page 41
While loop......Page 42
for loop......Page 43
Functions......Page 45
Syntax errors and exceptions......Page 47
Working with files......Page 48
Reading from a file......Page 49
Writing to a file......Page 50
Modules in Python......Page 51
Python Enhancement Proposal (PEP) 8 – standards for writing code......Page 52
Summary......Page 54
Review Exercises......Page 55
Lists......Page 60
Creating new lists from existing lists......Page 65
Concatenating of lists......Page 69
Methods used with a tuple......Page 71
Applications of tuples......Page 73
Dictionaries......Page 74
Sets......Page 78
Object-oriented programming......Page 80
Object-oriented programming principles......Page 82
Summary......Page 85
Review Exercises......Page 86
Steps for solving problems with regular expressions......Page 91
Python functions for regular expressions......Page 93
Metacharacters......Page 94
Factorization of an algebraic expression......Page 100
Solving simultaneous equations (for two variables)......Page 101
Solving expressions entered by the user......Page 102
Solving simultaneous equations graphically......Page 103
Union and intersection of sets......Page 104
Finding the probability of an event......Page 105
Derivative of a function......Page 106
Integral of a function......Page 107
Summary......Page 108
Review Exercises......Page 109
Descriptive data analysis - Steps......Page 114
Classifying data into different levels......Page 117
Visualizing various levels of data......Page 119
Plotting mixed data......Page 123
Review Exercises......Page 126
Getting familiar with arrays and NumPy functions......Page 130
Creating an array......Page 131
Reshaping an array......Page 134
Combining arrays......Page 138
Testing for conditions......Page 140
Broadcasting, vectorization, and arithmetic operations......Page 143
Obtaining the properties of an array......Page 146
Slicing or selecting a subset of data......Page 149
Obtaining descriptive statistics/aggregate measures......Page 151
Summary......Page 153
Review Exercises......Page 154
Pandas at a glance......Page 159
Building blocks of Pandas......Page 161
Examining the properties of a Series......Page 164
DataFrames......Page 168
From a CSV file:......Page 170
From an HTML file:......Page 171
Accessing attributes in a DataFrame......Page 172
Renaming columns......Page 173
Adding a new column to a DataFrame......Page 174
Inserting rows in a DataFrame......Page 177
Deleting columns from a DataFrame......Page 178
Deleting a row from a DataFrame......Page 180
Indexing......Page 181
Type of an index object......Page 182
Creating a custom index and using columns as indexes......Page 183
Indexes and speed of data retrieval......Page 185
Immutability of an index......Page 186
Alignment of indexes......Page 188
Union operation......Page 189
Data types in Pandas......Page 190
Obtaining information about data types......Page 191
Select particular data types......Page 192
Calculating the memory usage and changing data types of columns......Page 193
Indexers and selection of subsets of data......Page 194
Selecting consecutive rows......Page 195
Selecting consecutive columns......Page 196
Selecting rows using their index labels......Page 197
Using negative index values for selection......Page 198
Selecting nonconsecutive rows and columns......Page 199
ix indexer......Page 200
The indexing operator - [ ]......Page 201
at and iat indexers......Page 203
Using the query method to retrieve data......Page 204
Operators in Pandas......Page 205
Representing dates and times in Pandas......Page 206
Converting strings into Pandas Timestamp objects......Page 207
Extracting the components of a Timestamp object......Page 208
Grouping and aggregation......Page 209
Examining the properties of the groupby object......Page 211
Returning records with the same position in each group using the nth method......Page 212
Filtering groups......Page 213
Transform method and groupby......Page 214
How to combine objects in Pandas......Page 216
Append method for adding rows......Page 217
Concat function (adding rows or columns from  other objects)......Page 219
Join method – index to index......Page 222
Merge method – SQL type join based on common columns......Page 223
Restructuring data and dealing with anomalies......Page 225
Dealing with missing data......Page 226
Imputation......Page 227
Data duplication......Page 230
Tidy data and techniques for restructuring data......Page 232
Conversion from wide to long format (tidy data)......Page 233
Stack method (wide-to-long format conversion)......Page 235
Melt method (wide-to-long format conversion)......Page 238
Pivot method (long-to-wide conversion)......Page 240
Summary......Page 241
Review Exercises......Page 242
Technical requirements......Page 254
External files......Page 255
Commonly used plots......Page 256
Matplotlib......Page 259
Approach for plotting using Matplotlib......Page 262
Plotting using Pandas......Page 264
Scatter plot......Page 265
Histogram......Page 266
Pie charts......Page 267
Seaborn library......Page 268
Box plots......Page 269
Kernel density estimate......Page 270
Violin plot......Page 271
Count plots......Page 272
Heatmap......Page 273
Facet grid......Page 274
Regplot......Page 276
lmplot......Page 277
Strip plot......Page 278
Swarm plot......Page 279
Catplot......Page 280
Pair plot......Page 281
Joint plot......Page 283
Summary......Page 284
Review Exercises......Page 285
Technical requirements......Page 290
Methodology......Page 291
Case study 8-1: Highest grossing movies in France – analyzing unstructured data......Page 292
Case study 8-2: Use of data analysis for air quality management......Page 299
Case study 8-3: Worldwide COVID-19 cases – an analysis......Page 319
Summary......Page 331
Review Exercises......Page 332
Permutations and combinations......Page 335
Probability......Page 337
Rules of probability......Page 338
Bayes theorem......Page 340
Application of Bayes theorem in medical diagnostics......Page 341
Another application of Bayes theorem: Email spam classification......Page 343
SciPy library......Page 344
Binomial distribution......Page 345
The shape of a binomial distribution......Page 346
Poisson distribution......Page 348
The shape of a Poisson distribution......Page 349
Normal distribution......Page 351
Standard normal distribution......Page 353
Solved examples: Standard normal distribution......Page 354
Measures of central tendency......Page 357
Measures of dispersion......Page 358
Measures of shape......Page 359
Probability sampling......Page 363
Non-probability sampling......Page 364
Central limit theorem......Page 365
Estimates and confidence intervals......Page 366
Types of errors in sampling......Page 367
Basic concepts in hypothesis testing......Page 368
Key terminology used in hypothesis testing......Page 369
Steps involved in hypothesis testing......Page 371
One-sample z-test......Page 372
Two-sample sample z-test......Page 374
Hypothesis tests with proportions......Page 376
Two-sample z-test for the population proportions......Page 378
T-distribution......Page 380
Two-sample t-test......Page 382
Solved examples: Conducting t-tests using Scipy functions......Page 383
ANOVA......Page 386
Chi-square test of association......Page 389
Summary......Page 393
Review Exercises......Page 396
Bibliography......Page 401
Index......Page 403

✦ Subjects


Python


📜 SIMILAR VOLUMES


A Python Data Analyst’s Toolkit: Learn P
✍ Gayathri Rajagopalan 📂 Library 📅 2020 🏛 Apress 🌐 English

Explore the fundamentals of data analysis, and statistics with case studies using Python. This book will show you how to confidently write code in Python, and use various Python libraries and functions for analyzing any dataset. The code is presented in Jupyter notebooks that can further be adapted

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O'Reilly Media 🌐 English

Looking for complete instructions on manipulating, processing, cleaning, and crunching structured data in Python? The second edition of this hands-on guide--updated for Python 3.5 and Pandas 1.0--is packed with practical cases studies that show you how to effectively solve a broad set of data analys

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O’Reilly Media 🌐 English

<div><p>Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll lea

Python for Data Analysis: Data Wrangling
✍ Wes McKinney 📂 Library 📅 2017 🏛 O’Reilly Media 🌐 English

<div><p>Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll lea