<span>As part of the best-selling </span><span>Pocket Primer</span><span> series, this book is designed to introduce the reader to the basic concepts of data science using Python 3 and other computer applications. It is intended to be a fast-paced introduction to some basic features of data analytic
Data Science Fundamentals Pocket Primer
β Scribed by Oswald Campesato
- Publisher
- Mercury Learning and Information
- Year
- 2021
- Tongue
- English
- Leaves
- 451
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
As part of the best-selling Pocket Primer series, this book is designed to introduce the reader to the basic concepts of data science using Python 3 and other computer applications. It is intended to be a fast-paced introduction to some basic features of data analytics and also covers statistics, data visualization, linear algebra, and regular expressions. The book includes numerous code samples using Python, NumPy, R, SQL, NoSQL, and Pandas. Companion files with source code and color figures are available.
FEATURES:
- Includes a concise introduction to Python 3 and linear algebra
- Provides a thorough introduction to data visualization and regular expressions
- Covers NumPy, Pandas, R, and SQL
- Introduces probability and statistical concepts
- Features numerous code samples throughout
- Companion files with source code and figures
β¦ Table of Contents
Cover
Titlte Page
Copyright
Contents
Preface
Chapter 1 Working with Data
What are Datasets?
Data Preprocessing
Data Types
Preparing Datasets
Discrete Data Versus Continuous Data
βBinningβ Continuous Data
Scaling Numeric Data via Normalization
Scaling Numeric Data via Standardization
What to Look for in Categorical Data
Mapping Categorical Data to Numeric Values
Working with Dates
Working with Currency
Missing Data, Anomalies, and Outliers
Missing Data
Anomalies and Outliers
Outlier Detection
What is Data Drift
What is Imbalanced Classification?
What is SMOTE?
SMOTE Extensions
Analyzing Classifiers (Optional)
What is LIME?
What is ANOVA?
The Bias-Variance Trade-Off
Types of Bias in Data
Summary
Chapter 2 Intro to Probability and Statistics
What is a Probability?
Calculating the Expected Value
Random Variables
Discrete versus Continuous Random Variables
Well-Known Probability Distributions
Fundamental Concepts in Statistics
The Mean
The Median
The Mode
The Variance and Standard Deviation
Population, Sample, and Population Variance
Chebyshevβs Inequality
What is a P-Value?
The Moments of a Function (Optional)
What is Skewness?
What is Kurtosis?
Data and Statistics
The Central Limit Theorem
Correlation versus Causation
Statistical Inferences
Statistical Terms β RSS, TSS, R^2, and F1 Score
What is an F1 Score?
Gini Impurity, Entropy, and Perplexity
What is the Gini Impurity?
What is Entropy?
Calculating Gini Impurity and Entropy Values
Multidimensional Gini Index
What is Perplexity?
Cross-Entropy and KL Divergence
What is Cross-Entropy?
What is KL Divergence?
Whatβs their Purpose?
Covariance and Correlation Matrices
The Covariance Matrix
Covariance Matrix: An Example
The Correlation Matrix
Eigenvalues and Eigenvectors
Calculating Eigenvectors: A Simple Example
Gauss Jordan Elimination (Optional)
PCA (Principal Component Analysis)
The New Matrix of Eigenvectors
Well-Known Distance Metrics
Pearson Correlation Coefficient
Jaccard Index (or Similarity)
Local Sensitivity Hashing (Optional)
Types of Distance Metrics
What is Bayesian Inference?
Bayesβ Theorem
Some Bayesian Terminology
What is MAP?
Why Use Bayesβ Theorem?
Summary
Chapter 3 Linear Algebra Concepts
What is Linear Algebra?
What are Vectors?
The Norm of a Vector
The Inner Product of Two Vectors
The Cosine Similarity of Two Vectors
Bases and Spanning Sets
Three Dimensional Vectors and Beyond
What are Matrices?
Add and Multiply Matrices
The Determinant of a Square Matrix
Well-Known Matrices
Properties of Orthogonal Matrices
Operations Involving Vectors and Matrices
Gauss Jordan Elimination (Optional)
Covariance and Correlation Matrices
The Covariance Matrix
Covariance Matrix: An Example
The Correlation Matrix
Eigenvalues and Eigenvectors
Calculating Eigenvectors: A Simple Example
What is PCA (Principal Component Analysis)?
The Main Steps in PCA
The New Matrix of Eigenvectors
Dimensionality Reduction
Dimensionality Reduction Techniques
The Curse of Dimensionality
SVD (Singular Value Decomposition)
LLE (Locally Linear Embedding)
UMAP
t-SNE
PHATE
Linear Versus Non-Linear Reduction Techniques
Complex Numbers (Optional)
Complex Numbers on the Unit Circle
Complex Conjugate Root Theorem
Hermitian Matrices
Summary
Chapter 4 Introduction to Python
Tools for Python
easy_install and pip
virtualenv
Python Installation
Setting the PATH Environment Variable (Windows Only)
Launching Python on Your Machine
The Python Interactive Interpreter
Python Identifiers
Lines, Indentations, and Multi-Lines
Quotation and Comments in Python
Saving Your Code in a Module
Some Standard Modules in Python
The help() and dir() Functions
Compile Time and Runtime Code Checking
Simple Data Types in Python
Working with Numbers
Working with Other Bases
The chr() Function
The round() Function in Python
Formatting Numbers in Python
Unicode and UTF-8
Working with Unicode
Working with Strings
Comparing Strings
Formatting Strings in Python
Uninitialized Variables and the Value None in Python
Slicing and Splicing Strings
Testing for Digits and Alphabetic Characters
Search and Replace a String in Other Strings
Remove Leading and Trailing Characters
Printing Text without NewLine Characters
Text Alignment
Working with Dates
Converting Strings to Dates
Exception Handling in Python
Handling User Input
Command-Line Arguments
Precedence of Operators in Python
Python Reserved Words
Working with Loops in Python
Python For Loops
A For Loop with try/except in Python
Numeric Exponents in Python
Nested Loops
The split() Function with For Loops
Using the split() Function to Compare Words
Using the split() Function to Print Justified Text
Using the split() Function to Print Fixed Width Text
Using the split() Function to Compare Text Strings
Using the split() Function to Display Characters in a String
The join() Function
Python While Loops
Conditional Logic in Python
The break/continue/pass Statements
Comparison and Boolean Operators
The in/not in/is/is not Comparison Operators
The and, or, and not Boolean Operators
Local and Global Variables
Scope of Variables
Pass by Reference Versus Value
Arguments and Parameters
Using a While Loop to Find the Divisors of a Number
Using a While Loop to Find Prime Numbers
User-Defined Functions in Python
Specifying Default Values in a Function
Returning Multiple Values from a Function
Functions with a Variable Number of Arguments
Lambda Expressions
Recursion
Calculating Factorial Values
Calculating Fibonacci Numbers
Working with Lists
Lists and Basic Operations
Reversing and Sorting a List
Lists and Arithmetic Operations
Lists and Filter-related Operations
Sorting Lists of Numbers and Strings
Expressions in Lists
Concatenating a List of Words
The Python range() Function
Counting Digits, Uppercase, and Lowercase Letters
Arrays and the append() Function
Working with Lists and the split() Function
Counting Words in a List
Iterating Through Pairs of Lists
Other List-Related Functions
Working with Vectors
Working with Matrices
Queues
Tuples (Immutable Lists)
Sets
Dictionaries
Creating a Dictionary
Displaying the Contents of a Dictionary
Checking for Keys in a Dictionary
Deleting Keys from a Dictionary
Iterating Through a Dictionary
Interpolating Data from a Dictionary
Dictionary Functions and Methods
Dictionary Formatting
Ordered Dictionaries
Sorting Dictionaries
Python Multi Dictionaries
Other Sequence Types in Python
Mutable and Immutable Types in Python
The type() Function
Summary
Chapter 5 Introduction to NumPy
What is NumPy
Useful NumPy Features
What are NumPy Arrays?
Working with Loops
Appending Elements to Arrays (1)
Appending Elements to Arrays (2
)
Multiplying Lists and Arrays
Doubling the Elements in a List
Lists and Exponents
Arrays and Exponents
Math Operations and Arrays
Working with β-1β Sub-ranges with Vectors
Working with β-1β Sub-ranges with Arrays
Other Useful NumPy Methods
Arrays and Vector Operations
NumPy and Dot Products (1)
NumPy and Dot Products (2
)
NumPy and the Length of Vectors
NumPy and Other Operations
NumPy and the reshape() Method
Calculating the Mean and Standard Deviation
Code Sample with Mean and Standard Deviation
Trimmed Mean and Weighted Mean
Working with Lines in the Plane (Optional)
Plotting Randomized Points with NumPy and Matplotlib
Plotting a Quadratic with NumPy and Matplotlib
What is Linear Regression?
What is Multivariate Analysis?
What about Non-Linear Datasets?
The MSE (Mean Squared Error) Formula
Other Error Type
Non-Linear Least Squares
Calculating the MSE Manually
Find the Best-Fitting Line in NumPy
Calculating MSE by Successive Approximation (1)
C
alculating MSE by Successive Approximation (2 )
Google Colaboratory
Uploading CSV Files in Google Colaboratory
Summary
Chapter 6 Introduction to Pandas
What is Pandas?
Pandas Options and Settings
Pandas Data Frames
Data Frames and Data Cleaning Tasks
Alternatives to Pandas
A Pandas Data Frame with a NumPy Example
Describing a Pandas Data Frame
Pandas Boolean Data Frames
Transposing a Pandas Data Frame
Pandas Data Frames and Random Numbers
Reading CSV Files in Pandas
The loc() and iloc() Methods in Pandas
Converting Categorical Data to Numeric Data
Matching and Splitting Strings in Panda
Converting Strings to Dates in Pandas
Merging and Splitting Columns in Pandas
Combining Pandas Data Frames
Data Manipulation with Pandas Data Frames (1)
Data Manipulation with Pandas Data Frames (2
)
Data Manipulation with Pandas Data Frames (3
)
Pandas Data Frames and CSV Files
Managing Columns in Data Frames
Switching Columns
Appending Columns
Deleting Columns
Inserting Columns
Scaling Numeric Columns
Managing Rows in Pandas
Selecting a Range of Rows in Pandas
Finding Duplicate Rows in Pandas
Inserting New Rows in Pandas
Handling Missing Data in Pandas
Multiple Types of Missing Values
Test for Numeric Values in a Column
Replacing NaN Values in Pandas
Sorting Data Frames in Pandas
Working with groupby() in Pandas
Working with apply() and mapapply() in Pandas
Handling Outliers in Pandas
Pandas Data Frames and Scatterplots
Pandas Data Frames and Simple Statistics
Aggregate Operations in Pandas Data Frames
Aggregate Operations with the titanic.csv Dataset
Save Data Frames as CSV Files and Zip Files
Pandas Data Frames and Excel Spreadsheets
Working with JSON-based Data
Python Dictionary and JSON
Python, Pandas, and JSON
Useful One-line Commands in Pandas
What is Method Chaining?
Pandas and Method Chaining
Pandas Profiling
Summary
Chapter 7 Introduction to R
What is R?
Features of R
Installing R and RStudio
Variable Names, Operators, and Data Types in R
Assigning Values to Variables in R
Operators in R
Data Types in R
Working with Strings in R
Uppercase and Lowercase Strings
String-Related Tasks
Working with Vectors in R
Finding NULL Values in a Vector in R
Updating NA Values in a Vector in R
Sorting a Vector of Elements in R
Working with the Alphabet Variable in R
Working with Lists in R
Working with Matrices in R (1)
Working with Matrices in R (2
)
Working with Matrices in R (3
)
Working with Matrices in R (4
)
Working with Matrices in R (5
)
Updating Matrix Elements
Logical Constraints and Matrices
Working with Matrices in R (6)
Combining Vectors, Matrices, and Lists in R
Working with Dates in R
The seq Function in R
Basic Conditional Logic
Compound Conditional Logic
Working with User Input
A Try/Catch Block in R
Linear Regression in R
Working with Simple Loops in R
Working with Nested Loops in R
Working with While Loops in R
Working with Conditional Logic in R
Add a Sequence of Numbers in R
Check if a Number is Prime in R
Check if Numbers in an Array are Prime in R
Check for Leap Years in R
Well-formed Triangle Values in R
What are Factors in R?
What are Data Frames in R?
Working with Data Frames in R (1)
Working with Data Frames in R (2
)
Working with Data Frames in R (3
)
Sort a Data Frame by Column
Reading Excel Files in R
Reading SQLITE Tables in R
Reading Text Files in R
Saving and Restoring Objects in R
Data Visualization in R
Working with Bar Charts in R (1)
Working with Bar Charts in R (2
)
Working with Line Graphs in R
Working with Functions in R
Math-related Functions in R
Some Operators and Set Functions in R
The βApply Familyβ of Built-in Functions
The dplyr Package in R
The Pipe Operator %>%
Working with CSV Files in R
Working with XML in R
Reading an XML Document into an R Data Frame
Working with JSON in R
Reading a JSON File into an R Data Frame
Statistical Functions in R
Summary Functions in R
Defining a Custom Function in R
Recursion in R
Calculating Factorial Values in R (Non-recursive)
Calculating Factorial Values in R (recursive)
Calculating Fibonacci Numbers in R (Non-recursive)
Calculating Fibonacci Numbers in R (Recursive)
Convert a Decimal Integer to a Binary Integer in R
Calculating the GCD of Two Integers in R
Calculating the LCM of Two Integers in R
Summary
Chapter 8 Regular Expressions
What are Regular Expressions?
Metacharacters in Python
Character Sets in Python
Working with β^β and β\β
Character Classes in Python
Matching Character Classes with the re Module
Using the re.match() Method
Options for the re.match() Method
Matching Character Classes with the re.search() Method
Matching Character Classes with the findAll() Method
Finding Capitalized Words in a String
Additional Matching Function for Regular Expressions
Grouping with Character Classes in Regular Expressions
Using Character Classes in Regular Expressions
Matching Strings with Multiple Consecutive Digits
Reversing Words in Strings
Modifying Text Strings with the re Module
Splitting Text Strings with the re.split() Method
Splitting Text Strings Using Digits and Delimiters
Substituting Text Strings with the re.sub() Method
Matching the Beginning and the End of Text Strings
Compilation Flags
Compound Regular Expressions
Counting Character Types in a String
Regular Expressions and Grouping
Simple String Matches
Pandas and Regular Expressions
Summary
Exercises
Chapter 9 SQL and NoSQL 337
What is an RDBMS?
A Four-Table RDBMS
The customers Table
The purchase_orders Table
The line_items Table
The item_desc Table
What is SQL?
What is DCL?
What is DDL?
Delete Vs. Drop Vs. Truncate
What is DQL?
What is DML?
What is TCL?
Data Types in MySQL
Working with MySQL
Logging into MySQL
Creating a MySQL Database
Creating and Dropping Tables
Manually Creating Tables for mytools.com
Creating Tables via a SQL Script for mytools.com (1
)
Creating Tables via a SQL Script for mytools.com (2
)
Creating Tables from the Command Line
Dropping Tables via a SQL Script for mytools.com
Populating Tables with Seed Data
Populating Tables from Text Files
Simple SELECT Statements
Select Statements with a WHERE Clause
Select Statements with GROUP BY Clause
Select Statements with a HAVING Clause
Working with Indexes in SQL
What are Keys in an RDBMS?
Aggregate and Boolean Operations in SQL
Joining Tables in SQL
Defining Views in MySQL
Entity Relationships
One-to-Many Entity Relationships
Many-to-Many Entity Relationships
Self-Referential Entity Relationships
Working with Subqueries in SQL
Other Tasks in SQL
Reading MySQL Data from Panda
Export SQL Data to Excel
What is Normalization?
What are Schemas?
Other RDBMS Topics
Working with NoSQL
Create MongoDB Cellphones Collection
Sample Queries in MongoDB
Summary
Chapter 10 Data Visualization
What is Data Visualization?
Types of Data Visualization
What is Matplotlib?
Horizontal Lines in Matplotlib
Slanted Lines in Matplotlib
Parallel Slanted Lines in Matplotlib
A Grid of Points in Matplotlib
A Dotted Grid in Matplotlib
Lines in a Grid in Matplotlib
A Colored Grid in Matplotlib
A Colored Square in an Unlabeled Grid in Matplotlib
Randomized Data Points in Matplotlib
A Histogram in Matplotlib
A Set of Line Segments in Matplotlib
Plotting Multiple Lines in Matplotlib
Trigonometric Functions in Matplotlib
Display IQ Scores in Matplotlib
Plot a Best-Fitting Line in Matplotlib
Introduction to Sklearn (scikit-learn)
The Digits Dataset in Sklearn
The Iris Dataset in Sklearn (1)
Sklearn, Pandas, and the Iris Dataset
The Iris Dataset in Sklearn (2)
The Faces Dataset in Sklearn (Optional)
Working with Seaborn
Features of Seaborn
Seaborn Built-in Datasets
The Iris Dataset in Seaborn
The Titanic Dataset in Seaborn
Extracting Data from the Titanic Dataset in Seaborn (1)
Extracting Data from the Titanic Dataset in Seaborn (2)
Visualizing a Pandas Dataset in Seaborn
Data Visualization in Pandas
What is Bokeh?
Summary
Index
π SIMILAR VOLUMES
As part of the best selling <i>Pocket Primer</i> series, this book is an effort to give programmers sufficient knowledge of data cleaning to be able to work on their own projects. It is designed as a practical introduction to using flexible, powerful (and free) Unix / Linux shell commands to perform
<span>As part of the best selling </span><span>Pocket Primer</span><span> series, this book is an effort to give programmers sufficient knowledge of data cleaning to be able to work on their own projects. It is designed as a practical introduction to using flexible, powerful (and free) Unix / Linux
As part of the best-selling Pocket Primer series, this book is designed to present the fundamentals of data structures using Python. Data structures provide a means to manage huge amounts of information such as large databases and the ability to use search and sort algorithms effectively. It is inte
<span>As part of the best-selling </span><span>Pocket Primer</span><span> series, this book is designed to present the fundamentals of data structures using Python. Data structures provide a means to manage huge amounts of information such as large databases and the ability to use search and sort al
<span>As part of the best-selling </span><span>Pocket Primer</span><span> series, this book is designed to introduce the reader to the basic concepts of managing data using a variety of computerlanguages and applications. It is intended to be a fast-paced introduction to some basic features of data