𝔖 Scriptorium
✩   LIBER   ✩

📁

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist

✍ Scribed by T. Mailund


Year
2022
Tongue
English
Leaves
527
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✩ Table of Contents


Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
What Is Data Science?
Prerequisites for Reading This Book
Plan for the Book
Data Analysis and Visualization
Software Development
Getting R and RStudio
Projects
Chapter 1: Introduction to R Programming
Basic Interaction with R
Using R As a Calculator
Simple Expressions
Assignments
Indexing Vectors
Vectorized Expressions
Comments
Functions
Getting Documentation for Functions
Writing Your Own Functions
Summarizing and Vector Functions
A Quick Look at Control Flow
Factors
Data Frames
Using R Packages
Dealing with Missing Values
Data Pipelines
Writing Pipelines of Function Calls
Writing Functions That Work with Pipelines
The Magical “.” Argument
Other Pipeline Operations
Coding and Naming Conventions
Exercises
Mean of Positive Values
Root Mean Square Error
Chapter 2: Reproducible Analysis
Literate Programming and Integration of Workflow and Documentation
Creating an R Markdown/knitr Document in RStudio
The YAML Language
The Markdown Language
Formatting Text
Cross-Referencing
Bibliographies
Controlling the Output (Templates/Stylesheets)
Running R Code in Markdown Documents
Using chunks when analyzing data (without compiling documents)
Caching Results
Displaying Data
Exercises
Create an R Markdown Document
Different Output
Caching
Chapter 3: Data Manipulation
Data Already in R
Quickly Reviewing Data
Reading Data
Examples of Reading and Formatting Data Sets
Breast Cancer Data set
Boston Housing Data Set
The readr Package
Manipulating Data with dplyr
Some Useful dplyr Functions
Breast Cancer Data Manipulation
Tidying Data with tidyr
Exercises
Importing Data
Using dplyr
Using tidyr
Chapter 4: Visualizing Data
Basic Graphics
The Grammar of Graphics and the ggplot2 Package
Using qplot()
Using Geometries
Facets
Scaling
Themes and Other Graphics Transformations
Figures with Multiple Plots
Exercises
Chapter 5: Working with Large Data Sets
Subsample Your Data Before You Analyze the Full Data Set
Running Out of Memory During an Analysis
Too Large to Plot
Too Slow to Analyze
Too Large to Load
Exercises
Subsampling
Hex and 2D Density Plots
Chapter 6: Supervised Learning
Machine Learning
Supervised Learning
Regression vs. Classification
Inference vs. Prediction
Specifying Models
Linear Regression
Logistic Regression (Classification, Really)
Model Matrices and Formula
Validating Models
Evaluating Regression Models
Evaluating Classification Models
Confusion Matrix
Accuracy
Sensitivity and Specificity
Other Measures
More Than Two Classes
Sampling Approaches
Random Permutations of Your Data
Cross-Validation
Selecting Random Training and Testing Data
Examples of Supervised Learning Packages
Decision Trees
Random Forests
Neural Networks
Support Vector Machines
Naive Bayes
Exercises
Fitting Polynomials
Evaluating Different Classification Measures
Breast Cancer Classification
Leave-One-Out Cross-Validation (Slightly More Difficult)
Decision Trees
Random Forests
Neural Networks
Support Vector Machines
Compare Classification Algorithms
Chapter 7: Unsupervised Learning
Dimensionality Reduction
Principal Component Analysis
Multidimensional Scaling
Clustering
k-means Clustering
Hierarchical Clustering
Association Rules
Exercises
Dealing with Missing Data in the HouseVotes84 Data
k-means
Chapter 8: Project 1: Hitting the Bottle
Importing Data
Exploring the Data
Distribution of Quality Scores
Is This Wine Red or White?
Fitting Models
Exercises
Exploring Other Formulas
Exploring Different Models
Analyzing Your Own Data Set
Chapter 9: Deeper into R Programming
Expressions
Arithmetic Expressions
Boolean Expressions
Basic Data Types
Numeric
Integer
Complex
Logical
Character
Data Structures
Vectors
Matrix
Lists
Indexing
Named Values
Factors
Formulas
Control Structures
Selection Statements
Loops
Functions
Named Arguments
Default Parameters
Return Values
Lazy Evaluation
Scoping
Function Names Are Different from Variable Names
Recursive Functions
Exercises
Fibonacci Numbers
Outer Product
Linear Time Merge
Binary Search
More Sorting
Selecting the k Smallest Element
Chapter 10: Working with Vectors and Lists
Working with Vectors and Vectorizing Functions
ifelse
Vectorizing Functions
The apply Family
apply
Nothing Good, It Would Seem
lapply
sapply and vapply
Advanced Functions
Special Names
Infix Operators
Replacement Functions
How Mutable Is Data Anyway?
Exercises
between
rmq
Chapter 11: Functional Programming
Anonymous Functions
Higher-Order Functions
Functions Taking Functions As Arguments
Functions Returning Functions (and Closures)
Filter, Map, and Reduce
Functional Programming with purrr
Functions As Both Input and Output
Ellipsis Parameters

Exercises
apply_if
power
Row and Column Sums
Factorial Again

Function Composition
Implement This Operator
Chapter 12: Object-Oriented Programming
Immutable Objects and Polymorphic Functions
Data Structures
Example: Bayesian Linear Model Fitting
Classes
Polymorphic Functions
Defining Your Own Polymorphic Functions
Class Hierarchies
Specialization As Interface
Specialization in Implementations
Exercises
Shapes
Polynomials
Chapter 13: Building an R Package
Creating an R Package
Package Names
The Structure of an R Package
.Rbuildignore
Description
Title
Version
Description
Author and Maintainer
License
Type, Date, LazyData
URL and BugReports
Dependencies
Using an Imported Package
Using a Suggested Package
NAMESPACE
R/ and man/
Checking the Package
Roxygen
Documenting Functions
Import and Export
Package Scope vs. Global Scope
Internal Functions
File Load Order
Adding Data to Your Package
NULL
Building an R Package
Exercises
Chapter 14: Testing and Package Checking
Unit Testing
Automating Testing
Using testthat
Writing Good Tests
Using Random Numbers in Tests
Testing Random Results
Checking a Package for Consistency
Exercise
Chapter 15: Version Control
Version Control and Repositories
Using Git in RStudio
Installing Git
Making Changes to Files, Staging Files, and Committing Changes
Adding Git to an Existing Project
Bare Repositories and Cloning Repositories
Pushing Local Changes and Fetching and Pulling Remote Changes
Handling Conflicts
Working with Branches
Typical Workflows Involve Lots of Branches
Pushing Branches to the Global Repository
GitHub
Moving an Existing Repository to GitHub
Installing Packages from GitHub
Collaborating on GitHub
Pull Requests
Forking Repositories Instead of Cloning
Exercises
Chapter 16: Profiling and Optimizing
Profiling
A Graph-Flow Algorithm
Speeding Up Your Code
Parallel Execution
Switching to C++
Exercises
Chapter 17: Project 2: Bayesian Linear Regression
Bayesian Linear Regression
Exercises: Priors and Posteriors
Sample from a Multivariate Normal Distribution
Computing the Posterior Distribution
Predicting Target Variables for New Predictor Values
Formulas and Their Model Matrix
Working with Model Matrices in R
Exercises
Building Model Matrices
Fitting General Models
Model Matrices Without Response Variables
Exercises
Model Matrices for New Data
Predicting New Targets
Interface to a blm Class
Constructor
Updating Distributions: An Example Interface
Designing Your blm Class
Model Methods
coefficients
confint
deviance
fitted
plot
predict
print
residuals
summary
Building an R Package for blm
Deciding on the Package Interface
Organization of Source Files
Document Your Package Interface Well
Adding README and NEWS Files to Your Package
README
NEWS
Testing
GitHub
Conclusions
Data Science
Machine Learning
Data Analysis
R Programming
The End
Index


📜 SIMILAR VOLUMES


Beginning Data Science in R 4: Data Anal
✍ Thomas Mailund 📂 Library 📅 2022 🏛 Apress 🌐 English

<span>Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new s

Beginning Data Science in R: Data Analys
✍ Thomas Mailund [Thomas Mailund] 📂 Library 📅 2017 🏛 Apress 🌐 English

<span><p>Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.</p><

Beginning Data Science in R: Data Analys
✍ Mailund, Thomas 📂 Library 📅 2017 🏛 Apress 🌐 English

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.<br /><i>Data

Beginning Data Science in R: Data Analys
✍ Thomas Mailund (auth.) 📂 Library 📅 2017 🏛 Apress 🌐 English

<p>Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.<br><i>Begi