𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data, 4th Edition

✍ Scribed by Brett Lantz


Publisher
Packt Publishing
Year
2023
Tongue
English
Leaves
763
Edition
4
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Learn how to solve real-world data problems using machine learning and R

Purchase of the print or Kindle book includes a free eBook in PDF format.

Key Features

  • The 10th Anniversary Edition of the bestselling R machine learning book, updated with 50% new content for R 4.0.0 and beyond
  • Harness the power of R to build flexible, effective, and transparent machine learning models
  • Learn quickly with this clear, hands-on guide by machine learning expert Brett Lantz

Book Description

Machine learning, at its core, is concerned with transforming data into actionable knowledge. R offers a powerful set of machine learning methods to quickly and easily gain insight from your data.

Machine Learning with R, Fourth Edition, provides a hands-on, accessible, and readable guide to applying machine learning to real-world problems. Whether you are an experienced R user or new to the language, Brett Lantz teaches you everything you need to know for data pre-processing, uncovering key insights, making new predictions, and visualizing your findings. This 10th Anniversary Edition features several new chapters that reflect the progress of machine learning in the last few years and help you build your data science skills and tackle more challenging problems, including making successful machine learning models and advanced data preparation, building better learners, and making use of big data.

You'll also find this classic R data science book updated to R 4.0.0 with newer and better libraries, advice on ethical and bias issues in machine learning, and an introduction to deep learning. Whether you're looking to take your first steps with R for machine learning or making sure your skills and knowledge are up to date, this is an unmissable read that will help you find powerful new insights in your data.

What you will learn

  • Learn the end-to-end process of machine learning from raw data to implementation
  • Classify important outcomes using nearest neighbor and Bayesian methods
  • Predict future events using decision trees, rules, and support vector machines
  • Forecast numeric data and estimate financial values using regression methods
  • Model complex processes with artificial neural networks
  • Prepare, transform, and clean data using the tidyverse
  • Evaluate your models and improve their performance
  • Connect R to SQL databases and emerging big data technologies such as Spark, Hadoop, H2O, and TensorFlow

Who this book is for

This book is designed to help data scientists, actuaries, data analysts, financial analysts, social scientists, business and machine learning students, and any other practitioners who want a clear, accessible guide to machine learning with R. No R experience is required, although prior exposure to statistics and programming is helpful.

Table of Contents

  1. Introducing Machine Learning
  2. Managing and Understanding Data
  3. Lazy Learning – Classification Using Nearest Neighbors
  4. Probabilistic Learning – Classification Using Naive Bayes
  5. Divide and Conquer – Classification Using Decision Trees and Rules
  6. Forecasting Numeric Data – Regression Methods
  7. Black-Box Methods – Neural Networks and Support Vector Machines
  8. Finding Patterns – Market Basket Analysis Using Association Rules
  9. Finding Groups of Data – Clustering with k-means
  10. Evaluating Model Performance
  11. Being Successful with Machine Learning
  12. Advanced Data Preparation
  13. Challenging Data – Too Much, Too Little, Too Complex
  14. Building Better Learners
  15. Making Use of Big Data

✦ Table of Contents


Cover
Copyright
Contributors
Table of Contents
Preface
Chapter 1: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
Machine learning successes
The limits of machine learning
Machine learning ethics
How machines learn
Data storage
Abstraction
Generalization
Evaluation
Machine learning in practice
Types of input data
Types of machine learning algorithms
Matching input data to algorithms
Machine learning with R
Installing R packages
Loading and unloading R packages
Installing RStudio
Why R and why R now?
Summary
Chapter 2: Managing and Understanding Data
R data structures
Vectors
Factors
Lists
Data frames
Matrices and arrays
Managing data with R
Saving, loading, and removing R data structures
Importing and saving datasets from CSV files
Importing common dataset formats using RStudio
Exploring and understanding data
Exploring the structure of data
Exploring numeric features
Measuring the central tendency – mean and median
Measuring spread – quartiles and the five-number summary
Visualizing numeric features – boxplots
Visualizing numeric features – histograms
Understanding numeric data – uniform and normal distributions
Measuring spread – variance and standard deviation
Exploring categorical features
Measuring the central tendency – the mode
Exploring relationships between features
Visualizing relationships – scatterplots
Examining relationships – two-way cross-tabulations
Summary
Chapter 3: Lazy Learning – Classification Using Nearest Neighbors
Understanding nearest neighbor classification
The k-NN algorithm
Measuring similarity with distance
Choosing an appropriate k
Preparing data for use with k-NN
Why is the k-NN algorithm lazy?
Example – diagnosing breast cancer with the k-NN algorithm
Step 1 – collecting data
Step 2 – exploring and preparing the data
Transformation – normalizing numeric data
Data preparation – creating training and test datasets
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Transformation – z-score standardization
Testing alternative values of k
Summary
Chapter 4: Probabilistic Learning – Classification Using Naive Bayes
Understanding Naive Bayes
Basic concepts of Bayesian methods
Understanding probability
Understanding joint probability
Computing conditional probability with Bayes’ theorem
The Naive Bayes algorithm
Classification with Naive Bayes
The Laplace estimator
Using numeric features with Naive Bayes
Example – filtering mobile phone spam with the Naive Bayes algorithm
Step 1 – collecting data
Step 2 – exploring and preparing the data
Data preparation – cleaning and standardizing text data
Data preparation – splitting text documents into words
Data preparation – creating training and test datasets
Visualizing text data – word clouds
Data preparation – creating indicator features for frequent words
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Summary
Chapter 5: Divide and Conquer – Classification Using Decision Trees and Rules
Understanding decision trees
Divide and conquer
The C5.0 decision tree algorithm
Choosing the best split
Pruning the decision tree
Example – identifying risky bank loans using C5.0 decision trees
Step 1 – collecting data
Step 2 – exploring and preparing the data
Data preparation – creating random training and test datasets
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Boosting the accuracy of decision trees
Making some mistakes cost more than others
Understanding classification rules
Separate and conquer
The 1R algorithm
The RIPPER algorithm
Rules from decision trees
What makes trees and rules greedy?
Example – identifying poisonous mushrooms with rule learners
Step 1 – collecting data
Step 2 – exploring and preparing the data
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Summary
Chapter 6: Forecasting Numeric Data – Regression Methods
Understanding regression
Simple linear regression
Ordinary least squares estimation
Correlations
Multiple linear regression
Generalized linear models and logistic regression
Example – predicting auto insurance claims costs using linear regression
Step 1 – collecting data
Step 2 – exploring and preparing the data
Exploring relationships between features – the correlation matrix
Visualizing relationships between features – the scatterplot matrix
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Model specification – adding nonlinear relationships
Model specification – adding interaction effects
Putting it all together – an improved regression model
Making predictions with a regression model
Going further – predicting insurance policyholder churn with logistic regression
Understanding regression trees and model trees
Adding regression to trees
Example – estimating the quality of wines with regression trees and model trees
Step 1 – collecting data
Step 2 – exploring and preparing the data
Step 3 – training a model on the data
Visualizing decision trees
Step 4 – evaluating model performance
Measuring performance with the mean absolute error
Step 5 – improving model performance
Summary
Chapter 7: Black-Box Methods – Neural Networks and Support Vector Machines
Understanding neural networks
From biological to artificial neurons
Activation functions
Network topology
The number of layers
The direction of information travel
The number of nodes in each layer
Training neural networks with backpropagation
Example – modeling the strength of concrete with ANNs
Step 1 – collecting data
Step 2 – exploring and preparing the data
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Understanding support vector machines
Classification with hyperplanes
The case of linearly separable data
The case of nonlinearly separable data
Using kernels for nonlinear spaces
Example – performing OCR with SVMs
Step 1 – collecting data
Step 2 – exploring and preparing the data
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Changing the SVM kernel function
Identifying the best SVM cost parameter
Summary
Chapter 8: Finding Patterns – Market Basket Analysis Using Association Rules
Understanding association rules
The Apriori algorithm for association rule learning
Measuring rule interest – support and confidence
Building a set of rules with the Apriori principle
Example – identifying frequently purchased groceries with association rules
Step 1 – collecting data
Step 2 – exploring and preparing the data
Data preparation – creating a sparse matrix for transaction data
Visualizing item support – item frequency plots
Visualizing the transaction data – plotting the sparse matrix
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Sorting the set of association rules
Taking subsets of association rules
Saving association rules to a file or data frame
Using the Eclat algorithm for greater efficiency
Summary
Chapter 9: Finding Groups of Data – Clustering with k-means
Understanding clustering
Clustering as a machine learning task
Clusters of clustering algorithms
The k-means clustering algorithm
Using distance to assign and update clusters
Choosing the appropriate number of clusters
Finding teen market segments using k-means clustering
Step 1 – collecting data
Step 2 – exploring and preparing the data
Data preparation – dummy coding missing values
Data preparation – imputing the missing values
Step 3 – training a model on the data
Step 4 – evaluating model performance
Step 5 – improving model performance
Summary
Chapter 10: Evaluating Model Performance
Measuring performance for classification
Understanding a classifier’s predictions
A closer look at confusion matrices
Using confusion matrices to measure performance
Beyond accuracy – other measures of performance
The kappa statistic
The Matthews correlation coefficient
Sensitivity and specificity
Precision and recall
The F-measure
Visualizing performance tradeoffs with ROC curves
Comparing ROC curves
The area under the ROC curve
Creating ROC curves and computing AUC in R
Estimating future performance
The holdout method
Cross-validation
Bootstrap sampling
Summary
Chapter 11: Being Successful with Machine Learning
What makes a successful machine learning practitioner?
What makes a successful machine learning model?
Avoiding obvious predictions
Conducting fair evaluations
Considering real-world impacts
Building trust in the model
Putting the β€œscience” in data science
Using R Notebooks and R Markdown
Performing advanced data exploration
Constructing a data exploration roadmap
Encountering outliers: a real-world pitfall
Example – using ggplot2 for visual data exploration
Summary
Chapter 12: Advanced Data Preparation
Performing feature engineering
The role of human and machine
The impact of big data and deep learning
Feature engineering in practice
Hint 1: Brainstorm new features
Hint 2: Find insights hidden in text
Hint 3: Transform numeric ranges
Hint 4: Observe neighbors’ behavior
Hint 5: Utilize related rows
Hint 6: Decompose time series
Hint 7: Append external data
Exploring R’s tidyverse
Making tidy table structures with tibbles
Reading rectangular files faster with readr and readxl
Preparing and piping data with dplyr
Transforming text with stringr
Cleaning dates with lubridate
Summary
Chapter 13: Challenging Data – Too Much, Too Little, Too Complex
The challenge of high-dimension data
Applying feature selection
Filter methods
Wrapper methods and embedded methods
Example – Using stepwise regression for feature selection
Example – Using Boruta for feature selection
Performing feature extraction
Understanding principal component analysis
Example – Using PCA to reduce highly dimensional social media data
Making use of sparse data
Identifying sparse data
Example – Remapping sparse categorical data
Example – Binning sparse numeric data
Handling missing data
Understanding types of missing data
Performing missing value imputation
Simple imputation with missing value indicators
Missing value patterns
The problem of imbalanced data
Simple strategies for rebalancing data
Generating a synthetic balanced dataset with SMOTE
Example – Applying the SMOTE algorithm in R
Considering whether balanced is always better
Summary
Chapter 14: Building Better Learners
Tuning stock models for better performance
Determining the scope of hyperparameter tuning
Example – using caret for automated tuning
Creating a simple tuned model
Customizing the tuning process
Improving model performance with ensembles
Understanding ensemble learning
Popular ensemble-based algorithms
Bagging
Boosting
Random forests
Gradient boosting
Extreme gradient boosting with XGBoost
Why are tree-based ensembles so popular?
Stacking models for meta-learning
Understanding model stacking and blending
Practical methods for blending and stacking in R
Summary
Chapter 15: Making Use of Big Data
Practical applications of deep learning
Beginning with deep learning
Choosing appropriate tasks for deep learning
The TensorFlow and Keras deep learning frameworks
Understanding convolutional neural networks
Transfer learning and fine tuning
Example – classifying images using a pre-trained CNN in R
Unsupervised learning and big data
Representing highly dimensional concepts as embeddings
Understanding word embeddings
Example – using word2vec for understanding text in R
Visualizing highly dimensional data
The limitations of using PCA for big data visualization
Understanding the t-SNE algorithm
Example – visualizing data’s natural clusters with t-SNE
Adapting R to handle large datasets
Querying data in SQL databases
The tidy approach to managing database connections
Using a database backend for dplyr with dbplyr
Doing work faster with parallel processing
Measuring R’s execution time
Enabling parallel processing in R
Taking advantage of parallel with foreach and doParallel
Training and evaluating models in parallel with caret
Utilizing specialized hardware and algorithms
Parallel computing with MapReduce concepts via Apache Spark
Learning via distributed and scalable algorithms with H2O
GPU computing
Summary
Other Books You May Enjoy
Index


πŸ“œ SIMILAR VOLUMES


Machine Learning with R: Learn technique
✍ Brett Lantz πŸ“‚ Library πŸ“… 2023 πŸ› Packt Publishing 🌐 English

<p><span>Learn how to solve real-world data problems using machine learning and R</span></p><p><span>Purchase of the print or Kindle book includes a free eBook in PDF format.</span></p><h4><span>Key Features</span></h4><ul><li><span><span>The 10th Anniversary Edition of the bestselling R machine lea

Machine learning with R : discover how t
✍ Lantz, Brett πŸ“‚ Library πŸ“… 2015 πŸ› Packt Publishing - ebooks Account 🌐 English

<h4>Key Features</h4><ul><li>Harness the power of R for statistical computing and data science</li><li>Explore, forecast, and classify data with R</li><li>Use R to apply common machine learning algorithms to real-world scenarios</li></ul><h4>Book Description</h4><p>Machine learning, at its core, is

Machine Learning Models and Algorithms f
✍ Shan Suthaharan (auth.) πŸ“‚ Library πŸ“… 2016 πŸ› Springer US 🌐 English

<p><p>This book presents machine learning models and algorithms to address big data classification problems. Existing machine learning techniques like the decision tree (a hierarchical approach), random forest (an ensemble hierarchical approach), and deep learning (a layered approach) are highly sui

Machine Learning Mastery With R: How to
✍ Jason Brownlee πŸ“‚ Library πŸ“… 2016 πŸ› Independently Published 🌐 English

R has been the gold standard in applied machine learning for a long time. Surveys show that it is the most popular platform used by professional data scientists. It is also preferred by the best data scientists in the world. In this mega Ebook written in the friendly Machine Learning Mastery styl

Statistical Data Modeling and Machine Le
✍ Snezhana Gocheva-Ilieva (editor), Atanas Ivanov (editor), Hristina Kulina (edito πŸ“‚ Library πŸ“… 2023 πŸ› Mdpi AG 🌐 English

<p><span>The present book contains all of the articles in the second edition of the Special Issue titled "Statistical Data Modeling and Machine Learning with Applications II". This Special Issue belongs to the "Mathematics and Computer Science" Section and aims to publish research on the theory and

Statistical and Machine-Learning Data Mi
✍ Bruce Ratner πŸ“‚ Library πŸ“… 2011 πŸ› CRC Press 🌐 English

<P>The second edition of a bestseller, <STRONG>Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data</STRONG> is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition