𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Practical Machine Learning

✍ Scribed by Sunila Gollapudi


Publisher
Packt Publishing
Year
2016
Tongue
English
Leaves
464
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


About This Book

  • Fully-coded working examples using a wide range of machine learning libraries and tools, including Python, R, Julia, and Spark
  • Comprehensive practical solutions taking you into the future of machine learning
  • Go a step further and integrate your machine learning projects with Hadoop

Who This Book Is For

This book has been created for data scientists who want to see Machine learning in action and explore its real-world applications. Knowledge of programming (Python and R) and mathematics is advisable if you want to get started immediately.

What You Will Learn

  • Implement a wide range of algorithms and techniques for tackling complex data
  • Get to grips with some of the most powerful languages in data science, including R, Python, and Julia
  • Harness the capabilities of Spark and Mahout used in conjunction with Hadoop to manage and process data successfully
  • Apply the appropriate Machine learning technique to address a real-world problem
  • Get acquainted with deep learning and find out how neural networks are being used at the cutting edge of Machine learning
  • Explore the future of Machine learning and dive deeper into polyglot persistence, semantic data, and more

In Detail

This book explores an extensive range of Machine learning techniques, uncovering hidden tips and tricks for several types of data using practical real-world examples. While Machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles.

We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for modern data scientists who want to get to grips with Machine learning's real-world application.

The book also explores cutting-edge advances in Machine learning, with worked examples and guidance on Deep learning and Reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced Machine learning methodologies.

✦ Table of Contents


Cover
Copyright
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Chapter 1: Introduction to
Machine learning
Machine learning
Definition
Core Concepts and Terminology
What is learning?
Data
Labeled and unlabeled data
Tasks
Algorithms
Models
Data and inconsistencies in Machine learning
Under-fitting
Over-fitting
Data instability
Unpredictable data formats
Practical Machine learning examples
Types of learning problems
Classification
Clustering
Forecasting, prediction or regression
Simulation
Optimization
Supervised learning
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Deep learning
Performance measures
Is the solution good?
Mean squared error (MSE)
Mean absolute error (MAE)
Normalized MSE and MAE (NMSE and NMAE)
Solving the errors: bias and variance
Some complementing fields of Machine learning
Data mining
Artificial intelligence (AI)
Statistical learning
Data science
Machine learning process lifecycle and solution architecture
Machine learning algorithms
Decision tree based algorithms
Bayesian method based algorithms
Kernel method based algorithms
Clustering methods
Artificial neural networks (ANN)
Dimensionality reduction
Ensemble methods
Instance based learning algorithms
Regression analysis based algorithms
Association rule based learning algorithms
Machine learning tools and frameworks
Summary
Chapter 2: Machine learning and
Large-scale datasets
Big data and the context of large-scale Machine learning
Functional versus Structural – A methodological mismatch
Commoditizing information
Theoretical limitations of RDBMS
Scaling-up versus Scaling-out storage
Distributed and parallel computing strategies
Machine learning: Scalability and Performance
Too many data points or instances
Too many attributes or features
Shrinking response time windows – need for real-time responses
Highly complex algorithm
Feed forward, iterative prediction cycles
Model selection process
Potential issues in large-scale Machine learning
Algorithms and Concurrency
Developing concurrent algorithms
Technology and implementation options for scaling-up Machine learning
MapReduce programming paradigm
High Performance Computing (HPC) with Message Passing Interface (MPI)
Language Integrated Queries (LINQ) framework
Manipulating datasets with LINQ
Graphics Processing Unit (GPU)
Field Programmable Gate Array (FPGA)
Multicore or multiprocessor systems
Summary
Chapter 3: An Introduction to Hadoop's Architecture and Ecosystem
Introduction to Apache Hadoop
Evolution of Hadoop (the platform of choice)
Hadoop and its core elements
Machine learning solution architecture for big data (employing Hadoop)
The Data Source layer
The Ingestion layer
The Hadoop Storage layer
The Hadoop (Physical) Infrastructure layer – supporting appliance
Hadoop platform / Processing layer
The Analytics layer
The Consumption layer
Explaining and exploring data with Visualizations
Security and Monitoring layer
Hadoop core components framework
Writing to and reading from HDFS
Handling failures
HDFS command line
RESTFul HDFS
MapReduce
MapReduce architecture
What makes MapReduce cater to the needs of large datasets?
MapReduce execution flow and components
Developing MapReduce components
Hadoop 2.x
Hadoop ecosystem components
Hadoop installation and setup
Installing Jdk 1.7
Creating a system user for Hadoop (dedicated)
Disable IPv6
Steps for installing Hadoop 2.6.0
Starting Hadoop
Hadoop distributions and vendors
Summary
Chapter 4: Machine Learning Tools, Libraries, and Frameworks
Machine learning tools – A landscape
Apache Mahout
How does Mahout work?
Installing and setting up Apache Mahout
Setting up Maven
Setting-up Apache Mahout using Eclipse IDE
Setting up Apache Mahout without Eclipse
Mahout Packages
Implementing vectors in Mahout
R
Installing and setting up R
Integrating R with Apache Hadoop
Approach 1 – Using R and Streaming APIs in Hadoop
Approach 2 – Using the Rhipe package of R
Approach 3 – Using RHadoop
Summary of R/Hadoop integration approaches
Implementing in R (using examples)
Julia
Installing and setting up Julia
Downloading and using the command line version of Julia
Using Juno IDE for running Julia
Using Julia via the browser
Running the Julia code from the command line
Implementing in Julia (with examples)
Using variables and assignments
Numeric primitives
Data structures
Working with Strings and String manipulations
Packages
Interoperability
Graphics and plotting
Benefits of adopting Julia
Integrating Julia and Hadoop
Python
Toolkit options in Python
Implementation of Python (using examples)
Installing Python and setting up scikit-learn
Apache Spark
Scala
Programming with Resilient Distributed Datasets (RDD)
Spring XD
Summary
Chapter 5: Decision Tree based learning
Decision trees
Terminology
Purpose and uses
Constructing a Decision tree
Handling missing values
Considerations for constructing Decision trees
Decision trees in a graphical representation
Inducing Decision trees – Decision tree algorithms
Greedy Decision trees
Benefits of Decision trees
Specialized trees
Oblique trees
Random forests
Evolutionary trees
Hellinger trees
Implementing Decision trees
Using Mahout
Using R
Using Spark
Using Python (scikit-learn)
Using Julia
Summary
Chapter 6: Instance and Kernel Methods Based Learning
Instance-based learning (IBL)
Nearest Neighbors
Value of k in KNN
Distance measures in KNN
Case-based reasoning (CBR)
Locally weighed regression (LWR)
Implementing KNN
Using Mahout
Using R
Using Spark
Using Python (scikit-learn)
Using Julia
Kernel methods-based learning
Kernel functions
Support Vector Machines (SVM)
Inseparable Data
Implementing SVM
Using Mahout
Using R
Using Spark
Using Python (Scikit-learn)
Using Julia
Summary
Chapter 7: Association Rules
based learning
Association rules based learning
Association rule – a definition
Apriori algorithm
Rule generation strategy
FP-growth algorithm
Apriori versus FP-growth
Implementing Apriori and FP-growth
Using Mahout
Using R
Using Spark
Using Python (Scikit-learn)
Using Julia
Summary
Chapter 8: Clustering based learning
Clustering-based learning
Types of clustering
Hierarchical clustering
Partitional clustering
The k-means clustering algorithm
Convergence or stopping criteria for the k-means clustering
K-means clustering on disk
Advantages of the k-means approach
Disadvantages of the k-means algorithm
Distance measures
Complexity measures
Implementing k-means clustering
Using Mahout
Using R
Using Spark
Using Python (scikit-learn)
Using Julia
Summary
Chapter 9: Bayesian learning
Bayesian learning
Statistician's thinking
Important terms and definitions
Probability
Types of probability
Distribution
Bernoulli distribution
Binomial distribution
Bayes' theorem
NaΓ―ve Bayes classifier
Multinomial NaΓ―ve Bayes classifier
The Bernoulli NaΓ―ve Bayes classifier
Implementing NaΓ―ve Bayes algorithm
Using Mahout
Using R
Using Spark
Using scikit-learn
Using Julia
Summary
Chapter 10: Regression based learning
Regression analysis
Revisiting statistics
Properties of expectation, variance, and covariance
ANOVA and F Statistics
Confounding
Effect modification
Regression methods
Simple regression or simple linear regression
Multiple regression
Polynomial (non-linear) regression
Generalized Linear Models (GLM)
Logistic regression (logit link)
Odds ratio in logistic regression
Poisson regression
Implementing linear and logistic regression
Using Mahout
Using R
Using Spark
Using scikit-learn
Using Julia
Summary
Chapter 11: Deep learning
Background
The human brain
Neural networks
Neuron
Synapses
Artificial neurons or perceptrons
Neural Network size
Neural network types
Backpropagation algorithm
Softmax regression technique
Deep learning taxonomy
Convolutional neural networks (CNN/ConvNets)
Convolutional layer (CONV)
Pooling layer (POOL)
Fully connected layer (FC)
Recurrent Neural Networks (RNNs)
Restricted Boltzmann Machines (RBMs)
Deep Boltzmann Machines (DBMs)
Autoencoders
Implementing ANNs and Deep learning methods
Using Mahout
Using R
Using Spark
Using Python (Scikit-learn)
Using Julia
Summary
Chapter 12: Reinforcement Learning
Reinforcement Learning (RL)
The context of Reinforcement Learning
Examples of Reinforcement Learning
Evaluative Feedback
The Reinforcement Learning problem – the world grid example
Markov Decision Process (MDP)
Basic RL model – agent-environment interface
Delayed rewards
The policy
Reinforcement Learning – key features
Reinforcement learning solution methods
Dynamic Programming (DP)
Generalized Policy Iteration (GPI)
Monte Carlo methods
Temporal difference (TD) learning
Sarsa - on-Policy TD
Q-Learning – off-Policy TD
Actor-critic methods (on-policy)
R Learning (Off-policy)
Summary
Chapter 13: Ensemble learning
Ensemble learning methods
The wisdom of the crowd
Key use cases
Recommendation systems
Anomaly detection
Transfer learning
Stream mining or classification
Ensemble methods
Supervised ensemble methods
Unsupervised ensemble methods
Implementing ensemble methods
Using Mahout
Using R
Using Spark
Using Python (Scikit-learn)
Using Julia
Summary
Chapter 14: New generation data architectures for
Machine learning
Evolution of data architectures
Emerging perspectives & drivers for new age data architectures
Modern data architectures for Machine learning
Semantic data architecture
The business data lake
Semantic Web technologies
Vendors
Multi-model database architecture / polyglot persistence
Vendors
Lambda Architecture (LA)
Vendors
Summary
Index


πŸ“œ SIMILAR VOLUMES


Practical Machine Learning
✍ Ted Dunning and Ellen Friedman πŸ“‚ Library πŸ“… 2014 πŸ› OReilly Media 🌐 English

<p>Anomaly detection is the detective work of machine learning: finding the unusual, catching the fraud, discovering strange activity in large and complex datasets. But, unlike Sherlock Holmes, you may not know what the puzzle is, much less what suspects you&#8217;re looking for. This O&#8217;Reilly

Practical Machine Learning
✍ Ted Dunning and Ellen Friedman πŸ“‚ Library πŸ“… 2014 πŸ› OReilly Media 🌐 English

<div><p>Building a simple but powerful recommendation system is much easier than you think. Approachable for all levels of expertise, this report explains innovations that make machine learning practical for business production settings&#8212;and demonstrates how even a small-scale development team

Practical Machine Learning
✍ Ted Dunning;Ellen Friedman πŸ“‚ Library πŸ“… 2014 πŸ› OReilly Media 🌐 English

<div><p>Building a simple but powerful recommendation system is much easier than you think. Approachable for all levels of expertise, this report explains innovations that make machine learning practical for business production settings&#8212;and demonstrates how even a small-scale development team

Practical Machine Learning
✍ Gollapudi Sunila. πŸ“‚ Library 🌐 English

Packt Publishing - ebooks Account, 2016. β€” 614 p. β€” ISBN-10: 178439968X. β€” ISBN-13: 978-1784399689<div class="bb-sep"></div>This book has been created for data scientists who want to see Machine learning in action and explore its real-world applications. Knowledge of programming (Python and R) and m

Practical Machine Learning
✍ Gollapudi Sunila. πŸ“‚ Library 🌐 English

Packt Publishing - ebooks Account, 2016. β€” 653 p. β€” ISBN-10: 178439968X. β€” ISBN-13: 978-1784399689<div class="bb-sep"></div>This book has been created for data scientists who want to see Machine learning in action and explore its real-world applications. Knowledge of programming (Python and R) and m

Practical machine learning cookbook
✍ Atul Tripathi πŸ“‚ Library πŸ“… 2017 πŸ› Packt Publishing 🌐 English

Machine learning has become the new black. The challenge in today’s world is the explosion of data from existing legacy data and incoming new structured and unstructured data. The complexity of discovering, understanding, performing analysis, and predicting outcomes on the data using machine learnin