Mastering Scala machine learning

✍ Scribed by Kozlov, Alexander

Publisher: Packt Publishing
Year: 2016
Tongue: English
Leaves: 586
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Advance your skills in efficient data analysis and data processing using the powerful tools of Scala, Spark, and HadoopAbout This BookThis is a primer on functional-programming-style techniques to help you efficiently process and analyze all of your dataGet acquainted with the best and newest tools available such as Scala, Spark, Parquet and MLlib for machine learningLearn the best practices to incorporate new Big Data machine learning in your data-driven enterprise to gain future scalability and maintainabilityWho This Book Is ForMastering Scala Machine Learning is intended for enthusiasts who want to plunge into the new pool of emerging techniques for machine learning. Some familiarity with standard statistical techniques is required. What You Will LearnSharpen your functional programming skills in Scala using REPLApply standard and advanced machine learning techniques using ScalaGet acquainted with Big Data technologies and grasp why we need a functional approach to Big DataDiscover new data structures, algorithms, approaches, and habits that will allow you to work effectively with large amounts of dataUnderstand the principles of supervised and unsupervised learning in machine learningWork with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquetConstruct reliable and robust data pipelines and manage data in a data-driven enterprise*Implement scalable model monitoring and alerts with ScalaIn DetailSince the advent of object-oriented programming, new technologies related to Big Data are constantly popping up on the market. One such technology is Scala, which is considered to be a successor to Java in the area of Big Data by many, like Java was to C/C++ in the area of distributed programing. This book aims to take your knowledge to next level and help you impart that knowledge to build advanced applications such as social media mining, intelligent news portals, and more. After a quick refresher on functional programming concepts using REPL, you will see some practical examples of setting up the development environment and tinkering with data. We will then explore working with Spark and MLlib using k-means and decision trees. Most of the data that we produce today is unstructured and raw, and you will learn to tackle this type of data with advanced topics such as regression, classification, integration, and working with graph algorithms. Finally, you will discover at how to use Scala to perform complex concept analysis, to monitor model performance, and to build a model repository. By the end of this book, you will have gained expertise in performing Scala machine learning and will be able to build complex machine learning projects using Scala.;Cover; Copyright; Credits; About the Author; Acknowlegement; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Exploratory Data Analysis; Getting started with Scala; Distinct values of a categorical field; Summarization of a numeric field; Grepping across multiple fields; Basic, stratified, and consistent sampling; Working with Scala and Spark Notebooks; Basic correlations; Summary; Chapter 2: Data Pipelines and Modeling; Influence diagrams; Sequential trials and dealing with risk; Exploration and exploitation; Unknown unknowns; Basic components of a data-driven system; Data ingest.

✦ Table of Contents

Cover
Copyright
Credits
About the Author
Acknowlegement
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Exploratory Data Analysis
Getting started with Scala
Distinct values of a categorical field
Summarization of a numeric field
Grepping across multiple fields
Basic, stratified, and consistent sampling
Working with Scala and Spark Notebooks
Basic correlations
Summary
Chapter 2: Data Pipelines and Modeling
Influence diagrams
Sequential trials and dealing with risk
Exploration and exploitation
Unknown unknowns
Basic components of a data-driven system
Data ingest. Data transformation layerData analytics and machine learning
UI component
Actions engine
Correlation engine
Monitoring
Optimization and interactivity
Feedback loops
Summary
Chapter 3: Working with Spark and MLlib
Setting up Spark
Understanding Spark architecture
Task scheduling
Spark components
MQTT, ZeroMQ, Flume, and Kafka
HDFS, Cassandra, S3, and Tachyon
Mesos, YARN, and Standalone
Applications
Word count
Streaming word count
Spark SQL and DataFrame
ML libraries
SparkR
Graph algorithms --
GraphX and GraphFrames
Spark performance tuning
Running Hadoop HDFS
Summary. Chapter 4: Supervised and Unsupervised LearningRecords and supervised learning
Iris dataset
Labeled point
SVMWithSGD
Logistic regression
Decision tree
Bagging and boosting --
ensemble learning methods
Unsupervised learning
Problem dimensionality
Summary
Chapter 5: Regression and Classification
What regression stands for?
Continuous space and metrics
Linear regression
Logistic regression
Regularization
Multivariate regression
Heteroscedasticity
Regression trees
Classification metrics
Multiclass problems
Perceptron
Generalization error and overfitting
Summary. Chapter 6: Working with Unstructured DataNested data
Other serialization formats
Hive and Impala
Sessionization
Working with traits
Working with pattern matching
Other uses of unstructured data
Probabilistic structures
Projections
Summary
Chapter 7: Working with Graph Algorithms
A quick introduction to graphs
SBT
Graph for Scala
Adding nodes and edges
Graph constraints
JSON
GraphX
Who is getting e-mails?
Connected components
Triangle counting
Strongly connected components
PageRank
SVD++
Summary
Chapter 8: Integrating Scala with R and Python
Integrating with R. Setting up R and SparkRLinux
Mac OS
Windows
Running SparkR via scripts
Running Spark via R's command line
DataFrames
Linear models
Generalized linear model
Reading JSON files in SparkR
Writing Parquet files in SparkR
Invoking Scala from R
Using Rserve
Integrating with Python
Setting up Python
PySpark
Calling Python from Java/Scala
Using sys.process._
Spark pipe
Jython and JSR 223
Summary
Chapter 9: NLP in Scala
Text analysis pipeline
Simple text analysis
MLlib algorithms in Spark
TF-IDF
LDA
Segmentation, annotation, and chunking
POS tagging.

✦ Subjects

Scala (Computer program language);Data Processing; Databases; Programming Languages

📜 SIMILAR VOLUMES

Mastering Azure Machine Learning: Execut

📁 Mastering Azure Machine Learning: Execute Large-Scale End-to-end Machine Learning with Azure

✍ Christoph Korner; Marcel Alsdorf 📂 Library 📅 2022 🏛 Packt Publishing, Limited 🌐 English

Supercharge and automate your deployments to Azure Machine Learning clusters and Azure Kubernetes Service using Azure Machine Learning services Key Features: <ul><li>Implement end-to-end machine learning pipelines on Azure</li><li>Train deep learning models using Azure comput

Scala for Machine Learning

📁 Scala for Machine Learning

✍ Patrick R. Nicolas 📂 Library 📅 2017 🏛 Packt Publishing 🌐 English

Not a single day passes that we do not hear about big data in the news media, technical conferences, and even coffee shops. The ever-increasing amount of data collected in process monitoring, research, or simple human behavior becomes valuable only if you extract knowledge from it. Machine lea

Scala Machine Learning Projects

📁 Scala Machine Learning Projects

✍ Md. Rezaul Karim 📂 Library 📅 2018 🏛 Packt 🌐 English

Scala Machine Learning Projects

📁 Scala Machine Learning Projects

✍ Md. Rezaul Karim 📂 Library 📅 2018 🏛 Packt 🌐 English

Scala for Machine Learning

📁 Scala for Machine Learning

✍ Patrick R. Nicolas 📂 Library 📅 2014 🏛 Packt Publishing 🌐 English

Leverage Scala and Machine Learning to construct and study systems that can learn from data <h2>About This Book</h2><ul><li>Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulation, and source code</li><li>Leverage

Scala for Machine Learning

📁 Scala for Machine Learning

✍ Patrick R. Nicolas 📂 Library 📅 2014 🏛 Packt Publishing 🌐 English