Data Analytics with Spark Using Python
β Scribed by Jeffrey Aven
- Publisher
- Addison-Wesley
- Year
- 2018
- Tongue
- English
- Series
- Addison-Wesley Data & Analytics Series
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Solve Data Analytics Problems with Spark, PySpark, and Related Open Source Tools
Spark is at the heart of todayβs Big Data revolution, helping data professionals supercharge efficiency and performance in a wide range of data processing and analytics tasks. In this guide, Big Data expert Jeffrey Aven covers all you need to know to leverage Spark, together with its extensions, subprojects, and wider ecosystem.
Aven combines a language-agnostic introduction to foundational Spark concepts with extensive programming examples utilizing the popular and intuitive PySpark development environment. This guideβs focus on Python makes it widely accessible to large audiences of data professionals, analysts, and developersβeven those with little Hadoop or Spark experience.
Avenβs broad coverage ranges from basic to advanced Spark programming, and Spark SQL to machine learning. Youβll learn how to efficiently manage all forms of data with Spark: streaming, structured, semi-structured, and unstructured. Throughout, concise topic overviews quickly get you up to speed, and extensive hands-on exercises prepare you to solve real problems.
Coverage includes:
- Understand Sparkβs evolving role in the Big Data and Hadoop ecosystems
- Create Spark clusters using various deployment modes
- Control and optimize the operation of Spark clusters and applications
- Master Spark Core RDD API programming techniques
- Extend, accelerate, and optimize Spark routines with advanced API platform constructs, including shared variables, RDD storage, and partitioning
- Efficiently integrate Spark with both SQL and nonrelational data stores
- Perform stream processing and messaging with Spark Streaming and Apache Kafka
- Implement predictive modeling with SparkR and Spark MLlib
β¦ Subjects
Computers; Databases; Data Mining; Web; General; Mathematics
π SIMILAR VOLUMES
<p><em>Big Data Analytics with Spark</em> is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, inter
This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX,
<p><span>The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-worl
The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world dataset
Python is a popular programming language for data analytics, and it is also well-suited for IoT Data Analytics. By leveraging Python's versatility and its rich ecosystem of libraries and tools, Data Analytics for IoT can unlock valuable insights, enable predictive capabilities, and optimize decision