<p><em>Big Data Analytics with Spark</em> is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, inter
Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
โ Scribed by Lai, Rudy;Potaczek, Bartlomiej
- Publisher
- Packt Publishing
- Year
- 2019
- Tongue
- English
- Leaves
- 182
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs
Key Features
Book Description
Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.
You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and...
โฆ Table of Contents
Table of ContentsInstalling Pyspark and Setting up Your Development EnvironmentGetting Your Big Data into the Spark Environment Using RDDsBig Data Cleaning and Wrangling with Spark NotebooksAggregating and Summarizing Data into Useful ReportsPowerful Exploratory Data Analysis with MLlibPutting Structure on Your Big Data with SparkSQLTransformations and ActionsImmutable DesignAvoiding Shuffle and Reducing Operational ExpensesSaving Data in the Correct FormatWorking with the Spark Key/Value APITesting Apache Spark JobsLeveraging the Spark GraphX API
๐ SIMILAR VOLUMES
This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX,
<span><span><p><em>Data Science and Big Data Analytics</em> is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to
<i>Data Science and Big Data Analytics</i> is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and
Get command of your organizational Big Data using the power of data science and analyticsKey Features A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions Work with the best tools such as Apache Hadoop, R, Python, and Spark for
Get command of your organizational Big Data using the power of data science and analyticsKey Features A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions Work with the best tools such as Apache Hadoop, R, Python, and Spark for