𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Learning and Operating Presto: Fast, Reliable SQL for Data Analytics and Lakehouses

✍ Scribed by Angelica Lo Duca


Publisher
O'Reilly Media
Year
2023
Tongue
English
Leaves
191
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside.

Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production.

With this book, you will:

  • Learn how to install and configure Presto
  • Use Presto with business intelligence tools
  • Understand how to connect Presto to a variety of data...
  • ✦ Table of Contents


    Preface
    Why We Wrote This Book
    Who This Book Is For
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
    Angelica Lo Duca
    Tim Meehan
    Vivek Bharathan
    Ying Su
    1. Introduction to Presto
    Data Warehouses and Data Lakes
    The Role of Presto in a Data Lake
    Presto Origins and Design Considerations
    High Performance
    High Scalability
    Compliance with the ANSI SQL Standard
    Federation of Data Sources
    Running in the Cloud
    Presto Architecture and Core Components
    Alternatives to Presto
    Apache Impala
    Apache Hive
    Spark SQL
    Trino
    Presto Use Cases
    Reporting and Dashboarding
    Ad Hoc Querying
    ETL Using SQL
    Data Lakehouse
    Real-Time Analytics with Real-Time Databases
    Introducing Our Case Study
    Conclusion
    2. Getting Started with Presto
    Presto Manual Installation
    Running Presto on Docker
    Installing Docker
    Presto Docker Image
    Dockerfile
    The etc/ directory
    node.properties
    jvm.config
    config.properties
    log.properties
    catalog/.properties
    Building and Running Presto on Docker
    The Presto Sandbox
    Deploying Presto on Kubernetes
    Introducing Kubernetes
    Configuring Presto on Kubernetes
    presto-coordinator.yaml
    presto-workers.yaml
    presto-config-map.yaml
    presto-secrets.yaml
    Adding a New Catalog
    Running the Deployment on Kubernetes
    Querying Your Presto Instance
    Listing Catalogs
    Listing Schemas
    Listing Tables
    Querying a Table
    Conclusion
    3. Connectors
    Service Provider Interface
    Connector Architecture
    Popular Connectors
    Thrift
    Writing a Custom Connector
    Prerequisites
    Plugin and Module
    ExamplePlugin
    ExampleConnectorFactory
    ExampleModule
    ExampleConnector
    ExampleHandleResolver
    Configuration
    ExampleConfig
    SessionProperties
    TableProperties
    Metadata
    Data model
    Handles
    ExampleMetadata
    ExampleClient
    Input/Output
    ExampleSplitManager
    ExampleSplit
    ExampleRecordSetProvider and ExampleRecordSet
    ExampleRecordCursor
    Deploying Your Connector
    Apache Pinot
    Setting Up and Configuring Presto
    Setting up Pinot
    Configuring Pinot
    Configuring Presto with Pinot
    Presto-Pinot Querying in Action
    Conclusion
    4. Client Connectivity
    Setting Up the Environment
    Presto Client
    Docker Image
    Kubernetes Node
    Connectivity to Presto
    REST API
    Python
    R
    JDBC
    Node.js
    ODBC
    Other Presto Client Libraries
    Building a Client Dashboard in Python
    Setting Up the Client
    Building the Dashboard
    Connecting to and querying Presto
    Preparing the results of the query
    Building the first graph
    Building the second graph
    Conclusion
    5. Open Data Lakehouse Analytics
    The Emergence of the Lakehouse
    Data Lakehouse Architecture
    Data Lake
    File Store
    File Format
    Table Format
    Query Engine
    Metadata Management
    Data Governance
    Data Access Control
    Building a Data Lakehouse
    Configuring MinIO
    Populating MinIO
    Configuring HMS
    Configuring Spark
    Registering Hudi Tables with HMS
    Connecting and Querying Presto
    Conclusion
    6. Presto Administration
    Introducing Presto Administration
    Configuration
    Properties
    How to configure a cluster
    Sessions
    Using sessions
    JVM
    Memory
    Out-of-memory errors
    Garbage collection
    Monitoring
    Console
    Using the console for monitoring
    Using the console for debugging
    Using the console for going over the interactive plan
    REST API
    Metrics
    JMX connector
    REST API
    JMX exporters
    Management
    Resource Groups
    Configuring resource groups
    Resource groups properties
    Example
    Verifiers
    Setting up the system
    Configuring the MySQL database
    Configuring the Presto verifier
    Running a test
    Session Properties Managers
    Configuring a session property manager
    Namespace Functions
    Setting up the system
    Configuring a function
    Running a test
    Conclusion
    7. Understanding Security in Presto
    Introducing Presto Security
    Building Secure Communication in Presto
    Encryption
    Keystore Management
    Configuring HTTPS/TLS
    Running a Presto client
    Running the Presto console
    Authentication
    File-Based Authentication
    Running a Presto client
    Running the Presto console
    LDAP
    Kerberos
    Prerequisites
    Configuring the Presto coordinator and workers
    Configuring the Presto client
    Creating a Custom Authenticator
    Authorization
    Authorizing Access to the Presto REST API
    Configuring System Access Control
    Authorization Through Apache Ranger
    Building a custom audit function
    Conclusion
    8. Performance Tuning
    Introducing Performance Tuning
    Reasons for Performance Tuning
    The Performance Tuning Life Cycle
    Query Execution Model
    Approaches for Performance Tuning in Presto
    Resource Allocation
    Storage
    Query Optimization
    Aria Scan
    Table Scanning
    Repartitioning
    Implementing Performance Tuning
    Building and Importing the Sample CSV Table in MinIO
    Converting the CSV Table in ORC
    Defining the Tuning Parameters
    Running Tests
    Default parameters
    Reducing CPU usage
    Query optimization
    Aria scan
    Conclusion
    9. Operating Presto at Scale
    Introducing Scalability
    Reasons to Scale Presto
    Common Issues
    Design Considerations
    Availability
    Manageability
    Performance
    Protection
    Configuration
    How to Scale Presto
    Multiple Coordinators
    Presto on Spark
    Spilling
    Using a Cloud Service
    Conclusion
    Index


    πŸ“œ SIMILAR VOLUMES


    Learning and Operating Presto: Fast, Rel
    ✍ Angelica Lo Duca, Tim Meehan, Vivek Bharathan, Ying Su πŸ“‚ Library πŸ“… 2023 πŸ› O’Reilly Media 🌐 English

    The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and softwa

    Learning and Operating Presto: Fast, Rel
    ✍ Angelica Lo Duca, Vivek Bharathan, Ying Su πŸ“‚ Library πŸ› O'Reilly Media 🌐 English

    <p><span>The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this distributed SQL query engine can be challenging even for the most experienced engineers. This practical book shows you how to begin Presto operations at your organization to derive insights on dat

    SQL for Data Analytics: Perform fast and
    ✍ Upom Malik, Matt Goldwasser, Benjamin Johnston πŸ“‚ Library πŸ“… 2019 πŸ› Packt Publishing 🌐 English

    <p><b>Take your first steps to become a fully qualified data analyst by learning how to explore large relational datasets.</b><p><b>Key Features</b><p><li>Explore a variety of statistical techniques to analyze your data<li>Integrate your SQL pipelines with other analytics technologies<li>Perform adv

    SQL for Data Analytics: Perform fast and
    ✍ Upom Malik, Matt Goldwasser, Benjamin Johnston πŸ“‚ Library πŸ“… 2019 πŸ› Packt Publishing 🌐 English

    Take your first steps to become a fully qualified data analyst by learning how to explore large relational datasets. Key Features β€’ Explore a variety of statistical techniques to analyze your data β€’ Integrate your SQL pipelines with other analytics technologies β€’ Perform advanced analytics suc

    SQL for Data Analytics: Perform Fast and
    ✍ Upom Malik; Matt Goldwasser; Benjamin Johnston πŸ“‚ Library πŸ“… 2019 🌐 English

    Take your first steps to become a fully qualified data analyst by learning how to explore large relational datasets. Key Features Explore a variety of statistical techniques to analyze your data Integrate your SQL pipelines with other analytics technologies Perform advanced analytics such as geospat

    Open Data for Education: Linked, Shared,
    ✍ Dmitry Mouromtsev, Mathieu d’Aquin (eds.) πŸ“‚ Library πŸ“… 2016 πŸ› Springer International Publishing 🌐 English

    <p><p>This volume comprises a collection of papers presented at an Open Data in Education Seminar and the LILE workshops during 2014-2015.</p><p>In the first part of the book, two chapters give different perspectives on the current use of linked and open data in education, including the use of techn