𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Scalable Data Architecture with Java: Build efficient enterprise-grade data architecting solutions using Java

✍ Scribed by Sinchan Banerjee


Publisher
Packt Publishing
Year
2022
Tongue
English
Leaves
382
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clients

Key Features

  • Learn how to adapt to the ever-evolving data architecture technology landscape
  • Understand how to choose the best suited technology, platform, and architecture to realize effective business value
  • Implement effective data security and governance principles

Book Description

Java architectural patterns and tools help architects to build reliable, scalable, and secure data engineering solutions that collect, manipulate, and publish data.

This book will help you make the most of the architecting data solutions available with clear and actionable advice from an expert.

You'll start with an overview of data architecture, exploring responsibilities of a Java data architect, and learning about various data formats, data storage, databases, and data application platforms as well as how to choose them. Next, you'll understand how to architect a batch and real-time data processing pipeline. You'll also get to grips with the various Java data processing patterns, before progressing to data security and governance. The later chapters will show you how to publish Data as a Service and how you can architect it. Finally, you'll focus on how to evaluate and recommend an architecture by developing performance benchmarks, estimations, and various decision metrics.

By the end of this book, you'll be able to successfully orchestrate data architecture solutions using Java and related technologies as well as to evaluate and present the most suitable solution to your clients.

What you will learn

  • Analyze and use the best data architecture patterns for problems
  • Understand when and how to choose Java tools for a data architecture
  • Build batch and real-time data engineering solutions using Java
  • Discover how to apply security and governance to a solution
  • Measure performance, publish benchmarks, and optimize solutions
  • Evaluate, choose, and present the best architectural alternatives
  • Understand how to publish Data as a Service using GraphQL and a REST API

Who this book is for

Data architects, aspiring data architects, Java developers and anyone who wants to develop or optimize scalable data architecture solutions using Java will find this book useful. A basic understanding of data architecture and Java programming is required to get the best from this book.

Table of Contents

  1. Basics of Modern Data Architecture
  2. Data Storage and Databases
  3. Identifying the Right Data Platform
  4. ETL Data Load - A Batch-Based Solution to Ingest Data in a Data Warehouse
  5. Architecting a Batch Processing Pipeline
  6. Architecting a Real-Time Processing Pipeline
  7. Core Architectural Design Patterns
  8. Enabling Data Security and Governance
  9. Exposing MongoDB Data as a Service
  10. Federated and Scalable DaaS with GraphQL
  11. Measuring Performance and Benchmarking Your Applications
  12. Evaluating, Recommending, and Presenting Your Solutions

✦ Table of Contents


Cover
Title Page
Copyright and Credits
Contributors
About the reviewers
Table of Contents
Preface
Section 1 – Foundation of Data Systems
Chapter 1: Basics of Modern Data Architecture
Exploring the landscape of data engineering
What is data engineering?
Dimensions of data
Types of data engineering problems
Responsibilities and challenges of a Java data architect
Data architect versus data engineer
Challenges of a data architect
Techniques to mitigate those challenges
Summary
Chapter 2: Data Storage and Databases
Understanding data types, formats, and encodings
Data types
Data formats
Understanding file, block, and object storage
File storage
Block storage
Object storage
The data lake, data warehouse, and data mart
Data lake
Data warehouse
Data marts
Databases and their types
Relational database
NoSQL database
Data model design considerations
Summary
Chapter 3: Identifying the Right Data Platform
Technical requirements
Virtualization and containerization platforms
Benefits of virtualization
Containerization
Benefits of containerization
Kubernetes
Hadoop platforms
Hadoop architecture
Cloud platforms
Benefits of cloud computing
Choosing the correct platform
When to choose virtualization versus containerization
When to use big data
Choosing between on-premise versus cloud-based solutions
Choosing between various cloud vendors
Summary
Section 2 – Building Data Processing Pipelines
Chapter 4: ETL Data Load – A Batch-Based Solution to Ingesting Data in a Data Warehouse
Technical requirements
Understanding the problem and source data
Problem statement
Understanding the source data
Building an effective data model
Relational data warehouse schemas
Evaluation of the schema design
Designing the solution
Implementing and unit testing the solution
Summary
Chapter 5: Architecting a Batch Processing Pipeline
Technical requirements
Developing the architecture and choosing the right toolsΒ Β 
Problem statement
Analyzing the problem
Architecting the solution
Factors that affect your choice of storage
Determining storage based on cost
The cost factor in the processing layer
Implementing the solution
Profiling the source data
Writing the Spark application
Deploying and running the Spark application
Developing and testing a Lambda trigger
Performance tuning a Spark job
Querying the ODL using AWS Athena
Summary
Chapter 6: Architecting a Real-Time Processing Pipeline
Technical requirements
Understanding and analyzing the streaming problem
Problem statement
Analyzing the problem
Architecting the solution
Implementing and verifying the design
Setting up Apache Kafka on your local machine
Developing the Kafka streaming application
Unit testing a Kafka Streams application
Configuring and running the application
Creating a MongoDB Atlas cloud instance and database
Configuring Kafka Connect to store the results in MongoDB
Verifying the solution
Summary
Chapter 7: Core Architectural Design Patterns
Core batch processing patterns
The staged Collect-Process-Store pattern
Common file format processing pattern
The Extract-Load-Transform pattern
The compaction pattern
The staged report generation pattern
Core stream processing patterns
The outbox pattern
The saga pattern
The choreography pattern
The Command Query Responsibility Segregation (CQRS) patternΒ Β 
The strangler fig patternΒ Β 
The log stream analytics pattern
Hybrid data processing patterns
The Lambda architectureΒ Β 
The Kappa architecture
Serverless patterns for data ingestion
Summary
Chapter 8: Enabling Data Security and Governance
Technical requirements
Introducing data governance – what and why
When to consider data governance
The DGI data governance framework
Practical data governance using DataHub and NiFi
Creating the NiFi pipeline
Setting up DataHub
Governance activities
Understanding the need for data security
Solution and tools available for data security
Summary
Section 3 – Enabling Dataas a Service
Chapter 9: Exposing MongoDB Data as a Service
Technical requirements
Introducing DaaS – what and why
Benefits of using DaaS
Creating a DaaS to expose data using Spring Boot
Problem statement
Analyzing and designing a solution
Implementing the Spring Boot REST application
Deploying the application in an ECS cluster
API management
Enabling API management over the DaaS API using AWS API Gateway
Summary
Chapter 10: Federated and Scalable DaaS with GraphQL
Technical requirements
Introducing GraphQL – what, when, and why
Operation types
Why use GraphQL?
When to use GraphQL
Core architectural patterns of GraphQL
A practical use case – exposing federated data models using GraphQL
Summary
Section 4 – Choosing Suitable Data Architecture
Chapter 11: Measuring Performance and Benchmarking Your Applications
Performance engineering and planning
Performance engineering versus performance testing
Tools for performance engineering
Publishing performance benchmarks
Optimizing performance
Java Virtual Machine and garbage collection optimizations
Big data performance tuning
Optimizing streaming applications
Database tuning
Summary
Chapter 12: Evaluating, Recommending, and Presenting Your Solutions
Creating cost and resource estimations
Storage and compute capacity planning
Effort and timeline estimation
Creating an architectural decision matrix
Data-driven architectural decisions to mitigate risk
Presenting the solution and recommendations
Summary
Index
Other Books You May Enjoy


πŸ“œ SIMILAR VOLUMES


Scalable Data Architecture with Java: Bu
✍ Sinchan Banerjee πŸ“‚ Library πŸ“… 2022 πŸ› Packt Publishing 🌐 English

<p><span>Orchestrate data architecting solutions using Java and related technologies to evaluate, recommend and present the most suitable solution to leadership and clients</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Learn how to adapt to the ever-evolving data architecture techn

Building Java Enterprise Applications. A
✍ Brett McLaughlin πŸ“‚ Library πŸ“… 2002 πŸ› O'Reilly Media 🌐 English

Volume 1 of this advanced 3-volume guide explores the infrastructure issues so important to good application design. It isn't just a book about Entity Beans and JNDI. It takes you step by step through building the back end, designing the data store so that it gives you convenient access to the data

Java Connector Architecture: Building En
✍ Atul Apte πŸ“‚ Library πŸ“… 2002 πŸ› Sams 🌐 English

I paid $0.51 for this book. It was too much. This is the only book that I have ever just thrown away out of pure frustration because of how terrible it is. The book is about 345 pages long. Not until page 131 does a discussion of the JCA even begin. You then get about 30 pages of text that is very

Java Connector Architecture: Building En
✍ Atul Apte πŸ“‚ Library πŸ“… 2002 πŸ› Sams 🌐 English

<P><I>Java Connector Architecture</I> (JCA) presents the JCA and identifies the scope in which a JCA-based adapter operates. The book quickly moves to the design methodologies employed in adapter using the JCA. The book then progresses to information about testing and deploying adapters in a product

Building Java Enterprise Applications, V
✍ Brett McLaughlin πŸ“‚ Library πŸ“… 2002 πŸ› O'Reilly Media 🌐 English

After reading the synopsis I was ready to delve into some planning and best practices reading material. While the book focuses on a core example, it doesn't not provide enough information on WHY certain decisions were made and does not provide enough look into alternatives. The book would be much