Mastering Apache Cassandra, Second Edition [2nd Ed] True PDF
β Scribed by Nishant Neeraj
- Publisher
- Packt Publishing
- Year
- 2015
- Tongue
- English
- Leaves
- 346
- Edition
- 2
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Build, manage, and configure high-performing, reliable NoSQL database for your application with Cassandra
About This Book
- Develop applications for modelling data with Cassandra 2
- Manage large amounts of structured, semi-structured, and unstructured data with Cassandra
- Explore a wide-range of Cassandra components and how they interact to create a robust, distributed system.
Who This Book Is For
The book is aimed at intermediate developers with an understanding of core database concepts who want to become a master at implementing Cassandra for their application.
What You Will Learn
- Write programs using Cassandra's features more efficiently
- Get the most out of a given infrastructure, improve performance, and tweak JVM
- Use CQL3 in your application, which makes working with Cassandra more simple
- Configure Cassandra and fine-tune its parameters depending on your needs
- Set up a cluster and learn how to scale it
- Monitor Cassandra cluster in different ways
- Use Hadoop and other big data processing tools with Cassandra
In Detail
With ever increasing rates of data creation comes the demand to store data as fast and reliably as possible, a demand met by modern databases such as Cassandra. Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Through this practical guide, you will program pragmatically and understand completely the power of Cassandra. Starting with a brief recap of the basics to get everyone up and running, you will move on to deploy and monitor a production setup, dive under the hood, and optimize and integrate it with other software.
You will explore the integration and interaction of Cassandra components, and explore great new features such as CQL3, vnodes, lightweight transactions, and triggers. Finally, by learning Hadoop and Pig, you will be able to analyze your big data.
β¦ Table of Contents
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Quick Start
Introduction to Cassandra
Distributed database
High availability
Replication
Multiple data centers
A brief introduction to a data model
Installing Cassandra locally
Cassandra in action
Modeling data
Writing code
Setting up
Inserting records
Retrieving data
Writing your application
Getting the connection
Executing queries
Object mapping
Summary
Chapter 2: Cassandra Architecture
Problems in the RDBMS world
Enter NoSQL
The CAP theorem
Consistency
Availability
Partition-tolerance
The significance of the CAP theorem
Cassandra
Understanding the architecture of Cassandra
Ring representation
Virtual nodes
How Cassandra works
Write in action
Read in action
The components of Cassandra
The messaging service
Gossip
Failure detection
Partitioner
Replication
LSM tree
Commit log
MemTable
SSTable
Compaction
Tombstones
Hinted handoff
Read repair and anti-entropy
Summary
Chapter 3: Effective CQL
The Cassandra data model
The counter column (cell)
The expiring cell
The column family
Keyspaces
Data types
The primary index
CQL3
Creating a keyspace
Altering a keyspace
Creating a table
Altering a table
Dropping a table
Creating an index
Dropping an index
Creating a data type
Altering a custom type
Dropping a custom type
Creating triggers
Dropping a trigger
Creating a user
Altering a user
Dropping a user
The granting permission
Revoking permission using REVOKE
Inserting data
Lightweight transactions
Updating a row
Deleting a row
Executing the BATCH statement
Other CQL commands
CQL shell commands
Summary
Chapter 4: Deploying a Cluster
Evaluating requirements
Hard disk capacity
RAM
CPU
Is node a server?
Network
System configurations
Optimizing user limits
Swapping memory
Clock synchronization
Disk readahead
The required software
Installing Oracle Java 7
RHEL and CentOS systems
Debian and Ubuntu systems
Installing the Java Native Access library
Installing Cassandra
Installing from a tarball
Installing from ASFRepositoy for Debian or Ubuntu
Anatomy of the installation
Cassandra binaries
Configuration files
Configuring a Cassandra cluster
The cluster name
The seed node
Listen, broadcast, and RPC addresses
num_tokens versus initial_token
Num tokens
Initial token
Partitioners
The Random partitioner
The Byte-ordered partitioner
The Mumur3 partitioner
Snitches
SimpleSnitch
PropertyFileSnitch
GossipingPropertyFileSnitch
RackInferringSnitch
EC2Snitch
EC2MultiRegionSnitch
Replica placement strategies
SimpleStrategy
NetworkTopologyStrategy
Launching a cluster with a script
Creating a keyspace
Authorization and authentication
Summary
Chapter 5: Performance Tuning
Stress testing
Database schema
Data distribution
Write pattern
Read queries
Performance tuning
Write performance
Read performance
Choosing the right compaction strategy
Size-tiered compaction strategy
Leveled compaction
Row cache
Key cache
Cache settings
Enabling compression
Tuning bloom filter
More tuning via cassandra.yaml
commitlog_sync
column_index_size_in_kb
commitlog_total_space_in_mb
Tweaking JVM
Java heap
Garbage collection
Other JVM options
Scaling horizontally and vertically
Network
Summary
Chapter 6: Managing a Cluster β Scaling, Node Repair, and Backup
Scaling
Adding nodes to a cluster
Adding new nodes in vnode-enabled clusters
Adding a new node to a cluster without vnodes
Removing nodes from a cluster
Removing a live node
Removing a dead node
Replacing a node
Backup and restoration
Using the Cassandra bulk loader to restore the data
Load balancing
Datastax OpsCenter β managing large clusters
Summary
Chapter 7: Monitoring
Cassandra's JMX interface
Accessing MBeans using JConsole
Cassandra's nodetool utility
Monitoring with nodetool
cfstats
netstats
status
ring and describering
tpstats
compactionstats
info
Managing administration with nodetool
drain
decommission
removenode
move
repair
upgradesstable
snapshot
DataStax OpsCenter
OpsCenter features
Installing OpsCenter and an agent
Prerequisites
Running a Cassandra cluster
Installing OpsCenter from tarball
Setting up an OpsCenter agent
Monitoring and administrating with OpsCenter
Other features of OpsCenter
Nagios β monitoring and notification
Installing Nagios
Prerequisites
Preparation
Installation
Nagios plugins
Cassandra log
Enabling Java options for GC logging
Troubleshooting
High CPU usage
High memory usage
Hotspots
Open JDK's erratic behavior
Disk performance
Slow snapshots
Getting help from the mailing list
Summary
Chapter 8: Integration with Hadoop
Using Hadoop
Hadoop and Cassandra
Introduction to Hadoop
HDFS
Data management
Hadoop MapReduce
Reliability of data and processes in Hadoop
Setting up local Hadoop
Testing the installation
Cassandra with Hadoop MapReduce
Preparing Cassandra for Hadoop
ColumnFamilyInputFormat
ColumnFamilyOutputFormat
CqlOutputFormat and CqlInputFormat
ConfigHelper
Wide row support
Bulk loading
Secondary index support
Cassandra and Hadoop in action
Executing, debugging, monitoring, and looking at results
Hadoop in a Cassandra cluster
Cassandra filesystem
Integration with Pig
Installing Pig
Integrating Pig and Cassandra
Integration with other analytical tools
Summary
Index
π SIMILAR VOLUMES
Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide
The book is aimed at intermediate developers with an understanding of core database concepts and want to become a master implementing Cassandra for their application.
Unleash the Power of Distributed Database for Scalable and High-Performance Applications Are you ready to explore the world of distributed databases and unlock the potential of Apache Cassandra? "Mastering Apache Cassandra" is your comprehensive guide to understanding and harnessing the capabilit
With this hands-on guide, you'll learn how Apache Cassandra handles hundreds of terabytes of data while remaining highly available across multiple data centersβcapabilities that have attracted Facebook, Twitter, and other data-intensive companies. Updated for Cassandra 3.0, this second edition provi
Contains descriptions of more than 2,200 materials available to the adhesives industry. The book includes Supplier Addresses and a Trade Name Index. Projected 1995 adhesives sales are $12 billion, with steady growth and expansion into new areas.