Advanced Data Management: For SQL, NoSQL, Cloud and Distributed Databases
โ Scribed by Lena Wiese
- Publisher
- De Gruyter
- Year
- 2015
- Tongue
- English
- Leaves
- 374
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions.
This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models.
Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures.
While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.
- Complete coverage of the theoretical background of modern data management
- Analysis of alternative data models and distributed storage mechanisms
- Overview of non-SQL query languages.
โฆ Table of Contents
Contents
Preface
Overview
List of Figures
List of Tables
Part I: Introduction
1 Background
1.1 Database Properties
1.2 Database Components
1.3 Database Design
1.3.1 Entity-Relationship Model
1.3.2 Unified Modeling Language
1.4 Bibliographic Notes
2 Relational Database Management Systems
2.1 Relational Data Model
2.1.1 Database and Relation Schemas
2.1.2 Mapping ER Models to Schemas
2.2 Normalization
2.3 Referential Integrity
2.4 Relational Query Languages
2.5 Concurrency Management
2.5.1 Transactions
2.5.2 Concurrency Control
2.6 Bibliographic Notes
Part II: NOSQL And Non-Relational Databases
3 New Requirements, ``Not only SQL'' and the Cloud
3.1 Weaknesses of the Relational Data Model
3.1.1 Inadequate Representation of Data
3.1.2 Semantic Overloading
3.1.3 Weak Support for Recursion
3.1.4 Homogeneity
3.2 Weaknesses of RDBMSs
3.3 New Data Management Challenges
3.4 Bibliographic Notes
4 Graph Databases
4.1 Graphs and Graph Structures
4.1.1 A Glimpse on Graph Theory
4.1.2 Graph Traversal and Graph Problems
4.2 Graph Data Structures
4.2.1 Edge List
4.2.2 Adjacency Matrix
4.2.3 Incidence Matrix
4.2.4 Adjacency List
4.2.5 Incidence List
4.3 The Property Graph Model
4.4 Storing Property Graphs in Relational Tables
4.5 Advanced Graph Models
4.6 Implementations and Systems
4.6.1 Apache TinkerPop
4.6.2 Neo4J
4.6.3 HyperGraphDB
4.7 Bibliographic Notes
5 XML Databases
5.1 XML Background
5.1.1 XML Documents
5.1.2 Document Type Definition (DTD)
5.1.3 XML Schema Definition (XSD)
5.1.4 XML Parsers
5.1.5 Tree Model of XML Documents
5.1.6 Numbering Schemes
5.2 XML Query Languages
5.2.1 XPath
5.2.2 XQuery
5.2.3 XSLT
5.3 Storing XML in Relational Databases
5.3.1 SQL/XML
5.3.2 Schema-Based Mapping
5.3.3 Schemaless Mapping
5.4 Native XML Storage
5.4.1 XML Indexes
5.4.2 Storage Management
5.4.3 XML Concurrency Control
5.5 Implementations and Systems
5.5.1 eXistDB
5.5.2 BaseX
5.6 Bibliographic Notes
6 Key-value Stores and Document Databases
6.1 Key-Value Storage
6.1.1 Map-Reduce
6.2 Document Databases
6.2.1 Java Script Object Notation
6.2.2 JSON Schema
6.2.3 Representational State Transfer
6.3 Implementations and Systems
6.3.1 Apache Hadoop MapReduce
6.3.2 Apache Pig
6.3.3 Apache Hive
6.3.4 Apache Sqoop
6.3.5 Riak
6.3.6 Redis
6.3.7 MongoDB
6.3.8 CouchDB
6.3.9 Couchbase
6.4 Bibliographic Notes
7 Column Stores
7.1 Column-Wise Storage
7.1.1 Column Compression
7.1.2 Null Suppression
7.2 Column striping
7.3 Implementations and Systems
7.3.1 MonetDB
7.3.2 Apache Parquet
7.4 Bibliographic Notes
8 Extensible Record Stores
8.1 Logical Data Model
8.2 Physical storage
8.2.1 Memtables and immutable sorted data files
8.2.2 File format
8.2.3 Redo logging
8.2.4 Compaction
8.2.5 Bloom filters
8.3 Implementations and Systems
8.3.1 Apache Cassandra
8.3.2 Apache HBase
8.3.3 Hypertable
8.3.4 Apache Accumulo
8.4 Bibliographic Notes
9 Object Databases
9.1 Object Orientation
9.1.1 Object Identifiers
9.1.2 Normalization for Objects
9.1.3 Referential Integrity for Objects
9.1.4 Object-Oriented Standards and Persistence Patterns
9.2 Object-Relational Mapping
9.2.1 Mapping Collection Attributes to Relations
9.2.2 Mapping Reference Attributes to Relations
9.2.3 Mapping Class Hierarchies to Relations
9.2.4 Two-Level Storage
9.3 Object Mapping APIs
9.3.1 Java Persistence API (JPA)
9.3.2 Apache Java Data Objects (JDO)
9.4 Object-Relational Databases
9.5 Object Databases
9.5.1 Object Persistence
9.5.2 Single-Level Storage
9.5.3 Reference Management
9.5.4 Pointer Swizzling
9.6 Implementations and Systems
9.6.1 DataNucleus
9.6.2 ZooDB
9.7 Bibliographic Notes
Part III: Distributed Data Management
10 Distributed Database Systems
10.1 Scaling horizontally
10.2 Distribution Transparency
10.3 Failures in Distributed Systems
10.4 Epidemic Protocols and Gossip Communication
10.4.1 Hash Trees
10.4.2 Death Certificates
10.5 Bibliographic Notes
11 Data Fragmentation
11.1 Properties and Types of Fragmentation
11.2 Fragmentation Approaches
11.2.1 Fragmentation for Relational Tables
11.2.2 XML Fragmentation
11.2.3 Graph Partitioning
11.2.4 Sharding for Key-Based Stores
11.2.5 Object Fragmentation
11.3 Data Allocation
11.3.1 Cost-based allocation
11.3.2 Consistent Hashing
11.4 Bibliographic Notes
12 Replication And Synchronization
12.1 Replication Models
12.1.1 Master-Slave Replication
12.1.2 Multi-Master Replication
12.1.3 Replication Factor and the Data Replication Problem
12.1.4 Hinted Handoff and Read Repair
12.2 Distributed Concurrency Control
12.2.1 Two-Phase Commit
12.2.2 Paxos Algorithm
12.2.3 Multiversion Concurrency Control
12.3 Ordering of Events and Vector Clocks
12.3.1 Scalar Clocks
12.3.2 Concurrency and Clock Properties
12.3.3 Vector Clocks
12.3.4 Version Vectors
12.3.5 Optimizations of Vector Clocks
12.4 Bibliographic Notes
13 Consistency
13.1 Strong Consistency
13.1.1 Write and Read Quorums
13.1.2 Snapshot Isolation
13.2 Weak Consistency
13.2.1 Data-Centric Consistency Models
13.2.2 Client-Centric Consistency Models
13.3 Consistency Trade-offs
13.4 Bibliographic Notes
Part IV: Conclusion
14 Further Database Technologies
14.1 Linked Data and RDF Data Management
14.2 Data Stream Management
14.3 Array Databases
14.4 Geographic Information Systems
14.5 In-Memory Databases
14.6 NewSQL Databases
14.7 Bibliographic Notes
15 Concluding Remarks
15.1 Database Reengineering
15.2 Database Requirements
15.3 Polyglot Database Architectures
15.3.1 Polyglot Persistence
15.3.2 Lambda Architecture
15.3.3 Multi-Model Databases
15.4 Implementations and Systems
15.4.1 Apache Drill
15.4.2 Apache Druid
15.4.3 OrientDB
15.4.4 ArangoDB
15.5 Bibliographic Notes
Bibliography
Index
๐ SIMILAR VOLUMES
<p>Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions.<br>This book provides a comprehensive coverage of the
This textbook offers a comprehensive introduction to relational (SQL) and non-relational (NoSQL) databases. The authors thoroughly review the current state of database tools and techniques and examine upcoming innovations. In the first five chapters, the authors analyze in detail the management,
<span>This textbook offers a comprehensive introduction to relational (SQL) and non-relational (NoSQL) databases. The authors thoroughly review the current state of database tools and techniques and examine upcoming innovations.<br>In the first five chapters, the authors analyze in detail the manage
<span>This textbook offers a comprehensive introduction to relational (SQL) and non-relational (NoSQL) databases. The authors thoroughly review the current state of database tools and techniques and examine upcoming innovations.<br>In the first five chapters, the authors analyze in detail the manage
This book offers a comprehensive introduction to relational (SQL) and non-relational (NoSQL) databases. The authors thoroughly review the current state of database tools and techniques, and examine coming innovations. The book opens with a broad look at data management, including an overview of i