𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Principles of distributed database systems

✍ Scribed by Ozsu M.T., Valduriez P


Publisher
Springer
Year
2020
Tongue
English
Leaves
681
Edition
4
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Table of Contents


Preface......Page 6
Contents......Page 10
1.1 What Is a Distributed Database System?......Page 17
1.2 History of Distributed DBMS......Page 19
1.3 Data Delivery Alternatives......Page 21
1.4.1 Transparent Management of Distributed and Replicated Data......Page 23
1.4.2 Reliability Through Distributed Transactions......Page 26
1.4.3 Improved Performance......Page 27
1.5.1 Distributed Database Design......Page 29
1.5.4 Distributed Concurrency Control......Page 30
1.5.6 Replication......Page 31
1.5.10 Big Data Processing and NoSQL......Page 32
1.6.1 Architectural Models for Distributed DBMSs......Page 33
1.6.1.1 Autonomy......Page 34
1.6.1.3 Heterogeneity......Page 35
1.6.2 Client/Server Systems......Page 36
1.6.3 Peer-to-Peer Systems......Page 38
1.6.4 Multidatabase Systems......Page 41
1.6.5 Cloud Computing......Page 43
1.7 Bibliographic Notes......Page 47
2 Distributed and Parallel Database Design......Page 48
2.1 Data Fragmentation......Page 50
2.1.1.1 Auxiliary Information Requirements......Page 52
2.1.1.2 Primary Horizontal Fragmentation......Page 55
2.1.1.3 Derived Horizontal Fragmentation......Page 62
2.1.1.4 Checking for Correctness......Page 66
2.1.2 Vertical Fragmentation......Page 67
2.1.2.1 Auxiliary Information Requirements......Page 68
2.1.2.2 Clustering Algorithm......Page 71
2.1.2.3 Splitting Algorithm......Page 77
2.1.3 Hybrid Fragmentation......Page 80
2.2 Allocation......Page 81
2.2.1 Auxiliary Information......Page 83
2.2.2.1 Total Cost......Page 84
2.2.2.2 Constraints......Page 86
2.3 Combined Approaches......Page 87
2.3.1 Workload-Agnostic Partitioning Techniques......Page 88
2.3.2 Workload-Aware Partitioning Techniques......Page 89
2.4 Adaptive Approaches......Page 93
2.4.2 Detecting Affected Items......Page 94
2.4.3 Incremental Reconfiguration......Page 95
2.5 Data Directory......Page 97
2.6 Conclusion......Page 98
2.7 Bibliographic Notes......Page 99
Exercises......Page 101
3 Distributed Data Control......Page 105
3.1.1 Views in Centralized DBMSs......Page 106
3.1.2 Views in Distributed DBMSs......Page 109
3.1.3 Maintenance of Materialized Views......Page 110
3.2 Access Control......Page 116
3.2.1 Discretionary Access Control......Page 117
3.2.2 Mandatory Access Control......Page 120
3.2.3 Distributed Access Control......Page 122
3.3 Semantic Integrity Control......Page 124
3.3.1.1 Specification of Integrity Constraints......Page 125
3.3.1.2 Integrity Enforcement......Page 128
3.3.2 Distributed Semantic Integrity Control......Page 130
3.3.2.1 Definition of Distributed Integrity Constraints......Page 131
3.3.2.2 Enforcement of Distributed Integrity Constraints......Page 133
3.3.2.3 Summary of Distributed Integrity Control......Page 136
3.5 Bibliographic Notes......Page 137
Exercises......Page 139
4 Distributed Query Processing......Page 142
4.1.1 Query Processing Problem......Page 143
4.1.2.1 Search Space......Page 146
4.1.2.2 Cost Model......Page 147
4.1.2.3 Search Strategy......Page 148
4.1.3 Layers Of Query Processing......Page 149
4.1.3.1 Query Decomposition......Page 150
4.1.3.2 Data Localization......Page 151
4.1.3.4 Distributed Execution......Page 152
4.2 Data Localization......Page 153
4.2.1.1 Reduction with Selection......Page 154
4.2.2 Reduction with Join......Page 155
4.2.3 Reduction for Vertical Fragmentation......Page 156
4.2.4 Reduction for Derived Fragmentation......Page 158
4.2.5 Reduction for Hybrid Fragmentation......Page 161
4.3.1 Join Trees......Page 162
4.3.2 Join Ordering......Page 164
4.3.3 Semijoin-Based Algorithms......Page 166
4.3.4 Join Versus Semijoin......Page 169
4.4.1 Cost Functions......Page 170
4.4.2 Database Statistics......Page 172
4.5.1 Dynamic Approach......Page 174
4.5.2 Static Approach......Page 178
4.5.3 Hybrid Approach......Page 182
4.6 Adaptive Query Processing......Page 186
4.6.1 Adaptive Query Processing Process......Page 187
4.6.1.2 Adaptive Reactions......Page 188
4.6.2 Eddy Approach......Page 189
4.7 Conclusion......Page 190
4.8 Bibliographic Notes......Page 191
Exercises......Page 193
5 Distributed Transaction Processing......Page 196
5.1 Background and Terminology......Page 197
5.2 Distributed Concurrency Control......Page 201
5.2.1 Locking-Based Algorithms......Page 202
5.2.1.1 Centralized 2PL......Page 203
5.2.1.2 Distributed 2PL......Page 206
5.2.1.3 Distributed Deadlock Management......Page 207
5.2.2 Timestamp-Based Algorithms......Page 210
5.2.2.1 Basic TO Algorithm......Page 211
5.2.2.2 Conservative TO Algorithm......Page 214
5.2.3 Multiversion Concurrency Control......Page 216
5.2.4 Optimistic Algorithms......Page 218
5.3 Distributed Concurrency Control Using Snapshot Isolation......Page 219
5.4 Distributed DBMS Reliability......Page 222
5.4.1 Two-Phase Commit Protocol......Page 224
5.4.2 Variations of 2PC......Page 230
5.4.2.1 Presumed Abort 2PC Protocol......Page 231
5.4.2.2 Presumed Commit 2PC Protocol......Page 232
5.4.3.1 Termination and Recovery Protocols for 2PC......Page 233
5.4.3.2 Three-Phase Commit Protocol......Page 239
5.4.4 Network Partitioning......Page 240
5.4.4.2 Voting-Based Protocols......Page 243
5.4.5 Paxos Consensus Protocol......Page 244
5.4.6 Architectural Considerations......Page 247
5.5 Modern Approaches to Scaling Out Transaction Management......Page 249
5.5.2 LeanXcale......Page 250
5.6 Conclusion......Page 252
5.7 Bibliographic Notes......Page 254
Exercises......Page 257
6 Data Replication......Page 260
6.1.1 Mutual Consistency......Page 262
6.1.2 Mutual Consistency Versus Transaction Consistency......Page 264
6.2 Update Management Strategies......Page 265
6.2.1 Eager Update Propagation......Page 266
6.2.3 Centralized Techniques......Page 267
6.3 Replication Protocols......Page 268
6.3.1.1 Single Master with Limited Replication Transparency......Page 269
6.3.1.2 Single Master with Full Replication Transparency......Page 271
6.3.1.3 Primary Copy with Full Replication Transparency......Page 273
6.3.3 Lazy Centralized Protocols......Page 275
6.3.3.1 Single Master with Limited Transparency......Page 276
6.3.3.2 Single Master or Primary Copy with Full Replication Transparency......Page 278
6.3.4 Lazy Distributed Protocols......Page 281
6.4 Group Communication......Page 282
6.5 Replication and Failures......Page 285
6.5.2 Failures and Eager Replication......Page 286
6.6 Conclusion......Page 289
6.7 Bibliographic Notes......Page 290
Exercises......Page 292
7 Database Integrationβ€”Multidatabase Systems......Page 294
7.1 Database Integration......Page 295
7.1.1 Bottom-Up Design Methodology......Page 296
7.1.2 Schema Matching......Page 300
7.1.2.1 Schema Heterogeneity......Page 302
7.1.2.2 Linguistic Matching Approaches......Page 304
7.1.2.3 Constraint-Based Matching Approaches......Page 306
7.1.2.4 Learning-Based Matching......Page 307
7.1.2.5 Combined Matching Approaches......Page 308
7.1.3 Schema Integration......Page 309
7.1.4.1 Mapping Creation......Page 311
7.1.4.2 Mapping Maintenance......Page 317
7.1.5 Data Cleaning......Page 319
7.2 Multidatabase Query Processing......Page 320
7.2.1 Issues in Multidatabase Query Processing......Page 321
7.2.2 Multidatabase Query Processing Architecture......Page 322
7.2.3 Query Rewriting Using Views......Page 324
7.2.3.2 Rewriting in GAV......Page 325
7.2.3.3 Rewriting in LAV......Page 327
7.2.4.1 Heterogeneous Cost Modeling......Page 330
7.2.4.2 Heterogeneous Query Optimization......Page 337
7.2.5 Query Translation and Execution......Page 342
7.3 Conclusion......Page 345
7.4 Bibliographic Notes......Page 347
Exercises......Page 350
8 Parallel Database Systems......Page 361
8.1 Objectives......Page 362
8.2 Parallel Architectures......Page 364
8.2.1 General Architecture......Page 365
8.2.2.1 Uniform Memory Access (UMA)......Page 367
8.2.2.2 Non-Uniform Memory Access (NUMA)......Page 368
8.2.3 Shared-Disk......Page 369
8.2.4 Shared-Nothing......Page 370
8.3 Data Placement......Page 371
8.4.1 Parallel Algorithms for Data Processing......Page 374
8.4.1.1 Parallel Sort Algorithms......Page 375
8.4.1.2 Parallel Join Algorithms......Page 376
8.4.2.1 Search Space......Page 381
8.4.2.2 Cost Model......Page 384
8.4.2.3 Search Strategy......Page 385
Initialization......Page 386
Skew......Page 387
8.5.2 Intraoperator Load Balancing......Page 388
8.5.4 Intraquery Load Balancing......Page 390
8.6 Fault-Tolerance......Page 395
8.7 Database Clusters......Page 396
8.7.1 Database Cluster Architecture......Page 397
8.7.3 Load Balancing......Page 398
8.7.4 Query Processing......Page 399
8.9 Bibliographic Notes......Page 402
Exercises......Page 404
9 Peer-to-Peer Data Management......Page 407
9.1 Infrastructure......Page 410
9.1.1 Unstructured P2P Networks......Page 411
9.1.2 Structured P2P Networks......Page 414
9.1.3 Superpeer P2P Networks......Page 418
9.2.1 Pairwise Schema Mapping......Page 420
9.2.2 Mapping Based on Machine Learning Techniques......Page 421
9.2.3 Common Agreement Mapping......Page 422
9.3 Querying Over P2P Systems......Page 423
9.3.1.1 Basic Techniques......Page 424
9.3.1.2 Top-k Queries in Unstructured Systems......Page 431
9.3.1.3 Top-k Queries in DHTs......Page 433
9.3.1.4 Top-k Queries in Superpeer Systems......Page 435
9.3.2 Join Queries......Page 436
9.3.3 Range Queries......Page 437
9.4 Replica Consistency......Page 440
9.4.1 Basic Support in DHTs......Page 441
9.4.2 Data Currency in DHTs......Page 443
9.4.3.1 OceanStore......Page 444
9.4.3.3 APPA......Page 446
9.5 Blockchain......Page 448
9.5.1 Blockchain Definition......Page 449
9.5.2 Blockchain Infrastructure......Page 450
9.5.2.2 Grouping Transactions into Blocks......Page 451
9.5.2.3 Block Validation by Consensus......Page 453
9.5.3 Blockchain 2.0......Page 454
9.5.4 Issues......Page 455
9.6 Conclusion......Page 456
9.7 Bibliographic Notes......Page 457
Exercises......Page 459
10 Big Data Processing......Page 461
10.1 Distributed Storage Systems......Page 463
10.1.1 Google File System......Page 465
10.1.2 Combining Object Storage and File Storage......Page 466
10.2 Big Data Processing Frameworks......Page 467
10.2.1 MapReduce Data Processing......Page 468
10.2.1.1 MapReduce Architecture......Page 470
10.2.1.2 High-Level Languages for MapReduce......Page 472
10.2.1.3 MapReduce Implementation of Database Operators......Page 473
10.2.2 Data Processing Using Spark......Page 478
10.3 Stream Data Management......Page 482
10.3.1 Stream Models, Languages, and Operators......Page 484
10.3.1.1 Data Models......Page 485
10.3.1.3 Streaming Operators and Their Implementation......Page 486
10.3.2 Query Processing over Data Streams......Page 488
10.3.2.1 Windowed Query Execution......Page 489
10.3.2.2 Load Management......Page 490
10.3.2.3 Out-of-Order Processing......Page 491
10.3.2.4 Multiquery Optimization......Page 492
10.3.2.5 Parallel Data Stream Processing......Page 493
10.3.3 DSS Fault-Tolerance......Page 495
10.4 Graph Analytics Platforms......Page 498
10.4.1 Graph Partitioning......Page 501
10.4.2 MapReduce and Graph Analytics......Page 506
10.4.3 Special-Purpose Graph Analytics Systems......Page 507
10.4.4 Vertex-Centric Block Synchronous......Page 510
10.4.5 Vertex-Centric Asynchronous......Page 513
10.4.6 Vertex-Centric Gather-Apply-Scatter......Page 515
10.4.7 Partition-Centric Block Synchronous Processing......Page 516
10.4.9 Partition-Centric Gather-Apply-Scatter......Page 518
10.4.12 Edge-Centric Gather-Apply-Scatter......Page 519
10.5.1 Data Lake Versus Data Warehouse......Page 520
10.5.2 Architecture......Page 522
10.5.3 Challenges......Page 523
10.7 Bibliographic Notes......Page 524
Exercises......Page 528
11 NoSQL, NewSQL, and Polystores......Page 531
11.1 Motivations for NoSQL......Page 532
11.2 Key-Value Stores......Page 533
11.2.1 DynamoDB......Page 534
11.2.2 Other Key-Value Stores......Page 536
11.3.1 MongoDB......Page 537
11.3.2 Other Document Stores......Page 540
11.4.1 Bigtable......Page 541
11.5 Graph DBMSs......Page 543
11.5.1 Neo4j......Page 544
11.6 Hybrid Data Stores......Page 547
11.6.1 Multimodel NoSQL Stores......Page 548
11.6.2.1 F1......Page 549
11.6.2.2 LeanXcale......Page 550
11.7.1 Loosely Coupled Polystores......Page 552
11.7.1.1 BigIntegrator......Page 553
11.7.1.2 Forward......Page 555
11.7.2 Tightly Coupled Polystores......Page 556
11.7.2.1 Polybase......Page 558
11.7.2.2 HadoopDB......Page 559
11.7.2.3 Estocada......Page 560
11.7.3 Hybrid Systems......Page 561
11.7.3.1 Spark SQL......Page 562
11.7.3.2 CloudMdsQL......Page 563
11.7.4 Concluding Remarks......Page 565
11.8 Conclusion......Page 566
11.9 Bibliographic Notes......Page 567
Exercises......Page 568
12 Web Data Management......Page 570
12.1 Web Graph Management......Page 571
12.2 Web Search......Page 573
12.2.1 Web Crawling......Page 574
12.2.2.2 Text Index......Page 577
12.2.3 Ranking and Link Analysis......Page 578
12.2.4 Evaluation of Keyword Search......Page 579
12.3 Web Querying......Page 580
12.3.1 Semistructured Data Approach......Page 581
12.3.2 Web Query Language Approach......Page 585
12.4 Question Answering Systems......Page 591
12.5 Searching and Querying the Hidden Web......Page 595
12.5.1.1 Querying the Search Interface......Page 596
12.5.2 Metasearching......Page 597
12.5.2.2 Database Categorization......Page 598
12.6 Web Data Integration......Page 599
12.6.1 Web Tables/Fusion Tables......Page 600
12.6.2 Semantic Web and Linked Open Data......Page 601
12.6.2.1 XML......Page 602
12.6.2.2 RDF......Page 606
12.6.2.3 Navigating and Querying the LOD......Page 618
12.6.3 Data Quality Issues in Web Data Integration......Page 619
12.6.3.1 Cleaning Structured Web Data......Page 620
12.6.3.2 Web Data Fusion......Page 621
12.6.3.3 Web Source Quality......Page 622
12.7 Bibliographic Notes......Page 626
Exercises......Page 629
A Overview of Relational DBMS......Page 630
B Centralized Query Processing......Page 631
C Transaction Processing Fundamentals......Page 632
D Review of Computer Networks......Page 633
References......Page 634
Index......Page 670


πŸ“œ SIMILAR VOLUMES


Principles of Distributed Database Syste
✍ M. Tamer Ozsu πŸ“‚ Library πŸ“… 1990 πŸ› Prentice Hall 🌐 English

<P><B></B> Provides a comprehensive treatment of technical problems of distributed database systems from a holistic viewpoint. <B>KEYTOPICS:</B> Explores the development of distributed database management systemsβ€”focusing on concepts and technical issues. </P>

Principles Of Distributed Database Syste
✍ M. Tamer Γ–zsu, Patrick Valduriez πŸ“‚ Library πŸ“… 2020 πŸ› Springer 🌐 English

The fourth edition of this classic textbook provides major updates. This edition has completely new chapters on Big Data Platforms (distributed storage systems, MapReduce, Spark, data stream processing, graph analytics) and on NoSQL, NewSQL and polystore systems. It also includes an updated web data

Principles of Distributed Database Syste
✍ M. Tamer Γ–zsu, Patrick Valduriez (auth.) πŸ“‚ Library πŸ“… 2011 πŸ› Springer-Verlag New York 🌐 English

<p>This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud comp

Principles of Distributed Database Syste
✍ M. Tamer Γ–zsu, Patrick Valduriez (auth.) πŸ“‚ Library πŸ“… 2011 πŸ› Springer-Verlag New York 🌐 English

<p>This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud comp

Principles of database systems
✍ Jeffrey D Ullman πŸ“‚ Library πŸ“… 1980 πŸ› Computer Science Press 🌐 English

The comprehensive treatment of the theory and general principles relevant to database systems, as well as the many examples, exercises, and bibliographic notes included in each chapter, should make this book useful for database courses and for practising computer scientists.