Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There rem
Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault
β Scribed by W.H. Inmon, Dan Linstedt
- Publisher
- Morgan Kaufmann
- Year
- 2014
- Tongue
- English
- Leaves
- 342
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it canβt be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.
Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. Youβll be able to:
- Turn textual information into a form that can be analyzed by standard tools.
- Make the connection between analytics and Big Data
- Understand how Big Data fits within an existing systems environment
- Conduct analytics on repetitive and non-repetitive data
- Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it
- Shows how to turn textual information into a form that can be analyzed by standard tools.
- Explains how Big Data fits within an existing systems environment
- Presents new opportunities that are afforded by the advent of Big Data
- Demystifies the murky waters of repetitive and non-repetitive data in Big Data
β¦ Table of Contents
Content:
Front matter, Page iii
Copyright, Page iv
Dedication, Page v
Preface, Pages xvii-xix
About the Authors, Page xxi
1.1 - Corporate Data, Pages 1-7
1.2 - The Data Infrastructure, Pages 9-14
1.3 - The βGreat Divideβ, Pages 15-20
1.4 - Demographics of Corporate Data, Pages 21-25
1.5 - Corporate Data Analysis, Pages 27-31
1.6 - The Life Cycle of Data β Understanding Data Over Time, Pages 33-37
1.7 - A Brief History of Data, Pages 39-44
2.1 - A Brief History of Big Data, Pages 45-48
2.2 - What is Big Data?, Pages 49-55
2.3 - Parallel Processing, Pages 57-62
2.4 - Unstructured Data, Pages 63-70
2.5 - Contextualizing Repetitive Unstructured Data, Pages 71-72
2.6 - Textual Disambiguation, Pages 73-81
2.7 - Taxonomies, Pages 83-90
3.1 - A Brief History of Data Warehouse, Pages 91-100
3.2 - Integrated Corporate Data, Pages 101-109
3.3 - Historical Data, Pages 111-113
3.4 - Data Marts, Pages 115-119
3.5 - The Operational Data Store, Pages 121-126
3.6 - What a Data Warehouse is Not, Pages 127-132
4.1 - Introduction to Data Vault, Pages 133-137
4.2 - Introduction to Data Vault Modeling, Pages 139-147
4.3 - Introduction to Data Vault Architecture, Pages 149-153
4.4 - Introduction to Data Vault Methodology, Pages 155-162
4.5 - Introduction to Data Vault Implementation, Pages 163-168
5.1 - The Operational Environment β A Short History, Pages 169-175
5.2 - The Standard Work Unit, Pages 177-180
5.3 - Data Modeling for the Structured Environment, Pages 181-188
5.4 - Metadata, Pages 189-194
5.5 - Data Governance of Structured Data, Pages 195-198
6.1 - A Brief History of Data Architecture, Pages 199-209
6.2 - Big Data/Existing Systems Interface, Pages 211-218
6.3 - The Data Warehouse/Operational Environment Interface, Pages 219-224
6.4 - Data Architecture β A High-Level Perspective, Pages 225-229
7.1 - Repetitive Analytics β Some Basics, Pages 231-248
7.2 - Analyzing Repetitive Data, Pages 249-257
7.3 - Repetitive Analysis, Pages 259-266
8.1 - Nonrepetitive Data, Pages 267-285
8.2 - Mapping, Pages 287-289
8.3 - Analytics from Nonrepetitive Data, Pages 291-304
9.1 - Operational Analytics, Pages 305-312
10.1 - Operational Analytics, Pages 313-321
11.1 - Personal Analytics, Pages 323-328
12.1 - A Composite Data Architecture, Pages 329-333
Glossary, Pages 335-344
Index, Pages 345-355
π SIMILAR VOLUMES
<p>Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to h
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of each architecture to help
<p>What constitutes a data practice and how do contemporary digital media technologies reconfigure our understanding of practices in general? Autonomously acting media, distributed digital infrastructures, and sensor-based media environments challenge the conditions of accounting for data practices