<p>Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application
GUIDE TO INTELLIGENT DATA SCIENCE : how to intelligently make use of.
- Publisher
- SPRINGER NATURE
- Year
- 2020
- Tongue
- English
- Leaves
- 427
- Edition
- 2
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Guide to Intelligent Data Science
Preface
Contents
Symbols
1 Introduction
1.1 Motivation
1.1.1 Data and Knowledge
1.1.2 Tycho Brahe and Johannes Kepler
1.1.3 Intelligent Data Science
1.2 The Data Science Process
1.3 Methods, Tasks, and Tools
1.4 How to Read This Book
References
2 Practical Data Science: An Example
2.1 The Setup
2.2 Data Understanding and Pattern Finding
2.3 Explanation Finding
2.4 Predicting the Future
2.5 Concluding Remarks
3 Project Understanding
3.1 Determine the Project Objective
3.2 Assess the Situation
3.3 Determine Analysis Goals
3.4 Further Reading
References
4 Data Understanding
4.1 Attribute Understanding
4.2 Data Quality
4.3 Data Visualization
4.4 Correlation Analysis
4.5 Outlier Detection
4.5.1 Outlier Detection for Single Attributes
4.5.2 Outlier Detection for Multidimensional Data
4.6 Missing Values
4.7 A Checklist for Data Understanding
4.8 Data Understanding in Practice
4.8.1 Visualizing the Iris Data
References
5 Principles of Modeling
5.1 Model Classes
5.2 Fitting Criteria and Score Functions
5.3 Algorithms for Model Fitting
5.3.1 Closed-Form Solutions
5.3.2 Gradient Method
5.4 Types of Errors
5.5 Model Validation
5.5.1 Training and Test Data
5.5.2 Cross-Validation
5.5.3 Bootstrapping
5.6 Model Errors and Validation in Practice
5.6.1 Scoring Models for Classification
5.7 Further Reading
References
6 Data Preparation
6.1 Select Data
6.1.1 Feature Selection
6.2 Clean Data
6.2.1 Improve Data Quality
6.2.2 Missing Values
6.3 Construct Data
6.3.1 Provide Operability
6.4 Complex Data Types
6.5 Data Integration
6.5.1 Vertical Data Integration
6.5.2 Horizontal Data Integration
6.6 Data Preparation in Practice
6.6.1 Removing Empty or Almost Empty Attributes and Records in a Data Set
6.7 Further Reading
References
7 Finding Patterns
7.1 Hierarchical Clustering
7.2 Notion of (Dis-)Similarity
7.3 Prototype- and Model-Based Clustering
7.3.1 Overview
7.4 Density-Based Clustering
7.4.1 Overview
7.5 Self-organizing Maps
7.5.1 Overview
7.6 Frequent Pattern Mining and Association Rules
7.6.1 Overview
7.6.2 Construction
7.7 Deviation Analysis
7.7.1 Overview
7.7.2 Construction
7.8 Finding Patterns in Practice
7.8.1 Hierarchical Clustering
7.9 Further Reading
References
8 Finding Explanations
8.1 Decision Trees
8.1.1 Overview
8.2 Bayes Classifiers
8.2.1 Overview
8.2.2 Construction
8.3 Regression
8.3.1 Overview
8.4 Rule learning
8.4.1 Propositional Rules
8.4.1.1 Extracting Rules from Decision Trees
8.4.1.2 Extracting Propositional Rules
8.5 Finding Explanations in Practice
8.5.1 Decision Trees
8.6 Further Reading
References
9 Finding Predictors
9.1 Nearest-Neighbor Predictors
9.1.1 Overview
9.2 Artificial Neural Networks
9.2.1 Overview
9.3 Deep Learning
9.3.1 Recurrent Neural Networks and Long-Short Term Memory Units
9.4 Support Vector Machines
9.5 Ensemble Methods
9.5.1 Overview
9.5.2 Construction
9.5.3 Variations and Issues
9.5.3.1 Tree Ensembles and Random Forests (Bagging)
9.6 Finding Predictors in Practice
9.6.1 k Nearest Neighbor (kNN)
9.7 Further Reading
References
10 Deployment and Model Management
10.1 Model Deployment
10.1.1 Interactive Applications
10.1.2 Model Scoring as a Service
10.1.3 Model Representation Standards
10.1.4 Frequent Causes for Deployment Failures
10.2 Model Management
10.2.1 Model Updating and Retraining
10.3 Model Deployment and Management in Practice
10.3.1 Deployment to a Dashboard
References
A Statistics
A.1 Terms and Notation
A.2 Descriptive Statistics
A.2.1 Tabular Representations
A.3 Probability Theory
A.3.1 Probability
A.3.1.1 Intuitive Notions of Probability
A.3.1.2 The Formal Definition of Probability
A.3.2 Basic Methods and Theorems
A.3.2.1 Combinatorial Methods
A.3.2.2 Geometric Probabilities
A.3.2.3 Conditional Probability and Independent Events
A.3.2.4 Total Probability and Bayes' Rule
A.3.2.5 Bernoulli's Law of Large Numbers
A.3.3 Random Variables
A.3.3.1 Real-Valued Random Variables
A.3.3.2 Discrete Random Variables
A.3.3.3 Continuous Random Variables
A.3.3.4 Random Vectors
A.4 Inferential Statistics
A.4.1 Random Samples
A.4.2 Parameter Estimation
A.4.2.1 Point Estimation
A.4.2.2 Point Estimation Examples
A.4.2.3 Maximum Likelihood Estimation
A.4.2.4 Maximum Likelihood Estimation Example
A.4.2.5 Maximum A Posteriori Estimation
A.4.2.6 Maximum A Posteriori Estimation Example
A.4.2.7 Interval Estimation
A.4.2.8 Interval Estimation Examples
A.4.3 Hypothesis Testing
A.4.3.1 Error Types and Significance Level
A.4.3.2 Parameter Test
A.4.3.3 Parameter Test Example
A.4.3.4 Power of a Hypothesis Test
A.4.3.5 Goodness-of-Fit Test
A.4.3.6 Goodness-of-Fit Test Example
A.4.3.7 (In)Dependence Test
B KNIME
B.1 Installation and Overview
B.2 Building Workflows
B.3 Example Workflow
References
Index
π SIMILAR VOLUMES
<p><p>Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now β at least in principle - solve any problem we are faced with so long as we o
Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now - at least in principle - solve any problem we are faced with so long as we only ha
What do you really know about your competitors, and potential competitors? What are the real threats your business faces in the next two years? What do your competitors know about you, how did they find out about it and how can you stop them finding out more? In many ways the challenges and risks fa