𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Mathematical Foundations of Big Data Analytics

✍ Scribed by Vladimir Shikhman, David Müller


Publisher
Springer Gabler
Year
2021
Tongue
English
Leaves
274
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


In this textbook, basic mathematical models used in Big Data Analytics are presented and application-oriented references to relevant practical issues are made. Necessary mathematical tools are examined and applied to current problems of data analysis, such as brand loyalty, portfolio selection, credit investigation, quality control, product clustering, asset pricing etc. – mainly in an economic context. In addition, we discuss interdisciplinary applications to biology, linguistics, sociology, electrical engineering, computer science and artificial intelligence. For the models, we make use of a wide range of mathematics – from basic disciplines of numerical linear algebra, statistics and optimization to more specialized game, graph and even complexity theories. By doing so, we cover all relevant techniques commonly used in Big Data Analytics.Each chapter starts with a concrete practical problem whose primary aim is to motivate the study of a particular Big Data Analytics technique. Next, mathematical results follow – including important definitions, auxiliary statements and conclusions arising. Case-studies help to deepen the acquired knowledge by applying it in an interdisciplinary context. Exercises serve to improve understanding of the underlying theory. Complete solutions for exercises can be consulted by the interested reader at the end of the textbook; for some which have to be solved numerically, we provide descriptions of algorithms in Python code as supplementary material.This textbook has been recommended and developed for university courses in Germany, Austria and Switzerland.

✦ Table of Contents


Preface
Subject
Concept
Audience
Acknowledgment
Contents
1 Ranking
1.1 Motivation: Google Problem
1.2 Results
1.2.1 Perron-Frobenius Theorem
1.2.1.1 Eigenvector Problem
1.2.1.2 Feasibility System
1.2.1.3 Linear Duality
1.2.1.4 Existence
1.2.2 PageRank
1.2.2.1 Web Surfing
1.2.2.2 Oscillation
1.2.2.3 Regularization
1.2.2.4 Convergence Analysis of (IΞ±)
1.2.2.5 Google Ranking
1.2.2.6 Global Authority
1.3 Case Study: Brand Loyalty
1.4 Exercises
2 Online Learning
2.1 Motivation: Portfolio Selection
2.2 Results
2.2.1 Online Mirror Descent
2.2.1.1 Norm
2.2.1.2 Convexity
2.2.1.3 Prox-Function
2.2.1.4 Bregman Divergence
2.2.1.5 Online Learning Technique
2.2.1.6 Convergence Analysis of (OMD)
2.2.2 Entropic Setup
2.2.2.1 Manhattan and Maximum Norm
2.2.2.2 Loss Functions
2.2.2.3 Negative Entropy
2.2.2.4 Kullback-Leibler Divergence
2.2.2.5 Online Portfolio Selection
2.3 Case Study: Expert Advice
2.4 Exercises
3 Recommendation Systems
3.1 Motivation: Netflix Prize
3.2 Results
3.2.1 Neighborhood-Based Approach
3.2.1.1 Similarity Measures
3.2.1.2 k-Nearest Neighbors
3.2.2 Model-Based Approach
3.2.2.1 Singular Value Decomposition
3.2.2.2 Left- and Right-Singular Vectors
3.2.2.3 Reduction
3.2.2.4 Rank and Positive Singular Values
3.2.2.5 Low-Rank Approximation
3.2.2.6 Frobenius Norm
3.2.2.7 Eckart-Young-Mirsky Theorem
3.2.2.8 Matrix Factorization
3.2.2.9 Gradient Descent
3.3 Case Study: Latent Semantic Analysis
3.4 Exercises
4 Classification
4.1 Motivation: Credit Investigation
4.2 Results
4.2.1 Fisher's Discriminant Rule
4.2.1.1 Sample Mean
4.2.1.2 Sample Variance
4.2.1.3 Fisher's Discriminant
4.2.1.4 Linear Classifier
4.2.1.5 Maximum Likelihood Classifier
4.2.2 Support-Vector Machine
4.2.2.1 Separating Hyperplane
4.2.2.2 Maximum Margin
4.2.2.3 Dual Problem
4.2.2.4 Supporting Vectors
4.2.2.5 Regularization
4.2.2.6 Kernel Trick
4.2.2.7 Quadratic Kernel
4.3 Case Study: Quality Control
4.4 Exercises
5 Clustering
5.1 Motivation: DNA Sequencing
5.2 Results
5.2.1 k-Means
5.2.1.1 Total Dissimilarity
5.2.1.2 NaΓ―ve k-Means
5.2.1.3 Euclidean Setup
5.2.2 Spectral Clustering
5.2.2.1 Community Detection
5.2.2.2 Diffusion of Information
5.2.2.3 Spectral Decomposition
5.2.2.4 Diffusion Map
5.2.2.5 Dimensionality Reduction
5.3 Case Study: Topic Extraction
5.4 Exercises
6 Linear Regression
6.1 Motivation: Econometric Analysis
6.2 Results
6.2.1 Ordinary Least Squares
6.2.1.1 Maximum Likelihood Estimation
6.2.1.2 Normal Equation
6.2.1.3 Pseudoinverse
6.2.1.4 OLS Estimator
6.2.1.5 Gauss-Markov Theorem
6.2.1.6 Multicollinearity
6.2.1.7 Stability
6.2.2 Ridge Regression
6.2.2.1 Maximum A Posteriori Estimation
6.2.2.2 Ridge Estimator
6.2.2.3 Condition Number
6.2.2.4 Bias-Variance Tradeoff
6.3 Case Study: Capital Asset Pricing
6.4 Exercises
7 Sparse Recovery
7.1 Motivation: Variable Selection
7.2 Results
7.2.1 Lasso Regression
7.2.1.1 Spark
7.2.1.2 Basis Pursuit
7.2.1.3 Null Space Property
7.2.1.4 Maximum A Posteriori Estimation
7.2.2 Iterative Shrinkage-Thresholding Algorithm
7.2.2.1 Quadratic Overestimation
7.2.2.2 Soft-Thresholding
7.2.2.3 Proximal Gradient Descent
7.3 Case Study: Compressed Sensing
7.4 Exercises
8 Neural Networks
8.1 Motivation: Nerve Cells
8.2 Results
8.2.1 Logistic Regression
8.2.1.1 Logistic Model
8.2.1.2 Maximum Likelihood Estimation
8.2.1.3 Average Cross-Entropy
8.2.1.4 Stochastic Gradient Descent
8.2.1.5 Convergence Analysis of (SGD)
8.2.2 Perceptron
8.2.2.1 Rosenblatt Learning
8.2.2.2 Convergence Analysis of (RL)
8.2.2.3 XOR Problem
8.2.2.4 Multilayer Perceptron
8.3 Case Study: Spam Filtering
8.4 Exercises
9 Decision Trees
9.1 Motivation: Titanic Survival
9.2 Results
9.2.1 NP-Completeness
9.2.1.1 Decision Tree Problem
9.2.1.2 P versus NP
9.2.1.3 Exact Cover Problem
9.2.1.4 Polynomial-Time Reduction
9.2.1.5 Minimal External Path Length
9.2.1.6 Optimal Decision Tree
9.2.2 Top-Down and Bottom-Up Heuristics
9.2.2.1 Binary Classification
9.2.2.2 Generalization Error
9.2.2.3 Splitting
9.2.2.4 Iterative Dichotomizer
9.2.2.5 Pruning
9.3 Case Study: Chess Engine
9.4 Exercises
10 Solutions
10.1 Ranking
10.2 Online Learning
10.3 Recommendation Systems
10.4 Classification
10.5 Clustering
10.6 Linear Regression
10.7 Sparsity
10.8 Neural Networks
10.9 Decision Trees
Bibliography
Index


πŸ“œ SIMILAR VOLUMES


Mathematical Foundations of Big Data Ana
✍ Vladimir Shikhman πŸ“‚ Library πŸ“… 2021 πŸ› Springer Gabler 🌐 English

<span>In this textbook, basic mathematical models used in Big Data Analytics are presented and application-oriented references to relevant practical issues are made. Necessary mathematical tools are examined and applied to current problems of data analysis, such as brand loyalty, portfolio selection

Data Science Foundations: Geometry and T
✍ Fionn Murtagh πŸ“‚ Library πŸ“… 2017 πŸ› Chapman and Hall/CRC 🌐 English

<P>"<I>Data Science Foundations</I> is most welcome and, indeed, a piece of literature that the field is very much in need of…quite different from most data analytics texts which largely ignore foundational concepts and simply present a cookbook of methods…a very useful text and I would certainly us