Mathematical Foundation for Data Analysis
- Tongue
- English
- Leaves
- 299
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Table of Contents
Preface
Acknowledgements
Contents
1 Probability Review
1.1 Sample Spaces
1.2 Conditional Probability and Independence
1.3 Density Functions
1.4 Expected Value
1.5 Variance
1.6 Joint, Marginal, and Conditional Distributions
1.7 Bayes' Rule
1.7.1 Model Given Data
1.8 Bayesian Inference
Exercises
2 Convergence and Sampling
2.1 Sampling and Estimation
2.2 Probably Approximately Correct (PAC)
2.3 Concentration of Measure
2.3.1 Markov Inequality
2.3.2 Chebyshev Inequality
2.3.3 Chernoff-Hoeffding Inequality
2.3.4 Union Bound and Examples
2.4 Importance Sampling
2.4.1 Sampling Without Replacement with Priority Sampling
Exercises
3 Linear Algebra Review
3.1 Vectors and Matrices
3.2 Addition and Multiplication
3.3 Norms
3.4 Linear Independence
3.5 Rank
3.6 Square Matrices and Properties
3.7 Orthogonality
Exercises
4 Distances and Nearest Neighbors
4.1 Metrics
4.2 Lp Distances and their Relatives
4.2.1 Lp Distances
4.2.2 Mahalanobis Distance
4.2.3 Cosine and Angular Distance
4.2.4 KL Divergence
4.3 Distances for Sets and Strings
4.3.1 Jaccard Distance
4.3.2 Edit Distance
4.4 Modeling Text with Distances
4.4.1 Bag-of-Words Vectors
4.4.2 k-Grams
4.5 Similarities
4.5.1 Set Similarities
4.5.2 Normed Similarities
4.5.3 Normed Similarities between Sets
4.6 Locality Sensitive Hashing
4.6.1 Properties of Locality Sensitive Hashing
4.6.2 Prototypical Tasks for LSH
4.6.3 Banding to Amplify LSH
4.6.4 LSH for Angular Distance
4.6.5 LSH for Euclidean Distance
4.6.6 Min Hashing as LSH for Jaccard Distance
Exercises
5 Linear Regression
5.1 Simple Linear Regression
5.2 Linear Regression with Multiple Explanatory Variables
5.3 Polynomial Regression
5.4 Cross-Validation
5.4.1 Other ways to Evaluate Linear Regression Models
5.5 Regularized Regression
5.5.1 Tikhonov Regularization for Ridge Regression
5.5.2 Lasso
5.5.3 Dual Constrained Formulation
5.5.4 Matching Pursuit
Exercises
6 Gradient Descent
6.1 Functions
6.2 Gradients
6.3 Gradient Descent
6.3.1 Learning Rate
6.4 Fitting a Model to Data
6.4.1 Least Mean Squares Updates for Regression
6.4.2 Decomposable Functions
Exercises
7 Dimensionality Reduction
7.1 Data Matrices
7.1.1 Projections
7.1.2 Sum of Squared Errors Goal
7.2 Singular Value Decomposition
7.2.1 Best Rank-k Approximation of a Matrix
7.3 Eigenvalues and Eigenvectors
7.4 The Power Method
7.5 Principal Component Analysis
7.6 Multidimensional Scaling
7.6.1 Why does Classical MDS work?
7.7 Linear Discriminant Analysis
7.8 Distance Metric Learning
7.9 Matrix Completion
7.10 Random Projections
Exercises
8 Clustering
8.1 Voronoi Diagrams
8.1.1 Delaunay Triangulation
8.1.2 Connection to Assignment-Based Clustering
8.2 Gonzalez's Algorithm for k-Center Clustering
8.3 Lloyd's Algorithm for k-Means Clustering
8.3.1 Lloyd's Algorithm
8.3.2 k-Means++
8.3.3 k-Mediod Clustering
8.3.4 Soft Clustering
8.4 Mixture of Gaussians
8.4.1 Expectation-Maximization
8.5 Hierarchical Clustering
8.6 Density-Based Clustering and Outliers
8.6.1 Outliers
8.7 Mean Shift Clustering
Exercises
9 Classification
9.1 Linear Classifiers
9.1.1 Loss Functions
9.1.2 Cross-Validation and Regularization
9.2 Perceptron Algorithm
9.3 Support Vector Machines and Kernels
9.3.1 The Dual: Mistake Counter
9.3.2 Feature Expansion
9.3.3 Support Vector Machines
9.4 Learnability and VC dimension
9.5 kNN Classifiers
9.6 Decision Trees
9.7 Neural Networks
9.7.1 Training with Back-propagation
10 Graph Structured Data
10.1 Markov Chains
10.1.1 Ergodic Markov Chains
10.1.2 Metropolis Algorithm
10.2 PageRank
10.3 Spectral Clustering on Graphs
10.3.1 Laplacians and their EigenStructures
10.4 Communities in Graphs
10.4.1 Preferential Attachment
10.4.2 Betweenness
10.4.3 Modularity
Exercises
11 Big Data and Sketching
11.1 The Streaming Model
11.1.1 Mean and Variance
11.1.2 Reservoir Sampling
11.2 Frequent Items
11.2.1 Warm-Up: Majority
11.2.2 Misra-Gries Algorithm
11.2.3 Count-Min Sketch
11.2.4 Count Sketch
11.3 Matrix Sketching
11.3.1 Covariance Matrix Summation
11.3.2 Frequent Directions
11.3.3 Row Sampling
11.3.4 Random Projections and Count Sketch Hashing
Exercises
Index
๐ SIMILAR VOLUMES
<p>In recent years, the fields of crime analysis and environmental criminology have grown in prominence for their advancements made in understanding crime. This book offers a theoretical and methodological introduction to crime analysis, covering the main techniques used in the analysis of crime and
How to reveal, characterize, and exploit the structure in data? Meeting this central challenge of modern data science requires the development of new mathematical approaches to data analysis, going beyond traditional statistical methods. Fruitful mathematical methods can originate in geometry, top