This textbook aims to point out the most important principles of data analysis from the mathematical point of view. Specifically, it selected these questions for exploring: Which are the principles necessary to understand the implications of an application, and which are necessary to understand the
Mathematical Foundations of Data Science
β Scribed by Tomas Hrycej, Bernhard Bermeitinger, Matthias Cetto, Siegfried Handschuh
- Publisher
- Springer
- Year
- 2023
- Tongue
- English
- Leaves
- 218
- Series
- Texts in Computer Science
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This textbook aims to point out the most important principles of data analysis from the mathematical point of view. Specifically, it selected these questions for exploring: Β Which are the principles necessary to understand the implications of an application, and which are necessary to understand the conditions for the success of methods used? Theory is presented only to the degree necessary to apply it properly, striving for the balance between excessive complexity and oversimplification.Β Its primary focus is on principles crucial for application success.Β Β
Topics and features:
- Focuses on approaches supported by mathematical arguments, rather than sole computing experiences
- Investigates conditions under which numerical algorithms used in data science operate, and what performance can be expected from them
- Considers key data science problems: problem formulation including optimality measure; learning and generalization in relationships to training set size and number of free parameters; and convergence of numerical algorithms
- Examines original mathematical disciplines (statistics, numerical mathematics, system theory) as they are specifically relevant to a given problem
- Addresses the trade-off between model size and volume of data available for its identification and its consequences for model parametrization
- Investigates the mathematical principles involves with natural language processing and computer vision
- Keeps subject coverage intentionally compact, focusing on key issues of each topic to encourage full comprehension of the entire book
Although this core textbook aims directly at students of computer science and/or data science, it will be of real appeal, too, to researchers in the field who want to gain a proper understanding of the mathematical foundations βbeyondβ the sole computing experience.
β¦ Table of Contents
Preface
For Whom Is This Book Written?
What Makes This Book Different?
Comprehension Checks
Acknowledgments
Contents
Acronyms
Data Science and Its Tasks
Mathematical Foundations
Application-Specific Mappings and Measuring the Fit to Data
2.1 Continuous Mappings
2.1.1 Nonlinear Continuous Mappings
2.1.2 Mappings of Probability Distributions
2.2 Classification
2.2.1 Special Case: Two Linearly Separable Classes
2.2.2 Minimum Misclassification Rate for Two Classes
2.2.3 Probabilistic Classification
2.2.4 Generalization to Multiple Classes
2.3 Dynamical Systems
2.4 Spatial Systems
2.5 Mappings Received by ``Unsupervised Learning''
2.5.1 Representations with Reduced Dimensionality
2.5.2 Optimal Encoding
2.5.3 Clusters as Unsupervised Classes
2.6 Chapter Summary
2.7 Comprehension Check
3 Data Processing by Neural Networks
3.1 Feedforward and Feedback Networks
3.2 Data Processing by Feedforward Networks
3.3 Data Processing by Feedback Networks
3.4 Feedforward Networks with External Feedback
3.5 Interpretation of Network Weights
3.6 Connectivity of Layered Networks
3.7 Shallow Networks Versus Deep Networks
3.8 Chapter Summary
3.9 Comprehension Check
Learning and Generalization
4.1 Algebraic Conditions for Fitting Error Minimization
4.2 Linear and Nonlinear Mappings
4.3 Overdetermined Case with Noise
4.4 Noise and Generalization
4.5 Generalization in the Underdetermined Case
4.6 Statistical Conditions for Generalization
4.7 Idea of Regularization and Its Limits
4.7.1 Special Case: Ridge Regression
4.8 Cross-Validation
4.9 Parameter Reduction Versus Regularization
4.10 Chapter Summary
4.11 Comprehension Check
5 Numerical Algorithms for Data Science
5.1 Classes of Minimization Problems
5.1.1 Quadratic Optimization
5.1.2 Convex Optimization
5.1.3 Non-convex Local Optimization
5.1.4 Global Optimization
5.2 Gradient Computation in Neural Networks
5.3 Algorithms for Convex Optimization
5.4 Non-convex Problems with a Single Attractor
5.4.1 Methods with Adaptive Step Size
5.4.2 Stochastic Gradient Methods
5.5 Addressing the Problem of Multiple Minima
5.5.1 Momentum Term
5.5.2 Simulated Annealing
5.6 Section Summary
5.7 Comprehension Check
Applications
Specific Problems of Natural Language Processing
6.1 Word Embeddings
6.2 Semantic Similarity
6.3 Recurrent Versus Sequence Processing Approaches
6.4 Recurrent Neural Networks
6.5 Attention Mechanism
6.6 Autocoding and Its Modification
6.7 Transformer Encoder
6.7.1 Self-attention
6.7.2 Position-Wise Feedforward Networks
6.7.3 Residual Connection and Layer Normalization
6.8 Section Summary
6.9 Comprehension Check
Specific Problems of Computer Vision
7.1 Sequence of Convolutional Operators
7.1.1 Convolutional Layer
7.1.2 Pooling Layers
7.1.3 Implementations of Convolutional Neural Networks
7.2 Handling Invariances
7.3 Application of Transformer Architecture to Computer Vision
7.3.1 Attention Mechanism for Computer Vision
7.3.2 Division into Patches
7.4 Section Summary
7.5 Comprehension Check
Index
β¦ Subjects
Data Science; Big Data; Statistical Learning; Machine Learning; Deep Learning; Artificial Neural Networks; Data Processing; Pattern Recognition; Learning and Generalization; Numerical Algorithms; Natural Language Processing; Computer Vision
π SIMILAR VOLUMES
In order best exploit the incredible quantities of data being generated in most diverse disciplines data sciences increasingly gain worldwide importance. The book gives the mathematical foundations to handle data properly. It introduces basics and functionalities of the R programming language which
In order best exploit the incredible quantities of data being generated in most diverse disciplines data sciences increasingly gain worldwide importance. The book gives the mathematical foundations to handle data properly. It introduces basics and functionalities of the R programming language which
<p><span>The aim of the book is to help students become data scientists. Since this requires a series of courses over a considerable period of time, the book intends to accompany students from the beginning to an advanced understanding of the knowledge and skills that define a modern data scientist.
The aim of the book is to help students become data scientists. Since this requires a series of courses over a considerable period of time, the book intends to accompany students from the beginning to an advanced understanding of the knowledge and skills that define a modern data scientist. The book