<p>The contributions gathered in this book focus on modern methods for statistical learning and modeling in data analysis and present a series of engaging real-world applications. The book covers numerous research topics, ranging from statistical inference and modeling to clustering and factorial me
Statistical Models and Methods for Data Science (Studies in Classification, Data Analysis, and Knowledge Organization)
β Scribed by Leonardo Grilli (editor), Monia Lupparelli (editor), Carla Rampichini (editor), Emilia Rocco (editor), Maurizio Vichi (editor)
- Publisher
- Springer
- Year
- 2023
- Tongue
- English
- Leaves
- 186
- Edition
- 1st ed. 2023
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This book focuses on methods and models in classification and data analysis and presents real-world applications at the interface with data science. Numerous topics are covered, ranging from statistical inference and modelling to clustering and factorial methods, and from directional data analysis to time series analysis and small area estimation. The applications deal with new developments in a variety of fields, including medicine, finance, engineering, marketing, and cyber risk.
The contents comprise selected and peer-reviewed contributions presented at the 13th Scientific Meeting of the Classification and Data Analysis Group of the Italian Statistical Society, CLADAG 2021, held (online) in Florence, Italy, on September 9β11, 2021. CLADAG promotes advanced methodological research in multivariate statistics with a special focus on data analysis and classification, and supports the exchange and dissemination of ideas, methodological concepts, numerical methods, algorithms, and computational and applied results at the interface between classification and data science.
β¦ Table of Contents
Preface
Contents
Clustering Financial Time Series by Dependency
1 Introduction
2 Conditional Heteroscedastic Models
3 Procedure for Clustering Time Series by Dependency
4 Simulation Study
5 Real Data Example
6 Conclusions
References
The Homogeneity Index as a Measure of Interrater Agreement for Ratings on a Nominal Scale
1 Introduction
1.1 Aims of This Contribution
1.2 Measures of Interrater Agreement for Nominal Scales
2 Target-specific Measures of Interrater Agreement for Nominal Scales
3 Application
4 Conclusion
References
Hierarchical Clustering of Income Data Based on Share Densities
1 Introduction
2 Lorenz Curve and Share Density: A Parametric Approach
3 Hierarchical Algorithm Based on JS Dissimilarity
4 An Application
5 Conclusion
References
Optimal Coding of High-Cardinality Categorical Data in Machine Learning
1 Introduction
2 Quantify Categorical Features: A Review of Existing Methods
2.1 Methods that Do Not Consider the Target or the Other Variables
2.2 Encoders Requiring only the Target
2.3 One-Hot Encoding (OHE)
3 Single and Multiple Quantifications by OHE
4 Category Embedding by Neural Networks
5 Non-linear Encoding in the Unsupervised Case
6 Conclusions
References
Bayesian Multivariate Analysis of Mixed Data
1 Introduction
2 Bayesian Model Development
2.1 Moment Representation
2.2 Canonical Representation
3 Real Data Application
4 Conclusion and Next Steps
References
Marginals Matrix Under a Generalized Mallows Model Based on the Power Divergence
1 Introduction
2 Modeling Rank Data
2.1 Distances on Permutations
2.2 Generalized Mallows Model Based on the Power Divergence
2.3 Marginals Model
2.4 Model Comparisons
3 Marginals Matrix Under GMM
3.1 Marginals Matrix Structure Under Hoeffding Distances
3.2 Special Cases
4 Illustrative Example
5 Concluding Remarks
References
Time Series Clustering Based on Forecast Distributions: An Empirical Analysis on Production Indices for Construction
1 Introduction
2 The Clustering Procedure
3 An Application to the European Construction Sector
4 Concluding Remarks
References
Partial Reconstruction of Measures from Halfspace Depth
1 The Depth Characterization/Reconstruction Problem
2 Preliminaries: Flag Halfspaces and Central Regions
2.1 Minimizing Halfspaces and Flag Halfspaces
2.2 Halfspace Depth Central Regions
3 Main Result
4 Examples
5 Conclusion
6 Proof of Theorem 1
References
Posterior Predictive Assessment of IRT Models via the Hellinger Distance: A Simulation Study
1 Introduction
2 IRT Models
3 PPMC and Discrepancy Measures for IRT Models
4 Simulation Study
5 Empirical Application
6 Concluding Remarks
References
Shapley-Lorenz Values for Credit Risk Management
1 Introduction
2 Methodology
2.1 Binary Classification
2.2 The Shapley-Lorenz Decomposition for Credit Risk Data
3 Algorithm
4 Application
4.1 Data
4.2 Results
5 Concluding Remarks
References
A Study of Lack-of-Fit Diagnostics for Models Fit to Cross-Classified Binary Variables
1 Introduction
2 Marginal Proportions
2.1 First- and Second-Order Marginals
2.2 Higher Order Marginals
3 Lack-of-Fit Statistics
3.1 The GFfit(ij)perp Statistic
3.2 Adjusted Residuals
3.3 The barΟ2ij Statistic
4 Simulation Studies
4.1 Type I Error Study
4.2 Estimated Mean and Variance of the Statistics
4.3 Power Study for Eight Variables
5 Application
6 Conclusions
References
Robust Response Transformations for Generalized Additive Models via Additivity and Variance Stabilization
1 Introduction
2 Generalized Additive Models and the Structure of AVAS
2.1 Introduction
2.2 Backfitting
2.3 The AVAS Algorithm
2.4 The Numerical Variance Stabilizing Transformation
3 Robustness and Outlier Detection
3.1 Robust Regression
3.2 Robust Outlier Detection
4 Improvements and Options
4.1 Initial Calculations
4.2 Outer Loop
5 Simulations
6 The Generalized Star Plot
7 Prediction of the Weight of Fish
References
A Random-Coefficients Analysis with a Multivariate Random-Coefficients Linear Model
1 Introduction
2 The Model and the Analysis of the Random Coefficients
3 Application Study
4 Conclusions
5 Appendix
References
Parsimonious Mixtures of Matrix-Variate Shifted Exponential Normal Distributions
1 Introduction
2 Methodology
2.1 Parsimonious Mixtures of Matrix-Variate Shifted Exponential Normal Distributions
2.2 Maximum Likelihood Estimation
3 Real Data Example
4 Conclusions
References
Author Index
π SIMILAR VOLUMES
<p><span>The contributions gathered in this open access book focus on modern methods for data science and classification and present a series of real-world applications. Numerous research topics are covered, ranging from statistical inference and modeling to clustering and dimension reduction, from
<p>In science, industry, public administration and documentation centers large amounts of data and information are collected which must be analyzed, ordered, visualized, classified and stored efficiently in order to be useful for practical applications. This volume contains 50 selected theoretical a
The volume presents recent advances in data analysis and decision support and gives an actual overview on the interface between mathematics, operations research, statistics, computer science, and management science. Areas that receive considerable attention in the book are discrimination and cluster
<span>It is a great privilege and pleasure to write a foreword for a book honorΒ ing Wolfgang Gaul on the occasion of his sixtieth birthday. Wolfgang Gaul is currently Professor of Business Administration and Management Science and the Head of the Institute of Decision Theory and Management Science,
<p></p><p><span>This volume presents the latest advances in statistics and data science, including theoretical, methodological and computational developments and practical applications related to classification and clustering, data gathering, exploratory and multivariate data analysis, statistical m