Classification and Data Science in the Digital Age (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Scribed by Paula Brito (editor), José G. Dias (editor), Berthold Lausen (editor), Angela Montanari (editor), Rebecca Nugent (editor)

Publisher: Springer
Year: 2023
Tongue: English
Leaves: 393
Category: Library

⬇ Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis

The contributions gathered in this open access book focus on modern methods for data science and classification and present a series of real-world applications. Numerous research topics are covered, ranging from statistical inference and modeling to clustering and dimension reduction, from functional data analysis to time series analysis, and network analysis. The applications reflect new analyses in a variety of fields, including medicine, marketing, genetics, engineering, and education.

The book comprises selected and peer-reviewed papers presented at the 17th Conference of the International Federation of Classification Societies (IFCS 2022), held in Porto, Portugal, July 19–23, 2022. The IFCS federates the classification societies and the IFCS biennial conference brings together researchers and stakeholders in the areas of Data Science, Classification, and Machine Learning. It provides a forum for presenting high-quality theoretical and applied works, and promoting and fostering interdisciplinary research and international cooperation. The intended audience is researchers and practitioners who seek the latest developments and applications in the field of data science and classification.

✦ Table of Contents

Preface
Acknowledgements
Partners & Sponsors
Contents
A Topological Clustering of Individuals
1 Introduction
2 Topological Context
2.1 Reference Adjacency Matrices
2.2 Topological Analysis - Selective Review
3 Illustrative Example
4 Conclusion
References
Model Based Clustering of Functional Data with Mild Outliers
1 Introduction
2 The Model
3 Model Inference
4 Applications
5 Conclusion
References
A Trivariate Geometric Classification of Decision Boundaries for Mixtures of Regressions
1 Introduction
2 Mixtures of Regressions
3 Decision Boundaries: Generality
3.1 The Case with G = 2
4 Geometrical Classification of Decision Boundaries with G = 2 and D = 3
5 Beyond Gaussian Assumptions: t-distribution in d = 2
6 Conclusions
References
Generalized Spatio-temporal Regression with PDE Penalization
1 Introduction
2 Application to Criminality Data
References
A New Regression Model for the Analysis of Microbiome Data
1 Introduction
2 Statistical Models for Microbiome Data
2.1 Count Distributions
2.2 Regression Models
3 A Gut Microbiome Application
References
Stability of Mixed-type Cluster Partitions for Determination of the Number of Clusters
1 Introduction
2 Stability of Cluster Partitions
3 Simulation Study
3.1 Data Generation and Execution of Simulation Study
3.2 Analysis of the Results
4 Conclusion
References
A Review on Official Survey Item Classification for Mixed-Mode Effects Adjustment
1 Introduction
2 Methods
3 Results
4 Content Analysis
5 Conclusions
References
Clustering and Blockmodeling Temporal Networks – Two Indirect Approaches
1 Temporal Networks
2 Traditional (Generalized) Blockmodeling Scheme
3 BM of Temporal Networks
3.1 Adapted Symbolic Clustering Methods
3.2 Clustering of Temporal Network and CRC
3.3 Block Model
4 Example: September 11th Reuters Terror News
5 Conclusions
References
Latent Block Regression Model
1 Introduction
2 From Clusterwise Regression to Co-clusterwise Regression
2.1 Latent Block Model (LBM)
2.2 Latent Block Regression Model (LBRM)
3 Variational EM Algorithm
4 Experimental Results
5 Conclusion
References
Using Clustering and Machine Learning Methods to Provide Intelligent Grocery Shopping Recommendations
1 Introduction
2 Materials and Methods
2.1 Data Considered
2.2 Data Normalization
2.3 Further Data Preprocessing Steps
2.4 Data Clustering
3 Application of Supervised Machine Learning Methods
4 Conclusion
References
COVID-19 Pandemic: a Methodological Model for the Analysis of Government’s Preventing Measures and Health Data Records
1 Introduction
2 Methodology
2.1 Data
2.2 Data Analysis
3 Results
4 Discussion
References
pcTVI: Parallel MDP Solver Using a Decomposition into Independent Chains
1 Introduction
2 Related Work
3 Problem Definition
4 Parallel-chained TVI
5 Empirical Evaluation
6 Conclusion
References
Three-way Spectral Clustering
1 Introduction
2 Spectral Clustering
3 A Graphical Approach for Parameter Selection
4 Three-way Spectral Clustering
5 A Real Data Application
6 Conclusion
References
Improving Classification of Documents by Semi-supervised Clustering in a Semantic Space
1 Introduction
2 Related Work
2.1 Representation by Matrix Factorization Methods
2.2 Neural NetworkWord Embeddings
2.3 Methods for Simultaneous Clustering and Factor Extraction
3 Reduced k-Means with Penalization
4 Experiment
4.1 Design of Experiment
4.2 Results
5 Conclusions and Further Work
References
Trends in Data Stream Mining
1 Introduction
2 Fraud Detection: a Case Study
3 Learning to Learn Hyperparameters
4 Conclusions
References
Old and New Constraints in Model Based Clustering
1 Introduction
2 The New Constraints
3 An Illustration Example of the New Constraints
References
Clustering Student Mobility Data in 3-way Networks
1 Introduction
2 Simplification of 3-way Networks
2.1 Main Findings
3 Concluding Remarks
References
Clustering Brain Connectomes Through a Density-peak Approach
1 Introduction
2 Related Work
3 Methods
3.1 Original DP
3.2 DP-KDE
3.3 Graph Clustering
4 Empirical Analysis
5 Concluding Remarks
References
Similarity Forest for Time Series Classification
1 Introduction
2 Classification Methods Used in Comparison
2.1 General Method of Random Forest Construction
2.2 Classical Random Forest
2.3 Similarity Forest
2.4 Random Forest vs Similarity Forest
3 Experimental Setup
4 Results
5 Conclusions
References
Detection of the Biliary Atresia Using Deep Convolutional Neural Networks Based on Statistical Learning Weights via Optimal Similarity and Resampling Methods
1 Introduction
2 Background
3 Proposed Method
3.1 Description of Related Procedures of the Convolution
3.2 Setting Conditions Assumed in This Study
3.3 General Approach to Update Parameters in CNNs
3.4 Setting the Initial Weight Matrix in the Affine Layer
4 Analysis Results on Real-world Data
5 Conclusion and Limitations
References
Some Issues in Robust Clustering
1 Introduction
2 Outliers vs Clusters
3 Robustness and the Number of Clusters
4 More on User Tuning
5 Stability Measurement
6 Conclusion
References
Robustness Aspects of Optimized Centroids
1 Introduction
2 Centroid-based Classification (Object Localization)
2.1 Centroid-Based Object Localization: Asymmetric Modification of the Candidate Area
3 Experiments
3.1 Data
3.2 Methods
3.3 Results
4 Conclusions
References
Data Clustering and Representation Learning Based on Networked Data
1 Introduction
2 Proposed Method
2.1 Content and Structure Information
2.2 Model, Optimization and Algorithm
3 Numerical Experiments
4 Conclusion
References
Towards a Bi-stochastic Matrix Approximation of k-means and Some Variants
1 Introduction
2 Variants of k-Means
3 Bi-stochastic Matrix Approximation of k-Means Variants
3.1 Low-rank Matrix Factorization (MF)
3.2 BMA Formulation
3.3 The Equivalence Between BMA and k-Means
4 BMA Clustering Algorithm
5 Experiments Analysis
6 Conclusion
References
Clustering Adolescent Female Physical Activity Levels with an Infinite Mixture Model on Random Effects
1 Introduction
2 Bayesian Mixture Models for Heterogeneity of Random Effects
2.1 Bayesian Mixed ModelsWith Clustering
3 Trial of Activity in Adolescent Girls (TAAG) and Model Results
4 Discussion
References
Unsupervised Classification of Categorical Time Series Through Innovative Distances
1 Introduction
2 Two Novel Feature-based Approaches for Categorical Time Series Clustering
2.1 Descriptive Features for Categorical Processes
2.2 Two Innovative Dissimilarities Between CTS
3 Partitioning Around Medoids Clustering of CTS
3.1 Experimental Design
3.2 Alternative Metrics and Assessment Criteria
3.3 Results and Discussion
References
Fuzzy Clustering by Hyperbolic Smoothing
1 Introduction
2 Fuzzy Clustering
3 Algorithm for Hyperbolic Smoothing Fuzzy Clustering
4 Comparative Results
5 Concluding Remarks
References
Stochastic Collapsed Variational Inference for Structured Gaussian Process Regression Networks
1 Introduction
2 Model
3 Inference
4 Experiments
5 Conclusions
References
An Online Minorization-Maximization Algorithm
1 Introduction
2 The Online MM Algorithm
3 Example Application
4 Final Remarks
References
Detecting Differences in Italian Regional Health Services During Two Covid-19Waves
1 Introduction
2 Time Series Clustering
3 Data and Descriptive Statistics
4 Grouping Regions by Clustering and Discussion
5 Concluding Remarks
References
Political and Religion Attitudes in Greece: Behavioral Discourses
1 Introduction
2 Methodology
3 Results
4 Discussion
References
Supervised Classification via Neural Networks for Replicated Point Patterns
1 Introduction
2 Point Processes and Point Patterns
3 Neural Networks with General Input Space
4 Simulation Example
References
Parsimonious Mixtures of Seemingly Unrelated Contaminated Normal Regression Models
1 Introduction
2 Parsimonious SU Contaminated Normal Regression Mixtures
3 Analysis of U.S. Canned Tuna Sales
4 Conclusions
References
Penalized Model-based Functional Clustering: a Regularization Approach via Shrinkage Methods
1 Introduction
2 Shrinkage Method for Model-based Clustering for Functional Data
2.1 Model Definition
2.2 Model Estimation via E-M Algorithm
2.3 Model Selection via Silhouette Profile
3 Experimental Results
3.1 Simulation
3.2 Performance on Real Data Sets
4 Discussion
References
Emotion Classification Based on Single Electrode Brain Data: Applications for Assistive Technology
1 Introduction
2 Experimental Methodology
3 Evaluation and Discussion of Results
3.1 Core Emotions Classification
3.2 One vs One – Dual Emotion Classification
3.3 Stimulus vs No Stimulus Classification
4 Conclusions
References
The Death Process in Italy Before and During the Covid-19 Pandemic: a Functional Compositional Approach
1 Introduction and Data Presentation
2 Some Results
References
Clustering Validation in the Context of Hierarchical Cluster Analysis: an Empirical Study
1 Introduction
2 Data and Methods
2.1 Adaptation of the P (I2, )
2.2 Goodman and Kruskal Index ($ )
2.3 U Statistics (Mann and Whitney)
2.4 Silhouette Plots
3 Results and Discussion
4 Final Remarks
References
An MML Embedded Approach for Estimating the Number of Clusters
1 Introduction
2 Clustering with Finite Mixture Models
2.1 Definitions and Concepts
2.2 Discrete Finite Mixture Models
3 Model Selection for Categorical Data
4 The MML Based EM Algorithm
5 Data Analysis and Results
6 Discussion and Perspectives
References
Typology of Motivation Factors for Employees in the Banking Sector: An Empirical Study Using Multivariate Data Analysis Methods
1 Introduction
2 Materials and Methods
3 Main Results and Discussion
4 Conclusion
References
A Proposal for Formalization and Definition of Anomalies in Dynamical Systems
1 Introduction
2 Definition of Anomalies for Dynamical Systems
2.1 Definitions of Anomalies and Outliers
2.2 Definition by Philosophy of Science
3 Proposed Framework for a Formalization of Anomalies
4 Conclusion
References
New Metrics for Classifying Phylogenetic Trees Using Q-means and the Symmetric Difference Metric
1 Introduction
2 Methods
2.1 Silhouette Index Adapted for Tree Clustering
2.2 Gap Statistic Adapted for Tree Clustering
3 Results - A Biological Example
4 Discussion
References
On Parsimonious Modelling via Matrix-variate t Mixtures
1 Introduction
2 Methodology
2.1 Parsimonious Mixtures of Matrix-variate t Distributions
2.2 An AECM Algorithm for Parameter Estimation
3 Real Data Application
4 Conclusions
References
Evolution of Media Coverage on Climate Change and Environmental Awareness: an Analysis of Tweets from UK and US Newspapers
1 Introduction
2 Dataset and Methods
3 Results
3.1 Analysis of Tweet Trends and Breakpoints
3.2 Topic Modeling
4 Discussion
References
Index

📜 SIMILAR VOLUMES

Data Analysis and Decision Support (Stud

📁 Data Analysis and Decision Support (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Daniel Baier(Editor) Reinhold Decker(Editor) Lars Schmidt-Thieme(Editor) 📂 Library 📅 2005 🏛 Springer 🌐 English

The volume presents recent advances in data analysis and decision support and gives an actual overview on the interface between mathematics, operations research, statistics, computer science, and management science. Areas that receive considerable attention in the book are discrimination and cluster

Data Analysis and Decision Support (Stud

📁 Data Analysis and Decision Support (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Daniel Baier, Reinhold Decker (editor), Lars Schmidt-Thieme (editor) 📂 Library 📅 2005 🏛 Springer 🌐 English

It is a great privilege and pleasure to write a foreword for a book honor ing Wolfgang Gaul on the occasion of his sixtieth birthday. Wolfgang Gaul is currently Professor of Business Administration and Management Science and the Head of the Institute of Decision Theory and Management Science,

Statistical Models and Methods for Data

📁 Statistical Models and Methods for Data Science (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Leonardo Grilli (editor), Monia Lupparelli (editor), Carla Rampichini (editor), 📂 Library 📅 2023 🏛 Springer 🌐 English

This book focuses on methods and models in classification and data analysis and presents real-world applications at the interface with data science. Numerous topics are covered, ranging from statistical inference and modelling to clustering and factorial methods, and from directional data a

New Developments in Classification and D

📁 New Developments in Classification and Data Analysis: Proceedings of the Meeting of the Classification and Data Analysis Group (CLADAG) (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Maurizio Vichi, Paola Monari, Stefania Mignani, Angela Montanari (Editors) 📂 Library 📅 2005 🌐 English

The volume presents new developments in data analysis and classification. Particular attention is devoted to clustering, discrimination, data analysis and statistics, as well as applications in biology, finance and social sciences. The reader will find theory and algorithms on recent technical and m

Data Analysis and Rationality in a Compl

📁 Data Analysis and Rationality in a Complex World (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Theodore Chadjipadelis (editor), Berthold Lausen (editor), Angelos Markos (edito 📂 Library 📅 2021 🏛 Springer 🌐 English

This volume presents the latest advances in statistics and data science, including theoretical, methodological and computational developments and practical applications related to classification and clustering, data gathering, exploratory and multivariate data analysis, statistical m

Statistical Learning and Modeling in Dat

📁 Statistical Learning and Modeling in Data Analysis: Methods and Applications (Studies in Classification, Data Analysis, and Knowledge Organization)

✍ Simona Balzano (editor), Giovanni C. Porzio (editor), Renato Salvatore (editor), 📂 Library 📅 2021 🏛 Springer 🌐 English

The contributions gathered in this book focus on modern methods for statistical learning and modeling in data analysis and present a series of engaging real-world applications. The book covers numerous research topics, ranging from statistical inference and modeling to clustering and factorial me