This book offers an original and broad exploration of the fundamental methods in Clustering and Combinatorial Data Analysis, presenting new formulations and ideas within this very active field. With extensive introductions, formal and mathematical developments and real case studies, this book provid
Foundations and methods in combinatorial and statistical data analysis and clustering
✍ Scribed by Lerman, Israël César
- Publisher
- Springer
- Year
- 2016
- Tongue
- English
- Leaves
- 664
- Series
- Advanced information and knowledge processing
- Category
- Library
No coin nor oath required. For personal study only.
✦ Table of Contents
Preface......Page 7
Collaborators......Page 13
Acknowledgements......Page 14
Contents......Page 16
1.1.1 Definition and General Properties......Page 24
1.1.2 Countings......Page 32
1.2.1 Generalities......Page 41
1.2.2 Representations......Page 43
1.3 Type of a Partition and Cardinality of the Associated Equivalence Binary Relation......Page 44
1.4.1 Definition and Properties of Ultrametric Spaces......Page 53
1.4.2 Partition Lattice Chains of a Finite Set and the Associated Ultrametric Spaces......Page 56
1.4.3 Partition Lattice Chains and the Associated Ultrametric Preordonances......Page 60
1.4.4 Partition Hierarchies and Dendrograms......Page 62
1.4.5 From a Symmetrical Binary Hierarchy to a Directed Binary Hierarchy......Page 68
1.5 Polyhedral Representation of the Partition Set of a Finite Set......Page 75
References......Page 81
2.1 Preamble......Page 83
2.2.1 Data Structure and Clustering Criterion......Page 84
2.2.2 Transfer Algorithm and Central Partition......Page 91
2.2.3 Objects with the Same Representation......Page 94
2.2.4 Statistical Asymptotic Analysis......Page 96
2.2.5 Remarks on the Application of the Central Partition Method and Developments......Page 100
2.3.1 Data Structure and Clustering Criterion......Page 102
2.3.2 The K-Means Algorithm......Page 106
2.3.3 Dynamic Cluster Algorithm......Page 108
2.3.4 Following the Definition of the Algorithm......Page 113
References......Page 120
3.1 Objects, Categories and Attributes......Page 122
3.2 Representation of the Attributes of Type I......Page 124
3.2.1 The Boolean Attribute......Page 125
3.2.2 The Numerical Attribute......Page 126
3.2.3 Defining a Categorical Attribute from a Numerical One......Page 128
3.3 Representation of the Attributes of Type II......Page 130
3.3.1 The Nominal Categorical Attribute......Page 131
3.3.2 The Ordinal Categorical Attribute......Page 134
3.3.3 The Ranking Attribute......Page 137
3.3.4 The Categorical Attribute Valuated by a Numerical Similarity......Page 139
3.3.5 The Valuated Binary Relation Attribute......Page 141
3.4.1 The Preordonance Categorical Attribute......Page 142
3.4.2 The Taxonomic Categorical Attribute......Page 145
3.4.3 The Taxonomic Preordonance Attribute......Page 150
3.4.4 Coding the Different Attributes in Terms of Preordonance or Similarity Categorical Attributes......Page 153
3.5.1 Introduction......Page 158
3.5.3 Nominal or Ordinal Categorical Attributes......Page 159
3.5.4 Ordinal (preordonance) or Numerical Similarity Categorical Attributes......Page 163
3.5.5 The Data Table: A Tarski System mathcalT or a Statistical System mathcalS......Page 164
References......Page 167
4.1 Introduction......Page 170
4.2.1 Similarity Index in the Case of Boolean Data......Page 173
4.2.2 Preordonance Associated with a Similarity Index in the Case of Boolean Data......Page 186
4.3.1 Introduction......Page 199
4.3.2 Comparing Nominal Categorical Attributes......Page 201
4.3.3 Comparing Ordinal Categorical Attributes......Page 204
4.3.4 Comparing Preordonance Categorical Attributes......Page 212
References......Page 217
5.1 Introduction......Page 219
5.2.1 The Boolean Case......Page 221
5.2.2 Comparing Numerical Attributes in the LLA approach......Page 241
5.3.1 Introduction......Page 253
5.3.2 Case of a Description by Boolean Attributes......Page 254
5.3.3 Comparing Distributions of Numerical, Ordinal Categorical and Nominal Categorical Attributes......Page 262
References......Page 267
6.1 Introduction......Page 270
6.2.1 Introduction; Alternatives in Normalizing Association Coefficients......Page 271
6.2.2 Comparing Two Ranking Attributes......Page 275
6.2.3 Comparing Two Nominal Categorical Attributes......Page 280
6.2.4 Comparing Two Ordinal Categorical Attributes......Page 295
6.2.5 Comparing Two Valuated Binary Relation Attributes......Page 305
6.2.6 From the Total Association to the Partial One......Page 328
References......Page 340
7.1 Preamble......Page 343
7.2.1 The Outline of the LLA Method for Comparing Objects or Categories......Page 346
7.2.2 Similarity Index Between Objects Described by Numerical or Boolean Attributes......Page 349
7.2.3 Similarity Index Between Objects Described by Nominal or Ordinal Categorical Attributes......Page 352
7.2.4 Similarity Index Between Objects Described by Preordonance or Valuated Categorical Attributes......Page 356
7.2.5 Similarity Index Between Objects Described by Taxonomic Attributes. A Solution for the Classification Consensus Problem......Page 359
7.2.6 Similarity Index Between Objects Described by a Mixed Attribute Types: Heterogenous Description......Page 362
7.2.7 The Goodall Similarity Index......Page 363
7.2.8 Similarity Index Between Rows of a Juxtaposition of Contingency Tables......Page 367
7.2.9 Other Similarity Indices on the Row Set mathbbI of a Contingency Table......Page 371
References......Page 373
8.1 Introduction; Monothetic Class and Polythetic Class......Page 375
8.1.1 The Intuitive Approaches of Beckner and Adanson; from Beckner to Adanson......Page 378
8.2.1 Introduction......Page 381
8.2.2 Case of Attributes of Type I: Numerical and Boolean......Page 382
8.2.3 Discrimination a Partition by a Categorical Attribute......Page 384
8.3 Responsibility'' Degree of an Object in an Attribute Cluster Formation......Page 387<br> 8.3.1 mathcalA is Composed of Attributes of Type I......Page 388<br> 8.3.2 The Attribute Set mathcalA is Composed of Categorical or Ranking Attributes......Page 393<br> 8.4.1 Case of a Single Contingency Table......Page 395<br> 8.4.2 Case of an Horizontal Juxtaposition of Contingency Tables......Page 399<br> 8.5.1 Introduction......Page 400<br> 8.5.2 Comparing ClusteringImportance'' and Projective Importance'' of a Descriptive Attribute......Page 404<br> 8.6.1 General Introduction......Page 409<br> 8.6.2 Crossing Net Classifications; Introduction to Other Crossings......Page 412<br> 8.6.3 Crossing a Net and a Fuzzy Dichotomous Classifications......Page 418<br> 8.6.4 Crossing Two Fuzzy Dichotomous Classifications......Page 422<br> 8.6.5 Crossing Two Typologies......Page 426<br> 8.6.6 Extension to Crossing Fuzzy Relational Categorical Attributes......Page 429<br> 8.7.1 Introduction......Page 437<br> 8.7.2 Discrepancy Between the Preordonance Structure and that Ultrametric, on a Data Set......Page 438<br> 8.7.3 Classifiability Distribution Under a Random Hypothesis of Non-ultrametricity......Page 444<br> 8.7.4 The Murtagh Contribution......Page 449<br> References......Page 450<br> 9.1 Introduction......Page 452<br> 9.2.1 General Presentation......Page 455<br> 9.2.2 An Example......Page 457<br> 9.3 Quality of a Partition Based on the Pairwise Similarities......Page 460<br> 9.3.1 Criteria Based on a Data Preordonance......Page 461<br> 9.3.2 Approximating a Symmetrical Binary Relation by an Equivalence Relation: The Zahn Problem......Page 468<br> 9.3.3 Comparing Two Basic Criteria......Page 473<br> 9.3.4 Distribution of the Intersection Criterion on the Partition Set with a Fixed Type......Page 485<br> 9.3.5 Extensions of the Previous Criterion......Page 491<br> 9.3.6Significant Levels'' and Significant Nodes'' of a Classification Tree......Page 500<br> 9.4.1 Introduction......Page 506<br> 9.4.2 Generalization of the Set Theoretic and Metrical Criteria......Page 507<br> 9.4.3 Distribution of the Cardinality of the Graph Intersection Criterion......Page 510<br> 9.4.4 Pure Ordinal Criteria: The Lateral Order and the Lexicographic Order Criteria......Page 519<br> 9.4.5 Lexicographic Ranking and Inversion Number Criteria......Page 521<br> References......Page 528<br> 10.1 Introduction......Page 530<br> 10.2.1 Definition of an Ultrametric Preordonance Associated with a Preordonance Data......Page 536<br> 10.2.2 Algorithm for Determining ωu Defined by the H Function......Page 538<br> 10.2.3 Property of Optimality......Page 540<br> 10.2.4 Case Where ω Is a Total Ordonance......Page 541<br> 10.3.1 Preamble......Page 544<br> 10.3.2Single Linkage'', Complete Linkage'' andAverage Linkage'' Criteria......Page 545
10.3.3 Inertia Variation (or Ward) Criterion''......Page 547<br> 10.3.4 FromLexicographic'' Ordinal Algorithm to Single Linkage'' orMaximal Link'' Algorithm......Page 551
10.4.1 Family of Criteria of the Maximal Likelihood Linkage......Page 552
10.4.2 Minimal Likelihood Linkage and Average Likelihood Linkage in the LLA Analysis......Page 562
10.5.1 Introduction......Page 566
10.5.2 Chi Square Criterion: A Transposition of the Ward Criterion......Page 567
10.5.3 Mutual Information Criterion......Page 569
10.6.1 Introduction......Page 572
10.6.2 Complexity Considerations of the Basic AAHC Algorithm......Page 575
10.6.3 Reactualization Formulas in the Cases of Binary and Multiple Aggregations......Page 577
10.6.4 Reducibility, Monotonic Criterion, Reducible Neighborhoods and Reciprocal Nearest Neighborhoods......Page 583
10.6.5 Ascendant Agglomerative Hierarchical Clustering (AAHC) Under a Contiguity Constraint......Page 589
10.6.6 Ascendant Agglomerative Parallel Hierarchical Clustering......Page 593
References......Page 597
11.1 Introduction: the CHAVL Software (to.Classification Hiérarchique par …......Page 600
11.2 Real Data: Outline Presentation of Some Processings......Page 603
11.3.1 Preamble: Technical Data Sheet......Page 607
11.3.2 General Objective and Data Description......Page 608
11.3.3 Profiles Extracted from the Classification Tree on mathcalA......Page 610
11.3.5 Standardized Association Coefficient with Respect to the Hypergeometric Model......Page 613
11.3.6 Return to Individuals......Page 614
11.4.1 Preamble: Technical Data Sheet......Page 617
11.4.2 Introduction......Page 618
11.4.3 Construction of the Dayhoff Matrix......Page 620
11.4.4 The Henikoffs Matrix: Comparison with the Dayhoff Matrix......Page 629
11.4.5 The LLA Matrices......Page 633
11.4.6 LLA Similarity Index on a Set of Proteic Aligned Sequences......Page 636
11.4.7 Some Results......Page 642
11.5.1 Structuring the Sets of Values of Categorical Attributes......Page 646
11.5.2 From Total Associations Between Categorical Attributes to Partial Ones......Page 649
References......Page 654
12.1 Contribution to Challenges in Cluster Analysis......Page 656
12.2 Around Two Books Concerning Relational Aspects......Page 658
12.3.1 Principal Component Analysis......Page 660
12.3.2 Multidimensional Scaling......Page 661
12.3.4 Semi-supervised Hierarchical Classification......Page 662
References......Page 663
✦ Subjects
AC
📜 SIMILAR VOLUMES
<p><p>This book offers an original and broad exploration of the fundamental methods in Clustering and Combinatorial Data Analysis, presenting new formulations and ideas within this very active field. </p><p>With extensive introductions, formal and mathematical developments and real case studies, thi
<span>This monograph offers an original broad and very diverse exploration of the seriation domain in data analysis, together with building a specific relation to clustering.</span><p><span>Relative to a data table crossing a set of objects and a set of descriptive attributes, the search for orders
<p><span>This book provides clear explanatory text, illustrative mathematics and algorithms, demonstrations of the iterative process, pseudocode, and well-developed examples for applications of the branch-and-bound paradigm to important problems in combinatorial data analysis. </span></p><p><span>Su
<p>Classical probability theory and mathematical statistics appear sometimes too rigid for real life problems, especially while dealing with vague data or imprecise requirements. These problems have motivated many researchers to "soften" the classical theory. Some "softening" approaches utilize conc
<p>The contributions gathered in this book focus on modern methods for statistical learning and modeling in data analysis and present a series of engaging real-world applications. The book covers numerous research topics, ranging from statistical inference and modeling to clustering and factorial me