Recent advances in experimental methods have resulted in the generation of enormous volumes of data across the life sciences. Hence clustering and classification techniques that were once predominantly the domain of ecologists are now being used more widely. This book provides an overview of these i
Cluster and classification techniques for the biosciences
β Scribed by Fielding A.H.
- Publisher
- CUP
- Year
- 2007
- Tongue
- English
- Leaves
- 260
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Recent advances in experimental methods have resulted in the generation of enormous volumes of data across the life sciences. Hence clustering and classification techniques that were once predominantly the domain of ecologists are now being used more widely. This book provides an overview of these important data analysis methods, from long-established statistical methods to more recent machine learning techniques. It aims to provide a framework that will enable the reader to recognise the assumptions and constraints that are implicit in all such techniques. Important generic issues are discussed first and then the major families of algorithms are described. Throughout the focus is on explanation and understanding and readers are directed to other resources that provide additional mathematical rigour when it is required. Examples taken from across the whole of biology, including bioinformatics, are provided throughout the book to illustrate the key concepts and each technique's potential.
β¦ Table of Contents
Cover......Page 1
Half-title......Page 3
Title......Page 5
Copyright......Page 6
Dedication......Page 7
Contents......Page 9
Preface......Page 13
1.1 Background......Page 15
1.3 Classification......Page 16
1.5.1 Structure in tables......Page 17
1.5.2 Graphical identification of structure......Page 18
1.6 Glossary......Page 19
Supervised learning......Page 20
Machine learning......Page 21
1.6.5 Maximum likelihood estimation......Page 22
1.6.8 Occamβs razor......Page 23
1.7.2 Software......Page 24
2.1 Background......Page 26
2.2 Dimensionality......Page 27
2.3 Goodness of fit testing......Page 28
2.4.1 Background......Page 29
Outline......Page 30
Matrix methods (a very brief review)......Page 31
Example analysis 1......Page 36
Example analysis 2......Page 39
2.5.3 Factor analysis......Page 42
2.6.1 Background......Page 43
2.6.3 Sammon mapping......Page 45
2.7.1 Correspondence analysis......Page 46
2.7.2 Canonical correspondence analysis......Page 48
2.8.1 Mantel tests......Page 50
2.8.2 Procrustes rotation......Page 51
2.10 Example EDA analysis......Page 52
3.1 Background......Page 60
3.2.1 Distance measures......Page 62
Non-Euclidean metrics......Page 63
3.2.2 Importance of data types......Page 64
Distances for interval variables......Page 65
Count data......Page 66
3.2.3 Other distance measures......Page 67
3.3.1 k-means......Page 69
3.3.2 k-medians and PAM......Page 70
3.3.3 Mixture models......Page 71
3.4 Agglomerative hierarchical methods......Page 72
Average linkage clustering......Page 73
Complete linkage clustering......Page 74
3.4.2 The dendrogram......Page 75
3.5 How many groups are there?......Page 76
3.5.1 Scree plots......Page 77
3.5.2 Other methods of estimating optimum number of clusters......Page 78
3.6 Divisive hierarchical methods......Page 79
3.7 Two-way clustering and gene shaving......Page 80
3.8 Recommended reading......Page 81
3.9.1 Hierarchical clustering of bacterial strains......Page 82
3.9.2 Hierarchical clustering of the human genus......Page 84
Bacteria data......Page 86
Cancer data......Page 89
4.1 Background......Page 92
4.2 Black-box classifiers......Page 95
4.3 Nature of a classifier......Page 96
4.4 No-free-lunch......Page 99
4.5 Bias and variance......Page 100
4.6.1 Background......Page 101
Signal-to-noise ratio......Page 102
Sequential selection methods......Page 103
Gene expression data......Page 104
4.6.3 Ranking the importance of predictors......Page 105
4.7.1 Background......Page 106
4.7.2 Boosting and bagging......Page 107
4.8 Why do classifiers fail?......Page 108
4.9 Generalisation......Page 109
4.10 Types of classifier......Page 110
5.1 Background......Page 111
5.2 NaΓ―ve Bayes......Page 113
5.3.1 Introduction......Page 114
5.3.2 Example analyses......Page 117
Discriminant analysis of two artificial data sets......Page 119
Discriminant analysis of golden eagle data (multi-class analysis)......Page 122
5.3.3 Modified algorithms......Page 130
5.4.1 Introduction......Page 131
Artificial data......Page 134
Mixed data type analysis......Page 136
Analysis......Page 139
Residuals and influence statistics......Page 141
5.5 Discriminant analysis or logistic regression?......Page 142
5.6.2 Loess and spline smoothing functions......Page 144
5.6.3 Example analysis......Page 146
5.7 Summary......Page 150
6.2.1 Background......Page 151
Identifying splits......Page 156
Tree complexity......Page 157
Missing values......Page 158
6.2.2 Example analysis......Page 159
6.2.3 Random forests......Page 160
ID3, C4.5 and C5......Page 165
CHAID......Page 166
OC1......Page 167
6.3 Support vector machines......Page 168
6.4.1 Introduction......Page 170
6.4.2 Back-propagation networks......Page 172
6.4.3 Modelling general and generalised linear models with neural networks......Page 175
6.4.4 Interpreting weights......Page 177
6.4.5 Radial bias function networks......Page 178
6.4.6 Example analysis......Page 179
Outline......Page 182
6.5.1 Introduction......Page 184
6.5.2 Genetic algorithms as classifiers......Page 186
6.6.1 Case-based reasoning......Page 189
6.6.2 Nearest neighbour......Page 190
6.7 Where next?......Page 191
7.1 Background......Page 193
7.3 Binary accuracy measures......Page 194
7.4.1 Re-substitution......Page 197
7.4.3 Cross-validation......Page 198
7.4.4 Bootstrapping methods......Page 199
7.5 Decision thresholds......Page 200
7.6 Example......Page 201
7.7.1 Background......Page 204
7.7.2 ROC curves......Page 206
7.7.3 Comparing classifiers using AUC values......Page 207
7.8.1 Costs are universal......Page 208
7.8.3 Using misclassification costs......Page 209
7.9 Comparing classifiers......Page 210
7.10 Recommended reading......Page 213
A.2.1 Descriptive statistics and relationships between variables......Page 214
B.2.1 Descriptive statistics and relationships between variables......Page 217
C.1 Outline......Page 221
D.2.1 Descriptive statistics and relationships between variables......Page 222
E.2.1 Descriptive statistics and relationships between variables......Page 224
F.2.1 Descriptive statistics......Page 231
G.2.1 Descriptive statistics and relationships between variables......Page 234
References......Page 238
Index......Page 255
π SIMILAR VOLUMES
Recent advances in experimental methods have resulted in the generation of enormous volumes of data across the life sciences. Hence clustering and classification techniques that were once predominantly the domain of ecologists are now being used more widely. This book provides an overview of these i
<p>This book provides an introduction to operational research methods and their application in the agrifood and environmental sectors. It explains the need for multicriteria decision analysis and teaches users how to use recent advances in multicriteria and clustering classification techniques in pr
<p><P>Knowledge Discovery today is a significant study and research area. In finding answers to many research questions in this area, the ultimate hope is that knowledge can be extracted from various forms of data around us. This book covers recent advances in unsupervised and supervised data analys
<p><P>Knowledge Discovery today is a significant study and research area. In finding answers to many research questions in this area, the ultimate hope is that knowledge can be extracted from various forms of data around us. This book covers recent advances in unsupervised and supervised data analys