𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Data Mining and Statistics for Decision Making (Tufféry/Data Mining and Statistics for Decision Making) || Appendix A: Elements of Statistics

✍ Scribed by Tufféry, Stéphane


Publisher
John Wiley & Sons, Ltd
Year
2011
Weight
502 KB
Category
Article
ISBN
0470688297

No coin nor oath required. For personal study only.

✦ Synopsis


probabilistic and statistical methods . used in the laboratory.

Data analysis :

. a few thousand individuals . several tens of variables . construction of 'individuals  variables' tables . importance of computing and visual representation.

Data mining (1990s onwards):

. several millions or tens of millions of individuals . several hundreds or thousands of variables . numerous non-numeric variables, such as textual variables (or variables containing images)

. weak assumptions regarding the statistical distributions involved

. data collected before the study, and often for other purposes

. constantly changing population (difficulty of sampling)

. presence of 'outliers' (abnormal individuals, at least in terms of the distributions studied)

. imperfect data, with errors of input and coding, and missing values

. fast computing, possibly in real time, is essential

. the aim is not always to find the mathematical optimum, but sometimes the model that is easiest for non-statisticians to understand Robert Tibshirani's lasso method of linear regression DBSCAN clustering algorithm proposed by M.


📜 SIMILAR VOLUMES