Data Mining and Statistics for Decision Making (Tufféry/Data Mining and Statistics for Decision Making) || Appendix A: Elements of Statistics
✍ Scribed by Tufféry, Stéphane
- Publisher
- John Wiley & Sons, Ltd
- Year
- 2011
- Weight
- 502 KB
- Category
- Article
- ISBN
- 0470688297
No coin nor oath required. For personal study only.
✦ Synopsis
probabilistic and statistical methods . used in the laboratory.
Data analysis :
. a few thousand individuals . several tens of variables . construction of 'individuals  variables' tables . importance of computing and visual representation.
Data mining (1990s onwards):
. several millions or tens of millions of individuals . several hundreds or thousands of variables . numerous non-numeric variables, such as textual variables (or variables containing images)
. weak assumptions regarding the statistical distributions involved
. data collected before the study, and often for other purposes
. constantly changing population (difficulty of sampling)
. presence of 'outliers' (abnormal individuals, at least in terms of the distributions studied)
. imperfect data, with errors of input and coding, and missing values
. fast computing, possibly in real time, is essential
. the aim is not always to find the mathematical optimum, but sometimes the model that is easiest for non-statisticians to understand Robert Tibshirani's lasso method of linear regression DBSCAN clustering algorithm proposed by M.
📜 SIMILAR VOLUMES