𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Weighted logistic regression for large-scale imbalanced and rare events data

✍ Scribed by Maalouf, Maher; Siddiqi, Mohammad


Book ID
122249728
Publisher
Elsevier Science
Year
2014
Tongue
English
Weight
790 KB
Volume
59
Category
Article
ISSN
0950-7051

No coin nor oath required. For personal study only.

✦ Synopsis


Latest developments in computing and technology, along with the availability of large amounts of raw data, have led to the development of many computational techniques and algorithms. Concerning binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. Logistic Regression (LR) is a powerful classifier. The combination of LR and the truncated-regularized iteratively re-weighted least squares (TR-IRLS) algorithm, has provided a powerful classification method for large data sets. This study examines imbalanced data with binary response variables containing many more non-events (zeros) than events (ones). It has been established in the literature that these variables are difficult to predict and explain. This research combines rare events corrections to LR with truncated Newton methods. The proposed method, Rare Event Weighted Logistic Regression (RE-WLR), is capable of processing large imbalanced data sets at relatively the same processing speed as the TR-IRLS, however, with higher accuracy.


πŸ“œ SIMILAR VOLUMES


WEIGHTED LIKELIHOOD, PSEUDO-LIKELIHOOD A
✍ NORMAN E. BRESLOW; RICHARD HOLUBKOV πŸ“‚ Article πŸ“… 1997 πŸ› John Wiley and Sons 🌐 English βš– 316 KB πŸ‘ 1 views

General approaches to the fitting of binary response models to data collected in two-stage and other stratified sampling designs include weighted likelihood, pseudo-likelihood and full maximum likelihood. In previous work the authors developed the large sample theory and methodology for fitting of l

Advanced Statistical Methods for the Ana
✍ Di Ciaccio, Agostino; Coli, Mauro; Angulo Ibanez, Jose Miguel πŸ“‚ Article πŸ“… 2011 πŸ› Springer Berlin Heidelberg 🌐 German βš– 382 KB

The theme of the meeting was β€œStatistical Methods for the Analysis of Large Data-Sets”. In recent years there has been increasing interest in this subject; in fact a huge quantity of information is often available but standard statistical techniques are usually not well suited to managing this kind