๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

[ACM Press the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference - Perth, Australia (2011.09.01-2011.09.02)] Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference on - CEAS '11 - Clustering for semi-supervised spam filtering

โœ Scribed by Whissell, John S.; Clarke, Charles L. A.


Book ID
111981130
Publisher
ACM Press
Year
2011
Tongue
English
Weight
701 KB
Volume
0
Category
Article
ISBN
1450307884

No coin nor oath required. For personal study only.

โœฆ Synopsis


We present a novel investigation of email clustering, demonstrating that clustering can be a powerful tool for email spam filtering. We first extend the well-known notion that ham and spam emails can be divided into clusters, showing the striking result that almost any reasonable clustering algorithm will naturally partition an email dataset into almost entirely spam and entirely spam clusters. We then consider the specific semi-supervised spam filtering scenario of filtering when a large amount of training data is available, but only a few true labels can be obtained for that data. We present two spam filtering approaches for this scenario, both of which start with a clustering of training email. Our first approach uses the true labels of the medoids of each cluster to train a spam filter; our second approach functions similar to the first, except that the true label of each cluster's medoid is used as the label of every email within the cluster, giving a much larger set of labels for training, while still only requiring only a few labels. We evaluate our approaches using the TREC2005 and CEAS2008 spam email datasets. For a large range of different numbers of true labels, we show that both of our approaches significantly outperform training on the same number of randomly selected email messages. The results of our second approach are also better than those of a previously published state-of-the-art semi-supervised small sample spam filtering approach.


๐Ÿ“œ SIMILAR VOLUMES


[ACM Press the 8th Annual Collaboration,
โœ Whissell, John S.; Clarke, Charles L. A. ๐Ÿ“‚ Article ๐Ÿ“… 2011 ๐Ÿ› ACM Press ๐ŸŒ English โš– 701 KB

The 8th Annual Collaboration, Electronic Messaging, Anti-abuse And Spam Conference Sep 01, 2011-sep 02, 2011 Perth, Australia. You Can View More Information About This Proceeding And All Of Acm๏ฟฝs Other Published Conference Proceedings From The Acm Digital Library: Http://www.acm.org/dl.

[ACM Press the 8th Annual Collaboration,
โœ Ramachandran, Anirudh; Dasgupta, Anirban; Feamster, Nick; Weinberger, Kilian ๐Ÿ“‚ Article ๐Ÿ“… 2011 ๐Ÿ› ACM Press ๐ŸŒ English โš– 662 KB

The 8th Annual Collaboration, Electronic Messaging, Anti-abuse And Spam Conference Sep 01, 2011-sep 02, 2011 Perth, Australia. You Can View More Information About This Proceeding And All Of Acm๏ฟฝs Other Published Conference Proceedings From The Acm Digital Library: Http://www.acm.org/dl.

[ACM Press the 8th Annual Collaboration,
โœ Jaberi, Shaghayegh; Rahmani, Amir Masoud; Zadeh, Ahmad Khadem ๐Ÿ“‚ Article ๐Ÿ“… 2011 ๐Ÿ› ACM Press ๐ŸŒ English โš– 646 KB

The 8th Annual Collaboration, Electronic Messaging, Anti-abuse And Spam Conference Sep 01, 2011-sep 02, 2011 Perth, Australia. You Can View More Information About This Proceeding And All Of Acm๏ฟฝs Other Published Conference Proceedings From The Acm Digital Library: Http://www.acm.org/dl.