✦ LIBER ✦

Mining Optimized Association Rules for Numeric Attributes

✍ Scribed by Takeshi Fukuda; Yasuhiko Morimoto; Shinichi Morishita; Takeshi Tokuyama

Publisher: Elsevier Science
Year: 1999
Tongue: English
Weight: 283 KB
Volume: 58
Category: Article
ISSN: 0022-0000
DOI: 10.1006/jcss.1998.1595

No coin nor oath required. For personal study only.

✦ Synopsis

Given a huge database, we address the problem of finding association rules for numeric attributes, such as (Balance # I ) O (CardLoan= yes), which implies that bank customers whose balances fall in a range I are likely to use card loan with a probability greater than p. The above rule is interesting only if the range I has some special feature with respect to the interrelation between Balance and CardLoan. It is required that the number of customers whose balances are contained in I (called the support of I ) is sufficient and also that the probability p of the condition CardLoan= yes being met (called the confidence ratio) be much higher than the average probability of the condition over all the data.

Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing two optimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at least a given threshold number.

Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketing as the preprocessing method and thus obtain an efficient rule-finding system.

Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute optimized rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time.

📜 SIMILAR VOLUMES

Novel measurement for mining effective a

Novel measurement for mining effective association rules

✍ Jin-Mao Wei; Wei-Guo Yi; Ming-Yang Wang 📂 Article 📅 2006 🏛 Elsevier Science 🌐 English ⚖ 138 KB

CBAR: an efficient method for mining ass

CBAR: an efficient method for mining association rules

✍ Yuh-Jiuan Tsay; Jiunn-Yann Chiang 📂 Article 📅 2005 🏛 Elsevier Science 🌐 English ⚖ 176 KB

Fuzzy data mining for interesting genera

Fuzzy data mining for interesting generalized association rules

✍ Tzung-Pei Hong; Kuei-Ying Lin; Shyue-Liang Wang 📂 Article 📅 2003 🏛 Elsevier Science 🌐 English ⚖ 262 KB

Due to the increasing use of very large databases and data warehouses, mining useful information and helpful knowledge from transactions is evolving into an important research area. Most conventional data-mining algorithms identify the relationships among transactions using binary values and ÿnd rul

A heuristic for mining association rules

A heuristic for mining association rules in polynomial time

✍ E Yilmaz; E Triantaphyllou; J Chen; T.W Liao 📂 Article 📅 2003 🏛 Elsevier Science 🌐 English ⚖ 985 KB

Abstra£t--Mining association rules from databases has attracted great interest because of its potentially very practical applications. Given a database, the problem of interest is how to mine association rules (which could describe patterns of consumers' behaviors) in an efficient and effective way.

Mining fuzzy association rules and fuzzy

Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection

✍ Jianxiong Luo; Susan M. Bridges 📂 Article 📅 2000 🏛 John Wiley and Sons 🌐 English ⚖ 234 KB

Selection and optimization of cut-points

Selection and optimization of cut-points for numeric attribute values

✍ L. Shang; S.Y. Yu; X.Y. Jia; Y.S. Ji 📂 Article 📅 2009 🏛 Elsevier Science 🌐 English ⚖ 664 KB

Data discretization is the process of setting several cut-points which can represent attribute values using different symbols or integer values for continuous numeric attribute values. A hybrid method based on neural network and genetic algorithm is proposed to select and optimize the cut-points for