High Performance Multidimensional Analysis and Data Mining

✍ Scribed by Goil S., Choudhary A.

Year: 1998
Tongue: English
Leaves: 19
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Summary information from data in large databases is used to answer queries in On-Line Analytical Processing (OLAP) systems and to build decision support systems over them. The Data Cube is used to calculate and store summary information on a variety of dimensions, which is computed only partially if the number of dimensions is large. Queries posed on such systems are quite complex and require different views of data. These may either be answered from a materialized cube in the data cube or calculated on the fly. Further, data mining for associations can be performed on the data cube. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Also, they are amenable to parallelism, which is necessary to deal with large (and still growing) data sets. Multidimensional databases store data in multidimensional structure on which analytical operations are performed. A challenge for these systems is how to handle large data sets in a large number of dimensions. These techniques are also applicable to scientific and statistical databases (SSDB) which employ large multidimensional databases and dimensional operations over them.In this paper we present (1) A parallel infrastructure for OLAP multidimensional databases integrated with association rule mining. (2) Introduce Bit-Encoded Sparse Structure (BESS) for sparse data storage in chunks. (3) Scheduling optimizations for parallel computation of complete and partial data cubes. (4) Implementation of a large scale multidimensional database engine suitable for dimensional analysis used in OLAP and SSDB for (a) large number of dimensions (20-30) (b) large data sets (10s of Gigabyte)Our implementation on the IBM SP-2 can handle large data sets and a large number of dimensions by using disk I/O. Results are presented showing its performance and scalability.

📜 SIMILAR VOLUMES

High Performance Data Mining

📁 High Performance Data Mining

✍ Guo, Grossman 📂 Library 🏛 Kluwer 🌐 English

High Performance Data Mining

📁 High Performance Data Mining

✍ Guo, Grossman. (eds.) 📂 Library 📅 2000 🏛 Kluwer 🌐 English

Contains four refereed papers covering important classes of data mining algorithms: classification, clustering, association rule discovery, and learning Bayesian networks. Srivastava et al present a detailed analysis of the parallelization strategy of tree induction algorithms. Xu et al present a pa

Data Mining and Machine Learning in Buil

📁 Data Mining and Machine Learning in Building Energy Analysis: Towards High Performance Computing

✍ Frédéric Magoules, Hai-Xiang Zhao 📂 Library 📅 2016 🏛 Wiley-ISTE 🌐 English

Focusing on up-to-date artificial intelligence models to solve building energy problems, Artificial Intelligence for Building Energy Analysis reviews recently developed models for solving these issues, including detailed and simplified engineering methods, statistical methods, and artificial

High Performance Data Mining: Scaling Al

📁 High Performance Data Mining: Scaling Algorithms, Applications and Systems

✍ Yike Guo, Robert Grossman (auth.), Yike Guo, Robert Grossman (eds.) 📂 Library 📅 2002 🏛 Springer US 🌐 English

High Performance Data Mining: Scaling Algorithms, Applications andSystems brings together in one place important contributions and up-to-date research results in this fast moving area. High Performance Data Mining: Scaling Algorithms, Applications andSystems</e

Scalable High Performance Computing for

📁 Scalable High Performance Computing for Knowledge Discovery and Data Mining

✍ Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, Wei Li, Paul Stol 📂 Library 📅 1998 🏛 Springer US 🌐 English

Scalable High Performance Computing for Knowledge Discovery and DataMining brings together in one place important contributions and up-to-date research results in this fast moving area. Scalable High Performance Computing for Knowledge Discovery and DataMining<

High Performance Data Mining in Time Ser

📁 High Performance Data Mining in Time Series: Techniques and Case Studies

✍ Zhu Y. 📂 Library 📅 2004 🌐 English

The first part of this dissertation describes the framework for high performance time series data mining based on important primitives. Data reduction transform such as the Discrete Fourier Transform, the Discrete Wavelet Transform, Singular Value Decomposition and Random Projection, can reduce the