The main memory access latency can significantly slow down the overall performance of a computer system due to the fact that average cycle time of the main memory is typically a factor of 5 10 times higher than that of a processor. To cope with this problem, in addition to the use of caches, the mai
Multiple Templates Access of Trees in Parallel Memory Systems
β Scribed by Vincenzo Auletta; Amelia De Vivo; Vittorio Scarano
- Publisher
- Elsevier Science
- Year
- 1998
- Tongue
- English
- Weight
- 179 KB
- Volume
- 49
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
β¦ Synopsis
We study the problem of mapping the N nodes of a data structure on M memory modules so that they can be accessed in parallel by templates, i.e., distinct sets of nodes. In literature several algorithms are available for arrays (accessed by rows, columns, diagonals, and subarrays) and trees (accessed by subtrees, rootto-leaf paths, levels, etc.). Although some mapping algorithms for arrays allow conflict-free access to several templates at once (for example rows and columns), no mapping algorithm is known for efficiently accessing subtree, path and level templates in complete binary trees. In our paper, we first prove that any mapping algorithm that is conflict-free for tree/level template has (M/ log M) conflicts when access is done according to path template and vice versa. Therefore, no mapping algorithm can be found that is conflict-free on both path and tree (or path and level) templates. Our main result is an algorithm for mapping complete binary trees with N = 2 M -1 nodes on M memory modules in such a way that:
β’ the number of conflicts for accessing an M-node subtree, M adjacent nodes in the same level, or M consecutive nodes of a root-to-leaf path is O( β M/ log M), β’ the load (i.e., the ratio between the maximum and minimum number of data items mapped on each module) is 1 + o(1),
β’ the time complexity for retrieving the module where a given data item is stored is O(1), if a preprocessing phase of space and time complexity O(log N ) is executed, or O(log log N ), if no preprocessing is allowed.
The algorithm can be easily generalized to complete binary trees of any size.
π SIMILAR VOLUMES
We consider a network of workstations (NOW) organization consisting of bus-based multiprocessors interconnected by a high latency and high bandwidth interconnect, such as ATM, on which a shared-memory programming model using a multiplewriter distributed virtual shared-memory system is imposed. The l
Adults' memory performance on recognition (explicit memory) tests is sensitive to stimulus size, but their performance on priming (implicit memory) tests is not. This memory dissociation is taken as evidence for two, functionally distinct memory systems. Young infants, however, are thought to posses