Optimal Compile-Time Multiprocessor Scheduling Based on the O–1 Linear Programming Algorithm with the Branch and Bound Technique
✍ Scribed by D.Antony Louis Piriyakumar; C.Siva Ram Murthy
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 238 KB
- Volume
- 35
- Category
- Article
- ISSN
- 0743-7315
No coin nor oath required. For personal study only.
✦ Synopsis
This paper is organized as follows. We describe the formulation of the problem and the required definitions in Section II. In Section III, the performance improvement over [6] due to processor isomorphism and look ahead pruning in reducing the search space is explained, along with a note on the lower bound on the completion time. In Section IV, our modified 0-1 linear programming algorithm with the branch and bound technique (with the above improvements) to produce optimal solution is presented. The performance of our algorithm is evaluated using DFT computation and LU decomposition as benchmarks in Section V. Finally, Section VI concludes the discussion on optimal multiprocessor scheduling.
II. PROBLEM DEFINITION
A parallel program (algorithm) is represented as a weighted directed acyclic (task) graph, G t ϭ (V t , E t ), where V t ϭ ͕v i : i ϭ 1, 2, ..., n͖ is the set of vertices (tasks) with associated service demand s i , and E t ϭ ͕͗v i , v j ͘: i, j ϭ 1, 2, ..., n, i ϶ j͖ is the set of directed edges with associated intertask data communication between task i and task j, imposing the partial order that task j can be executed only after the execution of task i. A task i can send a message or data to task j only after the execution of task i is completed. Moreover, a task i can begin its execution only when all the required data from all the predecessor tasks k, where ͗v k , v i ͘ ʦ E t are received. An example of a task graph is shown in Fig. 1a.
The multiprocessor system, onto which the tasks are scheduled, is assumed to be homogeneous (all the processors have the same service rate, memory capacity, link capacities, etc.) or heterogeneous (the processors and the links may differ in service rates and link capacities, respectively). It has no global shared memory. A processor is assumed to perform both computation and interprocessor communication at the same time. The communication links are assumed to be full duplex. The multiprocessor system is represented as a weighted undirected graph (as shown in Fig. 1b), G p ϭ (V p , E p ), where V p ϭ ͕v q : q ϭ 1, 2, ..., m͖ is the set of processors with associated service rates Ȑ q and E p ϭ ͕(p, q): p, q ϭ 1, 2, ..., m, p ϶ q͖ is the set of