[ACM Press the 6th ACM conference - Isch
โ
Villa, Oreste; Krishnamoorthy, Sriram; Nieplocha, Jarek; Brown, David M.
๐
Article
๐
2009
๐
ACM Press
โ 618 KB
Checkpoint-Restart is one of the most used software approaches to achieve fault-tolerance in high-end clusters. While standard techniques typically focus on user-level solutions, the advent of virtualization software has enabled efficient and transparent system-level approaches. In this paper, we pr