𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Performance analysis of a scaleable design for replicating file collections in wide-area networks

✍ Scribed by Bert J Dempsey


Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
131 KB
Volume
23
Category
Article
ISSN
1084-8045

No coin nor oath required. For personal study only.

✦ Synopsis


Mirroring file collections in the global Internet is widely practiced with a recent study estimating the number of WWW hosts with mirrored content at 10% of all WWW hosts. Conventional mirroring tools, however, are not well-suited for the large-scale multiplesite replication services envisioned by projects such as the Internet2 Distributed Storage Infrastructure (I2-DSI) project. This paper presents a scaleable design for the automated synchronization of large collections of files replicated across multiple hosts, as in I2-DSI, and outlines of how the design has been realized using rsyncC, a modification to the popular open-source mirroring tool, rsync. A performance study based on an instrumented mirror using rsyncC empirically characterizes server-side processing costs under realistic, large-scale workloads, and supplementary measurements of network throughput across Internet2 links illustrate the achievable network performance in a high-speed wide-area network. These experimental results confirm the validity of scalability arguments for the design, uncover key system parameters for rsyncC that must be tuned for efficient operation, and indicate the limitations of TCP-only transport solutions as the number of mirror sites grows.