Performance analysis of a scaleable design for replicating file collections in wide-area networks
✍ Scribed by Bert J Dempsey
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 131 KB
- Volume
- 23
- Category
- Article
- ISSN
- 1084-8045
No coin nor oath required. For personal study only.
✦ Synopsis
Mirroring file collections in the global Internet is widely practiced with a recent study estimating the number of WWW hosts with mirrored content at 10% of all WWW hosts. Conventional mirroring tools, however, are not well-suited for the large-scale multiplesite replication services envisioned by projects such as the Internet2 Distributed Storage Infrastructure (I2-DSI) project. This paper presents a scaleable design for the automated synchronization of large collections of files replicated across multiple hosts, as in I2-DSI, and outlines of how the design has been realized using rsyncC, a modification to the popular open-source mirroring tool, rsync. A performance study based on an instrumented mirror using rsyncC empirically characterizes server-side processing costs under realistic, large-scale workloads, and supplementary measurements of network throughput across Internet2 links illustrate the achievable network performance in a high-speed wide-area network. These experimental results confirm the validity of scalability arguments for the design, uncover key system parameters for rsyncC that must be tuned for efficient operation, and indicate the limitations of TCP-only transport solutions as the number of mirror sites grows.