Scaling Filename Queries in a Large-Scale Distributed File System

Citation:

Ledlie, Jonathan, Laura Serban, and Dafina Toncheva. 2001. “Scaling Filename Queries in a Large-Scale Distributed File System,” no. 03-02. Cambridge, Massachusetts. Copy at http://www.tinyurl.com/y4643v6j

Date Published:

January 2001

Abstract:

We have examined the tradeoffs in applying regular and Compressed Bloom filters to the name query problem in distributed file systems and developed and tested a novel mechanism for scaling queries as the network grows large. Filters greatly reduced query messages when using Fan’s "Summary Cache" in web cache hierarchies [fan00summary], a similar albeit smaller, searching problem. We have implemented a testbed that models a distributed file system and run experiments that test various configurations of the system to see if Bloom filters could provide the same kind of improvements. In a realistic system, where the chance that a randomly queried node holds the file being searched for is low, we show that filters always provide lower bandwidth/search and faster time/search, as long as the rates of change of the files stored at the nodes is not extremely high relative to the number of searches. In other words, we confirm the intuition that keeping some state about the contents of the rest of the system will aid in searching as long as acquiring this state is not overly costly and it does not expire too quickly.

The grouping topology we have developed divides n nodes into log(n) groups, each of which has a representative node that aggregates a composite filter for the group. All nodes not in that group use this low-precision filter to weed out whole collections of nodes by probing these filters, only sending a search to be proxied by a member of the group if the probe of the group filter returns positively. Proxied searches are then carried out within a group, where more precise (more bits per file) filters are kept and exchanged between the n/(log(n)) nodes in a group. Experimental results show that both bandwidth/search and time/search are improved with this novel grouping topology.

  • [fan00summary] Li Fan, Pei Cao, Jussara Almeida, Andrei Z. Broder: Summary cache: a Scalable Wide-Area Web Cache Sharing Protocol, IEEE/ACM Transactions on Networking, 2000.

Website