SWORD on PlanetLab: Scalable Wide-Area Resource Discovery

How it works

Note: The current implementation of SWORD runs in centralized mode. The remaining discussion refers to the distributed architecture that is no longer running.

Distributed SWORD consists of two pieces: a distributed query engine that runs atop the Bamboo DHT, and an optimizer that runs on the node where the query entered the system (i.e., the node that the client program contacted).

The distributed query engine receives periodic measurement updates from nodes (currently one update is sent from each node every two minutes). The full set of measurements in an update are stored on a set of nodes, one node for each "key" attribute in the measurement. When the distributed query engine receives a query from a user, it selects one of the single-node key attributes in the query as the "range search attribute" and performs a range search using the range for that attribute specified in the user's query. The range search causes the query to visit the set of nodes that may be storing measurement updates from nodes with their value of the search attribute within the range specified in the query. When the query visits a node, that node returns to the querying node the set of full measurement reports that meet all single-node criteria in the query. Nodes form a linked list in ascending order of range value for which they are responsible, using their (four) Bamboo DHT successor pointers. The query thus travels along these pointers to search the desired range. Because we configure Bamboo to use four successor pointers per node, the query visits four nodes at a time.

For a double range search attribute, the range search is performed over the range abs_min to abs_max described earlier. For a string range search attribute, the range search is performed over the set of strings for that attribute in the query. For a network coordinates range search attribute, the range search is performed over a linearized range corresponding to the z-ordering of the coordinate range of the sphere defined in the query.

The optimizer takes the set of "candidate nodes" returned from the distributed query engine and computes the lowest-penalty mapping of nodes to groups based on the single-node and inter-node penalty functions specified in the query.