perfsonar-dev - nmwg: r305 - trunk/nmwg/doc/dLS

Subject: perfsonar development work

List archive

nmwg: r305 - trunk/nmwg/doc/dLS

From:
To: ,
Subject: nmwg: r305 - trunk/nmwg/doc/dLS
Date: Thu, 13 Dec 2007 15:32:13 -0500

Author: swany
Date: 2007-12-13 15:32:11 -0500 (Thu, 13 Dec 2007)
New Revision: 305

Modified:
trunk/nmwg/doc/dLS/dLS.html
trunk/nmwg/doc/dLS/dLS.pdf
trunk/nmwg/doc/dLS/dLS.xml
Log:
newest changes. section 2.3 and 2.4 have been swapped based on Jason's
suggestion.

Modified: trunk/nmwg/doc/dLS/dLS.html
===================================================================
--- trunk/nmwg/doc/dLS/dLS.html 2007-12-13 19:25:26 UTC (rev 304)
+++ trunk/nmwg/doc/dLS/dLS.html 2007-12-13 20:32:11 UTC (rev 305)
@@ -299,31 +299,7 @@
----------------> | Client/Service |
Query for Services |_________________|

- </pre><p>Services interacting with an LS</p><p
class="figure">Figure 1</p><p id="rfc.section.1.p.4">This document describes
the support necessary to extend the basic Lookup Service to a distributed
mode of operation. Distributing LS functionality is necessary to scale the
system up in terms of the amount of information that can be stored and
searched. There are a few key facets of this mode of operation:</p><p
id="rfc.section.1.p.5"> </p><ul><li>Summarization - to reduce the amount of
information sent over the network or to anonymize sensitive data, some form
of data reduction must take place.</li><li>Scope - to enable a hierarchy of
systems, some form of scoping must exist that defines local and remote
communication groups.</li><li>Search - information location is key and the
way in which distributed location and search is handled is the crux of this
service.</li></ul><p id="rfc.section.1.p.6">Additionally we present solutions
to issues necessary to allow effecti
ve operation of this service including bootstrapping (i.e. how service finds
other parts of the system) and domain-specific concerns.</p><hr
class="noprint"><h1 id="rfc.section.2" class="np"><a
href="#rfc.section.2">2.</a> <a id="system" href="#system">System
Specific Operation</a></h1><h2 id="rfc.section.2.1"><a
href="#rfc.section.2.1">2.1</a> <a id="overview"
href="#overview">Overview</a></h2><p id="rfc.section.2.1.p.1">The first step
of information flow is when a pS service registers with an LS. The service
may know the name of an LS via static configuration (the most common case for
legacy deployments), or other forms of bootstrapping such as multicast may
occur. A service registers a "service metadata" record about itself and full
metadata (i.e. containing all information such as subject, eventType(s), and
any parameters, see <a href="#service-metadata" title="Service metadata
example">Section 4.1</a>) about stored data it has knowledge of. Such a
record
is called Lookup Information (see <a href="#lookup-info" tit!
le="Look
up Information">Section 4.2</a>).</p><p id="rfc.section.2.1.p.2">The
idea is to move the metadata from a service-local XML data store to a
specialized LS with additional searching capabilities. While a service
instance may support limited searching, this is not necessary as they should
be focused on storing or gathering data and leave the lookup functionality to
the LS. Possible exceptions when a client may need to contact a service
directly are when metadata rapidly changes, like the most recent data's
timestamp and full details of data stored in long-term archival MAs.</p><p
id="rfc.section.2.1.p.3">The architecture of the dLS protocol assumes the
existence of logical rings of LS instances. The architecture should allow for
multiple levels of these rings representing multiple splits in a hierarchy,
although the basic example that will be an ongoing theme in this document
will revolve around only 2 levels. The authors realize it is impossible to
predict how the hierarch
y of this service may split over time, therefore we avoid using language
that directly categorizes a ring into a specific role. In general the two
rings that define scope are 'lower' and 'upper'.</p><p
id="rfc.section.2.1.p.4">To better define this classification consider an
example at a high level: inter-domain communication. It is natural to assume
that single domain will deploy an LS instance to manage deployed services.
The true goal of perfSONAR is to ease the detection of end to end performance
issues particularly across domain boundaries, therefore communication between
domain LS instances is paramount. We assume for this example that the 'top'
most level is that of the domain; further fragmentation by other factors such
as the 'top-level domain' or geographical considerations are probable, just
not of interest in this work. A single domain may have multiple LS
deployments; a representative 'leader' from this set will represent the
'upper' (intra-domain) scope and com
municate with similar LS instances of other domains in this !
case. Th
e actual registered services of the LS represent the 'lower' (local, or in
many cases inter-domain) scope.</p><p id="rfc.section.2.1.p.5">The scoping
designations are important to the next stage: data reduction. We observe that
the abundance of information available via the original metadata description
is rather obtuse when it comes to answering a simple (and common) query such
as 'give me bandwidth data for host x'. Although information such as capacity
or interface name is valuable internal to a domain, it does not serve much
purpose to NOC staff simply asking to see utilization of a link. We propose a
'summarization' strategy based on 'distance' from the source that will
distill the complete metadata into smaller and smaller sets as the
information is passed through the scope hierarchy.</p><p
id="rfc.section.2.1.p.6">Finally, using the scoping and summarizing steps we
come to final, and arguably most important phase: search. Search must rely on
two phases to work efficien
tly in the dLS framework, namely discovery and query. The first step is
locating 'where' information can be found. This involves asking semi direct
questions regarding the well defined network topology in order to locate the
'vicinity' of data. The query phase will then ask more esoteric questions
once it locates the proper LS instances to ask. The discovery phase is made
possible through the process of summarization, while the query phase remains
similar to the current LS functionality.</p><h2 id="rfc.section.2.2"><a
href="#rfc.section.2.2">2.2</a> <a id="summary"
href="#summary">Summarization</a></h2><p id="rfc.section.2.2.p.1">The LS that
a service contacts to register becomes the "Home LS" (HLS, see <a
href="#glossary" title="Glossary">Section 6.1</a>) of that particular
service. It is the responsibility of the HLS to make summary data about the
all of the pS services it knows of available to the larger enterprise and to
draw relevant queries to itself.</p><p i
d="rfc.section.2.2.p.2">Summarization is important to the ov!
erall su
ccess of this service as summaries prevents other LS instances from being
overloaded by information. They must be general enough to allow for easy
creation and exchange but also must retain enough information to provide a
rich query interface able to locate the distributed information. That means
service metadata information must be reduced (summarized) as it propagates
through the LS cloud.</p><p id="rfc.section.2.2.p.3">We start by making an
observation that summarization is best based on scope (see also <a
href="#scope" title="Scope Forming">Section 2.3</a> for forming scope).
Simply put, this means that we should attempt to summarize "more" the
"farther" away from the source that we get. This creates a smaller data set
that travels the farthest away while keeping the larger and more informative
data sets closer to the source. We present the strategies as such:</p><p
id="rfc.section.2.2.p.4"> </p><ul><li>Summarization for the "lower scope" -
we must examine services t
hat we are directly responsible for and distill this information into a
compact and manageable form.</li><li>Summarization for the "upper scope" of
an LS - Aggregation of summaries from peers plus additional summary steps
will yield a concise (yet still compact) data set. Potentially we will need
to consider summaries from lower scopes and aggregate these into the
information set.</li></ul><p id="rfc.section.2.2.p.5">We will limit our
discussions to the previously discussed inter-domain example, thus involving
only two scope rings. Building upon the basic ideas presented here, extension
to multiple scopes will be be presented in future work.</p><h3
id="rfc.section.2.2.1"><a href="#rfc.section.2.2.1">2.2.1</a> <a
id="lower_scope_summarization" href="#lower_scope_summarization">Lower Scope
Summarization</a></h3><p id="rfc.section.2.2.1.p.1">The lower scope
summarization, described here as information exchange between HLS instances
internal to a domain, consists of simply
extracting detailed information from the metadata descriptio!
ns provi
ded by registered services. For now we define this to be simply removing
additional "parameter" elements from the metadata. Special consideration must
be given to the "supportedEventType" parameter by simply converting this to
actual eventType elements. This will ensure interoperability with legacy
services.</p><p id="rfc.section.2.2.1.p.2">Future iterations may choose to
drop additional pieces of information deemed unnecessary or private such as
parts of topological descriptions. This sort of modification is encouraged as
long as the data remains "symmetrical" and conforms to the schematic
definitions for a given metadata description. It should be noted that such
modifications will affect the searching procedure and could isolate the
source services.</p><p id="rfc.section.2.2.1.p.3">The mechanics for
performing this level of summarization can use any number of technologies.
Either Extensible Stylesheet Language Transformation (XSLT) documents or the
XQuery language (see <a h
ref="#glossary" title="Glossary">Section 6.1</a>) may be used to
prepare the initial data for exchange in this first level. Since the exchange
of this local information will occur frequently, a simple operation that is
scheduled or on demand should be employed by the individual implementations
to ensure the regular LS functions are not impeded.</p><p
id="rfc.section.2.2.1.p.4">In order to make information available to the LS
cloud, the HLS will advertise this summary information to other LS instances
to propagate the appropriate information. Information exchange will be
handled using a "taking turns" protocol such as token ring. The holder of the
token will then perform the information exchange to other known instances
(see <a href="#glossary" title="Glossary">Section 6.1</a>).</p><div
id="hls-cloud"></div><div id="rfc.figure.2"></div><p>Token passing between
HLS cloud</p><pre>
- _____ _____
-| | | |
-| LS1 | <----------------- | LS2 |
-|_____| Communicate via |_____|
- | token exchange ^
- | |
- | _____ |
- | | | |
- |-------> | LS3 | ---------|
- |_____|
- </pre><p>HLS instances communicating via a token message</p><p
class="figure">Figure 2</p><div id="hls-cloud1"></div><div
id="rfc.figure.3"></div><p>Broadcast summary</p><pre>
- _____ _____
-| | | |
-| LS1 | < > | LS2 |
-|_____| \ / |_____|
- \ /
- \ / Broadcast Summary info to
- \ _____ / all peers when holding the token
- | |
- | LS3 |
- |____/T\ Token
- \_/
-
- </pre><p>The holder of the token (LS3) will inform everyone of
its summary information.</p><p class="figure">Figure 3</p><p
id="rfc.section.2.2.1.p.7">Once exchanged, the details regarding storage in
the XML database backend (see <a href="#glossary"
title="Glossary">Section 6.1</a>) are also left to individual
implementations. It is understood that this information, in the possession of
non HLS instances, is provided as a convenience and should be treated in the
same way that directly registered information is (i.e. purged on expiration).
When responding to queries for this information, the LS must indicate whether
or not it is authoritative.</p><h3 id="rfc.section.2.2.2"><a
href="#rfc.section.2.2.2">2.2.2</a> <a id="upper_scope_summarization"
href="#upper_scope_summarization">Upper Scope Summarization</a></h3><p
id="rfc.section.2.2.2.p.1">A designated member of the aforementioned HLS
organization will be required to interact with other similar LSs (po
ssibly representing other domains) in order to form an upper scope. The
mechanics of how we learn who is the designated leader are discussed in <a
href="#tokens" title="Token Passing">Section 2.3.2</a>. The leader from
each of the first layers of this hierarchy (and the designated backup) will
be responsible for examining each member's summary information and building a
summarization/aggregation that describes the contents of the various LS
instances. This summary will serve as input to the upper scope.</p><p
id="rfc.section.2.2.2.p.2">The most natural summarization is based on the
topology of the network (like in network routing). Thus, topology-based
summarization will reduce available service instances in the same way that IP
addresses are summarized into network numbers. They will indicate the
eventTypes that a service has and ones that can it can generate.
Summarization will be performed using specialized summary algorithm. Topology
information such as IP addresses
will be summarized using algorithms based on Radix Tree (se!
e <a hre
f="#IP-summary" title="IP addresses summarization
algorithm">Section 2.2.2.1</a>).</p><p id="rfc.section.2.2.2.p.3">Other
information can be summarized in a less programmatic fashion through the use
of either Extensible Stylesheet Language Transformation (XSLT) documents or
the XQuery language as discussed in the previous section. These mechanisms
will take into account the XML elements that represent the network topology
currently used in metadata subjects as well as additional items such as
eventTypes.</p><p id="rfc.section.2.2.2.p.4">The output of this process
becomes a "service summary" that represents a breadth of the original input.
See <a href="#LSControl-Summary-lower" title="LS Summary Message
(Lower)">Section 4.6</a> or <a href="#LSControl-Summary-upper" title="LS
Summary Message (Upper)">Section 4.7</a> for a mock-up of the summary
output. Additional transformations, while aggressive, will strive to preserve
as much information as possible to remain
useful during the search procedures.</p><h4 id="rfc.section.2.2.2.1"><a
href="#rfc.section.2.2.2.1">2.2.2.1</a> <a id="IP-summary"
href="#IP-summary">IP addresses summarization algorithm</a></h4><p
id="rfc.section.2.2.2.1.p.1">To summarize a set of IP addresses we can build
upon a common structure for IP storage and lookup, the <a
href="http://en.wikipedia.org/wiki/Radix_tree";>Radix (or Patricia) Tree</a>.
Radix trees are used often used to store and look up IP address Our goal is
to index IP addresses using their natural hierarchy, where we can to
summarize groups of addresses by describing the ranges of values into which
they fall.</p><p id="rfc.section.2.2.2.1.p.2">A detailed explanation of the
nuances of the Radix Tree is well beyond the scope of this document, but a
brief overview is presented for completeness. The structure itself is best
described as a binary tree where nodes contain whole values and the edges are
the primary source of navigation. The edges of th
e tree can be based on a single character or perhaps on even!
longer
strings (one of the features that leads to efficiency). The algorithm should
support the following basic operations:</p><p id="rfc.section.2.2.2.1.p.3">
</p><ul><li>Lookup: Indicate that something is or is not contained in the
tree.</li><li>Insert: Like in most inserts we attempt to place something into
the structure. We first start by doing a Lookup to see if it exists already;
the point where we stop is where we will insert the object. We are careful to
utilize the longest matching prefix of any other nearby edge to our
advantage. This last part is normally referred to as "splitting" and ensures
that each node has no more than two children.</li><li>Delete: Delete an
object from the tree. This operation will be complicated by "collapsing"
parents that have a single child and merging the edges.</li></ul><p
id="rfc.section.2.2.2.1.p.4">Once constructed, it is possible to consult the
structure in creating IP network summaries. The current prototype
implementation of summarizati
on creates a Radix tree of IPs during an update phase. Then it can perform 2
types of summarization. First, the "maximum dominator" of the Radix tree is
the maximum summarzation for all IP addresses in the Radix tree. Using the
optimization mentioned above regarding strings longer than 1 character or
bit, the solution to the maximum dominator problem is trivial -- it is simply
the first node below the root. The second type of summarization is to
determine "K-dominators". Essesntially, for a given target K, we produce the
most appropriate summarizing nodes. While this problem is NP-complete, we can
construct an approximation heuristic that simply considers the length of the
strings in the internal (or structural) nodes of the tree. We leave for
future work the problem of "Min cost dominators", in which the best K and the
best K dominators are selected.</p><h2 id="rfc.section.2.3"><a
href="#rfc.section.2.3">2.3</a> <a id="scope" href="#scope">Scope
Forming</a></h2><p id="
rfc.section.2.3.p.1">The next question is how to form the hi!
erarchy
of LS instances and subsequently organize the 'scopes'. The simplest answer
is that the highest scope be formed based on the domain name of the
participating systems as mentioned in the previous examples. That would allow
e.g. internet2.edu, geant2.net, and pionier.gov.pl to potentially operate
more than one LS instance inside their own domains (for performance and
scalability.) As LS instances come online they will invoke bootstrapping
procedures to find and join a lower scoped group first.</p><p
id="rfc.section.2.3.p.2">The scopes should be named based on URIs. This will
allow a domain-level scope to take the form <a
href="http://internet2.edu";>http://internet2.edu</a>, with subdomain scopes
named <a href="http://internet2.edu/foo";>http://internet2.edu/foo</a>, etc.
The top-level scope can be called <a
href="http://perfsonar.net";>http://perfsonar.net</a> with potential for
geographic divisions later if necessary for performance (such as <a
href="http://eu.perfsonar.net";>htt
p://eu.perfsonar.net</a>).</p><p id="rfc.section.2.3.p.3">The major
algorithms used to form and maintain the ring structure of the dLS, no matter
which scope we are talking about, are as follows:</p><p
id="rfc.section.2.3.p.4"> </p><ul><li>Join Procedure</li><li>Token
Passing</li><li>Summarization Notification</li></ul><p
id="rfc.section.2.3.p.5">Each of these procedures is important to keeping
members of the distributed "service" functioning correctly. The algorithms
will be presented in the frame of HLS instances communicating in a lower
scope, but will be used in the same manner for inter-domain communication as
an upper scope as well.</p><h3 id="rfc.section.2.3.1"><a
href="#rfc.section.2.3.1">2.3.1</a> <a id="join" href="#join">Join
Procedure</a></h3><p id="rfc.section.2.3.1.p.1">When an LS instance comes
online it will have some bootstrapping knowledge of potential peers (both
inter and intra domain). This information is contained in LSRing file (see <a
href="#LSRi
ng" title="LS Ring File Structure">Section 4.3</a>). Th!
e inter-
domain knowledge is used first to establish a connection to an already in
progress ring, or perhaps to start a ring that may not exist yet.</p><p
id="rfc.section.2.3.1.p.2">A candidate LS will continuously search its LSRing
information and send an LSControl message to its known LS instances with a
"join" eventType (see <a href="#LSControl-Join" title="LS Joining Message for
Joining a Ring">Section 4.4</a>) until a successful response is seen.
The LS candidate will then search through the successful LSControlResponse to
this message and update its LSRing with the returned information. This can
mean updating the "active" parameter as well as adding new LS instances. This
parameter is indicative of the "live-ness" (i.e. were we successful in
contacting it recently). The contacted LS will also update the local copy of
LSRing to add the new member to its "available" list.</p><p
id="rfc.section.2.3.1.p.3">For security purposes, it is necessary for the
members of the LSRing to
know that a new member has joined without that member authenticating
pairwise with each other member of the ring. To accomplish this, the
initially contacted LS will request that the current ring leader initiate a
token rotation to allow all members to update their LSRing list. </p><p
id="rfc.section.2.3.1.p.4">After updating, the newly joined LS will broadcast
another LSControl message with a "summary" eventType (see <a
href="#LSControl-Summary-lower" title="LS Summary Message
(Lower)">Section 4.6</a>, or if we are dealing with the upper level see
<a href="#LSControl-Summary-upper" title="LS Summary Message
(Upper)">Section 4.7</a>) to all of the "active" LS instances from its
LSRing. Again the responses will be parsed to get any useful updated
information. At the end of this process the joining LS will possess an LSRing
file reflecting the state of the dLS cloud. Each of the recipient LS
instances which hasn't heard anything from this joining LS previously will d
o the same, including adding this new member to their own li!
sts (as
they didn't know of it's existence yet).</p><p
id="rfc.section.2.3.1.p.5">After this initial warm-up the LS will observe the
rules of token etiquette and remain silent until it is contacted with a
token, or it has not seen one in a very long time (see <a href="#tokens"
title="Token Passing">Section 2.3.2</a>).</p><h4
id="rfc.section.2.3.1.1"><a href="#rfc.section.2.3.1.1">2.3.1.1</a> <a
id="join_algorithm" href="#join_algorithm">Join Algorithm</a></h4><p
id="rfc.section.2.3.1.1.p.1">The algorithm for joining the ring works as
follows.</p><p id="rfc.section.2.3.1.1.p.2"> </p><div
id="join-example-rej"></div><div id="rfc.figure.4"></div><p>Illustration of
LS Join Algorithm (rejected)</p><pre>
+ </pre><p>Services interacting with an LS</p><p
class="figure">Figure 1</p><p id="rfc.section.1.p.4">This document describes
the support necessary to extend the basic Lookup Service to a distributed
mode of operation. Distributing LS functionality is necessary to scale the
system up in terms of the amount of information that can be stored and
searched. There are a few key facets of this mode of operation:</p><p
id="rfc.section.1.p.5"> </p><ul><li>Summarization - to reduce the amount of
information sent over the network or to anonymize sensitive data, some form
of data reduction must take place.</li><li>Scope - to enable a hierarchy of
systems, some form of scoping must exist that defines local and remote
communication groups.</li><li>Search - information location is key and the
way in which distributed location and search is handled is the crux of this
service.</li></ul><p id="rfc.section.1.p.6">Additionally we present solutions
to issues necessary to allow effecti
ve operation of this service including bootstrapping (i.e. how service finds
other parts of the system) and domain-specific concerns.</p><hr
class="noprint"><h1 id="rfc.section.2" class="np"><a
href="#rfc.section.2">2.</a> <a id="system" href="#system">System
Specific Operation</a></h1><h2 id="rfc.section.2.1"><a
href="#rfc.section.2.1">2.1</a> <a id="overview"
href="#overview">Overview</a></h2><p id="rfc.section.2.1.p.1">The first step
of information flow is when a pS service registers with an LS. The service
may know the name of an LS via static configuration (the most common case for
legacy deployments), or other forms of bootstrapping such as multicast may
occur. A service registers a "service metadata" record about itself and full
metadata (i.e. containing all information such as subject, eventType(s), and
any parameters, see <a href="#service-metadata" title="Service metadata
example">Section 4.1</a>) about stored data it has knowledge of. Such a
record
is called Lookup Information (see <a href="#lookup-info" tit!
le="Look
up Information">Section 4.2</a>).</p><p id="rfc.section.2.1.p.2">The
idea is to move the metadata from a service-local XML data store to a
specialized LS with additional searching capabilities. While a service
instance may support limited searching, this is not necessary as they should
be focused on storing or gathering data and leave the lookup functionality to
the LS. Possible exceptions when a client may need to contact a service
directly are when metadata rapidly changes, like the most recent data's
timestamp and full details of data stored in long-term archival MAs.</p><p
id="rfc.section.2.1.p.3">The architecture of the dLS protocol assumes the
existence of logical groups of LS instances. The architecture should allow
for multiple levels of these rings representing multiple splits in a
hierarchy, although the basic example that will be an ongoing theme in this
document will revolve around only 2 levels. The authors realize it is
impossible to predict how the hierarc
hy of this service may split over time, therefore we avoid using language
that directly categorizes a ring into a specific role. In general the two
rings that define scope are 'lower' and 'upper'.</p><p
id="rfc.section.2.1.p.4">To better define this classification consider an
example at a high level: inter-domain communication. It is natural to assume
that single domain will deploy an LS instance to manage deployed services.
The true goal of perfSONAR is to ease the detection of end to end performance
issues particularly across domain boundaries, therefore communication between
domain LS instances is paramount. We assume for this example that the 'top'
most level is that of the domain; further fragmentation by other factors such
as the 'top-level domain' or geographical considerations are probable, just
not of interest in this work. A single domain may have multiple LS
deployments; a representative 'leader' from this set will represent the
'upper' (intra-domain) scope and co
mmunicate with similar LS instances of other domains in this!
case. T
he actual registered services of the LS represent the 'lower' (local, or in
many cases inter-domain) scope.</p><p id="rfc.section.2.1.p.5">The scoping
designations are important to the next stage: data reduction. We observe that
the abundance of information available via the original metadata description
is rather obtuse when it comes to answering a simple (and common) query such
as 'give me bandwidth data for host x'. Although information such as capacity
or interface name is valuable internal to a domain, it does not serve much
purpose to NOC staff simply asking to see utilization of a link. We propose a
'summarization' strategy based on 'distance' from the source that will
distill the complete metadata into smaller and smaller sets as the
information is passed through the scope hierarchy.</p><p
id="rfc.section.2.1.p.6">Finally, using the scoping and summarizing steps we
come to final, and arguably most important phase: search. Search must rely on
two phases to work efficie
ntly in the dLS framework, namely discovery and query. The first step is
locating 'where' information can be found. This involves asking semi direct
questions regarding the well defined network topology in order to locate the
'vicinity' of data. The query phase will then ask more esoteric questions
once it locates the proper LS instances to ask. The discovery phase is made
possible through the process of summarization, while the query phase remains
similar to the current LS functionality.</p><h2 id="rfc.section.2.2"><a
href="#rfc.section.2.2">2.2</a> <a id="scope" href="#scope">Scope
Formation</a></h2><p id="rfc.section.2.2.p.1">The next question is how to
form the hierarchy of LS instances and subsequently organize the 'scopes'.
The simplest answer is that the highest scope be formed based on the domain
name of the participating systems as mentioned in the previous examples. That
would allow e.g. internet2.edu, geant2.net, and pionier.gov.pl to potentially
operate more
than one LS instance inside their own domains (for performa!
nce and
scalability.) As LS instances come online they will invoke bootstrapping
procedures to find and join a lower scoped group first.</p><p
id="rfc.section.2.2.p.2">The scopes should be named based on URIs. This will
allow a domain-level scope to take the form <a
href="http://internet2.edu";>http://internet2.edu</a>, with subdomain scopes
named <a href="http://internet2.edu/foo";>http://internet2.edu/foo</a>, etc.
The top-level scope can be called <a
href="http://perfsonar.net";>http://perfsonar.net</a> with potential for
geographic divisions later if necessary for performance (such as <a
href="http://eu.perfsonar.net";>http://eu.perfsonar.net</a>).</p><p
id="rfc.section.2.2.p.3">The major algorithms used to form and maintain the
ring structure of the dLS, no matter which scope we are talking about, are as
follows:</p><p id="rfc.section.2.2.p.4"> </p><ul><li>Join
Procedure</li><li>Token Passing</li><li>Summarization
Notification</li></ul><p id="rfc.section.2.2.p.5">Each of these proce
dures is important to keeping members of the distributed "service"
functioning correctly. The algorithms will be presented in the frame of HLS
instances communicating in a lower scope, but will be used in the same manner
for inter-domain communication as an upper scope as well.</p><h3
id="rfc.section.2.2.1"><a href="#rfc.section.2.2.1">2.2.1</a> <a
id="join" href="#join">Join Procedure</a></h3><p
id="rfc.section.2.2.1.p.1">When an LS instance comes online it will have some
bootstrapping knowledge of potential peers (both inter and intra domain).
This information is contained in LSRing file (see <a href="#LSRing" title="LS
Ring File Structure">Section 4.3</a>). The inter-domain knowledge is
used first to establish a connection to an already in progress ring, or
perhaps to start a ring that may not exist yet.</p><p
id="rfc.section.2.2.1.p.2">A candidate LS will continuously search its LSRing
information and send an LSControl message to its known LS instances with a "
join" eventType (see <a href="#LSControl-Join" title="LS Joi!
ning Mes
sage for Joining a Ring">Section 4.4</a>) until a successful response is
seen. The LS candidate will then search through the successful
LSControlResponse to this message and update its LSRing with the returned
information. This can mean updating the "active" parameter as well as adding
new LS instances. This parameter is indicative of the "live-ness" (i.e. were
we successful in contacting it recently). The contacted LS will also update
the local copy of LSRing to add the new member to its "available" list.</p><p
id="rfc.section.2.2.1.p.3">For security purposes, it is necessary for the
members of the LSRing to know that a new member has joined without that
member authenticating pairwise with each other member of the ring. To
accomplish this, the initially contacted LS will request that the current
ring leader initiate a token rotation to allow all members to update their
LSRing list.</p><p id="rfc.section.2.2.1.p.4">After updating, the newly
joined LS will broadcast anoth
er LSControl message with a "summary" eventType (see <a
href="#LSControl-Summary-lower" title="LS Summary Message
(Lower)">Section 4.6</a>, or if we are dealing with the upper level see
<a href="#LSControl-Summary-upper" title="LS Summary Message
(Upper)">Section 4.7</a>) to all of the "active" LS instances from its
LSRing. Again the responses will be parsed to get any useful updated
information. At the end of this process the joining LS will possess an LSRing
file reflecting the state of the dLS cloud. Each of the recipient LS
instances which hasn't heard anything from this joining LS previously will do
the same, including adding this new member to their own lists (as they didn't
know of it's existence yet).</p><p id="rfc.section.2.2.1.p.5">After this
initial warm-up the LS will observe the rules of token etiquette and remain
silent until it is contacted with a token, or it has not seen one in a very
long time (see <a href="#tokens" title="Token Messages for Contr
ol and Election">Section 2.2.2</a>).</p><h4 id="rfc.sec!
tion.2.2
.1.1"><a href="#rfc.section.2.2.1.1">2.2.1.1</a> <a id="join_algorithm"
href="#join_algorithm">Join Algorithm</a></h4><p
id="rfc.section.2.2.1.1.p.1">The algorithm for joining the ring works as
follows.</p><p id="rfc.section.2.2.1.1.p.2"> </p><div
id="join-example-rej"></div><div id="rfc.figure.2"></div><p>Illustration of
LS Join Algorithm (rejected)</p><pre>
LS1 LS2
candidate member
| |
@@ -332,7 +308,7 @@
| error resp [#]
2[#]<---------------[#]
| |
- </pre><p>Join request rejected</p><p class="figure">Figure 4</p><p
id="rfc.section.2.3.1.1.p.3"> </p><dl class="empty"><dd>1. LS1 (candidate to
the ring) sends join (http://perfsonar.net/services/LS/join eventType)
request to LS2 (member of the ring). LS2 receives join message from LS1 and
decides whether to accept it or not. Application of security policy may occur
here.</dd><dd>2. LS2 rejects join request from LS1 and responses with proper
error code</dd></dl><p id="rfc.section.2.3.1.1.p.4"> </p><div
id="join-example-acc"></div><div id="rfc.figure.5"></div><p>Illustration of
LS Join Algorithm (rejected)</p><pre>
+ </pre><p>Join request rejected</p><p class="figure">Figure 2</p><p
id="rfc.section.2.2.1.1.p.3"> </p><dl class="empty"><dd>1. LS1 (candidate to
the ring) sends join (http://perfsonar.net/services/LS/join eventType)
request to LS2 (member of the ring). LS2 receives join message from LS1 and
decides whether to accept it or not. Application of security policy may occur
here.</dd><dd>2. LS2 rejects join request from LS1 and responses with proper
error code</dd></dl><p id="rfc.section.2.2.1.1.p.4"> </p><div
id="join-example-acc"></div><div id="rfc.figure.3"></div><p>Illustration of
LS Join Algorithm (rejected)</p><pre>
|==========LS Ring=========|

LS1 LS2 LS3
@@ -363,7 +339,7 @@
[#]-----------------+----------------->|
[#]--->...summary | |
| | |
- </pre><p>Join request accepted</p><p class="figure">Figure 5</p><p
id="rfc.section.2.3.1.1.p.5"> </p><dl class="empty"><dd>1. LS1 (candidate to
the ring) sends join (http://perfsonar.net/services/LS/join eventType)
request to LS2 (member of the ring). LS2 receives join message from LS1 and
decides whether to accept it or not. Application of security policy may occur
here.</dd><dd>2. LS2 accepts join request from LS1 and responses with success
code and LSRing content. LS2 will be waiting for send-summary request
(http://perfsonar.net/services/LS/send-summary eventType)</dd><dd>3. LS2
sends send-update-token (http://perfsonar.net/services/LS/send-update-token
eventType) to LS3 (the leader of the ring). Send-update-token contain the URL
of LS1. LS3 updates its LSRing with URL of LS1.</dd><dd>4. LS3 immediately
sends update-token (http://perfsonar.net/services/LS/update-token) to next
peer from LSRing. Update-token contains updated LSRing.</dd><dd>5. LS2
receives update-
token, updates its LSRing and immediately sends update-token to the next
peer</dd><dd>6. After full cycle of update-token LS3 receives own
update-token. Now all ring members have knowledge about newly joined LS1 and
can accept summary from LS1.</dd><dd>7. LS3 responses for request mentioned
in step 3. LS2 receives an acknowledgement (result code) of update-token
operation.</dd><dd>8. If update-token was accomplished succesfuly, LS2 sends
send-summary request (http://perfsonar.net/services/LS/send-summary
eventType) to LS1</dd><dd>9. LS1 sends summary to all peers in the LSRing.
Now all members of the LSRing have the summary information from
LS1.</dd></dl><p id="rfc.section.2.3.1.1.p.6">The algorithm could be
simplified by moving response from step 2 to step 8. However then, LS1 may be
waiting for quite a long time without any reponse and communication time may
pass. Such a simplification should be taken under consideration after
testing.</p><h3 id="rfc.section.2.3.2"><a href
="#rfc.section.2.3.2">2.3.2</a> <a id="tokens" href="#t!
okens">T
oken Passing</a></h3><p id="rfc.section.2.3.2.p.1">The "token" is an
LSControlMessage (see <a href="#LSControl-Token" title="LS Token
Message">Section 4.5</a>) meant to be passed around an LSRing to the
various members in some order. There are various criteria that can be used in
deciding how to order the ring so that everyone can predict where the token
is, when they might expect to get it, and whom they should get it from/ pass
it to next. It is important that we choose a sound method that is simple to
calculate, and should use as much "knowledge" of the ring as possible without
burdening the LS instances too much with complex calculations.</p><p
id="rfc.section.2.3.2.p.2">The essential idea in the token passing mechanism
for leader election is that some identifier is chosen for each node and that
the node with the highest (or lowest) identifier win the election and becomes
the leader. The basic mechanism of leader election is that participants form
a logical ring and
initiate an election. An election can be initiated when a new machine joins,
at system start time, or when a host feels that the leader may have failed
based on failure to receive a periodic token. When an election is initiated,
the initiating host sends an election message to its counter-clockwise
neighbor and changes its state to “ELECTING”. It places its
identifier inside the message. The ultimate goal is for the host with the
highest identifier to be chosen. When a host receives an election message, it
compares its identifier with that in the message. It forwards the higher of
the identifiers. When a node receives a message with its own identifier, it
knows that it has been selected and the election terminates.</p><p
id="rfc.section.2.3.2.p.3">The next question is how to choose the identifier
for a given node. There still needs to be some discussion here. The first
proposal was to use the IP address of the node as the lower-order 32-bits of
a 64-bit number an
d to allow the higher-order bits to be set as a "priority" f!
ield. Th
is would effectively allow a system administrator to make sure that her most
powerful or well-connected nodes became the leader when they were available.
In the absence of a priority, the nodes essentially are randomly
ordered.</p><p id="rfc.section.2.3.2.p.4">The key is that as long as the
identifier is chosen consistently within a given scope, the choice of
identifier doesn't affect the operation of the protocol.</p><p
id="rfc.section.2.3.2.p.5">The token can be viewed as "permission to talk"
and permits the holding LS to send its summary information to all other
available LS instances (see <a href="#LSControl-Summary-lower" title="LS
Summary Message (Lower)">Section 4.6</a> and <a
href="#LSControl-Summary-upper" title="LS Summary Message
(Upper)">Section 4.7</a>). The responses will be parsed to get any
useful updated information about current dLS cloud state.</p><h4
id="rfc.section.2.3.2.1"><a href="#rfc.section.2.3.2.1">2.3.2.1</a> <a
id="token_passing_alg
orithm" href="#token_passing_algorithm">Token Passing Algorithm</a></h4><div
id="token-passing-example"></div><div id="rfc.figure.6"></div><p>Illustration
of Token Passing Algorithm</p><pre>
+ </pre><p>Join request accepted</p><p class="figure">Figure 3</p><p
id="rfc.section.2.2.1.1.p.5"> </p><dl class="empty"><dd>1. LS1 (candidate to
the ring) sends join (http://perfsonar.net/services/LS/join eventType)
request to LS2 (member of the ring). LS2 receives join message from LS1 and
decides whether to accept it or not. Application of security policy may occur
here.</dd><dd>2. LS2 accepts join request from LS1 and responses with success
code and LSRing content. LS2 will be waiting for send-summary request
(http://perfsonar.net/services/LS/send-summary eventType)</dd><dd>3. LS2
sends send-update-token (http://perfsonar.net/services/LS/send-update-token
eventType) to LS3 (the leader of the ring). Send-update-token contain the URL
of LS1. LS3 updates its LSRing with URL of LS1.</dd><dd>4. LS3 immediately
sends update-token (http://perfsonar.net/services/LS/update-token) to next
peer from LSRing. Update-token contains updated LSRing.</dd><dd>5. LS2
receives update-
token, updates its LSRing and immediately sends update-token to the next
peer</dd><dd>6. After full cycle of update-token LS3 receives own
update-token. Now all ring members have knowledge about newly joined LS1 and
can accept summary from LS1.</dd><dd>7. LS3 responses for request mentioned
in step 3. LS2 receives an acknowledgement (result code) of update-token
operation.</dd><dd>8. If update-token was accomplished succesfuly, LS2 sends
send-summary request (http://perfsonar.net/services/LS/send-summary
eventType) to LS1</dd><dd>9. LS1 sends summary to all peers in the LSRing.
Now all members of the LSRing have the summary information from
LS1.</dd></dl><p id="rfc.section.2.2.1.1.p.6">The algorithm could be
simplified by moving response from step 2 to step 8. However then, LS1 may be
waiting for quite a long time without any reponse and communication time may
pass. Such a simplification should be taken under consideration after
testing.</p><h3 id="rfc.section.2.2.2"><a href
="#rfc.section.2.2.2">2.2.2</a> <a id="tokens" href="#t!
okens">T
oken Messages for Control and Election</a></h3><p
id="rfc.section.2.2.2.p.1">When scopes are created they form themselves into
logical rings around which tokens can be passed. These token passing
mechanism is used for two purposes, for registration control and for leader
election.</p><p id="rfc.section.2.2.2.p.2">The "token" is an LSControlMessage
(see <a href="#LSControl-Token" title="LS Token
Message">Section 4.5</a>) meant to be passed around an LSRing to the
various members in some order. There are various criteria that can be used in
deciding how to order the ring so that everyone can predict where the token
is, when they might expect to get it, and whom they should get it from/ pass
it to next. It is important that we choose a sound method that is simple to
calculate, and should use as much "knowledge" of the ring as possible without
burdening the LS instances too much with complex calculations.</p><h4
id="rfc.section.2.2.2.1"><a href="#rfc.section.2.2.2.1">2.2.2.1
</a> <a id="leader_election" href="#leader_election">Leader
Election</a></h4><p id="rfc.section.2.2.2.1.p.1">The essential idea in the
token passing mechanism for leader election is that some identifier is chosen
for each node and that the node with the highest (or lowest) identifier win
the election and becomes the leader. The basic mechanism of leader election
is that participants form a logical ring and initiate an election. An
election can be initiated when a new machine joins, at system start time, or
when a host feels that the leader may have failed based on failure to receive
a periodic token. When an election is initiated, the initiating host sends an
election message to its counter-clockwise neighbor and changes its state to
“ELECTING”. It places its identifier inside the message. The
ultimate goal is for the host with the highest identifier to be chosen. When
a host receives an election message, it compares its identifier with that in
the message.
It forwards the higher of the identifiers. When a node recei!
ves a me
ssage with its own identifier, it knows that it has been selected and the
election terminates.</p><p id="rfc.section.2.2.2.1.p.2">The next question is
how to choose the identifier for a given node. There still needs to be some
discussion here. The first proposal was to use the IP address of the node as
the lower-order 32-bits of a 64-bit number and to allow the higher-order bits
to be set as a "priority" field. This would effectively allow a system
administrator to make sure that her most powerful or well-connected nodes
became the leader when they were available. In the absence of a priority, the
nodes essentially are randomly ordered.</p><p
id="rfc.section.2.2.2.1.p.3">The key is that as long as the identifier is
chosen consistently within a given scope, the choice of identifier doesn't
affect the operation of the protocol.</p><h4 id="rfc.section.2.2.2.2"><a
href="#rfc.section.2.2.2.2">2.2.2.2</a> </h4><p
id="rfc.section.2.2.2.2.p.1">The token can be viewed as "permiss
ion to talk" and permits the holding LS to send its summary information to
all other available LS instances (see <a href="#LSControl-Summary-lower"
title="LS Summary Message (Lower)">Section 4.6</a> and <a
href="#LSControl-Summary-upper" title="LS Summary Message
(Upper)">Section 4.7</a>). The responses will be parsed to get any
useful updated information about current dLS cloud state.</p><h5
id="rfc.section.2.2.2.2.1"><a
href="#rfc.section.2.2.2.2.1">2.2.2.2.1</a> <a
id="token_passing_algorithm" href="#token_passing_algorithm">Token Passing
Algorithm</a></h5><div id="token-passing-example"></div><div
id="rfc.figure.4"></div><p>Illustration of Token Passing Algorithm</p><pre>

LS1 LS2 LS3
| | |
@@ -379,14 +355,38 @@
6 [#]--------------->[#] |
| | |

- </pre><p>LS1, LS2 and LS3 are members of the ring. LS1 receives
a token from LS3.</p><p class="figure">Figure 6</p><p
id="rfc.section.2.3.2.1.p.2">The algorithm for token passing works as
follows:</p><p id="rfc.section.2.3.2.1.p.3"> </p><dl class="empty"><dd>1. LS1
receives the token i.e. LSControlRequest message with the
http://perfsonar.net/services/LS/token/ eventType from its predecessor
L3.</dd><dd>2. LS1 updates its 'lower' peer list based on token content. The
local peer list is replaced by the one received in token</dd><dd>3. LS1 sends
LSControlRequest message with the http://perfsonar.net/services/LS/summary/
eventType to all peers in the lease (excluding itself).</dd><dd>4. LS2,LS3
receiving this message checks its collection and updates it if necessary with
service info.</dd><dd>5. LS1 waits for some amount of time. (TO BE DEFINED -
who decides it?)</dd><dd>6. LS1 sends token to next LS (LS2) from the LSRing
lower scope. If it fails, mark the not-resp
onding peer as "not active" and try next one. (TO BE DISCUSSED whether "not
active" is just boolean or number of fails - after 3 failures the url will be
removed from LSRing)</dd></dl><p id="rfc.section.2.3.2.1.p.4">MG: open
issues:</p><p id="rfc.section.2.3.2.1.p.5">- how to determine and remove
duplicate tokens?</p><p id="rfc.section.2.3.2.1.p.6">- when to re-send token
(I guess when computed token rotation time passes)</p><p
id="rfc.section.2.3.2.1.p.7">- after leader election how the node can know
who is the leader and which tokens accept or reject (if there are tokens sent
by old leader and new leader)</p><h4 id="rfc.section.2.3.2.2"><a
href="#rfc.section.2.3.2.2">2.3.2.2</a> <a id="rotation-time-computing"
href="#rotation-time-computing">Token rotation time computing</a></h4><p
id="rfc.section.2.3.2.2.p.1">The token rotation time is the time of passing
and serving token by all nodes in the LS ring. This time should be computed
by the leader basing on some knowledg
e about the time of serving token by all particular nodes. T!
he time
may be based on times saving in token message by all nodes. Initially, this
can be very simple as in "2 minutes X the number of nodes in the ring." (To
be discussed)</p><p id="rfc.section.2.3.2.2.p.2">The key is that after the
timeout has exceeded, it can be inferred that the leader has failed and
another election should be initiated.</p><h3 id="rfc.section.2.3.3"><a
href="#rfc.section.2.3.3">2.3.3</a> <a id="summary-blast"
href="#summary-blast">Summarization Notification</a></h3><p
id="rfc.section.2.3.3.p.1">As discussed in the prior two sections there are
two acceptable instances to send your summary to the LSRing:</p><p
id="rfc.section.2.3.3.p.2"> </p><ol><li>When first joining</li><li>When
holding the token</li></ol><p id="rfc.section.2.3.3.p.3">In the first case we
are explicitly entering ourselves into the ring when we get our first message
from a peer. This ensures we show up in the token rotation instantly. The
second case is the routine exchange started when a t
oken arrives from a peer.</p><p id="rfc.section.2.3.3.p.4"> <a
href="#LSControl-Summary-lower" title="LS Summary Message
(Lower)">Section 4.6</a> and <a href="#LSControl-Summary-upper"
title="LS Summary Message (Upper)">Section 4.7</a> contain examples of
the message format for this exchange. It is left up to the implementation
when the summarization occurs (i.e. at message send time, or also as a
periodic event).</p><h3 id="rfc.section.2.3.4"><a
href="#rfc.section.2.3.4">2.3.4</a> <a id="Leader_Election"
href="#Leader_Election">Leader election</a></h3><p
id="rfc.section.2.3.4.p.1">The most important role of any group of HLS
instances is electing a leader to serve as a representative in upper level
communication. This logical ring should consist of one representative LS from
each of similar lower scope configuration.</p><p
id="rfc.section.2.3.4.p.2">The Leader and Vice-Leader LS instances should
exchange messages (see <a href="#LSControl-Leader" title="LS Lead
er Message">Section 4.8</a>) periodically to ensure tha!
t in the
event of a failure the lower level will still have a link to the upper
level. A Vice-Leader will be monitoring the time between successive
communications from the Leader to be sure it has not failed. In the event
that it has, the "Join" procedure will start to the upper level to keep the
hierarchy complete.</p><div id="leader-election-example"></div><div
id="rfc.figure.7"></div><p>Illustration of Leader Election Algorithm</p><pre>
+ </pre><p>LS1, LS2 and LS3 are members of the ring. LS1 receives
a token from LS3.</p><p class="figure">Figure 4</p><p
id="rfc.section.2.2.2.2.1.p.2">The algorithm for token passing works as
follows:</p><p id="rfc.section.2.2.2.2.1.p.3"> </p><dl class="empty"><dd>1.
LS1 receives the token i.e. LSControlRequest message with the
http://perfsonar.net/services/LS/token/ eventType from its predecessor
L3.</dd><dd>2. LS1 updates its 'lower' peer list based on token content. The
local peer list is replaced by the one received in token</dd><dd>3. LS1 sends
LSControlRequest message with the http://perfsonar.net/services/LS/summary/
eventType to all peers in the lease (excluding itself).</dd><dd>4. LS2,LS3
receiving this message checks its collection and updates it if necessary with
service info.</dd><dd>5. LS1 waits for some amount of time. (TO BE DEFINED -
who decides it?)</dd><dd>6. LS1 sends token to next LS (LS2) from the LSRing
lower scope. If it fails, mark the not-
responding peer as "not active" and try next one. (TO BE DISCUSSED whether
"not active" is just boolean or number of fails - after 3 failures the url
will be removed from LSRing)</dd></dl><p id="rfc.section.2.2.2.2.1.p.4">MG:
open issues:</p><p id="rfc.section.2.2.2.2.1.p.5">- how to determine and
remove duplicate tokens?</p><p id="rfc.section.2.2.2.2.1.p.6">- when to
re-send token (I guess when computed token rotation time passes)</p><p
id="rfc.section.2.2.2.2.1.p.7">- after leader election how the node can know
who is the leader and which tokens accept or reject (if there are tokens sent
by old leader and new leader)</p><h5 id="rfc.section.2.2.2.2.2"><a
href="#rfc.section.2.2.2.2.2">2.2.2.2.2</a> <a
id="rotation-time-computing" href="#rotation-time-computing">Token rotation
time computing</a></h5><p id="rfc.section.2.2.2.2.2.p.1">The token rotation
time is the time of passing and serving token by all nodes in the LS ring.
This time should be computed by the leader bas
ing on some knowledge about the time of serving token by all!
particu
lar nodes. The time may be based on times saving in token message by all
nodes. Initially, this will be very simple and will be conputed as "2 minutes
plus 5 seconds times the number of nodes in the ring."</p><p
id="rfc.section.2.2.2.2.2.p.2">The key is that after the timeout has
exceeded, it can be inferred that the leader has failed and another election
should be initiated.</p><h3 id="rfc.section.2.2.3"><a
href="#rfc.section.2.2.3">2.2.3</a> <a id="summary_messages"
href="#summary_messages">Summarization Messages</a></h3><p
id="rfc.section.2.2.3.p.1"> <a href="#LSControl-Summary-lower" title="LS
Summary Message (Lower)">Section 4.6</a> and <a
href="#LSControl-Summary-upper" title="LS Summary Message
(Upper)">Section 4.7</a> contain examples of the message format for this
exchange. It is left up to the implementation when the summarization occurs
(i.e. at message send time, or also as a periodic event).</p><h3
id="rfc.section.2.2.4"><a href="#rfc.section.2.2.4
">2.2.4</a> <a id="Leader_Election" href="#Leader_Election">Leader
election</a></h3><p id="rfc.section.2.2.4.p.1">The most important role of any
group of HLS instances is electing a leader to serve as a representative in
upper level communication. This logical ring should consist of one
representative LS from each of similar lower scope configuration.</p><p
id="rfc.section.2.2.4.p.2">The Leader and Vice-Leader LS instances should
exchange messages (see <a href="#LSControl-Leader" title="LS Leader
Message">Section 4.8</a>) periodically to ensure that in the event of a
failure the lower level will still have a link to the upper level. A
Vice-Leader will be monitoring the time between successive communications
from the Leader to be sure it has not failed. In the event that it has, the
"Join" procedure will start to the upper level to keep the hierarchy
complete.</p><div id="leader-election-example"></div><div
id="rfc.figure.5"></div><p>Illustration of Leader Election
Algorithm</p><pre>

LS1 LS2 LS3
| | |
(...) (...) (...)
| | |

- </pre><p>LS1, LS2 and LS3 are members of the ring. Leader
election occurs when ... TBD</p><p class="figure">Figure 7</p><p
id="rfc.section.2.3.4.p.4">The algorithm for leader election works as
follow:</p><p id="rfc.section.2.3.4.p.5"> </p><dl class="empty"><dd>1.
...tbd</dd><dd>2. ...tbd</dd></dl><p id="rfc.section.2.3.4.p.6">MG: Open
issues to be described</p><p id="rfc.section.2.3.4.p.7">- when does leader
election occur?</p><p id="rfc.section.2.3.4.p.8">- who initiates leader
election?</p><p id="rfc.section.2.3.4.p.9">- is vice leader still necessary?
(at least in the first version; perhaps it could be implemented/discussed
later)</p><p id="rfc.section.2.3.4.p.10">- who initiates leader election if
leader doesn't work; when (after what time) does it occur?</p><p
id="rfc.section.2.3.4.p.11">- what if more than one node initiates leader
election?</p><h2 id="rfc.section.2.4"><a
href="#rfc.section.2.4">2.4</a> <a id="search"
href="#search">Search</a></h2><p
id="rfc.section.2.4.p.1">The search operation is obviously critical to the
Lookup Service's function. It is envisioned that searching could take one of
two major forms, iterative and recursive, analogous to those used in DNS.
This design will focus exclusively on iterative initially as the only method
in the first versions of the dLS. The key act when searching is to find what
eventTypes exist for a particular topology element or set of topology
elements.</p><p id="rfc.section.2.4.p.2">As outlined above, the full data
that services register to an LS is not expected to leave the scope of that
LS. The information is summarized before wider distribution. Therefore, a
client needs to find an LS in the scope of the HLS to make queries about the
complete metadata. Specifically, a client wishing to locate information might
specify a topology element in a query to locate the LS instance (or
instances) that contain the relevant details. This separation of full data
and summary data m
eans the overall act of searching is broken down into two di!
stinct p
hases - Discovery and Metadata Query.</p><h3 id="rfc.section.2.4.1"><a
href="#rfc.section.2.4.1">2.4.1</a> <a id="discovery"
href="#discovery">Discovery Phase</a></h3><p id="rfc.section.2.4.1.p.1">The
discovery phase is used to locate the set of Authoritative LS (or LSes) for a
given Subject/eventType tuple. This requires a query to be constructed over
the Discovery information set (which is not described yet, but which must
consist of the 3-tuple of Subject Summary, eventType and Authoritative LS.)
Either a specific API call and a pre-prepared query, or some automatic
mechanism, must map the desired query into a query of the Discovery info-set
(see <a href="#LSControl-Discovery" title="LS Discovery
Message">Section 4.9</a>).</p><h4 id="rfc.section.2.4.1.1"><a
href="#rfc.section.2.4.1.1">2.4.1.1</a> <a id="discovery-alg"
href="#discovery-alg">Discovery Algorithm</a></h4><p
id="rfc.section.2.4.1.1.p.1">The discovery algorithm is as
follows.</p><ol><li>A client l
ocates an LS of some sort (this may be known beforehand via a configuration
value, or from bootstrapping).</li><li>The client should start by making a
discovery query (possibly using an API call) to locate an LS that contains
the data it is interested in. The results of this query will be: <ol
style="list-style-type: lower-alpha"><li>Failure: Returned if there is no LS
at a higher scope than the current one, and nothing was found in the summary
info-set that matches the query.</li><li>Referral: This is returned when
there is no match other than a "global wildcard" <ol style="list-style-type:
lower-alpha"><li>If this LS is not participating in the highest (upper)
scope, then it returns the leader of its current scope (or a direct referral
to an instance of the next-higher scope.) This is effectively a wildcard
match saying "I don't know the answer, but I know who might." This is how the
Metadata registered to an LS in another scope (domain) is found.</li></ol>
</li><li>Succes
s: We define success to mean at least one matching LS has be!
en retur
ned. The LS must return the following: <ol style="list-style-type:
lower-alpha"><li>If this LS is an HLS for the discovery query, it returns
itself.</li><li>This LS also returns any other HLS instances it has found
that match. An LS instance will have summary information from other domains
when it is participating in a higher-level scope (such as
upper).</li><li>Note: this is where recursive searches would be added into
the discovery phase. The trail up the scope hierarchy would be followed by
the LS itself instead of returning the leader LS. Ideally, this list would be
iterated on by the LS so that only leaf LS instances are returned</li></ol>
</li></ol> </li><li>The client will need to iterate through the list of
returned LS instances. If the LS returns itself, this LS can be used in the
following Metadata query phase. If the returned LS is different, a discovery
query should be made to it.</li></ol><h3 id="rfc.section.2.4.2"><a
href="#rfc.section.2.4.2">2.4.2</a> <a i
d="metadata-query" href="#metadata-query">Metadata Query Phase</a></h3><p
id="rfc.section.2.4.2.p.1">The Metadata Query Phase with an individual LS is
the same as the query mechanism that is in place with the current LS
implementations.</p><p id="rfc.section.2.4.2.p.2">Once we have found the HLS
(or Home LSes) that contain data in the range of our discovery query, we can
pose Metadata Queries to each of them. The results will be failure or
success.</p><hr class="noprint"><h1 id="rfc.section.3" class="np"><a
href="#rfc.section.3">3.</a> <a id="bootstrapping"
href="#bootstrapping">Bootstrapping</a></h1><p id="rfc.section.3.p.1">A
distributed information system such as the LS needs to address bootstrapping.
In this system, an LS instance needs to find other members of its scope (for
each scope in which it participates.) To accomplish this we will use a
similar solution to what DNS uses (root.hints).</p><p
id="rfc.section.3.p.2">We will maintain a service that maintains a l
ist of currently known LS instances. These known instances s!
hould pr
eferably be at the upper scope. All clients can cache this list. The service
will be accessed via a well-known hostname, and could be requested via UDP
messages. (We can also use TCP here for some sorts of anycast.)</p><p
id="rfc.section.3.p.3">Initially this will be deployed on one server. We can
extend this to handle redundancy and load balancing in the future by using
multiple DNS records and implementing ANYCAST with routing tricks for this
well known hostname. (Additionally, we can distribute an initial file with a
list of well known LS instances that are supported by the primary perfSONAR
participants.)</p><p id="rfc.section.3.p.4">The above discovery algorithm is
used to find an LS within a given scope. Therefore, the only piece of
information an LS should need to be pre-configured with is the scope it
belongs to. And as stated above, that can be assumed to be
"global:organization-dns-name". Note: Need to define the specific syntax
above.</p><hr class="noprint"><h1 id=
"rfc.section.4" class="np"><a href="#rfc.section.4">4.</a> <a
id="structures-and-messages" href="#structures-and-messages">Structures and
Messages</a></h1><h2 id="rfc.section.4.1"><a
href="#rfc.section.4.1">4.1</a> <a id="service-metadata"
href="#service-metadata">Service metadata example</a></h2><p
id="rfc.section.4.1.p.1">Example of metadata describing information collected
and stored in Measurement Archive service</p><p id="rfc.section.4.1.p.2">
<pre>
+ </pre><p>LS1, LS2 and LS3 are members of the ring. Leader
election occurs when ... TBD</p><p class="figure">Figure 5</p><p
id="rfc.section.2.2.4.p.4">The algorithm for leader election works as
follow:</p><p id="rfc.section.2.2.4.p.5"> </p><dl class="empty"><dd>1.
...tbd</dd><dd>2. ...tbd</dd></dl><p id="rfc.section.2.2.4.p.6">MG: Open
issues to be described</p><p id="rfc.section.2.2.4.p.7">- when does leader
election occur?</p><p id="rfc.section.2.2.4.p.8">- who initiates leader
election?</p><p id="rfc.section.2.2.4.p.9">- is vice leader still necessary?
(at least in the first version; perhaps it could be implemented/discussed
later)</p><p id="rfc.section.2.2.4.p.10">- who initiates leader election if
leader doesn't work; when (after what time) does it occur?</p><p
id="rfc.section.2.2.4.p.11">- what if more than one node initiates leader
election?</p><h2 id="rfc.section.2.3"><a
href="#rfc.section.2.3">2.3</a> <a id="summary"
href="#summary">Summarization</a
></h2><p id="rfc.section.2.3.p.1">The LS that a service contacts to register
>becomes the "Home LS" (HLS, see <a href="#glossary"
>title="Glossary">Section 6.1</a>) of that particular service. It is
>the responsibility of the HLS to make summary data about the all of the pS
>services it knows of available to the larger enterprise and to draw
>relevant queries to itself.</p><p id="rfc.section.2.3.p.2">Summarization is
>important to the overall success of this service as summaries prevents
>other LS instances from being overloaded by information. They must be
>general enough to allow for easy creation and exchange but also must retain
>enough information to provide a rich query interface able to locate the
>distributed information. That means service metadata information must be
>reduced (summarized) as it propagates through the LS cloud.</p><p
>id="rfc.section.2.3.p.3">We start by making an observation that
>summarization is best based on scope (see also <a href="#scope" title="Scope
Formation">Section 2.2</a> for forming scope). Simply !
put, thi
s means that we should attempt to summarize "more" the "farther" away from
the source that we get. This creates a smaller data set that travels the
farthest away while keeping the larger and more informative data sets closer
to the source. We present the strategies as such:</p><p
id="rfc.section.2.3.p.4"> </p><ul><li>Summarization for the "lower scope" -
we must examine services that we are directly responsible for and distill
this information into a compact and manageable form.</li><li>Summarization
for the "upper scope" of an LS - Aggregation of summaries from peers plus
additional summary steps will yield a concise (yet still compact) data set.
Potentially we will need to consider summaries from lower scopes and
aggregate these into the information set.</li></ul><p
id="rfc.section.2.3.p.5">We will limit our discussions to the previously
discussed inter-domain example, thus involving only two scope rings. Building
upon the basic ideas presented here, extension to multiple s
copes will be be presented in future work.</p><h3 id="rfc.section.2.3.1"><a
href="#rfc.section.2.3.1">2.3.1</a> <a id="lower_scope_summarization"
href="#lower_scope_summarization">Lower Scope Summarization</a></h3><p
id="rfc.section.2.3.1.p.1">The lower scope summarization, described here as
information exchange between HLS instances internal to a domain, consists of
simply extracting detailed information from the metadata descriptions
provided by registered services. For now we define this to be simply removing
additional "parameter" elements from the metadata. Special consideration must
be given to the "supportedEventType" parameter by simply converting this to
actual eventType elements. This will ensure interoperability with legacy
services.</p><p id="rfc.section.2.3.1.p.2">Future iterations may choose to
drop additional pieces of information deemed unnecessary or private such as
parts of topological descriptions. This sort of modification is encouraged as
long as th
e data remains "symmetrical" and conforms to the schematic d!
efinitio
ns for a given metadata description. It should be noted that such
modifications will affect the searching procedure and could isolate the
source services.</p><p id="rfc.section.2.3.1.p.3">The mechanics for
performing this level of summarization can use any number of technologies.
Either Extensible Stylesheet Language Transformation (XSLT) documents or the
XQuery language (see <a href="#glossary"
title="Glossary">Section 6.1</a>) may be used to prepare the initial
data for exchange in this first level. Since the exchange of this local
information will occur frequently, a simple operation that is scheduled or on
demand should be employed by the individual implementations to ensure the
regular LS functions are not impeded.</p><p id="rfc.section.2.3.1.p.4">In
order to make information available to the LS cloud, the HLS will advertise
this summary information to other LS instances to propagate the appropriate
information. Information exchange will be handled using a "taking t
urns" protocol such as token ring. The holder of the token will then perform
the information exchange to other known instances (see <a href="#glossary"
title="Glossary">Section 6.1</a>).</p><div id="hls-cloud"></div><div
id="rfc.figure.6"></div><p>Token passing between HLS cloud</p><pre>
+ _____ _____
+| | | |
+| LS1 | <----------------- | LS2 |
+|_____| Communicate via |_____|
+ | token exchange ^
+ | |
+ | _____ |
+ | | | |
+ |-------> | LS3 | ---------|
+ |_____|
+ </pre><p>HLS instances communicating via a token message</p><p
class="figure">Figure 6</p><div id="hls-cloud1"></div><div
id="rfc.figure.7"></div><p>Broadcast summary</p><pre>
+ _____ _____
+| | | |
+| LS1 | < > | LS2 |
+|_____| \ / |_____|
+ \ /
+ \ / Broadcast Summary info to
+ \ _____ / all peers when holding the token
+ | |
+ | LS3 |
+ |____/T\ Token
+ \_/
+
+ </pre><p>The holder of the token (LS3) will inform everyone of
its summary information.</p><p class="figure">Figure 7</p><p
id="rfc.section.2.3.1.p.7">Once exchanged, the details regarding storage in
the XML database backend (see <a href="#glossary"
title="Glossary">Section 6.1</a>) are also left to individual
implementations. It is understood that this information, in the possession of
non HLS instances, is provided as a convenience and should be treated in the
same way that directly registered information is (i.e. purged on expiration).
When responding to queries for this information, the LS must indicate whether
or not it is authoritative.</p><h3 id="rfc.section.2.3.2"><a
href="#rfc.section.2.3.2">2.3.2</a> <a id="upper_scope_summarization"
href="#upper_scope_summarization">Upper Scope Summarization</a></h3><p
id="rfc.section.2.3.2.p.1">A designated member of a given scope (often
corresponding to an organization) will be required to interact with ot
her similar LSs (generally representing other domains) in order to form a
higher-level, or "upper", scope. The mechanics of how we learn who is the
designated leader are discussed in <a href="#tokens" title="Token Messages
for Control and Election">Section 2.2.2</a>. The leader from each of the
first layers of this hierarchy (and the designated backup) will be
responsible for examining each member's summary information and building a
summarization/aggregation that represents the contents of the various LS
instances. This summary will serve as input to the upper scope.</p><p
id="rfc.section.2.3.2.p.2">The most natural summarization is based on the
topology of the network (like in network routing). Thus, topology-based
summarization will reduce available service instances in the same way that IP
addresses are summarized into network numbers. They will indicate the
eventTypes that a service has and ones that can it can generate.
Summarization will be performed using specia
lized summary algorithm. Topology information such as IP add!
resses w
ill be summarized using algorithms based on Radix Tree (see <a
href="#IP-summary" title="IP Address Summarization
Algorithm">Section 2.3.2.1</a>).</p><p id="rfc.section.2.3.2.p.3">Other
information can be summarized in an easier manner through the use of either
Extensible Stylesheet Language Transformation (XSLT) documents or the XQuery
language as discussed in the previous section. These mechanisms will take
into account the XML elements that represent the network topology currently
used in metadata subjects as well as additional items such as
eventTypes.</p><p id="rfc.section.2.3.2.p.4">The output of this process
becomes a "service summary" that represents the breadth of the original
input. This consists minimally of IP networks and addresses and the
eventTypes which are stored and can be generated. See <a
href="#LSControl-Summary-lower" title="LS Summary Message
(Lower)">Section 4.6</a> or <a href="#LSControl-Summary-upper" title="LS
Summary Message (Upper)">Sect
ion 4.7</a> for a mock-up of the summary output. Additional
transformations, while aggressive, will strive to preserve as much
information as possible to remain useful during the search procedures.</p><h4
id="rfc.section.2.3.2.1"><a href="#rfc.section.2.3.2.1">2.3.2.1</a> <a
id="IP-summary" href="#IP-summary">IP Address Summarization
Algorithm</a></h4><p id="rfc.section.2.3.2.1.p.1">To summarize a set of IP
addresses we can build upon a common structure for IP storage and lookup, the
<a href="http://en.wikipedia.org/wiki/Radix_tree";>Radix (or Patricia)
Tree</a>. Radix trees are used often used to store and look up IP address Our
goal is to index IP addresses using their natural hierarchy, where we can to
summarize groups of addresses by describing the ranges of values into which
they fall.</p><p id="rfc.section.2.3.2.1.p.2">A detailed explanation of the
nuances of the Radix Tree is well beyond the scope of this document, but a
brief overview is presented for comple
teness. The structure itself is best described as a binary t!
ree wher
e nodes contain whole values and the edges are the primary source of
navigation. The edges of the tree can be based on a single character or
perhaps on even longer strings (one of the features that leads to
efficiency). The algorithm should support the following basic
operations:</p><p id="rfc.section.2.3.2.1.p.3"> </p><ul><li>Lookup: Indicate
that something is or is not contained in the tree.</li><li>Insert: Like in
most inserts we attempt to place something into the structure. We first start
by doing a Lookup to see if it exists already; the point where we stop is
where we will insert the object. We are careful to utilize the longest
matching prefix of any other nearby edge to our advantage. This last part is
normally referred to as "splitting" and ensures that each node has no more
than two children.</li><li>Delete: Delete an object from the tree. This
operation will be complicated by "collapsing" parents that have a single
child and merging the edges.</li></ul><p id="rfc.
section.2.3.2.1.p.4">Once constructed, it is possible to consult the
structure in creating IP network summaries. The current prototype
implementation of summarization creates a Radix tree of IPs during an update
phase. Then it can perform 2 types of summarization. First, the "maximum
dominator" of the Radix tree is the maximum summarzation for all IP addresses
in the Radix tree. Using the optimization mentioned above regarding strings
longer than 1 character or bit, the solution to the maximum dominator problem
is trivial -- it is simply the first node below the root. The second type of
summarization is to determine "K-dominators". Essesntially, for a given
target K, we produce the most appropriate summarizing nodes. While this
problem is NP-complete, we can construct an approximation heuristic that
simply considers the length of the strings in the internal (or structural)
nodes of the tree. We leave for future work the problem of "Min cost
dominators", in which the best K a
nd the best K dominators are selected.</p><p id="rfc.section!
.2.3.2.1
.p.5">Essentially, the output of the this algorithm is a set of IP subnets
and address expressed in Classless Interdomain Routing (CIDR) style, i.e.
W.X.Y.Z/mask bits. There may be times when this address aggregation is
manually specified and this is a completely viable interim solution.</p><h2
id="rfc.section.2.4"><a href="#rfc.section.2.4">2.4</a> <a id="search"
href="#search">Search</a></h2><p id="rfc.section.2.4.p.1">The search
operation is obviously critical to the Lookup Service's function. It is
envisioned that searching could take one of two major forms, iterative and
recursive, analogous to those used in DNS. This design will focus exclusively
on iterative initially as the only method in the first versions of the dLS.
The key act when searching is to find what eventTypes exist for a particular
topology element or set of topology elements.</p><p
id="rfc.section.2.4.p.2">As outlined above, the full data that services
register to an LS is not expected to leave the
scope of that LS. The information is summarized before wider distribution.
Therefore, a client needs to find an LS in the scope of the HLS to make
queries about the complete metadata. Specifically, a client wishing to locate
information might specify a topology element in a query to locate the LS
instance (or instances) that contain the relevant details. This separation of
full data and summary data means the overall act of searching is broken down
into two distinct phases - Discovery and Metadata Query.</p><h3
id="rfc.section.2.4.1"><a href="#rfc.section.2.4.1">2.4.1</a> <a
id="discovery" href="#discovery">Discovery Phase</a></h3><p
id="rfc.section.2.4.1.p.1">The discovery phase is used to locate the set of
Authoritative LS (or LSes) for a given Subject/eventType tuple. This requires
a query to be constructed over the Discovery information set (which is not
described yet, but which must consist of the 3-tuple of Subject Summary,
eventType and Authoritative LS.) Either
a specific API call and a pre-prepared query, or some automa!
tic mech
anism, must map the desired query into a query of the Discovery info-set (see
<a href="#LSControl-Discovery" title="LS Discovery
Message">Section 4.9</a>).</p><h4 id="rfc.section.2.4.1.1"><a
href="#rfc.section.2.4.1.1">2.4.1.1</a> <a id="discovery-alg"
href="#discovery-alg">Discovery Algorithm</a></h4><p
id="rfc.section.2.4.1.1.p.1">The discovery algorithm is as
follows.</p><ol><li>A client locates an LS of some sort (this may be known
beforehand via a configuration value, or from bootstrapping).</li><li>The
client should start by making a discovery query (possibly using an API call)
to locate an LS that contains the data it is interested in. The results of
this query will be: <ol style="list-style-type: lower-alpha"><li>Failure:
Returned if there is no LS at a higher scope than the current one, and
nothing was found in the summary info-set that matches the
query.</li><li>Referral: This is returned when there is no match other than a
"global wildcard" <ol style="lis
t-style-type: lower-alpha"><li>If this LS is not participating in the
highest (upper) scope, then it returns the leader of its current scope (or a
direct referral to an instance of the next-higher scope.) This is effectively
a wildcard match saying "I don't know the answer, but I know who might." This
is how the Metadata registered to an LS in another scope (domain) is
found.</li></ol> </li><li>Success: We define success to mean at least one
matching LS has been returned. The LS must return the following: <ol
style="list-style-type: lower-alpha"><li>If this LS is an HLS for the
discovery query, it returns itself.</li><li>This LS also returns any other
HLS instances it has found that match. An LS instance will have summary
information from other domains when it is participating in a higher-level
scope (such as upper).</li><li>Note: this is where recursive searches would
be added into the discovery phase. The trail up the scope hierarchy would be
followed by the LS itself inst
ead of returning the leader LS. Ideally, this list would be !
iterated
on by the LS so that only leaf LS instances are returned</li></ol>
</li></ol> </li><li>The client will need to iterate through the list of
returned LS instances. If the LS returns itself, this LS can be used in the
following Metadata query phase. If the returned LS is different, a discovery
query should be made to it.</li></ol><h3 id="rfc.section.2.4.2"><a
href="#rfc.section.2.4.2">2.4.2</a> <a id="metadata-query"
href="#metadata-query">Metadata Query Phase</a></h3><p
id="rfc.section.2.4.2.p.1">The Metadata Query Phase with an individual LS is
the same as the query mechanism that is in place with the current LS
implementations.</p><p id="rfc.section.2.4.2.p.2">Once we have found the HLS
(or Home LSes) that contain data in the range of our discovery query, we can
pose Metadata Queries to each of them. The results will be failure or
success.</p><hr class="noprint"><h1 id="rfc.section.3" class="np"><a
href="#rfc.section.3">3.</a> <a id="bootstrapping" href="#bootstrap
ping">Bootstrapping</a></h1><p id="rfc.section.3.p.1">A distributed
information system such as the LS needs to address bootstrapping. In this
system, an LS instance needs to find other members of its scope (for each
scope in which it participates.) To accomplish this we will use a similar
solution to what DNS uses (root.hints).</p><p id="rfc.section.3.p.2">We will
maintain a service that maintains a list of currently known LS instances.
These known instances should preferably be at the upper scope. All clients
can cache this list. The service will be accessed via a well-known hostname,
and could be requested via UDP messages. (We can also use TCP here for some
sorts of anycast.)</p><p id="rfc.section.3.p.3">Initially this will be
deployed on one server. We can extend this to handle redundancy and load
balancing in the future by using multiple DNS records and implementing
ANYCAST with routing tricks for this well known hostname. (Additionally, we
can distribute an initial fil
e with a list of well known LS instances that are supported !
by the p
rimary perfSONAR participants.)</p><p id="rfc.section.3.p.4">The above
discovery algorithm is used to find an LS within a given scope. Therefore,
the only piece of information an LS should need to be pre-configured with is
the scope it belongs to. And as stated above, that can be assumed to be
"global:organization-dns-name". Note: Need to define the specific syntax
above.</p><hr class="noprint"><h1 id="rfc.section.4" class="np"><a
href="#rfc.section.4">4.</a> <a id="structures-and-messages"
href="#structures-and-messages">Structures and Messages</a></h1><h2
id="rfc.section.4.1"><a href="#rfc.section.4.1">4.1</a> <a
id="service-metadata" href="#service-metadata">Service metadata
example</a></h2><p id="rfc.section.4.1.p.1">Example of metadata describing
information collected and stored in Measurement Archive service</p><p
id="rfc.section.4.1.p.2"> <pre>

<nmwg:metadata xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/";
id="m_ale-netutil-1">
<netutil:subject
xmlns:netutil="http://ggf.org/ns/nmwg/characteristic/utilization/2.0/";
id="s_ale-netutil-1">
@@ -748,8 +748,8 @@
<nmwg:metadata>
<summary:subject
xmlns:summary="http://ggf.org/ns/nmwg/summary/2.0/">
<nmtl3:network>
- <nmtl3:subnet>128.4.10.0</nmtl3:subnet>
- <nmtl3:netmask>255.255.255.0</nmtl3:netmask>
+ <nmtl3:ipAddress>128.4.10.0/16</nmtl3:ipAddress>
+ 
<nmtl3:asn>666</nmtl3:asn>
</nmtl3:network>
</summary:subject>

Modified: trunk/nmwg/doc/dLS/dLS.pdf
===================================================================
(Binary files differ)

Modified: trunk/nmwg/doc/dLS/dLS.xml
===================================================================
--- trunk/nmwg/doc/dLS/dLS.xml 2007-12-13 19:25:26 UTC (rev 304)
+++ trunk/nmwg/doc/dLS/dLS.xml 2007-12-13 20:32:11 UTC (rev 305)
@@ -180,258 +180,9 @@
functionality.
</t>

-
+l
</section>
-
- 
- <section anchor="summary" title="Summarization">
- <t>
- The LS that a service contacts to register becomes the "Home LS"
- (HLS, see <xref target="glossary" />) of that particular service.
- It is the responsibility of the HLS to make summary data about the
- all of the pS services it knows of available to the larger
- enterprise and to draw relevant queries to itself.
- </t>
- <t>
- Summarization is important to the overall
- success of this service as summaries prevents other LS instances
- from being overloaded by information. They must be general
enough to allow
- for easy creation and exchange but also must retain enough
- information to provide a rich query interface able to locate the
- distributed information. That means service metadata information
- must be reduced (summarized) as it propagates through the LS
cloud.
- </t>
-
- <t>
- We start by making an
- observation that summarization is best based on scope (see also
- <xref target="scope" /> for forming scope). Simply put, this
- means that we should attempt to summarize "more" the "farther"
away from the
- source that we get. This creates a smaller data set that travels
- the farthest away while keeping the larger and more informative
data
- sets closer to the source. We present the strategies as such:
- </t>
- <t>
- <list style="symbols">
- <t>Summarization for the "lower scope" - we must examine
- services that we are directly responsible for and distill
this
- information into a compact and manageable form.
- </t>
- <t>Summarization for the "upper scope" of an LS - Aggregation
of
- summaries from peers plus additional summary steps will
yield a
- concise (yet still compact) data set. Potentially we
- will need to consider summaries from lower scopes and
aggregate
- these into the information set.
- </t>
- </list>
- </t>
-
- <t>
- We will limit our discussions to the previously discussed
inter-domain
- example, thus involving only two scope rings. Building upon the
- basic ideas presented here, extension to multiple scopes will be
- be presented in future work.
-
- </t>
-
- <section anchor="lower_scope_summarization" title="Lower Scope
Summarization">
- <t>
- The lower scope summarization, described here as information
- exchange between HLS instances internal to a domain, consists
of
- simply extracting detailed information from the metadata
descriptions
- provided by registered services. For now we define this to be
- simply removing additional "parameter" elements from the
metadata.
- Special consideration must be given to the "supportedEventType"
- parameter by simply converting this to actual eventType
elements.
- This will ensure interoperability with legacy services.
- </t>
- <t>
- Future iterations may choose to drop additional pieces of
- information deemed unnecessary or private such as parts of
- topological descriptions. This sort of modification is
encouraged
- as long as the data remains "symmetrical" and conforms to the
- schematic definitions for a given metadata description. It
should
- be noted that such modifications will affect the searching
- procedure and could isolate the source services.
- </t>
- <t>
- The mechanics for performing this level of summarization can
use
- any number of technologies. Either Extensible Stylesheet
- Language Transformation (XSLT) documents or the XQuery language
- (see <xref target="glossary" />) may be used to prepare the
- initial data for exchange in this first level. Since the
- exchange of this local information will occur frequently, a
- simple operation that is scheduled or on demand should be
- employed by the individual implementations to ensure the
- regular LS functions are not impeded.
- </t>
- <t>In order to make information available to the LS cloud, the HLS
- will advertise this summary information to other LS instances
to
- propagate the appropriate information. Information exchange
- will be handled using a "taking turns" protocol such as token
- ring. The holder of the token will then perform the
information
- exchange to other known instances (see <xref target="glossary"
/>).
- </t>
- <figure anchor="hls-cloud">
- <preamble>Token passing between HLS cloud</preamble>
- <artwork>
- _____ _____
-| | | |
-| LS1 | <----------------- | LS2 |
-|_____| Communicate via |_____|
- | token exchange ^
- | |
- | _____ |
- | | | |
- |-------> | LS3 | ---------|
- |_____|
- </artwork>
- <postamble>HLS instances communicating via a token
message</postamble>
- </figure>
-
- <figure anchor="hls-cloud1">
- <preamble>Broadcast summary</preamble>
- <artwork>
- _____ _____
-| | | |
-| LS1 | < > | LS2 |
-|_____| \ / |_____|
- \ /
- \ / Broadcast Summary info to
- \ _____ / all peers when holding the token
- | |
- | LS3 |
- |____/T\ Token
- \_/
-
- </artwork>
- <postamble>The holder of the token (LS3) will inform everyone of its
- summary information.</postamble>
- </figure>
-
- <t>Once exchanged, the details regarding storage in the XML database
- backend (see <xref target="glossary" />) are also left to
- individual implementations. It is understood that this
- information, in the possession of non HLS instances, is
provided
- as a convenience and should be treated in the same way that
- directly registered information is (i.e. purged on
expiration).
- When responding to queries for this information, the LS must
- indicate whether or not it is authoritative.
- </t>
- </section>
-
- <section anchor="upper_scope_summarization" title="Upper Scope
Summarization">
- <t>
- A designated member of a given scope (often corresponding to an
organization) will be
- required to interact with other similar LSs (generally
representing
- other domains) in order to form a higher-level, or "upper",
scope. The mechanics of
- how we learn who is the designated leader are discussed in
- <xref target="tokens" />. The leader from each of the first
layers
- of this hierarchy (and the designated backup) will be
responsible for
- examining each member's summary information and building a
- summarization/aggregation that represents the contents of the
various
- LS instances. This summary will serve as input to the upper
scope.
- </t>
- <t>
- The most natural summarization is based on the topology of the
- network (like in network routing). Thus, topology-based
- summarization will reduce available service instances in the same
- way that IP addresses are summarized into network numbers. They will
- indicate the eventTypes that a service has and ones that can it can
generate.
- Summarization will be performed
- using specialized summary algorithm. Topology information such
- as IP addresses will be summarized using algorithms based on
- Radix Tree (see <xref target="IP-summary" />).
- </t>
- <t>
- Other information can be summarized in an easier manner
- through the use of either Extensible Stylesheet Language
- Transformation (XSLT) documents or the XQuery language as
discussed
- in the previous section. These mechanisms will take into
account
- the XML elements that represent the network topology currently
used
- in metadata subjects as well as additional items such as
- eventTypes.
- </t>
- <t>
- The output of this process becomes a "service summary" that
- represents the breadth of the original input. This consists
minimally of IP networks
- and addresses and the eventTypes which are stored and can be
generated. See
- <xref target="LSControl-Summary-lower" /> or
- <xref target="LSControl-Summary-upper" /> for a mock-up of the
- summary output. Additional transformations, while aggressive,
will
- strive to preserve as much information as possible to remain
- useful during the search procedures.
- </t>
-
-
- <section anchor="IP-summary" title="IP Address Summarization Algorithm">
- <t>
- To summarize a set of IP addresses we can build upon a common
- structure for IP storage and lookup,
- the <eref
target="http://en.wikipedia.org/wiki/Radix_tree";>Radix (or Patricia)
Tree</eref>.
- Radix trees are used often used to store and look up IP address
- Our goal is to index IP addresses using their natural
hierarchy,
- where we can to summarize groups of addresses by describing
the ranges
- of values into which they fall.
- </t>
- <t>
- A detailed explanation of the nuances of the Radix Tree is well
- beyond the scope of this document, but a brief overview is
- presented for completeness. The structure itself is best
- described as a binary tree where nodes contain whole values
and
- the edges are the primary source of navigation. The edges of
the
- tree can be based on a single character or perhaps on even
longer
- strings (one of the features that leads to efficiency). The
algorithm
- should support the following basic
- operations:
- </t>
- <t>
- <list style="symbols">
- <t>Lookup: Indicate that something is or is not contained in
- the tree.</t>
- <t>Insert: Like in most inserts we attempt to place something
- into the structure. We first start by doing a Lookup to
- see if it exists already; the point where we stop is
where
- we will insert the object. We are careful to utilize
- the longest matching prefix of any other nearby edge to
our
- advantage. This last part is normally referred to as
- "splitting" and ensures that each node has no more than
- two children.</t>
- <t>Delete: Delete an object from the tree. This operation will
- be complicated by "collapsing" parents that have a single
- child and merging the edges.</t>
- </list>
- </t>
-
- <t>
- Once constructed, it is possible to consult the structure in
- creating IP network summaries. The current prototype
implementation of
- summarization creates a Radix tree of IPs during an update
phase.
- Then it can perform 2 types of summarization. First, the
"maximum
- dominator" of the Radix tree is the maximum summarzation for all IP
addresses in the Radix
- tree. Using the optimization mentioned above regarding strings
longer than 1 character or
- bit, the solution to the maximum dominator problem is trivial -- it
is simply the first
- node below the root. The second type of summarization is to
determine "K-dominators".
- Essesntially, for a given target K, we produce the most appropriate
summarizing nodes.
- While this problem is NP-complete, we can construct an approximation
heuristic that
- simply considers the length of the strings in the internal (or
structural) nodes of
- the tree. We leave for future work the problem of "Min cost
dominators", in which the
- best K and the best K dominators are selected.
- </t>
-
- <t>
- Essentially, the output of the this algorithm is a set of IP subnets
and address
- expressed in Classless Interdomain Routing (CIDR) style, i.e.
W.X.Y.Z/mask bits.
- There may be times when this address aggregation is manually
specified and this is a
- completely viable interim solution.
- </t>
-
- </section>
- </section>
- </section>
-
-
-
+
<section anchor="scope" title="Scope Formation">
<t>The next question is how to form the hierarchy of LS instances and
subsequently organize the 'scopes'. The simplest answer is that the
@@ -501,11 +252,6 @@
accomplish this, the initially contacted LS will request that the
current ring leader initiate
a token rotation to allow all members to update their LSRing list.

- 

</t>
<t>
@@ -616,7 +362,12 @@

</section>

- <section anchor="tokens" title="Token Passing">
+ <section anchor="tokens" title="Token Messages for Control and Election">
+
+ <t>When scopes are created they form themselves into logical rings
around which tokens can be
+ passed. These token passing mechanism is used for two purposes, for
registration control and
+ for leader election.</t>
+
<t>
The "token" is an LSControlMessage (see <xref
target="LSControl-Token" />)
meant to be passed around an LSRing to the various members in
some
@@ -629,6 +380,7 @@
complex calculations.
</t>

+ <section anchor="leader_election" title="Leader Election">
<t>
The essential idea in the token passing mechanism for leader election
is
that some identifier is chosen for each node and that the node with the
@@ -665,8 +417,11 @@
The key is that as long as the identifier is chosen consistently
within a
given scope, the choice of identifier doesn't affect the operation of
the
protocol.
- </t>
+ </t>
+
+ </section>

+ <section title="">
<t>
The token can be viewed as "permission to talk" and permits the
holding LS to send its summary information to all other
available LS
@@ -697,6 +452,8 @@
</t>
-->

+
+
<section anchor="token_passing_algorithm" title="Token Passing
Algorithm">

+ <section anchor="summary" title="Summarization">
+ <t>
+ The LS that a service contacts to register becomes the "Home LS"
+ (HLS, see <xref target="glossary" />) of that particular service.
+ It is the responsibility of the HLS to make summary data about the
+ all of the pS services it knows of available to the larger
+ enterprise and to draw relevant queries to itself.
+ </t>
+ <t>
+ Summarization is important to the overall
+ success of this service as summaries prevents other LS instances
+ from being overloaded by information. They must be general
enough to allow
+ for easy creation and exchange but also must retain enough
+ information to provide a rich query interface able to locate the
+ distributed information. That means service metadata information
+ must be reduced (summarized) as it propagates through the LS
cloud.
+ </t>
+
+ <t>
+ We start by making an
+ observation that summarization is best based on scope (see also
+ <xref target="scope" /> for forming scope). Simply put, this
+ means that we should attempt to summarize "more" the "farther"
away from the
+ source that we get. This creates a smaller data set that travels
+ the farthest away while keeping the larger and more informative
data
+ sets closer to the source. We present the strategies as such:
+ </t>
+ <t>
+ <list style="symbols">
+ <t>Summarization for the "lower scope" - we must examine
+ services that we are directly responsible for and distill
this
+ information into a compact and manageable form.
+ </t>
+ <t>Summarization for the "upper scope" of an LS - Aggregation
of
+ summaries from peers plus additional summary steps will
yield a
+ concise (yet still compact) data set. Potentially we
+ will need to consider summaries from lower scopes and
aggregate
+ these into the information set.
+ </t>
+ </list>
+ </t>
+
+ <t>
+ We will limit our discussions to the previously discussed
inter-domain
+ example, thus involving only two scope rings. Building upon the
+ basic ideas presented here, extension to multiple scopes will be
+ be presented in future work.
+
+ </t>
+
+ <section anchor="lower_scope_summarization" title="Lower Scope
Summarization">
+ <t>
+ The lower scope summarization, described here as information
+ exchange between HLS instances internal to a domain, consists
of
+ simply extracting detailed information from the metadata
descriptions
+ provided by registered services. For now we define this to be
+ simply removing additional "parameter" elements from the
metadata.
+ Special consideration must be given to the "supportedEventType"
+ parameter by simply converting this to actual eventType
elements.
+ This will ensure interoperability with legacy services.
+ </t>
+ <t>
+ Future iterations may choose to drop additional pieces of
+ information deemed unnecessary or private such as parts of
+ topological descriptions. This sort of modification is
encouraged
+ as long as the data remains "symmetrical" and conforms to the
+ schematic definitions for a given metadata description. It
should
+ be noted that such modifications will affect the searching
+ procedure and could isolate the source services.
+ </t>
+ <t>
+ The mechanics for performing this level of summarization can
use
+ any number of technologies. Either Extensible Stylesheet
+ Language Transformation (XSLT) documents or the XQuery language
+ (see <xref target="glossary" />) may be used to prepare the
+ initial data for exchange in this first level. Since the
+ exchange of this local information will occur frequently, a
+ simple operation that is scheduled or on demand should be
+ employed by the individual implementations to ensure the
+ regular LS functions are not impeded.
+ </t>
+ <t>In order to make information available to the LS cloud, the HLS
+ will advertise this summary information to other LS instances
to
+ propagate the appropriate information. Information exchange
+ will be handled using a "taking turns" protocol such as token
+ ring. The holder of the token will then perform the
information
+ exchange to other known instances (see <xref target="glossary"
/>).
+ </t>
+ <figure anchor="hls-cloud">
+ <preamble>Token passing between HLS cloud</preamble>
+ <artwork>
+ _____ _____
+| | | |
+| LS1 | <----------------- | LS2 |
+|_____| Communicate via |_____|
+ | token exchange ^
+ | |
+ | _____ |
+ | | | |
+ |-------> | LS3 | ---------|
+ |_____|
+ </artwork>
+ <postamble>HLS instances communicating via a token
message</postamble>
+ </figure>
+
+ <figure anchor="hls-cloud1">
+ <preamble>Broadcast summary</preamble>
+ <artwork>
+ _____ _____
+| | | |
+| LS1 | < > | LS2 |
+|_____| \ / |_____|
+ \ /
+ \ / Broadcast Summary info to
+ \ _____ / all peers when holding the token
+ | |
+ | LS3 |
+ |____/T\ Token
+ \_/
+
+ </artwork>
+ <postamble>The holder of the token (LS3) will inform everyone of its
+ summary information.</postamble>
+ </figure>
+
+ <t>Once exchanged, the details regarding storage in the XML database
+ backend (see <xref target="glossary" />) are also left to
+ individual implementations. It is understood that this
+ information, in the possession of non HLS instances, is
provided
+ as a convenience and should be treated in the same way that
+ directly registered information is (i.e. purged on
expiration).
+ When responding to queries for this information, the LS must
+ indicate whether or not it is authoritative.
+ </t>
+ </section>
+
+ <section anchor="upper_scope_summarization" title="Upper Scope
Summarization">
+ <t>
+ A designated member of a given scope (often corresponding to an
organization) will be
+ required to interact with other similar LSs (generally
representing
+ other domains) in order to form a higher-level, or "upper",
scope. The mechanics of
+ how we learn who is the designated leader are discussed in
+ <xref target="tokens" />. The leader from each of the first
layers
+ of this hierarchy (and the designated backup) will be
responsible for
+ examining each member's summary information and building a
+ summarization/aggregation that represents the contents of the
various
+ LS instances. This summary will serve as input to the upper
scope.
+ </t>
+ <t>
+ The most natural summarization is based on the topology of the
+ network (like in network routing). Thus, topology-based
+ summarization will reduce available service instances in the same
+ way that IP addresses are summarized into network numbers. They will
+ indicate the eventTypes that a service has and ones that can it can
generate.
+ Summarization will be performed
+ using specialized summary algorithm. Topology information such
+ as IP addresses will be summarized using algorithms based on
+ Radix Tree (see <xref target="IP-summary" />).
+ </t>
+ <t>
+ Other information can be summarized in an easier manner
+ through the use of either Extensible Stylesheet Language
+ Transformation (XSLT) documents or the XQuery language as
discussed
+ in the previous section. These mechanisms will take into
account
+ the XML elements that represent the network topology currently
used
+ in metadata subjects as well as additional items such as
+ eventTypes.
+ </t>
+ <t>
+ The output of this process becomes a "service summary" that
+ represents the breadth of the original input. This consists
minimally of IP networks
+ and addresses and the eventTypes which are stored and can be
generated. See
+ <xref target="LSControl-Summary-lower" /> or
+ <xref target="LSControl-Summary-upper" /> for a mock-up of the
+ summary output. Additional transformations, while aggressive,
will
+ strive to preserve as much information as possible to remain
+ useful during the search procedures.
+ </t>
+
+
+ <section anchor="IP-summary" title="IP Address Summarization Algorithm">
+ <t>
+ To summarize a set of IP addresses we can build upon a common
+ structure for IP storage and lookup,
+ the <eref
target="http://en.wikipedia.org/wiki/Radix_tree";>Radix (or Patricia)
Tree</eref>.
+ Radix trees are used often used to store and look up IP address
+ Our goal is to index IP addresses using their natural
hierarchy,
+ where we can to summarize groups of addresses by describing
the ranges
+ of values into which they fall.
+ </t>
+ <t>
+ A detailed explanation of the nuances of the Radix Tree is well
+ beyond the scope of this document, but a brief overview is
+ presented for completeness. The structure itself is best
+ described as a binary tree where nodes contain whole values
and
+ the edges are the primary source of navigation. The edges of
the
+ tree can be based on a single character or perhaps on even
longer
+ strings (one of the features that leads to efficiency). The
algorithm
+ should support the following basic
+ operations:
+ </t>
+ <t>
+ <list style="symbols">
+ <t>Lookup: Indicate that something is or is not contained in
+ the tree.</t>
+ <t>Insert: Like in most inserts we attempt to place something
+ into the structure. We first start by doing a Lookup to
+ see if it exists already; the point where we stop is
where
+ we will insert the object. We are careful to utilize
+ the longest matching prefix of any other nearby edge to
our
+ advantage. This last part is normally referred to as
+ "splitting" and ensures that each node has no more than
+ two children.</t>
+ <t>Delete: Delete an object from the tree. This operation will
+ be complicated by "collapsing" parents that have a single
+ child and merging the edges.</t>
+ </list>
+ </t>
+
+ <t>
+ Once constructed, it is possible to consult the structure in
+ creating IP network summaries. The current prototype
implementation of
+ summarization creates a Radix tree of IPs during an update
phase.
+ Then it can perform 2 types of summarization. First, the
"maximum
+ dominator" of the Radix tree is the maximum summarzation for all IP
addresses in the Radix
+ tree. Using the optimization mentioned above regarding strings
longer than 1 character or
+ bit, the solution to the maximum dominator problem is trivial -- it
is simply the first
+ node below the root. The second type of summarization is to
determine "K-dominators".
+ Essesntially, for a given target K, we produce the most appropriate
summarizing nodes.
+ While this problem is NP-complete, we can construct an approximation
heuristic that
+ simply considers the length of the strings in the internal (or
structural) nodes of
+ the tree. We leave for future work the problem of "Min cost
dominators", in which the
+ best K and the best K dominators are selected.
+ </t>
+
+ <t>
+ Essentially, the output of the this algorithm is a set of IP subnets
and address
+ expressed in Classless Interdomain Routing (CIDR) style, i.e.
W.X.Y.Z/mask bits.
+ There may be times when this address aggregation is manually
specified and this is a
+ completely viable interim solution.
+ </t>
+
+ </section>
+ </section>
+ </section>

nmwg: r305 - trunk/nmwg/doc/dLS, svnlog, 12/13/2007

List archive

nmwg: r305 - trunk/nmwg/doc/dLS