perfsonar-dev - Re: [pS-dev] question about LookupService: Registration/KeepAlive

Subject: perfsonar development work

List archive

Re: [pS-dev] question about LookupService: Registration/KeepAlive

From: "Vedrin Jeliazkov" <>
To: Maciej Glowiak <>, "ulisses" <>
Cc: <>, "Szymon Trocha" <>, "Roman Lapacz" <>
Subject: Re: [pS-dev] question about LookupService: Registration/KeepAlive
Date: Thu, 07 Sep 2006 01:58:55 +0300

Hi Maciej, Ulisses, All,

Sorry for the quite long reply - I hope you'll find it helpful.

Maciej Glowiak
<>
wrote:

> ulisses wrote:
> > Hi Maciej!
> >
> > thanks for your answer, it has been very complete. But what I was thinking
> about
> > is that the client that registers the data, I believe it is not informed
> about the
> > timeout in which the data is cleaned, that is: in the response no
information
> on
> > when will the data expire is inserted.

Ulisses, I guess that by "client that registers the data" you actually mean a
service which registers its capabilities in LS, right? In my further comments
below I'm assuming this.

> Hi!
>
> Yes, that's right. We haven't thought about it yet.
>
> [ I CC-ed the e-mail to Vedrin. ]
>
> Vedrin, what do you think: what would be the best way (from client point
> of view) for determining such LS parameters as TTL:

Maciej, if your question refers only to visualization clients (which would
access the LS in a read-only mode), than I see two options:

1) the client doesn't receive/process resource TTLs at all; instead, it
assumes that if the LS has returned a pointer to a certain resource, then it's
reasonable to consider the resource as being available (alive);

2) the LS provides a TTL attribute for each returned resource (or set of
resources) and the client is free to make its own decisions based on this info
(e.g. cache the resource pointer while the returned TTL has not expired,
consider some resource as unstable if its remaining TTL is too short, ignore
the TTL attribute, etc...).

The first option is much simpler and straightforward to implement. So far we
have assumed that this would be the way to go, at least for the time being.
The second option somehow follows the model of DNS and is proved to be very
scalable, both from management and performance point of view. However, it
would also introduce more complexity, not only in the visualization client,
but in LS and all services registering in LS as well.

In general, the main advantage of "DNS-like" TTL is that it would be defined
by the resource owner, e.g. some RRD MA or other service registering with LS.
This would help avoiding nasty synchronisation issues, which might emerge with
our current way of processing LS registrations. As an example of such possible
issue, consider a RRD MA which is configured to register with a given LS every
1 hour, and suppose that the same LS is configured to assign a TTL of 15
minutes to all registered records and/or perform a clean-up every 30 minutes.
In such case I presume that RRD MA's registration with LS would be
unnecessarily flapping (continuously appearing and disappearing in LS'
database and advertisements). Well, if you have control both over the RRD MA's
and LS' configurations, of course you could avoid such issues by carefully
choosing the registration frequency of RRD MA, LS' clean-up interval and/or
LS' TTL. However, having in mind that perfSONAR is supposed to be a (very)
distributed set of services, perhaps it's not safe to rely on the assumption
that all services would be correctly configured and even that those parameter
values would be known in advance by someone, configuring a new service. Just
as an example, at present I'm not aware of a way to find out what are the
clean-up intervals and/or TTLs of LSes running at PIONIER and RNP, while this
info is crucial if I want to set correctly the registration frequency of my
RRD MA with them and avoid the flapping described above - I'm sure you get the
point.

In addition, again following some parallels with DNS, we could imagine that a
given LS might have to hold records with a wide variety of different TTLs,
depending on the type of respective resources. Moreover, it would be nice if
administrators could set low TTLs for experimental services and high TTLs for
long-life data provided by production services.

I think that it might be reasonable to introduce at some point DNS-like TTLs
in LS and in all services registering with LS. From LS' point of view this
would mean that it would trust the TTLs supplied by registering services.
During clean-up LS would delete only records with expired TTLs. Also, perhaps
it would be a nice feature if LS is able to calculate the required minimal
clean-up frequency as a function of the lowest registered TTL (let's say every
min(TTL)/2 seconds). Some protection against too aggressive clean-up could be
implemented both in LS (limit the highest clean-up frequency allowed) and in
registering services (limit the lowest TTL that can be configured). From a
registering service perspective, we wouldn't need to define registering
frequency any more. Instead, the service could calculate it, based on the
lowest resource TTL configured (let's say every min(TTL)/3). And finally, from
visualization (or other read-only) client's perspective we could either:

1) blindly rely on resources, returned by LS (without processing, or even
without receiving info about TTLs at all);

2) receive and process resource TTL attributes (e.g. useful if we would like
to implement some resource caching mechanism on the client side);

I guess there will be clients choosing to implement either 1) or 2), depending
on their use cases. For example, applications like CNM, which repeatedly
retrieve the same resources over and over again might benefit from such
caching mechanism. The LS itself would benefit as well, because it would have
less queries to process. On the other hand, clients which send only occasional
queries might choose to keep things simple, without implementing any caching
mechanism.

In summary, all of the above is rather a wish/suggestion than a requirement.
Maybe other people on this ML should comment as well, because this issue
affects all services, expected to communicate with LS. We should also consider
the amount of time developers would have to spend for implementing this stuff
and whether it is worthwhile.

> 1. To return them as parameters inside the result code (but it would
> require an action to the service because it would be in the registration
> response) and would be hard to control if we had more than one parameter
>
> 2. Management/Information interface for the Service. It'd need probably
> new message types and much more implementation in services (it should be
> generic feature for all services)
>
> I thing 2nd option would be the best solution but it'd need an agreement
> from the perfSONAR community.

Well, it's rather hard for me to comment these additional options. It looks
like the 2nd option is more promising... Again, I think that we could extract
many useful hints for LS and TTL design from the way DNS operates. Of course,
our case is different, because AFAIK multi-LS would not be organized in a
hierarchy, as opposed to DNS (there might be some additional factors to
consider, because of this - I don't know yet).

Let me just summarize the proposal discussed above, which I find quite
promising:

- TTLs are resource specific (e.g. interface metadata, circuit metadata,
service access point, etc);

- TTLs are set by the owners of the resources (they're expected to know best
what's the appropriate TTL for each resource);

- a defined minimum TTL allowed to be configured in services, which register
with LS (e.g. 180 sec);

- the registration frequency (Rf) with LS is a function of the lowest
configured resource TTL (let's say Rf = min(TTL)/3);

- resource TTLs are propagated untouched through LS(es) to clients (or other
LS(es);

- both LS and caching (visualization or other) clients perform frequently
enough clean-up (Cf) as a function of the lowest TTL received (let's say Cf =
min(TTL)/2);

- there's a defined maximum allowed Cf for LS and caching clients (in our case
not more often than every 180/2 = 90 sec) - it's important to have this
additional protection against registering services, which do not respect the
allowed minimum TTL for some reason;

- non-caching LS (visualization or other) clients don't receive (or simply
ignore) TTL attributes in resource records;

- a list of agreed and recommended TTLs for different types of resources is
published and advertised to perfSONAR service maintainers.

Kind regards,
Vedrin

question about LookupService: Registration/KeepAlive, ulisses, 09/06/2006
- Re: [pS-dev] question about LookupService: Registration/KeepAlive, Maciej Glowiak, 09/06/2006
  - Re: [pS-dev] question about LookupService: Registration/KeepAlive, ulisses, 09/06/2006
    - Re: [pS-dev] question about LookupService: Registration/KeepAlive, Maciej Glowiak, 09/06/2006
      - Re: [pS-dev] question about LookupService: Registration/KeepAlive, Vedrin Jeliazkov, 09/06/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, ulisses, 09/07/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Jeff W. Boote, 09/07/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Vedrin Jeliazkov, 09/07/2006
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Jeff W. Boote, 09/07/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, ulisses, 09/08/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Jeff W. Boote, 09/08/2006
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Vedrin Jeliazkov, 09/08/2006
        
        Re: [pS-dev] question about LookupService: Registration/KeepAlive, Maciej Glowiak, 09/11/2006

List archive

Re: [pS-dev] question about LookupService: Registration/KeepAlive