Skip to Content.
Sympa Menu

perfsonar-dev - Re: [pS-dev] Re: data/metadata relationships

Subject: perfsonar development work

List archive

Re: [pS-dev] Re: data/metadata relationships


Chronological Thread 
  • From: "Jeff W. Boote" <>
  • To: Vedrin Jeliazkov <>
  • Cc: Maciej Glowiak <>,
  • Subject: Re: [pS-dev] Re: data/metadata relationships
  • Date: Wed, 23 Aug 2006 15:18:45 -0600

I would like to hear more input on this topic. I made a case for keeping metadata id's consistent and unique. Vedrin presented further use cases.

I would very much like to hear from DFN regarding how this might effect CNM.

To maintain data/metadata relationships requires support in each service engine to ensure that the same metadata id is used for the same content consistently. I would like to know how burdensome this is for current services?

To be honest - from a client perspective, I think this is a fairly important point. Otherwise, it will be much more difficult for clients to collate different data sets and determine that they are actually referring to the same interface.

Roman - how difficult would it be for the RRD MA to keep metadata id's consistent? Could it be done by just assigning an id during configuration time? Loukik, what about the SNMP MP? Others?

I've thought about this some from the CLMP point of view. Since everything is basically dynamically generated, I think there would need to be some kind of fixed mapping from endpoint elements to the metadata id.

I realize there are some real questions that need to be answered relating to how id's across multiple services - and a new 'load' of a configuration. But, perhaps we could just start with how difficult it would be to keep them consistent from a given service instance.

Thanks,
jeff

Vedrin Jeliazkov wrote:
Hello Jeff, Maciej, All,

"Jeff W. Boote"
<>
wrote:

<snip>

The point is that I am concerned that this situation can cause confusion for


clients. I asked everyone - most especially client developers - if having multiple references to the same information was confusing.


We haven't encountered so far problems with multiple references to the same
information. However, we do have a problem with missing explicit relations
between somehow related data and that's why I'm using the opportunity to
discuss it. Let me use an example with interface utilization in order to
illustrate this problem. In this case for a given interface we receive two
unrelated XML constructs for ingress and egress directions. There is no any
explicit indication that they correspond to the same interface. Currently we
deduce this grouping info by comparing interface attributes (like hostName,
ifName, ifDescription, etc...), but this is not fail safe, in particular when
those attributes are missing or their combinations are not unique for a given
interface. Inferring the relation from file names doesn't work either, because
in some cases the related info is stored in different files (or database
fields). We've also considered sending aggregate requests for different
metrics of a given endpoint, but this solution would impose some undesirable
constraints on the messaging and is not generic enough. Of course, the problem
could be avoided in the case of utilization RRD MAs by making sure that the
configured combinations of interface attributes are unique, but we feel that
this might not be the best approach.

Here is an example SetupDataResponse, illustrating the problem:

2006-08-07 17:43:07,390 [Thread-2] DEBUG org.perfsonar.client.ma.MARequest2 -
<?xml version="1.0" encoding="UTF-8"?>
<nmwg:message xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/";
id="localhost.-5e1feaef:10ce6c59f5e:-412d">
<nmwg:metadata id="meta1">
<netutil:subject
xmlns:netutil="http://ggf.org/ns/nmwg/characteristic/utilization/2.0/";
id="subj1">
<nmwgt:interface xmlns:nmwgt="http://ggf.org/ns/nmwg/topology/2.0/";>
<nmwgt:hostName>PoP-SOF</nmwgt:hostName>
<nmwgt:ifName>Fa0/0</nmwgt:ifName>
<nmwgt:ifDescription>SEEREN-SOF==ISTF-SOF(1)</nmwgt:ifDescription>
<nmwgt:ifAddress type="ipv4">194.141.252.2</nmwgt:ifAddress>
<nmwgt:direction>out</nmwgt:direction>
<nmwgt:capacity>100000000</nmwgt:capacity>
</nmwgt:interface>
</netutil:subject>
<nmwg:eventType>utilization</nmwg:eventType>
</nmwg:metadata>
<nmwg:metadata id="meta2">
<netutil:subject
xmlns:netutil="http://ggf.org/ns/nmwg/characteristic/utilization/2.0/";
id="subj2">
<nmwgt:interface xmlns:nmwgt="http://ggf.org/ns/nmwg/topology/2.0/";>
<nmwgt:hostName>PoP-SOF</nmwgt:hostName>
<nmwgt:ifName>Fa0/0</nmwgt:ifName>
<nmwgt:ifDescription>SEEREN-SOF==ISTF-SOF(1)</nmwgt:ifDescription>
<nmwgt:ifAddress type="ipv4">194.141.252.2</nmwgt:ifAddress>
<nmwgt:direction>in</nmwgt:direction>
<nmwgt:capacity>100000000</nmwgt:capacity>
</nmwgt:interface>
</netutil:subject>
<nmwg:eventType>utilization</nmwg:eventType>
</nmwg:metadata>
<nmwg:data id="data2" metadataIdRef="meta2">
<nmwg:key id="localhost.-5e1feaef:10ce6c59f5e:-4130">
<nmwg:parameters id="param2">
<nmwg:parameter name="dataSource">traffic_in</nmwg:parameter>
<nmwg:parameter
name="file">/var/db/rra/backbone_traffic_in_9.rrd</nmwg:parameter>
</nmwg:parameters>
</nmwg:key>
</nmwg:data>
<nmwg:data id="data1" metadataIdRef="meta1">
<nmwg:key id="localhost.-5e1feaef:10ce6c59f5e:-413c">
<nmwg:parameters id="param1">
<nmwg:parameter name="dataSource">traffic_out</nmwg:parameter>
<nmwg:parameter
name="file">/var/db/rra/backbone_traffic_in_9.rrd</nmwg:parameter>
</nmwg:parameters>
</nmwg:key>
</nmwg:data>
</nmwg:message>

Now imagine that you have another interface, configured with the same
attributes - the client would have no way to distinguish between the different
interfaces and their respective directions and would behave in some
unpredictable way or just return some mismatch error message, which cannot be
acted upon by end users. The same holds true for the cases when you might have
more different metrics for a given interface (or other endpoints). Our feeling
is that the messaging protocol should provide explicit information about the
relationship between a group of metrics and a given endpoint, rather than
expecting clients to deduce this information. In summary, we would prefer to
know in an explicit way that some traffic_in, traffic_out, errors_in,
erros_out, drops_in, drops_out, etc., are related to a particular interface.

<snip>

My point was asking what happens if the response from the SE is:

a)
<message>
<metadata id="X"/>
<metadata id="Y" metadataIdRef="X"/>
<data id="1" metadataIdRef="Y"/>
</message>

b)
<message>
<metadata id="X"/>
<metadata id="Z" metadataIdRef="X"/>
<data id="2" metadataIdRef="Z"/>
</message>

Specifically - imagine that the only thing in metadata Y,Z is the time selection. Don't you think the client would want to know that the two sets

of
data are in fact about the same 'interface'? Without having to look at each

and

every parameter in the metadata?


Yes, our feeling is that in some cases this would be beneficial and in others
- a must, especially if we want to avoid data presentation consistency
problems.

<snip>

Alternatively, for the case I show above, if SE's are required to maintain consistent and unique id references, you could return:

<message>
<metadata id="X"/>
<metadata id="Y" metadataIdRef="X"/>
<metadata id="Z" metadataIdRef="X"/>
<data id="1" metadataIdRef="Y"/>
<data id="2" metadataIdRef="Z"/>
</message>

The message handler would be able to determine that the metadata "X" was returned by both calls to the service engine because it is keeping track of

the

metadata it will return in a hash table. The duplicate could be ignored and

the

message output at the end. (Coincidentally, the metadata are already held in

a

HashMap in the Message class - so this is pretty much done. I have not

tested,

but it should "just work".)


Well, it looks like the alternative suggested above would solve our problem.
Please note that our remarks are relevant not only to the messaging protocol,
but to the contents of the config file as well.

<snip>

Kind regards,
Vedrin



--- Begin Message ---
  • From: "Jeff W. Boote" <>
  • To: Maciej Glowiak <>
  • Cc:
  • Subject: Re: data/metadata relationships
  • Date: Wed, 02 Aug 2006 10:33:10 -0600
Maciej Glowiak wrote:
Jeff,

The solution with _number has been used for more than 2 months.

I can see that it has been used by you. That is not the point.

The point is that I am concerned that this situation can cause confusion for clients. I asked everyone - most especially client developers - if having multiple references to the same information was confusing.

The value of a community software project, is community review. I did not say what you had done was bad. I specifically said I was concerned with one of the implications. If others are not concerned, I will drop the issue. But, I want to make sure others (especially client developers that are under-represented in our development group) understand the implications. Don't you want the services as usable by clients as possible?

1. As you remember, we didn't have generic Message Handler, so everybody wrote his own MH. I had to write one for Lookup Service and tried to make it quite generic. Then Roman used it for his MAs, but of course everybody may use his own MH for his own service.

I just asked if anyone else was concerned that the data/metadata relationships were not being preserved in this methodology. If no one else is concerned, I'll drop the issue. The fact that others are copying your work is all the more reason to discuss the implications, is it not?

2. Your case with one common metadata:

--------------------------------------------------
<message>

<metadata id="a"/>
<metadata id="b" metadataIdRef="a"/>
<metadata id="c" metadataIdRef="c"/>

<data metadataIdRef="b"/>
<data metadataIdRef="c"/>

</message>
--------------------------------------------------

must be split for two sub-requests if we want to run Service Engine
separate for each data trigger. That's work for both MAs now and for
LS (except LSRegister which has its own simple Message Handler)

3. Service output is not divided into pieces, so request message from
pt.2 will cause running Service Engine two times:

a) request to SE
--------------------------------------------------
<message>
<metadata id="a"/>
<metadata id="b" metadataIdRef="a"/>
<data metadataIdRef="b"/>
</message>
--------------------------------------------------

b) request to SE
--------------------------------------------------
<message>
<metadata id="a"/>
<metadata id="c" metadataIdRef="c"/>
<data metadataIdRef="c"/>
</message>
--------------------------------------------------

and Service Engine will return similar set of metadatas and datas
with the same identifiers, for instance:

a) response from SE
--------------------------------------------------
<message>
<metadata id="X"/>
<data metadataIdRef="X"/>
</message>
--------------------------------------------------

b) response from SE
--------------------------------------------------
<message>
<metadata id="X"/>
<data metadataIdRef="X"/>
</message>
--------------------------------------------------

My point was asking what happens if the response from the SE is:

a)
<message>
<metadata id="X"/>
<metadata id="Y" metadataIdRef="X"/>
<data id="1" metadataIdRef="Y"/>
</message>

b)
<message>
<metadata id="X"/>
<metadata id="Z" metadataIdRef="X"/>
<data id="2" metadataIdRef="Z"/>
</message>

Specifically - imagine that the only thing in metadata Y,Z is the time selection. Don't you think the client would want to know that the two sets of data are in fact about the same 'interface'? Without having to look at each and every parameter in the metadata?

Now, ho do you combine these two into one response message? By
changing identifiers...

So the response message will look like:

--------------------------------------------------
<message>
<metadata id="X_1"/>
<data metadataIdRef="X_1"/>

<metadata id="X_2"/>
<data metadataIdRef="X_2"/>
</message>
--------------------------------------------------

Alternatively, for the case I show above, if SE's are required to maintain consistent and unique id references, you could return:

<message>
<metadata id="X"/>
<metadata id="Y" metadataIdRef="X"/>
<metadata id="Z" metadataIdRef="X"/>
<data id="1" metadataIdRef="Y"/>
<data id="2" metadataIdRef="Z"/>
</message>

The message handler would be able to determine that the metadata "X" was returned by both calls to the service engine because it is keeping track of the metadata it will return in a hash table. The duplicate could be ignored and the message output at the end. (Coincidentally, the metadata are already held in a HashMap in the Message class - so this is pretty much done. I have not tested, but it should "just work".)

And as I said, it works fine. Of course Message handler may be more
intelligent, but it'd need big effort to implement it, and the final
effect may also not be satisfied.

The intelligence I suggest in the message handler is only a hash of metadata id -> metadata element. This would ensure no duplicates are put in the outgoing message. This is trivial (and done in the Message class already).

The question I asked was how difficult it would be to maintain consistency and uniqueness of metadata id's in the service engine. That is where any additional complexity would be. But, for most SE's I would imagine this would not be too difficult. For the LS I'm sure XML DB's have auto generated ID's for individual elements in the database that could be used for the id. For the RRD MA this would just be an id assigned when the config file is loaded (can be automatic for the XML DB method I believe).

We have often discussed the id's and references and questioned how unique they needed to be. I'm just suggesting reasons to make them very unique (and stable).

jeff


--- End Message ---



Archive powered by MHonArc 2.6.16.

Top of Page