perfsonar-dev - Re: [pS-dev] Re: Self-test triggering/response messages pointer

Subject: perfsonar development work

List archive

Re: [pS-dev] Re: Self-test triggering/response messages pointer

From: "Luchesar V. ILIEV" <>
To:
Cc: Roman Lapacz <>, Szymon Trocha <>, Maciej Glowiak <>, Verena Venus <>, Stijn Melis <>, Cándido Rodríguez Montes <>, Nicolas Simar <>, Domenico Vicinanza <>, perfSONAR developers <>,
Subject: Re: [pS-dev] Re: Self-test triggering/response messages pointer
Date: Fri, 20 Jun 2008 04:12:45 +0300
Disposition-notification-to: "Luchesar V. ILIEV" <>
Openpgp: id=9A1FEEFF; url=https://cert.acad.bg/pgp-keys/
Organization: BG.ACAD/IPP-BAS

Jason Zurawski wrote:

Good point. However let's start with something easier, so how about a generic test as a beginning?

http://schemas.perfsonar.net/tools/admin/selftest/generic/full/1.0

See my comment to VV for an in depth reasoning. I am not opposed to 'all in one' as long as it is structured appropriately.

Indeed, you made very good point, and that's why I changed the proposal accordingly -- now you can easily add new self-tests in lieu of generic.

This may be true, but it is far from clean or ideal. We use the structured messages for a reason: there is a natural mechanism to find what we are really interested in by walking the XML. Doing a regex across several datums to verify the results of the test offers a quick and correct solution to this problem, but we do have the tools at our disposal to avoid shoving all possible results into a single text bloated element.

I totally agree in general. But I'm more a UNIX administrator than a web service developer (the latest not at all, actually :D), and that's perhaps why I'm more in peace with those kind of solutions.

But what's really important here: I wouldn't even think to argue with you, if we were talking about the "real" service operation. But we are talking about monitoring, and that's _completely_ different thing. The monitoring must be robust, and robust most always means simple: and the simpler -- the better.

I'd guess that you're not feeling nice with the currently proposed approach, because it really does sound like a bad thing for a web service to do, and I totally agree. But really, this is only a supporting task, and it need not adhere to the "web service good practices", let's call it this way. I'd rather totally pull it out of the web-services themselves, but that's the way we've taken already.

Finally, to just give you an example, you know that in critical systems like aeroplanes, spaceships, etc., the fine-precision, high-tech, complex systems are often backed up by simple, crude, and straightforward ugly mechanical ones. Like that plain old magnetic compass in the cockpit of an ultra-modern supersonic fighter plane. It'll most likely work long after all complex MFDs and gyroscopes have surrendered, and that's what I'd like to see in a monitoring system too.

Let me answer the rest of your mail a bit later, as it's almost morning here, I'm afraid. :|

Thanks a lot,
Luchesar

But I concur that separate metadata and data could be a benefit, especially if we implement more specific tests in the future.

If the separation was as so for a request:

<metadata id="m1">
<eventType>.../admin/selftest/ma/snmp/test1</eventType>
</metadata>

<metadata id="m2">
<eventType>.../admin/selftest/ma/snmp/test2</eventType>
</metadata>

<data id="d1" metadataIdRef="m1" />

<data id="d2" metadataIdRef="m2" />

The response would be easier to interpret:

<metadata id="r-m1" metadataIdRef="m1">
<eventType>.../success/...</eventType>
</metadata>

<metadata id="r-m2" metadataIdRef="m1">
<eventType>.../error/...</eventType>
</metadata>

<data id="d1" metadataIdRef="r-m1">
<datum>some results here</datum>
</data>

<data id="d2" metadataIdRef="r-m2">
<datum>some other results here</datum>
</data>

Sure, that's very good idea. But I haven't proposed it for one single, yet important reason so far. We don't have effective means to keep this information widely available and current. When the Lookup Services begin full operation, then perhaps the available self-tests could be something registered there. Then you could, really, easily send the type of requests you propose. So I'd suggest keeping this in mind, but start with the simpler generic self-test. Simply we need efficient enough self-testing for the upcoming 3.1, but can certainly extend it later.

Ok, we can shelve the discussion of 'what ifs' and extensive personalized tests. Even with the generic test you propose we need to:

a) keep the format within the confines of the rest of the framework
b) allow for this future extension.

The semantics of the multiple datum elements for this one test is really the issue that concerns me most. If I am running this generic test I would want to know exactly what was being tested in this so called bundle for two reasons really:

1) the logs need to reflect the details (e.g. the service running the test knows exactly what was going on, but the client doesn't get the complete picture)
2) I may want to run the failed tests again on their own, outside of the bundle.
Structuring things to avoid just using some number of datum elements does not complicate your idea for the generic test, and does not increase the amount of work needed to craft and send back an appropriate response from each service.

This would lend itself to easier logging (especially in the syslog case: the name of the test, the result code, and the result message are all easily correlated) and extension to whatever types of tests are required.

Okay, how about the response eventTypes:

http://schemas.perfsonar.net/status/selftest/generic/full/1.0/
success
error

Indeed, why has success and error been closer to the root (/) than what generates them? To me the schema above is more logical, but I might be missing something. I'll appreciate your opinion. Also, for the generic selftest, it does make sense to return single metadata and multiple datums. Once again, ID of the message origin is not a problem, as it must be in the datum anyway. Each datum would essentially contain what is written to syslog, and you'll want the origin of the message there, anyway.

I don't think the proposed way to convey the information via the datums is completely wrong; I just think it shoving a little too much and too complex information into an element that wasn't really designed to handle it. Hopefully I made my points clear in the examples from the previous message, if I haven't please let me know and I can formalize it more.

Saying that, and going back to what you said about logging, the EchoResponses will not be parsed to write the logs. Additionally to the EchoResponse, the service will write all messages via syslog. In fact, the web messages are additional to sysloging.

This is part of a larger discussion ( that should be scheduled for next week ) regarding 'completeness' in the initial request and the subsequent response. There is information loss when a service decides to cover up intermediate steps, and being a bit more verbose does not hurt (regardless of if the information appears in syslog or some other form of report).

-jason

Attachment: signature.asc
Description: OpenPGP digital signature

Re: [pS-dev] Re: Self-test triggering/response messages pointer, (continued)

List archive

Re: [pS-dev] Re: Self-test triggering/response messages pointer