perfsonar-dev - Re: [pS-dev] perfSONAR protocol modification for high volume data transfers
Subject: perfsonar development work
List archive
- From: Roman Lapacz <>
- To: Jeff Boote <>
- Cc: , , Sibylle Schweizer-Jäckle <>
- Subject: Re: [pS-dev] perfSONAR protocol modification for high volume data transfers
- Date: Wed, 12 Mar 2008 11:13:46 +0100
Jeff Boote wrote:
Hi All,
Roman is absolutely correct. This is something we have discussed as a possibility nearly from the beginning. This type of solution is complicated by the fact that you need to communicate a much larger amount of information than just a URL for the data connection. (AA, binary data format, byte ordering, etc...)
However, I do wonder what specific data services you are seeing a problem with at this time? Can you make your problem more concrete?
There are several ways to mitigate the issues - and I'm not wondering if the DataHandle solution* (as I believe we called it IIRC - it could be push or pull) is really needed yet. For example, one possible solution to larger data flows we have considered in the pS-PS code is base64 encoding the entire data array for high-volume data. That can be done within the context of the single control-communication that currently exists and would hopefully save on the xml parsing. (It would require some additional metadata/parameters, but could be done within the same interaction model.)
* The DataHandle was in effect a URL for where the data could be retrieved (or could be sent depending on the context (push vs pull) - and included additional information on the binary representation and AA needs. I actually wrote up a description of how this would work, but alas - it doesn't seem to be on the wiki anymore. Evidently lost during one of the previous re-organizations of the wiki.
[more comments in-line]
On Mar 11, 2008, at 8:42 AM, Roman Lapacz wrote:
This issue takes us again to the push interface to transfer high volume data. Once we had idea to create two communication channels: control one (nmwg format) and data one (no xml structure, just data). The control channel would dynamically set up data connection between a sender and a receiver.
Andreas Hanemann wrote:
The difficulties are more or less tied to the flexibility that is foreseen in the NMWG format as briefly summarized in the following. There is the concept that data and metadata contain pointers so that a message typically looks like
“Metadata2, metadata1 (containing reference to metadata2), data (containing reference to metadata1)”,
but can also be organized in a different arbitrary manner. A further possibility is the potential multiple use of references (for filter metadata) which may look as follows.
“Filter metadata, metadata1 (containing reference to filter metadata), metadata2 (containing reference to filter metadata), data1 (containing reference to metadata1), data2 (containing reference to metadata2)”
The processing would therefore require an interleaved referencing to previous metadata so that a serial processing of data is not possible and requires that a complicated XML parsing tree is built in memory. Even though the latter case has currently not been used yet (to our knowledge), we have to address the amount of data to be kept in memory during the XML parsing.
Your concern is keeping this metadata in memory during parsing? The metadata should not be very large... Can you show an example where this has really been a concern?
What should be done is to try to allow for a serialized processing. There could be a metadata flag that says that the data in the message allow for serialized processing. This means that possibilities for arbitrary ordering are not used in this case and also that some metadata are repeated (e.g. the filtering metadata mentioned above). The idea of the flag is that services which have no need for high-volume data exchange may ignore the flag and process data as before so that there is no need for changes. Other services would be required to have a new library for parsing. We have to check for potential problems if there are messages suitable for stream processing and others are not stream processing-enabled (e.g. one RRD MA instance sends stream processing-enabled messages and the others not).
So, this method would only save you from having to cache metadata, right? Or am I missing something?
A further alternative would be a RESTful architecture which could make use of a modified MetaDataKeyRequest. A client asks for a key for a certain parameter set. The key in the answer is then a URL which is the location where the client can fetch the data together with a metadata description. The further data transfer is then done with HTTP which allows for stream processing. The data would not be wrapped in XML anymore which would minimize the overhead. However, this solution would require larger modifications to perfSONAR.
This is much more similar to the DataHandle idea that we discussed very early on and have put on the back-burner since it has not been needed yet.
Since this represents a very large change, I would want to see more information on why you think it is needed. If for no other reason, then to better be able to evaluate any potential
If we decided to introduce that push/pull (publisher/subscriber) interface I would treat it as a new feature and we could still use what we already have the same way (the configuration of a service would say what communication solution and when to use). This could be eventual smooth transition from one (older) solution to the newer one.
Roman
- perfSONAR protocol modification for high volume data transfers, Andreas Hanemann, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Roman Lapacz, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Jeff Boote, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Jeff Boote, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Michael Bischoff, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Roman Lapacz, 03/12/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Verena Venus, 03/12/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Andreas Hanemann, 03/12/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Szymon Trocha, 03/12/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Gijs Molenaar, 03/12/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Jeff Boote, 03/11/2008
- Re: [pS-dev] perfSONAR protocol modification for high volume data transfers, Roman Lapacz, 03/11/2008
Archive powered by MHonArc 2.6.16.