perfsonar-dev - Re: [pS-dev] Lookup service playground tab
Subject: perfsonar development work
List archive
- From: Jason Zurawski <>
- To: Nina Jeliazkova <>
- Cc: Szymon Trocha <>, "" <>, GN3 JRA2 T3 <>,
- Subject: Re: [pS-dev] Lookup service playground tab
- Date: Tue, 11 Aug 2009 07:34:14 -0400
- Organization: Internet2
Nina Jeliazkova wrote:
Jason,
Thank you for detailed explanations. It's clear now I need to find a
different way of testing the client, regardless of interoperability issues.
I think your testing methodology is sound, I am more interested in starting to identify what the GUIs need and how the service developers can start delivering this. From this discussion we have learned that there are unexpected behaviors in the summary set and in the hLS/gLS in general. Historical perspective on this was a requirement that information that enters into an hLS (and eventually gLS) be "available" as fast as possible - this is why the summarization set that the discovery messages use is calculated as quickly as it is, and why it can sometimes be viewed as unstable. Slowing this down may make it more stable, but makes the information less instantaneous.
There was never a full review of the how the Information Services are designed, and how well they allow for interaction. Now that lots of time has passed between initial deployment and interest in using these services is increasing, we need to have the discussion very soon, and we need to identify things in the original design that could be altered to improve the experience.
We have your particular use case, and I believe we have recorded some other complaints in the LHC community that have deployed the pS Performance Toolkit that would be valuable here as well. I will start compiling this into a larger list that can be discussed at the next meeting.
Regarding the service type, I've included it in the required query
parmeters, because the LS client API already supports it. If a service
functionality is determined just by event types, this might not be
necessary.
This is a hard question that we don't have an answer for. The eventTypes don't convey that a service is a measurement point or a measurement archive. I have always been wary of using the service parameters for anything 'valuable'. Would the API treat all of these the same:
<serviceType>ma</serviceType>
<serviceType>MA</serviceType>
<serviceType>Ma</serviceType>
What about this:
<serviceType>measurement archive</serviceType>
The eventTypes are valuable because they are standard - if HADES uses the bwctl eventType, the data format is the same as when perfSONAR-BUOY uses it. Maybe we need a similar URI/URN structure for services as well. This is just an idea, but:
http://perfsonar.net/service/ma/hades
or we can use the other syntax making the rounds in OGF
urn:perfsonar:service:type=ma:name=hades
Even if we don't use these, having a better way to parse/expect what the value of this field will be would make the IS and client interaction a little more fulfilling.
-jason
Regards,
Nina
Jason Zurawski wrote:
Nina Jeliazkova wrote:
Jason,
Did you read my response discussing possible reasons *why* things areIndeed, we have discussed summarization ‘tightness’ on a number ofToday experiment with the updated views Jason send, is as follows:I have explained this on numerous occasions before, and nothing has
http://dc211.internet2.edu/cgi-bin/perfSONAR/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
According to the tree view, 192.16.166.26 is served by
http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService
( resolves to 131.225.204.1 ) and I would expect this service to
appear
in the results. Unfortunately , this is not the case - could anyone
please provide an explanation? I assume the tree view is up to
date.
really changed: we are aware that the IP summarization is not as
tight
as some would like them to be. There is ongoing work with a student
at the University of Delaware to improve the algorithm and he has a
report on his results so far available here:
occasions already. However, I would like to emphasize that the
suggested
workaround is not suitable in this specific case, because the returned
superset of services does not include the specific entry I’m
looking for
(p-mdm.ps-lhcopn.fnal.gov or 131.225.204.1).
That is to say, if I apply the suggested workaround and start querying
every service from the returned superset list I don’t have any
chance to
query the ‘correct’ one (p-mdm.ps-lhcopn.fnal.gov or 131.225.204.1) ,
because it is not in the list.
not working as you would expect them to? There is absolutely nothing
*anyone* can do if the summarization is not working at the lowest
levels, I look forward to hearing from the MDM hLS maintainer so we
can figure out if there is or is not a problem here that needs to be
addressed.
In addition it appears that there’s some inconsistency between theNot sure what you are implying here by "inconsistency? - there are 2
output of the directory/tree summary view, which includes a
reference to
192.16.166.26, attached to p-mdm.ps-lhcopn.fnal.gov/131.225.204.1 and
the results of direct queries to this particular service that you
sent.
As already stated above, the same kind of inconsistency exists between
the output of the directory/tree summary view and the results of a
playground query for this particular IP address (192.16.166.26). I
would
appreciate an explanation of these inconsistencies.
interfaces on the first GUI for the *utilization eventType* (direction
"in" and direction "out"), the graphs for these two interfaces reflect
the data for these two directions. The GUI *does not* display other
eventTypes (errors, discards) so it has no reason to display the
remaining interface definitions that are in the hLS view.
Am I answering your concern correctly? If not can you try to explain
what "inconsistency" you are seeing?
The inconsistency I’m referring to consists in the fact that the
(192.16.166.26, p-mdm.ps-lhcopn.fnal.gov) tuple is present in the
directory and tree summary views but is not present in gLS response.
I dug deeper, and I will try to explain what I think is happening. I
do not know exactly since I can't access the service directly to
investigate further. If you read the design of the service
(http://anonsvn.internet2.edu/svn/nmwg/trunk/nmwg/doc/dLS/gLS/phase_1_color.html)
note that the gLS tries to maintain two data storage locations
- Registered hLSs - Each hLS should contact the gLS in a timely
manner and register a summary of *each* services it represents. For
example if the hLS knows about 2 RRDMAs and a PingER MA it will
register 3 summary sets to the gLS.
- Summarized hLSs - The gLS will go through the above list of hLSs
and summarize what it knows about each *service* into a single set. In the above example the gLS will summarize the 3 service datas into
one summary set to represent the hLS as a whole (this includes all
eventTypes, domains, keywords, IPaddresses). The gLS also then makes
one *big* summary of everything it knows about (all eventTypes, IP
addresses, domains, keywords) - this set is used to answer the
discovery queries you are sending.
Each of these data sets is controlled by the 'control structure' that
records when the last time the information was updated. These are
cleaned out when they 'expire' and may be done so independently of
each other.
What I think is happening:
1) Looking in the gLS data set (here if you are interested:
http://dc211.internet2.edu/cgi-bin/pA/view.cgi?hls=http://ps4.es.net:9990/perfSONAR_PS/services/gLS),
search for the hLS in question (p-mdm.ps-lhcopn.fnal.gov).
2) I only see this show up in the second information set - the
summarized information. I do not see it in the first data set - the
actually registered hLS information.
What this means:
3) The service registered, probably not too long ago, and was
summarized into the information sets that the gLS knows about. This
is why we still see it in the summary data set.
4) It stopped registering at some point, and the gLS 'cleaned' it from
the first data set. It did not clean it from the second data set
(yet) because this information is configured to live for a longer
length of time (this was a design assumption - I am not saying it is
correct but it is truth). Eventually, this record would be cleared
out as well.
5) Since the gLS builds the 'complete' summary set (the one that the
discovery query would be compared against) frequently, the FNAL hLS
would not show up as being in this set of information since it is
missing from the first data set. It would still have a summary
record, but this is not used in constructing the 'complete' set.
Things that I thing need to be answered to debug further:
- Why did the hLS stop registering and what is it's registration
interval? This is one more reason why the registration intervals (all
intervals really) are important and why adjusting the protocol around
the idea of the LSTTL is a very hard problem to solve.
- Are my design assumptions (above) correct about how the gLS should
behave regarding cleaning out information and making summary sets. I
say yes - but we may also want the summary set to die immediatly after
an hLS does so we don't introduce false positives as Nina is seeing.
- Is the lack of a summary set in the hLS view.cgi indicitive of this
service not registering to the gLS. I think the two may be related,
but I do not know too much about the MDM hLS architecture.
PerfsonarUI playground currently sends query without event types
specified, therefore it is expected _any_ information about the query
network element to be returned. I am not quite sure how your comment
about eventtypes is related to the issue.
I misunderstood what you were asking, I thought you were referring to
an inconsistency of what you were seeing in the two views.
Perhaps I am misinterpreting the directory and the tree view, and for
clarification please find attached screenshots of both views. The tuple
(192.16.166.26, p-mdm.ps-lhcopn.fnal.gov) do show there.
Note that these two views are not related to the gLS at all - these
are related the hLS (view.cgi) and the RRDMA service
(serviceTest.cgi). The gLS is only used to find these wherever they
may be in the world. This is why I did not understand what you were
asking and I apologize.
I understand many things can go wrong in a distributed system, but still
it is not clear to me :
- how this tuple got in the directory/tree summary views in the first
place (having in mind that it appears that it is not possible to obtain
such summary information from the specific service as you have already
shown);
See above. The web GUIs do not use the discovery data set - they use
the xquery interface to find the complete list of summarized services.
This is a more 'stable' view because it lives for a longer period of
time than the discovery set (which is updated very frequently).
- why this tuple is included in the directory/tree summary views but
is not included in response to the above mentioned query;
It appears that currently we don’t get a superset which could be further
narrowed down by sending specific queries to hLSes, but rather some set
which may or may not include the specific entry we’re looking for.
My answer above should cover this, but the discovery set is the 'most
up to date' information that the gLS knows about. I made many of the
design decisions around this set because I realized that GUIs would be
using this almost exclusively. If something has 'dropped out' of the
infrastructure, this set will reflect that change almost
instantaneously (gLSs should re-calculate this every 20 minutes in the
current configuration). This may explain your 'unpredictability'
because services will come and go and each gLS may not have a complete
picture.
We can improve this of course, and suggestion on how would welcomed.
Regarding the requirements from a client to the LS infrastructure, I
believe the basic query is given a network element (+event types +
service type) , be able to retrieve which services provide information
about it.
Define what you mean by 'serviceType', I am not sure understand why
this is important.
-jason
Regards,
Nina
P.S. Right now I've been querying for 192.16.166.26
and got
40498 ms
Query
urn:ogf:network:node=192.16.166.26
Services
http://216.255.240.2:8085/perfSONAR_PS/services/pSB
http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
The entire log with queries and responses is attached.
For the completeness, I've tried a query of ESNET MA
http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA for
134.55.221.30 and 198.129.254.101 (any event type) and only got an
empty response.
I assume this view comes from querying the LS infrastructure:
http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0
-jason
Directory view
http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0
<http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0>
Tree view
http://dc211.internet2.edu/cgi-bin/perfSONAR/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
Best regards,
Nina
http://code.google.com/p/perfsonar-ps/wiki/IPSummarization
and
https://damsl.cis.udel.edu/svn/perl-iptrie/README.html
I have not had a chance to incorporate his work into the gLS releases
yet, so this is still untested in a production environment. We will
be working towards releasing this before the end of the year. Until
then our suggestion of how to work around this still stands: multiple
queries to the result hLSs will be be required to determined if these
instances do or do not hold what you are looking for.
As for your results below I took a look at the hLS installed on
p-mdm.ps-lhcopn.fnal.gov, I would encourage all to see the internal
view here:
http://dc211.internet2.edu/cgi-bin/pA/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
This is an MDM hLS, so I am not aware of the specifics of how it
works
on the inside. According to the view of the hLS summary set
(represented as "Collection:
http://ogf.org/ns/nmwg/tools/org/perfsonar/service/lookup/discovery/xquery/2.0"
and "Store Type: LSStore-summary" - scroll all the way to the bottom)
there is nothing there. This may mean:
1) I am not querying this set correctly, I will let the MDM hLS
maintainer answer if I am doing this correctly. If there is a
problem
with my query, that would indicate a bug in my CGIs which I will
gladly fix. I am sending this query:
<nmwg:message type="LSQueryRequest"
id="LSQueryRequest"
xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/"
xmlns:xquery="http://ggf.org/ns/nmwg/tools/org/perfsonar/service/lookup/xquery/1.0/">
<nmwg:metadata id="meta1">
<xquery:subject id="sub1">
declare namespace nmwg="http://ggf.org/ns/nmwg/base/2.0/";
/nmwg:store[@type="LSStore-summary"]/nmwg:metadata
</xquery:subject>
<nmwg:eventType>http://ogf.org/ns/nmwg/tools/org/perfsonar/service/lookup/discovery/xquery/2.0</nmwg:eventType>
</nmwg:metadata>
<nmwg:data metadataIdRef="meta1" id="d1"/>
</nmwg:message>
I do get a response (but it is vaugue):
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><nmwg:message
xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/"
id="LSQueryRequest_resp"
messageIdRef="LSQueryRequest" type="LSQueryResponse"><nmwg:metadata
id="LSQueryResponseMetadata"><nmwg:eventType>success.ls.query</nmwg:eventType></nmwg:metadata><nmwg:data
id="LSQueryResponseData"
metadataIdRef="LSQueryResponseMetadata"><psservice:datum
xmlns:psservice="http://ggf.org/ns/nmwg/tools/org/perfsonar/service/1.0/"/></nmwg:data></nmwg:message></soapenv:Body></soapenv:Envelope>
2) The set is empty and the service is not summarizing, would
indicate
a bug in the service
3) The service just came up and did not have a chance to summarize
yet
(unlikely, this appears to have been registered for a while but I
can't prove this from where I sit)
-jason
First try:
40963 ms
Query
urn:ogf:network:node=192.16.166.26
Services
http://216.255.240.2:8085/perfSONAR_PS/services/pSB
http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/IUsnmpMA
http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/snmpMA
http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA
http://frown.es.net:8085/perfSONAR_PS/services/pSB
http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
Second try:
41670 ms
Query
urn:ogf:network:node=192.16.166.26
Services
http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/IUsnmpMA
http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/snmpMA
http://frown.es.net:8085/perfSONAR_PS/services/pSB
http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA
http://216.255.240.2:8085/perfSONAR_PS/services/pSB
Third try:
433 ms
Query
urn:ogf:network:node=192.16.166.26
Services
Fourth try:
3518 ms
Query
urn:ogf:network:node=192.16.166.26
Services
Fifth try:
3518 ms
Query
urn:ogf:network:node=192.16.166.26
Services
And on subsequent attempts no services were retrieved as well
Obviously, we need to think of a systematic testing approach.
- Re: [pS-dev] Lookup service playground tab, (continued)
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/11/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/19/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/19/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/19/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/19/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/20/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Jason Zurawski, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Nina Jeliazkova, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/10/2009
- Re: [pS-dev] Lookup service playground tab, Szymon Trocha, 08/12/2009
Archive powered by MHonArc 2.6.16.