Skip to Content.
Sympa Menu

perfsonar-dev - Re: [pS-dev] Lookup service playground tab

Subject: perfsonar development work

List archive

Re: [pS-dev] Lookup service playground tab


Chronological Thread 
  • From: Nina Jeliazkova <>
  • To:
  • Cc: Szymon Trocha <>, "" <>, GN3 JRA2 T3 <>,
  • Subject: Re: [pS-dev] Lookup service playground tab
  • Date: Tue, 11 Aug 2009 13:28:20 +0300

Jason,

Thank you for detailed explanations. It's clear now I need to find a
different way of testing the client, regardless of interoperability issues.

Regarding the service type, I've included it in the required query
parmeters, because the LS client API already supports it. If a service
functionality is determined just by event types, this might not be
necessary.

Regards,
Nina

Jason Zurawski wrote:
> Nina Jeliazkova wrote:
>> Jason,
>>
>>
>>>>>> Today experiment with the updated views Jason send, is as follows:
>>>>>>
>>>>>> http://dc211.internet2.edu/cgi-bin/perfSONAR/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> According to the tree view, 192.16.166.26 is served by
>>>>>> http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService
>>>>>>
>>>>>>
>>>>>> ( resolves to 131.225.204.1 ) and I would expect this service to
>>>>>> appear
>>>>>> in the results. Unfortunately , this is not the case - could anyone
>>>>>> please provide an explanation? I assume the tree view is up to
>>>>>> date.
>>>>> I have explained this on numerous occasions before, and nothing has
>>>>> really changed: we are aware that the IP summarization is not as
>>>>> tight
>>>>> as some would like them to be. There is ongoing work with a student
>>>>> at the University of Delaware to improve the algorithm and he has a
>>>>> report on his results so far available here:
>>>>>
>>>> Indeed, we have discussed summarization ‘tightness’ on a number of
>>>> occasions already. However, I would like to emphasize that the
>>>> suggested
>>>> workaround is not suitable in this specific case, because the returned
>>>> superset of services does not include the specific entry I’m
>>>> looking for
>>>> (p-mdm.ps-lhcopn.fnal.gov or 131.225.204.1).
>>>>
>>>> That is to say, if I apply the suggested workaround and start querying
>>>> every service from the returned superset list I don’t have any
>>>> chance to
>>>> query the ‘correct’ one (p-mdm.ps-lhcopn.fnal.gov or 131.225.204.1) ,
>>>> because it is not in the list.
>>>
>>> Did you read my response discussing possible reasons *why* things are
>>> not working as you would expect them to? There is absolutely nothing
>>> *anyone* can do if the summarization is not working at the lowest
>>> levels, I look forward to hearing from the MDM hLS maintainer so we
>>> can figure out if there is or is not a problem here that needs to be
>>> addressed.
>>>
>>>
>>>> In addition it appears that there’s some inconsistency between the
>>>> output of the directory/tree summary view, which includes a
>>>> reference to
>>>> 192.16.166.26, attached to p-mdm.ps-lhcopn.fnal.gov/131.225.204.1 and
>>>> the results of direct queries to this particular service that you
>>>> sent.
>>>> As already stated above, the same kind of inconsistency exists between
>>>> the output of the directory/tree summary view and the results of a
>>>> playground query for this particular IP address (192.16.166.26). I
>>>> would
>>>> appreciate an explanation of these inconsistencies.
>>>
>>> Not sure what you are implying here by "inconsistency? - there are 2
>>> interfaces on the first GUI for the *utilization eventType* (direction
>>> "in" and direction "out"), the graphs for these two interfaces reflect
>>> the data for these two directions. The GUI *does not* display other
>>> eventTypes (errors, discards) so it has no reason to display the
>>> remaining interface definitions that are in the hLS view.
>>>
>>> Am I answering your concern correctly? If not can you try to explain
>>> what "inconsistency" you are seeing?
>>
>>
>> The inconsistency I’m referring to consists in the fact that the
>> (192.16.166.26, p-mdm.ps-lhcopn.fnal.gov) tuple is present in the
>> directory and tree summary views but is not present in gLS response.
>
>
> I dug deeper, and I will try to explain what I think is happening. I
> do not know exactly since I can't access the service directly to
> investigate further. If you read the design of the service
> (http://anonsvn.internet2.edu/svn/nmwg/trunk/nmwg/doc/dLS/gLS/phase_1_color.html)
> note that the gLS tries to maintain two data storage locations
>
> - Registered hLSs - Each hLS should contact the gLS in a timely
> manner and register a summary of *each* services it represents. For
> example if the hLS knows about 2 RRDMAs and a PingER MA it will
> register 3 summary sets to the gLS.
>
> - Summarized hLSs - The gLS will go through the above list of hLSs
> and summarize what it knows about each *service* into a single set.
> In the above example the gLS will summarize the 3 service datas into
> one summary set to represent the hLS as a whole (this includes all
> eventTypes, domains, keywords, IPaddresses). The gLS also then makes
> one *big* summary of everything it knows about (all eventTypes, IP
> addresses, domains, keywords) - this set is used to answer the
> discovery queries you are sending.
>
> Each of these data sets is controlled by the 'control structure' that
> records when the last time the information was updated. These are
> cleaned out when they 'expire' and may be done so independently of
> each other.
>
> What I think is happening:
>
> 1) Looking in the gLS data set (here if you are interested:
> http://dc211.internet2.edu/cgi-bin/pA/view.cgi?hls=http://ps4.es.net:9990/perfSONAR_PS/services/gLS),
> search for the hLS in question (p-mdm.ps-lhcopn.fnal.gov).
>
> 2) I only see this show up in the second information set - the
> summarized information. I do not see it in the first data set - the
> actually registered hLS information.
>
> What this means:
>
> 3) The service registered, probably not too long ago, and was
> summarized into the information sets that the gLS knows about. This
> is why we still see it in the summary data set.
>
> 4) It stopped registering at some point, and the gLS 'cleaned' it from
> the first data set. It did not clean it from the second data set
> (yet) because this information is configured to live for a longer
> length of time (this was a design assumption - I am not saying it is
> correct but it is truth). Eventually, this record would be cleared
> out as well.
>
> 5) Since the gLS builds the 'complete' summary set (the one that the
> discovery query would be compared against) frequently, the FNAL hLS
> would not show up as being in this set of information since it is
> missing from the first data set. It would still have a summary
> record, but this is not used in constructing the 'complete' set.
>
> Things that I thing need to be answered to debug further:
>
> - Why did the hLS stop registering and what is it's registration
> interval? This is one more reason why the registration intervals (all
> intervals really) are important and why adjusting the protocol around
> the idea of the LSTTL is a very hard problem to solve.
>
> - Are my design assumptions (above) correct about how the gLS should
> behave regarding cleaning out information and making summary sets. I
> say yes - but we may also want the summary set to die immediatly after
> an hLS does so we don't introduce false positives as Nina is seeing.
>
> - Is the lack of a summary set in the hLS view.cgi indicitive of this
> service not registering to the gLS. I think the two may be related,
> but I do not know too much about the MDM hLS architecture.
>
>
>> PerfsonarUI playground currently sends query without event types
>> specified, therefore it is expected _any_ information about the query
>> network element to be returned. I am not quite sure how your comment
>> about eventtypes is related to the issue.
>
>
> I misunderstood what you were asking, I thought you were referring to
> an inconsistency of what you were seeing in the two views.
>
>
>> Perhaps I am misinterpreting the directory and the tree view, and for
>> clarification please find attached screenshots of both views. The tuple
>> (192.16.166.26, p-mdm.ps-lhcopn.fnal.gov) do show there.
>
>
> Note that these two views are not related to the gLS at all - these
> are related the hLS (view.cgi) and the RRDMA service
> (serviceTest.cgi). The gLS is only used to find these wherever they
> may be in the world. This is why I did not understand what you were
> asking and I apologize.
>
>
>> I understand many things can go wrong in a distributed system, but still
>> it is not clear to me :
>>
>> - how this tuple got in the directory/tree summary views in the first
>> place (having in mind that it appears that it is not possible to obtain
>> such summary information from the specific service as you have already
>> shown);
>
>
> See above. The web GUIs do not use the discovery data set - they use
> the xquery interface to find the complete list of summarized services.
> This is a more 'stable' view because it lives for a longer period of
> time than the discovery set (which is updated very frequently).
>
>
>> - why this tuple is included in the directory/tree summary views but
>> is not included in response to the above mentioned query;
>>
>> It appears that currently we don’t get a superset which could be further
>> narrowed down by sending specific queries to hLSes, but rather some set
>> which may or may not include the specific entry we’re looking for.
>
>
> My answer above should cover this, but the discovery set is the 'most
> up to date' information that the gLS knows about. I made many of the
> design decisions around this set because I realized that GUIs would be
> using this almost exclusively. If something has 'dropped out' of the
> infrastructure, this set will reflect that change almost
> instantaneously (gLSs should re-calculate this every 20 minutes in the
> current configuration). This may explain your 'unpredictability'
> because services will come and go and each gLS may not have a complete
> picture.
>
> We can improve this of course, and suggestion on how would welcomed.
>
>
>> Regarding the requirements from a client to the LS infrastructure, I
>> believe the basic query is given a network element (+event types +
>> service type) , be able to retrieve which services provide information
>> about it.
>
>
> Define what you mean by 'serviceType', I am not sure understand why
> this is important.
>
> -jason
>
>
>> Regards,
>> Nina
>>
>> P.S. Right now I've been querying for 192.16.166.26
>>
>> and got
>>
>> 40498 ms
>> Query
>> urn:ogf:network:node=192.16.166.26
>> Services
>> http://216.255.240.2:8085/perfSONAR_PS/services/pSB
>> http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
>>
>> The entire log with queries and responses is attached.
>>
>> For the completeness, I've tried a query of ESNET MA
>> http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA for
>> 134.55.221.30 and 198.129.254.101 (any event type) and only got an
>> empty response.
>>
>> I assume this view comes from querying the LS infrastructure:
>> http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0
>>
>>
>>
>>
>>> -jason
>>>
>>>
>>>> Directory view
>>>> http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0
>>>>
>>>>
>>>> <http://dc211.internet2.edu/cgi-bin/perfSONAR/serviceTest.cgi?url=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-rrd-ma/services/MeasurementArchiveService&eventType=http://ggf.org/ns/nmwg/characteristic/utilization/2.0>
>>>>
>>>>
>>>>
>>>> Tree view
>>>> http://dc211.internet2.edu/cgi-bin/perfSONAR/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
>>>>
>>>>
>>>>
>>>> Best regards,
>>>> Nina
>>>>> http://code.google.com/p/perfsonar-ps/wiki/IPSummarization
>>>>>
>>>>> and
>>>>>
>>>>> https://damsl.cis.udel.edu/svn/perl-iptrie/README.html
>>>>>
>>>>> I have not had a chance to incorporate his work into the gLS releases
>>>>> yet, so this is still untested in a production environment. We will
>>>>> be working towards releasing this before the end of the year. Until
>>>>> then our suggestion of how to work around this still stands: multiple
>>>>> queries to the result hLSs will be be required to determined if these
>>>>> instances do or do not hold what you are looking for.
>>>>>
>>>>> As for your results below I took a look at the hLS installed on
>>>>> p-mdm.ps-lhcopn.fnal.gov, I would encourage all to see the internal
>>>>> view here:
>>>>>
>>>>> http://dc211.internet2.edu/cgi-bin/pA/view.cgi?hls=http://p-mdm.ps-lhcopn.fnal.gov:8080/geant2-java-xml-ls/services/LookupService
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This is an MDM hLS, so I am not aware of the specifics of how it
>>>>> works
>>>>> on the inside. According to the view of the hLS summary set
>>>>> (represented as "Collection:
>>>>> http://ogf.org/ns/nmwg/tools/org/perfsonar/service/lookup/discovery/xquery/2.0";
>>>>>
>>>>>
>>>>> and "Store Type: LSStore-summary" - scroll all the way to the bottom)
>>>>> there is nothing there. This may mean:
>>>>>
>>>>> 1) I am not querying this set correctly, I will let the MDM hLS
>>>>> maintainer answer if I am doing this correctly. If there is a
>>>>> problem
>>>>> with my query, that would indicate a bug in my CGIs which I will
>>>>> gladly fix. I am sending this query:
>>>>>
>>>>> <nmwg:message type="LSQueryRequest"
>>>>> id="LSQueryRequest"
>>>>> xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/";
>>>>>
>>>>> xmlns:xquery="http://ggf.org/ns/nmwg/tools/org/perfsonar/service/lookup/xquery/1.0/";>
>>>>>
>>>>>
>>>>>
>>>>> <nmwg:metadata id="meta1">
>>>>> <xquery:subject id="sub1">
>>>>> declare namespace nmwg="http://ggf.org/ns/nmwg/base/2.0/";;
>>>>>
>>>>> /nmwg:store[@type="LSStore-summary"]/nmwg:metadata
>>>>> </xquery:subject>
>>>>>
>>>>> <nmwg:eventType>http://ogf.org/ns/nmwg/tools/org/perfsonar/service/lookup/discovery/xquery/2.0</nmwg:eventType>
>>>>>
>>>>>
>>>>>
>>>>> </nmwg:metadata>
>>>>> <nmwg:data metadataIdRef="meta1" id="d1"/>
>>>>> </nmwg:message>
>>>>>
>>>>> I do get a response (but it is vaugue):
>>>>>
>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>> <soapenv:Envelope
>>>>> xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/";
>>>>> xmlns:xsd="http://www.w3.org/2001/XMLSchema";
>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";><soapenv:Body><nmwg:message
>>>>>
>>>>>
>>>>> xmlns:nmwg="http://ggf.org/ns/nmwg/base/2.0/";
>>>>> id="LSQueryRequest_resp"
>>>>> messageIdRef="LSQueryRequest" type="LSQueryResponse"><nmwg:metadata
>>>>> id="LSQueryResponseMetadata"><nmwg:eventType>success.ls.query</nmwg:eventType></nmwg:metadata><nmwg:data
>>>>>
>>>>>
>>>>> id="LSQueryResponseData"
>>>>> metadataIdRef="LSQueryResponseMetadata"><psservice:datum
>>>>> xmlns:psservice="http://ggf.org/ns/nmwg/tools/org/perfsonar/service/1.0/"/></nmwg:data></nmwg:message></soapenv:Body></soapenv:Envelope>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2) The set is empty and the service is not summarizing, would
>>>>> indicate
>>>>> a bug in the service
>>>>>
>>>>> 3) The service just came up and did not have a chance to summarize
>>>>> yet
>>>>> (unlikely, this appears to have been registered for a while but I
>>>>> can't prove this from where I sit)
>>>>>
>>>>> -jason
>>>>>
>>>>>
>>>>>> First try:
>>>>>> 40963 ms
>>>>>> Query
>>>>>> urn:ogf:network:node=192.16.166.26
>>>>>> Services
>>>>>> http://216.255.240.2:8085/perfSONAR_PS/services/pSB
>>>>>>
>>>>>> http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/IUsnmpMA
>>>>>> http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/snmpMA
>>>>>> http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA
>>>>>> http://frown.es.net:8085/perfSONAR_PS/services/pSB
>>>>>> http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
>>>>>>
>>>>>> Second try:
>>>>>> 41670 ms
>>>>>> Query
>>>>>> urn:ogf:network:node=192.16.166.26
>>>>>> Services
>>>>>>
>>>>>> http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/IUsnmpMA
>>>>>> http://rrdma.net.internet2.edu:8080/perfSONAR_PS/services/snmpMA
>>>>>> http://frown.es.net:8085/perfSONAR_PS/services/pSB
>>>>>> http://perfsonar-1.t2.ucsd.edu:8085/perfSONAR_PS/services/pSB
>>>>>> http://ps3.es.net:8080/perfSONAR_PS/services/snmpMA
>>>>>> http://216.255.240.2:8085/perfSONAR_PS/services/pSB
>>>>>>
>>>>>> Third try:
>>>>>> 433 ms
>>>>>> Query
>>>>>> urn:ogf:network:node=192.16.166.26
>>>>>> Services
>>>>>>
>>>>>> Fourth try:
>>>>>> 3518 ms
>>>>>> Query
>>>>>> urn:ogf:network:node=192.16.166.26
>>>>>> Services
>>>>>>
>>>>>>
>>>>>> Fifth try:
>>>>>> 3518 ms
>>>>>> Query
>>>>>> urn:ogf:network:node=192.16.166.26
>>>>>> Services
>>>>>>
>>>>>> And on subsequent attempts no services were retrieved as well
>>>>>>
>>>>>> Obviously, we need to think of a systematic testing approach.




Archive powered by MHonArc 2.6.16.

Top of Page