Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] MA failures in the aftermath of a yum update

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] MA failures in the aftermath of a yum update


Chronological Thread 
  • From: "Uhl, George D. (GSFC-423.0)[ARTS]" <>
  • To: Andrew Lake <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] MA failures in the aftermath of a yum update
  • Date: Wed, 2 Apr 2014 14:54:28 +0000
  • Accept-language: en-US

Andy,

The XML files were being deposited in /var/lib/perfsonar/traceroute_ma. I
moved them to /var/lib/perfsonar/traceroute_ma/upload and restarted the
traceroute_master. I edited owmesh.conf and changed the parameter
TraceDataDir to /var/lib/perfsonar/traceroute_ma/upload. Now when I go to
the psTracerouteViewer page on the toolkit page I see that the endpoints
drop-down list is populated but when I select one of the traceroutes
listed there I'm getting an error:

Software Error
Times from XML response are out of order at
/opt/perfsonar_ps/toolkit/lib/psTracerouteUtils.pm line 280.

This is the same error I was getting last September. It was supposed to
have been patched in a later release. The statement in line 280 of
/opt/perfsonar_ps/toolkit/lib/psTracerouteUtils.pm is "if
($$hashref{'timeValue'} < $last_timestamp) {". When I print these two
values, $$hashref{'timeValue'} is displayed as 1396391562 and
$last_timestamp is 0 so this if statement should evaluate as false but the
die command in the next statement runs anyway.

I'm also concerned that if I update the mesh configuration file and create
a new json, I'll lose my edited correction for the TraceDataDir parameter
the next time I run the generate_configuratio script on the test nodes.


Thanks,
-George

On 4/2/14 8:28 AM, "Andrew Lake"
<>
wrote:

>Hi George,
>
>Where are you seeing the XML files? I wonder if TraceDataDir in
>/opt/perfsonar_ps/traceroute_ma/etc/owmesh.conf is wrong.That is what
>determines where the files are stored. It should be
>/var/lib/perfsonar/traceroute_ma/upload/.
>
>Thanks,
>Andy
>
>
>On Apr 1, 2014, at 4:05 PM, "Uhl, George D. (GSFC-423.0)[ARTS]"
><>
> wrote:
>
>> Andy,
>>
>> I tried running traceroute_master with --verbose set but I only see the
>> init.start and init.end messages. I even deleted/rebuilt the
>> traceroute_ma database in mysql. I'm running tcdump on the test node
>>and
>> I periodically see the icmp unreachable messages from the scheduled
>> traceroute tests. traceroute_master is running with no error messages
>>at
>> start up. The file permissions for both
>>/var/lib/perfsonar/traceroute_ma/
>> and /var/lib/perfsonar/traceroute_ma/upload are perfsonar for both user
>> and group. All the *.xml files are perfsonar owned too. One thing
>> though, /var/lib/perfsonar/traceroute_ma/upload is always empty.
>>
>> Contents of traceroute-master.conf:
>> # cat traceroute-master.conf
>> batch_count 25
>> batch_size 10
>> collector_timeout 30
>>
>>collector_urls
>>http://archive.eos.nasa.gov:8086/perfSONAR_PS/services/tra
>>ce
>> routeCollector
>> data_dir /var/lib/perfsonar/traceroute_ma/upload/
>> register_interval 30
>> #
>>
>>
>> Thanks,
>> George
>>
>> On 4/1/14 9:55 AM, "Andrew Lake"
>> <>
>> wrote:
>>
>>> Hi George,
>>>
>>> Thanks for all the info again. I definitely sounds like your
>>>traceroutes
>>> are running since they work by hand and there are files under
>>> /var/lib/perfsonar/traceroute_ma. The error message in the logs looks
>>> suspicious but I am not quite sure what would cause it, especially
>>>since
>>> your traceroute_master.log doesn't have any errors. The
>>>traceroute_master
>>> is what reads the results and sends them off to the MA. A few more
>>>things
>>> to check:
>>>
>>> - What does the
>>> /opt/perfsonar_ps/traceroute_ma/etc/traceroute-master.conf file look
>>> like? Most important the "collector_urls" option should point at your
>>> archive host.
>>> - If you run "ps auxw | grep traceroute_master" is there a
>>> traceroute_master process running?
>>> - If you restart traceroute_master with "/sbin/service
>>>traceroute_master
>>> restart" does it dump anything to the screen that looks like an error?
>>> - What are the file permissions on the stuff under
>>> /var/lib/perfsonar/traceroute_ma/ and
>>> /var/lib/perfsonar/traceroute_ma/upload? I believe the should all be
>>> owned by the perfsonar user and group.
>>>
>>> Thanks,
>>> Andy
>>>
>>>
>>>
>>> On Mar 31, 2014, at 3:47 PM, "Uhl, George D. (GSFC-423.0)[ARTS]"
>>> <>
>>> wrote:
>>>
>>>> Andy,
>>>>
>>>> Thanks. The maddish issue is now fixed. I'm still mulling over the
>>>> traceroute_ma failure. It's not a FW or iptables issue since test
>>>>nodes
>>>> can traceroute to each other and report back to the MA on port 8086.
>>>> The /var/lib/perfsonar/traceroute_ma data directory on the test nodes
>>>> are populated with .xml data files with current date/time stamps.
>>>>
>>>> CLI traceroutes from one of the test nodes:
>>>>
>>>> [owamp2 ~]# traceroute owamp3.eos.nasa.gov
>>>> traceroute to owamp3.eos.nasa.gov (198.119.22.35), 30 hops max, 60
>>>>byte
>>>> packets
>>>> 1 owamp3.eos.nasa.gov (198.119.22.35) 10.062 ms 9.992 ms 10.194 ms
>>>> [owamp2 ~]#
>>>>
>>>> The MA server is reachable on port 8086:
>>>>
>>>> [owamp2 ~]# telnet archive.eos.nasa.gov 8086
>>>> Trying 198.119.22.36...
>>>> Connected to archive.eos.nasa.gov.
>>>> Escape character is '^]'.
>>>> ^]
>>>>
>>>> telnet> quit
>>>> Connection closed.
>>>> [owamp2 ~]#
>>>>
>>>> -----------------
>>>>
>>>> Today's activity as logged in traceroute_master.log and
>>>> traceroute_scheduler.log from test node owamp2:
>>>>
>>>> --Contents of traceroute_master.log
>>>> 2014/03/26 11:08:24 (9237) INFO> traceroute_master.pl:145 main:: -
>>>> ts=2014-03-26T15:08:24.068157Z
>>>> event=org.perfSONAR.TracerouteMaster.init.start
>>>> guid=40752e49-89c3-49e8-a6d5-e50f67f03ffa
>>>> 2014/03/26 11:08:24 (9239) INFO> traceroute_master.pl:218 main:: -
>>>> ts=2014-03-26T15:08:24.073967Z
>>>> event=org.perfSONAR.TracerouteMaster.init.end
>>>> guid=40752e49-89c3-49e8-a6d5-e50f67f03ffa
>>>> 2014/03/26 12:45:54 (7009) INFO> traceroute_master.pl:145 main:: -
>>>> ts=2014-03-26T16:45:54.846809Z
>>>> event=org.perfSONAR.TracerouteMaster.init.start
>>>> guid=b6f39940-2fe7-4a57-ab65-990b87353217
>>>> 2014/03/26 12:45:54 (7010) INFO> traceroute_master.pl:218 main:: -
>>>> ts=2014-03-26T16:45:54.881106Z
>>>> event=org.perfSONAR.TracerouteMaster.init.end
>>>> guid=b6f39940-2fe7-4a57-ab65-990b87353217
>>>> 2014/03/28 12:44:04 (18181) INFO> traceroute_master.pl:145 main:: -
>>>> ts=2014-03-28T16:44:04.027489Z
>>>> event=org.perfSONAR.TracerouteMaster.init.start
>>>> guid=394aa976-6f46-4429-9743-552c0343f9fb
>>>> 2014/03/28 12:44:04 (18183) INFO> traceroute_master.pl:218 main:: -
>>>> ts=2014-03-28T16:44:04.033855Z
>>>> event=org.perfSONAR.TracerouteMaster.init.end
>>>> guid=394aa976-6f46-4429-9743-552c0343f9fb
>>>> 2014/03/30 10:57:22 (29045) INFO> traceroute_master.pl:145 main:: -
>>>> ts=2014-03-30T14:57:22.541500Z
>>>> event=org.perfSONAR.TracerouteMaster.init.start
>>>> guid=9565ec22-9a6f-405b-9e44-86b9df689e39
>>>> 2014/03/30 10:57:22 (29047) INFO> traceroute_master.pl:218 main:: -
>>>> ts=2014-03-30T14:57:22.547406Z
>>>> event=org.perfSONAR.TracerouteMaster.init.end
>>>> guid=9565ec22-9a6f-405b-9e44-86b9df689e39
>>>>
>>>> --Contents of traceroute_scheduler.log
>>>> 2014/03/31 13:45:42 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 13:55:47 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 13:55:53 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 13:55:54 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:05:59 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:06:04 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:06:10 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:16:15 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:16:20 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>> 2014/03/31 14:16:26 (29057) INFO> TracerouteScheduler.pm:180
>>>> perfSONAR_PS::Services::MP::TracerouteScheduler::run - Running
>>>> traceroute...
>>>>
>>>> -----------------
>>>>
>>>> From the test node perspective everything appears to be running
>>>> nominally. The last successful report received by the MA with the
>>>>data
>>>> entered into the traceroute_ma database was on March 20 which happens
>>>>to
>>>> be the day I did the yum update on the test nodes. That might be
>>>> coincidental, or not.
>>>>
>>>> From the /var/log/perfsonar/traceroute_master.log file on the MA I see
>>>> these messages:
>>>> 2014/03/31 12:43:53 (1650) INFO> daemon.pl:572 main::psService -
>>>> Received incoming connection from:
>>>> 198.119.22.34
>>>> 2014/03/31 12:43:59 (11059) ERROR> daemon.pl:582 main::psService - No
>>>> HTTP Request received from host:
>>>> 198.119.22.34
>>>> 2014/03/31 14:04:59 (1650) INFO> daemon.pl:572 main::psService -
>>>> Received incoming connection from:
>>>> 198.119.22.35
>>>> 2014/03/31 14:05:04 (13463) ERROR> daemon.pl:582 main::psService - No
>>>> HTTP Request received from host:
>>>> 198.119.22.35
>>>>
>>>> These are the only messages from the test node and MA traceroute logs
>>>> that indicate to me that something might be broken.
>>>>
>>>>
>>>> Thanks,
>>>> -George
>>>>
>>>>
>>>> From: Andrew Lake
>>>> <>
>>>> Date: Monday, March 31, 2014 12:06 PM
>>>> To: George Uhl
>>>> <>
>>>> Cc:
>>>> ""
>>>>
>>>> <>
>>>> Subject: Re: [perfsonar-user] MA failures in the aftermath of a yum
>>>> update
>>>>
>>>> Hi,
>>>>
>>>> I can help with a couple of these:
>>>>
>>>> - For the maddash issue, see this FAQ
>>>> http://psps.perfsonar.net/toolkit/FAQs.html#Q71
>>>> - For the traceroute issue, what is in traceroute_master.log and
>>>> traceroute_scheduler.netlogger.log on one of the test nodes ? Its
>>>> possible the traceroute tests are not completing for some reason (e.g.
>>>> ICMP blocked). Those may shed some more light. I think otherwise your
>>>> traceroute test config looks fine (though I may be missing something)
>>>>
>>>> Thanks,
>>>> Andy
>>>>
>>>>
>>>>
>>>> On Mar 31, 2014, at 10:16 AM, "Uhl, George D. (GSFC-423.0)[ARTS]"
>>>> <>
>>>> wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I did an yum update on my Measurement Archive server over the weekend
>>>>> and now the maddish page on the server is broken as well as the
>>>>> traceroute_ma service graphs from the toolkit page. The maddish page
>>>>> displays the page header and returns nothing else. The traceroute
>>>>> service graph page displays "Error: No Measurement Archives
>>>>> available." This is a custom built measurement archive so perhaps
>>>>>not
>>>>> all the packages that might be available from a PS toolkit yum update
>>>>> are going to be available.
>>>>>
>>>>> From the yum update log on the MA sever:
>>>>>
>>>>> # cat yum.log | grep perf
>>>>> Mar 28 14:58:37 Installed:
>>>>> perl-perfSONAR_PS-MeshConfig-GUIAgent-3.3.2-3.pSPS.noarch
>>>>> Mar 29 21:22:45 Updated:
>>>>> perl-perfSONAR_PS-TracerouteMA-config-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:07 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-config-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:07 Updated: iperf-2.0.5-11.el6.x86_64
>>>>> Mar 29 21:23:58 Updated:
>>>>> perl-perfSONAR_PS-serviceTest-3.3.2-4.pSPS.noarch
>>>>> Mar 29 21:23:58 Updated:
>>>>> perl-perfSONAR_PS-SimpleLS-BootStrap-client-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:58 Updated: perl-perfSONAR_PS-SNMPMA-3.3-4.pSPS.noarch
>>>>> Mar 29 21:23:58 Updated:
>>>>> perl-perfSONAR_PS-TracerouteMA-server-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:59 Updated:
>>>>> perl-perfSONAR_PS-LSRegistrationDaemon-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:59 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-server-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:59 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-client-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:23:59 Updated:
>>>>> perl-perfSONAR_PS-TracerouteMA-client-3.3.2-1.pSPS.noarch
>>>>> Mar 29 21:24:07 Installed:
>>>>> perl-perfSONAR_PS-Toolkit-3.3.2-13.pSPS.noarch
>>>>> Mar 29 21:24:07 Updated:
>>>>> perl-perfSONAR_PS-MeshConfig-Shared-3.3.2-3.pSPS.noarch
>>>>> Mar 29 21:24:07 Updated: perl-perfSONAR_PS-Nagios-3.3.2-2.pSPS.noarch
>>>>> Mar 29 21:25:10 Updated:
>>>>> perl-perfSONAR_PS-MeshConfig-JSONBuilder-3.3.2-3.pSPS.noarch
>>>>> Mar 29 21:25:23 Updated: perf-2.6.32-431.11.2.el6.x86_64
>>>>> Mar 29 21:25:38 Erased: perl-perfSONAR_PS-TopologyService
>>>>>
>>>>> From the yum log of one of the test nodes:
>>>>> # cat /var/log/yum.log | grep perf
>>>>> Mar 20 15:16:45 Updated:
>>>>> perl-perfSONAR_PS-TracerouteMA-config-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:17:50 Updated:
>>>>> perl-perfSONAR_PS-MeshConfig-Shared-3.3.2-3.pSPS.noarch
>>>>> Mar 20 15:17:51 Updated:
>>>>> perl-perfSONAR_PS-SimpleLS-BootStrap-client-3.3.1-1.pSPS.noarch
>>>>> Mar 20 15:18:02 Updated:
>>>>> perl-perfSONAR_PS-LSRegistrationDaemon-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:18:12 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-config-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:18:17 Updated: iperf-2.0.5-11.el6.x86_64
>>>>> Mar 20 15:20:55 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-server-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:20:56 Updated:
>>>>> perl-perfSONAR_PS-TracerouteMA-client-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:20:57 Updated:
>>>>> perl-perfSONAR_PS-MeshConfig-Agent-3.3.2-3.pSPS.noarch
>>>>> Mar 20 15:20:58 Updated:
>>>>> perl-perfSONAR_PS-perfSONARBUOY-client-3.3.2-1.pSPS.noarch
>>>>> Mar 20 15:22:54 Updated: perf-2.6.32-431.5.1.el6.x86_64
>>>>> Mar 26 12:08:12 Updated: perf-2.6.32-431.11.2.el6.x86_64
>>>>> Mar 26 12:08:15 Updated:
>>>>> perl-perfSONAR_PS-SimpleLS-BootStrap-client-3.3.2-1.pSPS.noarch
>>>>>
>>>>>
>>>>> From the maddash-server.netlogger.log:
>>>>> level=INFO ts=2014-03-30T02:18:02.536201Z event=maddash.init.start
>>>>> guid=9de487d2-8c5e-4574-8125-3faf7e97d381
>>>>> level=ERROR ts=2014-03-30T02:18:06.292021Z event=maddash.init.end
>>>>> guid=9de487d2-8c5e-4574-8125-3faf7e97d381 status=-1 msg="Error
>>>>>loading
>>>>> database: Column 'TEMPLATENAME' is either not in any table in the
>>>>>FROM
>>>>> list or appears within a join specification and is outside the scope
>>>>>of
>>>>> the join specification or appears in a HAVING clause and is not in
>>>>>the
>>>>> GROUP BY list. If this is a CREATE or ALTER TABLE statement then
>>>>> 'TEMPLATENAME' is not a column in the target table."
>>>>> level=INFO ts=2014-03-31T01:04:00.152609Z event=maddash.init.start
>>>>> guid=78268c7e-f7c9-43a0-81eb-402faaf6ab07
>>>>> level=ERROR ts=2014-03-31T01:04:02.302381Z event=maddash.init.end
>>>>> guid=78268c7e-f7c9-43a0-81eb-402faaf6ab07 status=-1 msg="Error
>>>>>loading
>>>>> database: Column 'TEMPLATENAME' is either not in any table in the
>>>>>FROM
>>>>> list or appears within a join specification and is outside the scope
>>>>>of
>>>>> the join specification or appears in a HAVING clause and is not in
>>>>>the
>>>>> GROUP BY list. If this is a CREATE or ALTER TABLE statement then
>>>>> 'TEMPLATENAME' is not a column in the target table."
>>>>>
>>>>> The maddish yaml file (which define the maddish pages of two meshes)
>>>>> was built from the MeshConfig GUI Agent and can be viewed here :
>>>>> https://ensight.eos.nasa.gov/maddash.yaml
>>>>> The json file that defines one off the meshes can be viewed here:
>>>>> https://ensight.eos.nasa.gov/enpl.json
>>>>>
>>>>> -----
>>>>>
>>>>> Before I did the yum update, I did a clean_pSBdb of the traceroute_ma
>>>>> database to clear out old data. This left the database with only an
>>>>> empty DATES table. However over time the traceroute_ma database
>>>>>never
>>>>> was repopulated with the daily tables containing traceroute test
>>>>> results and remains empty.
>>>>>
>>>>> When a traceroute node reports into the MA, the MA's
>>>>>traceroute_ma.log
>>>>> reports:
>>>>> 2014/03/30 11:44:47 (26029) INFO> Traceroute.pm:111
>>>>> perfSONAR_PS::Services::MA::Traceroute::init - Setting service access
>>>>> point to
>>>>> http://archive.eos.nasa.gov:8086/perfSONAR_PS/services/tracerouteMA
>>>>> 2014/03/30 11:44:47 (26029) WARN> Traceroute.pm:126
>>>>> perfSONAR_PS::Services::MA::Traceroute::init - Setting
>>>>> 'service_description' to 'perfSONAR_PS Traceroute MA at NASA ESDIS
>>>>> Network Prototyping Lab'.
>>>>> 2014/03/30 11:44:47 (26029) WARN> Traceroute.pm:133
>>>>> perfSONAR_PS::Services::MA::Traceroute::init - Setting 'service_name'
>>>>> to 'Traceroute MA'.
>>>>> 2014/03/30 11:44:47 (26029) WARN> Traceroute.pm:140
>>>>> perfSONAR_PS::Services::MA::Traceroute::init - Setting 'service_type'
>>>>> to 'MA'.
>>>>> 2014/03/30 11:44:47 (26033) ERROR> daemon.pl:670 main::registerLS -
>>>>> Problem running register LS: No data found in given time range
>>>>> 2014/03/30 11:53:37 (26032) INFO> daemon.pl:572 main::psService -
>>>>> Received incoming connection from:198.119.22.34
>>>>> 2014/03/30 11:53:42 (26118) ERROR> daemon.pl:582 main::psService - No
>>>>> HTTP Request received from host:198.119.22.34
>>>>>
>>>>> The owmesh.conf for the MA can be viewed here:
>>>>> https://ensight.eos.nasa.gov/ma_owmesh.conf
>>>>> The owmesh.conf for one of the test nodes can be viewed here:
>>>>> https://ensight.eos.nasa.gov/node_owmesh.conf
>>>>> The output of a "generate_configuration ­verbose" can be viewed here:
>>>>> https://ensight.eos.nasa.gov/node_gen_config
>>>>>
>>>>> Please let me know if you need more information.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> George Uhl
>>>>> NASA ESDIS / SGT Inc.
>>>>> Code 423
>>>>> NASA Goddard Space Flight Center
>>>>> Greenbelt, MD 20771
>>>>> Office: 301-614-5155
>>>>> Fax: 301-614-5700
>>>>> email:
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>




Archive powered by MHonArc 2.6.16.

Top of Page