perfsonar-user - Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop

From: Peter van Heusden <>
Cc:
Subject: Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop
Date: Fri, 24 May 2013 10:18:33 +0200
Authentication-results: sfpop-ironport03.merit.edu; dkim=neutral (message not signed) header.i=none

Thanks. I changed line 62 of
/opt/perfsonar_ps/toolkit/web/root/admin/regular_testing/templates/test_configu
ration.tmpl to read:

<td style="width: 200px">TOS Bits</td><td>[% IF
current_test.parameters.tos_bits %] [% current_test.parameters.tos_bits
%] [% ELSE %] 0 [% END %] </td>

(i.e. added the ELSE clause)

and now I don't need to go around manually fixing files after changes.
This sets the TOS to 0 by default, I presume that is ok?

Peter

On 24/05/2013 03:06, Brian Tierney wrote:
> Good catch! I'll add this to the issue track to fix.
>
>
> On May 23, 2013, at 2:07 PM, Peter van Heusden
> <>
> wrote:
>
>> Ok, I found the cause of the problem!
>>
>> Somehow the web user interface is setting the TOS Bits field of the
>> bandwidth test I had set up to "NaN". This then leads to the following
>> lines (around line 1243) of bwmaster.pl triggering and adding a "-S
>> NaN" to the bwctl command line. Then bwctl of course fails and is
>> restarted... etc etc.
>>
>> push @cmd, ( "-S", $val ) if (
>> $val = $conf->get_val(
>> TESTSPEC => $ms->{'TESTSPEC'},
>> ATTR => 'BWTosBits'
>> )
>> );
>>
>> I manually removed the BWTosBits entry in
>> /opt/perfsonar_ps/perfsonarbuoy_ma/etc/owmesh.conf and the restart loop
>> has now stopped.
>>
>> Thanks,
>> Peter
>>
>> On 23/05/2013 22:48, Peter van Heusden wrote:
>>> Yes, those work fine. E.g.:
>>>
>>> [root@ps
>>> sysconfig]# bwctl -c 192.168.2.132 -s 192.168.2.104
>>> bwctl: Using tool: iperf
>>> bwctl: 15 seconds until test results available
>>>
>>> RECEIVER START
>>> bwctl: exec_line: iperf -B ps2.sanbi.ac.za -s -f b -m -p 5176 -t 10
>>> bwctl: start_tool: 3578330236.026986
>>> ------------------------------------------------------------
>>> Server listening on TCP port 5176
>>> Binding to local address ps2.sanbi.ac.za
>>> TCP window size: 87380 Byte (default)
>>> ------------------------------------------------------------
>>> [ 15] local 192.168.2.132 port 5176 connected with 192.168.2.104 port 5176
>>> [ ID] Interval Transfer Bandwidth
>>> [ 15] 0.0-10.0 sec 530448384 Bytes 422283774 bits/sec
>>> [ 15] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>>> bwctl: stop_exec: 3578330249.065921
>>>
>>> RECEIVER END
>>>
>>> Is there any other log file that I could be looking in? The messages in
>>> /var/log/messages don't explain *why* bwctl is being restarted, also
>>> there is a bwctl process running the entire time, so I'm not sure *what*
>>> is being restarted!
>>>
>>> Thanks!
>>> Peter
>>> On 23/05/2013 22:28, Aaron Brown wrote:
>>>> Hey Peter,
>>>>
>>>> Can you run bwctl tests by hand?
>>>>
>>>> Cheers,
>>>> Aaron
>>>>
>>>> On May 23, 2013, at 4:27 PM, Peter van Heusden
>>>> <>
>>>> wrote:
>>>>
>>>>> I made that change, and added tock.meraka.csir.co.za - a stratum 1
>>>>> server that is about 1000 miles away. ntpq -p now shows:
>>>>>
>>>>> remote refid st t when poll reach delay offset
>>>>> jitter
>>>>> ==============================================================================
>>>>> *zibbi.meraka.cs 238.72.153.243 2 u 40 64 1 72.996 -3.658
>>>>> 6.944
>>>>> +firewall.sanbi. 41.73.38.11 3 u 22 64 7 0.070 -4.745
>>>>> 0.173
>>>>> -ntp0.za.uu.net 216.171.120.36 3 u 20 64 7 5.677 0.208
>>>>> 15.795
>>>>> -ntp2.is.co.za 146.64.58.41 2 u 21 64 7 5.525 -9.424
>>>>> 16.250
>>>>> +tock.meraka.csi .PPS. 1 u - 64 15 73.582 -3.607
>>>>> 21.681
>>>>>
>>>>>
>>>>> and bwmaster.pl is still restarting.
>>>>>
>>>>> :(
>>>>>
>>>>> On 23/05/2013 21:46, Pedro Queirós wrote:
>>>>>> Try this:
>>>>>>
>>>>>> server ntp1.meraka.csir.co.za iburst minpoll 4 maxpoll 6
>>>>>> server ntp.sanbi.ac.za iburst minpoll 4 maxpoll 6
>>>>>> server ntp0.za.uu.net iburst minpoll 4 maxpoll 6
>>>>>> server ntp2.is.co.za iburst minpoll 4 maxpoll 6
>>>>>>
>>>>>> Asides from that, it looks good. If possible, try to have access
>>>>>> to good (e.g. low delay) NTP stratum 1 server near your network.
>>>>>>
>>>>>> If the bwmaster.pl continues restarting, I'd suggest looking into
>>>>>> something else - let us know about!
>>>>>>
>>>>>> Ah, don't forget to restart ntpd after changing the config file!
>>>>>>
>>>>>> Pedro
>>>>>>
>>>>>>
>>>>>> On Thu, May 23, 2013 at 8:31 PM, Peter van Heusden
>>>>>> <>
>>>>>> wrote:
>>>>>> logfile /var/log/ntpd
>>>>>> driftfile /var/lib/ntp/ntp.drift
>>>>>> statsdir /var/lib/ntp/
>>>>>> statistics loopstats peerstats clockstats
>>>>>> filegen loopstats file loopstats type day enable
>>>>>> filegen peerstats file peerstats type day enable
>>>>>> filegen clockstats file clockstats type day enable
>>>>>>
>>>>>> # You should have at least 4 NTP servers
>>>>>>
>>>>>> server ntp1.meraka.csir.co.za iburst
>>>>>> server ntp.sanbi.ac.za iburst
>>>>>> server ntp0.za.uu.net iburst
>>>>>> server chronos.es.net iburst
>>>>>> server ntp2.is.co.za iburst
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>>
>>>>>> On 23/05/2013 21:17, Pedro Queirós wrote:
>>>>>>> Peter, from the ntpq -p output I can see your NTP config is faulty.
>>>>>>>
>>>>>>> Can you provide the /etc/ntp.conf file?
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Pedro
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 23, 2013 at 8:04 PM, Jason Zurawski
>>>>>>> <>
>>>>>>> wrote:
>>>>>>> While not true in this case that you sent - static lists fall into
>>>>>>> disrepair frequently. A couple of months back we tried this in the
>>>>>>> APAN region and found about 1/5 still worked well after being posted
>>>>>>> to an 'official' wiki.
>>>>>>>
>>>>>>> It doesn't hurt to ask those on the front lines that are adopting pS
>>>>>>> for some insider info - that request applies to all, even if they ate
>>>>>>> not in South Africa.
>>>>>>>
>>>>>>> Thanks;
>>>>>>>
>>>>>>> -jason
>>>>>>>
>>>>>>> On May 23, 2013, at 11:58 AM, Michael Sinatra
>>>>>>> <>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 05/23/2013 11:44, Jason Zurawski wrote:
>>>>>>>>> Hey Peter;
>>>>>>>>>
>>>>>>>>> I would nuke the ESnet server too - NTP works best when the servers
>>>>>>>>> are within the same timezone/continent, it gives the algorithms a
>>>>>>>>> level playing field to choose from.
>>>>>>>> It's not really the case that a server outside of the timezone or
>>>>>>>> continent matters. The key is whether it's more likely than not
>>>>>>>> that there will be asymmetry in the path to/from a given NTP server.
>>>>>>>> I don't really see much evidence of that with any of the
>>>>>>>> servers in the list, except for
>>>>>>>> possibly ntp.mtnbusiness....
>>>>>>>>
>>>>>>>>
>>>>>>>>> A related note is that we are always looking to add new servers
>>>>>>>>> into our list around the world - if you know of 'open' clocks in
>>>>>>>>> the region that we can front-load into the list, that can be
>>>>>>>>> arranged.
>>>>>>>> Well, there's a published list of them here:
>>>>>>>>
>>>>>>>> http://support.ntp.org/bin/view/Servers/WebHome
>>>>>>>>
>>>>>>>> :)
>>>>>>>>
>>>>>>>> michael
>>>>>>>>
>>>>>>
>>>> ESnet/Internet2 Focused Technical Workshop
>>>> Network Issues for Life Sciences Research
>>>> July 17 - 18, 2013, Berkeley CA
>>>> http://events.internet2.edu/2013/ftw-life-sciences/
>>>>

Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop, (continued)

List archive

Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop