Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop


Chronological Thread 
  • From: Brian Tierney <>
  • To: Peter van Heusden <>
  • Cc:
  • Subject: Re: [perfsonar-user] perfSonar 3.3rc4 stuck in bwctl restart loop
  • Date: Fri, 24 May 2013 08:54:47 -0700
  • Authentication-results: sfpop-ironport07.merit.edu; dkim=pass (signature verified)


That looks right. Thanks! We'll add this for the final 3.3. release.


On May 24, 2013, at 1:18 AM, Peter van Heusden
<>
wrote:

> Thanks. I changed line 62 of
> /opt/perfsonar_ps/toolkit/web/root/admin/regular_testing/templates/test_configu
> ration.tmpl to read:
>
> <td style="width: 200px">TOS Bits</td><td>[% IF
> current_test.parameters.tos_bits %] [% current_test.parameters.tos_bits
> %] [% ELSE %] 0 [% END %] </td>
>
> (i.e. added the ELSE clause)
>
> and now I don't need to go around manually fixing files after changes.
> This sets the TOS to 0 by default, I presume that is ok?
>
> Peter
>
> On 24/05/2013 03:06, Brian Tierney wrote:
>> Good catch! I'll add this to the issue track to fix.
>>
>>
>> On May 23, 2013, at 2:07 PM, Peter van Heusden
>> <>
>> wrote:
>>
>>> Ok, I found the cause of the problem!
>>>
>>> Somehow the web user interface is setting the TOS Bits field of the
>>> bandwidth test I had set up to "NaN". This then leads to the following
>>> lines (around line 1243) of bwmaster.pl triggering and adding a "-S
>>> NaN" to the bwctl command line. Then bwctl of course fails and is
>>> restarted... etc etc.
>>>
>>> push @cmd, ( "-S", $val ) if (
>>> $val = $conf->get_val(
>>> TESTSPEC => $ms->{'TESTSPEC'},
>>> ATTR => 'BWTosBits'
>>> )
>>> );
>>>
>>> I manually removed the BWTosBits entry in
>>> /opt/perfsonar_ps/perfsonarbuoy_ma/etc/owmesh.conf and the restart loop
>>> has now stopped.
>>>
>>> Thanks,
>>> Peter
>>>
>>> On 23/05/2013 22:48, Peter van Heusden wrote:
>>>> Yes, those work fine. E.g.:
>>>>
>>>> [root@ps
>>>> sysconfig]# bwctl -c 192.168.2.132 -s 192.168.2.104
>>>> bwctl: Using tool: iperf
>>>> bwctl: 15 seconds until test results available
>>>>
>>>> RECEIVER START
>>>> bwctl: exec_line: iperf -B ps2.sanbi.ac.za -s -f b -m -p 5176 -t 10
>>>> bwctl: start_tool: 3578330236.026986
>>>> ------------------------------------------------------------
>>>> Server listening on TCP port 5176
>>>> Binding to local address ps2.sanbi.ac.za
>>>> TCP window size: 87380 Byte (default)
>>>> ------------------------------------------------------------
>>>> [ 15] local 192.168.2.132 port 5176 connected with 192.168.2.104 port
>>>> 5176
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 15] 0.0-10.0 sec 530448384 Bytes 422283774 bits/sec
>>>> [ 15] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>>>> bwctl: stop_exec: 3578330249.065921
>>>>
>>>> RECEIVER END
>>>>
>>>> Is there any other log file that I could be looking in? The messages in
>>>> /var/log/messages don't explain *why* bwctl is being restarted, also
>>>> there is a bwctl process running the entire time, so I'm not sure *what*
>>>> is being restarted!
>>>>
>>>> Thanks!
>>>> Peter
>>>> On 23/05/2013 22:28, Aaron Brown wrote:
>>>>> Hey Peter,
>>>>>
>>>>> Can you run bwctl tests by hand?
>>>>>
>>>>> Cheers,
>>>>> Aaron
>>>>>
>>>>> On May 23, 2013, at 4:27 PM, Peter van Heusden
>>>>> <>
>>>>> wrote:
>>>>>
>>>>>> I made that change, and added tock.meraka.csir.co.za - a stratum 1
>>>>>> server that is about 1000 miles away. ntpq -p now shows:
>>>>>>
>>>>>> remote refid st t when poll reach delay offset
>>>>>> jitter
>>>>>> ==============================================================================
>>>>>> *zibbi.meraka.cs 238.72.153.243 2 u 40 64 1 72.996 -3.658
>>>>>> 6.944
>>>>>> +firewall.sanbi. 41.73.38.11 3 u 22 64 7 0.070 -4.745
>>>>>> 0.173
>>>>>> -ntp0.za.uu.net 216.171.120.36 3 u 20 64 7 5.677 0.208
>>>>>> 15.795
>>>>>> -ntp2.is.co.za 146.64.58.41 2 u 21 64 7 5.525 -9.424
>>>>>> 16.250
>>>>>> +tock.meraka.csi .PPS. 1 u - 64 15 73.582 -3.607
>>>>>> 21.681
>>>>>>
>>>>>>
>>>>>> and bwmaster.pl is still restarting.
>>>>>>
>>>>>> :(
>>>>>>
>>>>>> On 23/05/2013 21:46, Pedro Queirós wrote:
>>>>>>> Try this:
>>>>>>>
>>>>>>> server ntp1.meraka.csir.co.za iburst minpoll 4 maxpoll 6
>>>>>>> server ntp.sanbi.ac.za iburst minpoll 4 maxpoll 6
>>>>>>> server ntp0.za.uu.net iburst minpoll 4 maxpoll 6
>>>>>>> server ntp2.is.co.za iburst minpoll 4 maxpoll 6
>>>>>>>
>>>>>>> Asides from that, it looks good. If possible, try to have access
>>>>>>> to good (e.g. low delay) NTP stratum 1 server near your network.
>>>>>>>
>>>>>>> If the bwmaster.pl continues restarting, I'd suggest looking into
>>>>>>> something else - let us know about!
>>>>>>>
>>>>>>> Ah, don't forget to restart ntpd after changing the config file!
>>>>>>>
>>>>>>> Pedro
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 23, 2013 at 8:31 PM, Peter van Heusden
>>>>>>> <>
>>>>>>> wrote:
>>>>>>> logfile /var/log/ntpd
>>>>>>> driftfile /var/lib/ntp/ntp.drift
>>>>>>> statsdir /var/lib/ntp/
>>>>>>> statistics loopstats peerstats clockstats
>>>>>>> filegen loopstats file loopstats type day enable
>>>>>>> filegen peerstats file peerstats type day enable
>>>>>>> filegen clockstats file clockstats type day enable
>>>>>>>
>>>>>>> # You should have at least 4 NTP servers
>>>>>>>
>>>>>>> server ntp1.meraka.csir.co.za iburst
>>>>>>> server ntp.sanbi.ac.za iburst
>>>>>>> server ntp0.za.uu.net iburst
>>>>>>> server chronos.es.net iburst
>>>>>>> server ntp2.is.co.za iburst
>>>>>>>
>>>>>>> thanks!
>>>>>>>
>>>>>>>
>>>>>>> On 23/05/2013 21:17, Pedro Queirós wrote:
>>>>>>>> Peter, from the ntpq -p output I can see your NTP config is faulty.
>>>>>>>>
>>>>>>>> Can you provide the /etc/ntp.conf file?
>>>>>>>>
>>>>>>>> Kind Regards,
>>>>>>>> Pedro
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 23, 2013 at 8:04 PM, Jason Zurawski
>>>>>>>> <>
>>>>>>>> wrote:
>>>>>>>> While not true in this case that you sent - static lists fall into
>>>>>>>> disrepair frequently. A couple of months back we tried this in the
>>>>>>>> APAN region and found about 1/5 still worked well after being posted
>>>>>>>> to an 'official' wiki.
>>>>>>>>
>>>>>>>> It doesn't hurt to ask those on the front lines that are adopting pS
>>>>>>>> for some insider info - that request applies to all, even if they
>>>>>>>> ate not in South Africa.
>>>>>>>>
>>>>>>>> Thanks;
>>>>>>>>
>>>>>>>> -jason
>>>>>>>>
>>>>>>>> On May 23, 2013, at 11:58 AM, Michael Sinatra
>>>>>>>> <>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 05/23/2013 11:44, Jason Zurawski wrote:
>>>>>>>>>> Hey Peter;
>>>>>>>>>>
>>>>>>>>>> I would nuke the ESnet server too - NTP works best when the
>>>>>>>>>> servers are within the same timezone/continent, it gives the
>>>>>>>>>> algorithms a level playing field to choose from.
>>>>>>>>> It's not really the case that a server outside of the timezone or
>>>>>>>>> continent matters. The key is whether it's more likely than not
>>>>>>>>> that there will be asymmetry in the path to/from a given NTP
>>>>>>>>> server. I don't really see much evidence of that with any of the
>>>>>>>>> servers in the list, except
>>>>>>>>> for possibly ntp.mtnbusiness....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> A related note is that we are always looking to add new servers
>>>>>>>>>> into our list around the world - if you know of 'open' clocks in
>>>>>>>>>> the region that we can front-load into the list, that can be
>>>>>>>>>> arranged.
>>>>>>>>> Well, there's a published list of them here:
>>>>>>>>>
>>>>>>>>> http://support.ntp.org/bin/view/Servers/WebHome
>>>>>>>>>
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>> michael
>>>>>>>>>
>>>>>>>
>>>>> ESnet/Internet2 Focused Technical Workshop
>>>>> Network Issues for Life Sciences Research
>>>>> July 17 - 18, 2013, Berkeley CA
>>>>> http://events.internet2.edu/2013/ftw-life-sciences/
>>>>>
>




Archive powered by MHonArc 2.6.16.

Top of Page