ndt-users - Re: Error running web100 3.5.0
Subject: ndt-users list created
List archive
- From: Chris Welti <>
- To: Richard Carlson <>
- Cc: "" <>
- Subject: Re: Error running web100 3.5.0
- Date: Tue, 01 Sep 2009 14:39:49 +0200
- Organization: SWITCH
Hi Rich,
I've just upgraded one of our NDT servers from 3.4.4 to 3.5.6.
I'm observing the same behaviour as listed in this thread:
** Starting test 1 of 1 **
Connected to: lsmp2 -- Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
checking for firewalls . . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client-to-server [C2S]) . . . . . 91.78Mb/s
running 10s inbound test (server-to-client [S2C]) . . . . . . 85.01Mb/s
S2C throughput test: Received wrong type of the message
ERROR MSG: Server (S2C throughput test): Invalid S2C throughput received
S2C throughput test FAILED!
Server unable to determine bottleneck link type.
Information: Other network traffic is congesting the link
There are test results for both C2S and S2C tests, but an error is reported
anyway.
On the server there are also error messages (Protocol error! S2C throughput
test FAILED!)
Anyway, I've tried to install all the 3.5. versions before and it seems to me
that this bug was introduced with v3.5.3.
Using v3.5.2 or v3.5.1 it works as a charm on the same system (debian, lenny,
2.6.26-web100)
v3.5.3, v3.5.4, v3.5.5 and v3.5.6 all produce the same error output as above.
Maybe that helps... Regards,
Chris
Galuschka Christoph wrote:
> Hello Richard,
>
> I'm currently running the tests with 2.6.30.5 and IE6. JAVA - if
> relevant - is 1.6.11.
> The funny thing also is, I do get results in the webbrowser, the client
> just doesn't finish correctly.
>
> I will repair the WAIT-comment thing on monday (sorry, I'm not the best
> in C)
>
> Regards
> Christoph
>
> -----------------------------------------
> Ing. Christoph Galuschka
>
> TIWAG-Tiroler Wasserkraft AG
> Bereich IT/Betrieb und Services
> Eduard-Wallnöfer-Platz 2
> 6010 Innsbruck
> T: +43 (0)50607 21832
> F: +43 (0)50607 41832
> www.tiroler-wasserkraft.at
> -----------------------------------------
> Firmenbuchgericht Innsbruck, FN 44133b
> Sitz der Gesellschaft: Innsbruck
> DVR: 0164089
>
> ------------------------------------------------------------------------
> *From:* Richard Carlson
> *Sent:* Sat 8/29/2009 15:28
> *To:* Galuschka Christoph
> *Cc:*
>
> *Subject:* Re: Error running web100 3.5.0
>
> Hi Chris;
>
> What browser are you using? What kernel are you using on the server?
> I'll try and duplicate this in my lab.
>
> The problem is that the server thinks that the server-to-client test
> failed, even though it completed successfully.
> More in-line
>
> On Aug 29, 2009, at 2:16 AM, Galuschka Christoph wrote:
>
>> Hello Richard,
>>
>> thanks for posting the new releases.
>> I've installed 3.5.6 and I still get some errors resulting in an
>> incomplete measurement. Here is the output from web100srv:
>> >>
>> ANL/Internet2 NDT ver 3.5.6
>> Variables file = /usr/local/ndt/web100_variables
>> log file = /usr/local/ndt/web100srv.log
>> Debug level set to 5
>> [snip snip snip]
> Everything was normal up to this point.
>> fwd.saddr = dd70b0a:3003, rev.saddr = f006c0a:3461
>> 01:02:56.724367 10.11.215.13:3003 --> 10.108.0.15:3461 Collected
>> pkt-pair data max = 18667
>> 01:02:56.724367 10.108.0.15:3461 --> 10.11.215.13:3003 Collected
>> pkt-pair data max = 65475
>> Read ' 1 0 0 0 4 661 18667 6971 5501 5721 0 5377 976.37 0 0 0 1 0 7'
>> from monitor pipe
>> Read ' 0 0 0 1 367 9334 40681 26321 35413 65475 39990 34285 663.83
>> 39864 40036 171967 0 39990 7' from monitor pipe
>> 550764 kbps inbound
> This is the measured sc2 speed.
>
>> libweb100: warning: accessing depricated variable AckPktsIn
>> Variable 0 (AckPktsIn): web100_snap_read(): invalid arguments
>> libweb100: warning: accessing depricated variable AckPktsOut
> [snip snip snip]
> The server walks through the list of variables twice, once for the 'read'
> group and once for the 'tuning' group. You can ignore these errors - they
> are non-events.
>> >>> send_msg: type=5, len=18
>>
> [snip snip snip]
> The data was successfully sent back to the client.
>> Signal 11 received by process 3746
>> Signal 17 received by process 3741
> The child process received the terminate signal and the child process
> terminated.
>
>> Protocol error!
>> >>> send_msg: type=7, len=61
>> S2C throughput test FAILED!
> This says the s2c test failed and the server sent that message to the
> client, however as noted above, the test succeeded.
>
>> Finished testing C2S = 690.88 Mbps, S2C = -0.00 Mbps
>> Client --> Server data detects link = OC-12
>> Client <-- Server Ack's detect link = Gigabit Ethernet
>> Server --> Client data detects link = OC-12
>> Server <-- Client Ack's detect link = OC-12
>> CWND limited test = 43453.26 while unlimited = -0.71
>> Better throughput when CWND is limited, may be duplex mismatch
>> >>> send_msg: type=8, len=42
>> >>> send_msg: type=8, len=76
>> >>> send_msg: type=8, len=89
>> >>> send_msg: type=8, len=77
>> >>> send_msg: type=8, len=82
>> >>> send_msg: type=8, len=53
>> >>> send_msg: type=9, len=0
>> Opened
>> '/usr/local/ndt/serverdata/2009/08/29/20090829T07:02:36.826169000Z_10.108.0.15:3444.meta'
>> metadata log file
>> Successfully returned from run_test() routine
>> Signal 17 received by process 3740
>> now = 1251529386 Process started at 1251529356, run time = 30
>> Select exited with rc = -1
>> Queue pointer = 3741, testing = 1, waiting = 1, zombie_check = 0
>> Received SIGCHLD signal for active web100srv process [3740]
>> wait3() returned 0 for PID=3741
>> wexitstatus = '0'
>> Attempting to clean up child 3741, head pid = 3741
>> Child process 3741 causing head pointer modification
>> Removing Child from head, decrementing waiting now = 0
>> Timer not running, waiting for new connection
> And everything exits properly.
>
>> >>
>> This is the result from the browser:
>> >>
>> Connecting to '10.11.215.13' [/10.11.215.13] to run test
>> Connected to: 10.11.215.13 -- Using IPv4 address
>> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
>> checking for firewalls . . . . . . . . . . . . . . . . . . . Done
>> running 10s outbound test (client-to-server [C2S]) . . . . . 690.87Mb/s
>> running 10s inbound test (server-to-client [S2C]) . . . . . . 550.79Mb/s
>> S2C throughput test: Received wrong type of the message
>> ERROR MSG: Server (S2C throughput test): Invalid S2C throughput received
>> S2C throughput test FAILED!
>> The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet
>> >>
>
> So the question is, why did the server mistakenly report an error?
>>
>> After running the test I re-read your email and checked
>> src/testoptions.c line 730 about the comment wait(NULL). I see the /*
>> */ are still there so I removed them and recompiled everything. This
>> did not help very much - here is output from web100srv with the /* */
>> removed:
>> >>
>>
> [snip snip snip]
>
> Sorry for not being clear. The 3.5.6 code used the waitpid() call
> instead of the wait() call. They are functionally equivalent, so when
> you removed the comment, you now wait twice after the c2s test
> completes. This causes the server to timeout and no s2c test is run.
> Just remove/comment out one of the 2 wait() or waitpid() lines and
> rebuild/install to get back to 1 wait call.
>
> Regards;
> Rich
>
>>
>> I hope the debugging output helps...
>>
>> thanks and best regards
>> Christoph Galuschka
>>
>> ------------------------------------------------------------------------
>> *From:* Richard Carlson
>> [mailto:]
>> *Sent:* Fri 8/28/2009 14:25
>> *To:* Galuschka Christoph
>> *Cc:*
>>
>>
>> <mailto:>
>> *Subject:* Re: Error running web100 3.5.0
>>
>> Hi Chris;
>>
>> Sorry about that. There is a bug in the 3.5.0 release. You can
>> download the latest version (3.5.6 which will be posted soon) or you
>> can easily patch the 3.5.0 release. Just edit the src/testoptions.c
>> file and go to line 730. You should find the line /* wait(NULL); */
>> - which ofcourse makes this a comment. Remove the "/*" and "*/"
>> char's (so its not a comment) and rebuild/reinstall the package. This
>> should clear out this fault.
>>
>> Rich
>>
>> On Aug 27, 2009, at 4:47 AM,
>>
>> <mailto:>
>> wrote:
>>
>> > Hello,
>> >
>> > I've just finished installing ndt-3.5.0 on a fresh SuSE 11.1 System
>> > (incl. alle prerequisits; patch for kernel 2.6.27,
>> > web100_userland-1.7). Server runs fine and I do get bandwith results.
>> >
>> > However, the web100srv produces an error which fails to compelte the
>> > test successfully. This is the output from -ddd:
>> > ANL/Internet2 NDT ver 3.5.0
>> > Variables file = /usr/local/ndt/web100_variables
>> > log file = /usr/local/ndt/web100srv.log
>> > Debug level set to 1
>> > server ready on port 3001
>> > web100_init() read 69 variables from file
>> > Starting test suite:
>> >> Middlebox test
>> >> Simple firewall test
>> >> C2S throughput test
>> >> S2C throughput test
>> > <-- Middlebox test -->
>> > -- port: 3003
>> > Sending 1456 Byte packets over the network
>> > Signal 17 received by process 22352
>> > <-------------------->
>> > <-- Simple firewall test -->
>> > -- port: 42133
>> > -- time: 1
>> > -- oport: 2571
>> > <-------------------------->
>> > <-- C2S throughput test -->
>> > -- port: 3002
>> > listening for Inet connection on testOptions->c2ssockfd, fd=3
>> > Sending 'GO' signal, to tell client to head for the next test
>> > Opening network interface 'eth2' for packet-pair timing
>> > installing pkt filter for 'host 10.110.109.104 and port 2574'
>> > Initial pkt src data = 8068484
>> > New packet trace started -- initializing counters
>> > 365314 kbps outbound
>> > Signal USR1(10) sent to child [22355]
>> > Signal 10 received by process 22355
>> > 03:16:15.649224 03:16:15.649224 128 bytes read ' 0 0 84 694
>> > 7815 18212 77975 16876 70005 144937 1 1558 232.14 0 0 0 1 0' from
>> > monitor pipe
>> > 128 bytes read ' 1 0 0 99 558 1644 45429 40869 16975 3745 1 1
>> > 274.82 86 14 109221 0 0' from monitor pipe
>> > <------------------------->
>> > <-- S2C throughput test -->
>> > -- port: 3003
>> > waiting for data on testOptions->s2csockfd
>> > Signal 11 received by process 22355
>> > Signal 17 received by process 22352
>> > Opening network interface 'eth2' for packet-pair timing
>> > installing pkt filter for 'host 10.110.109.104 and port 2580'
>> > Initial pkt src data = 8068484
>> > Signal 17 received by process 22352
>> > New packet trace started -- initializing counters
>> > sent 716955648 bytes to client in 10.00 seconds
>> > Buffer control counters Total = 87519, new data = 0, Draining Queue
>> > = 0
>> > Signal USR2(12) sent to child [22357]
>> > Signal 12 received by process 22357
>> > 03:16:26.019890 03:16:26.019890 Read ' 0 0 1 1 5 6 9612 1476
>> > 1235 1247 0 4235 454.46 0 0 0 1 0' from monitor pipe
>> > Read ' 0 0 0 3 1179 9323 46700 9154 13887 148207 4687 19061 273.91
>> > 10056 242043 102 0 4687' from monitor pipe
>> > 573470 kbps inbound
>> > libweb100: warning: accessing depricated variable AckPktsIn
>> > libweb100: warning: accessing depricated variable AckPktsOut
>> > Variable 13 (CwndRestores) not found in KIS
>> > Variable 22 (MaxCaCwnd) not found in KIS
>> > Variable 30 (MaxSaCwnd) not found in KIS
>> > Variable 13 (CwndRestores) not found in KIS
>> > Variable 22 (MaxCaCwnd) not found in KIS
>> > Variable 30 (MaxSaCwnd) not found in KIS
>> > Signal 11 received by process 22357
>> > Signal 17 received by process 22352
>> > Protocol error!
>> > S2C throughput test FAILED!
>> > Client --> Server data detects link = 10 Gigabit Enet
>> > Client <-- Server Ack's detect link = OC-12
>> > Server --> Client data detects link = OC-12
>> > Server <-- Client Ack's detect link = 10 Gigabit Enet
>> >
>> > If I'm not mistaking, the 2 lines:
>> >>>
>> > libweb100: warning: accessing depricated variable AckPktsIn
>> > libweb100: warning: accessing depricated variable AckPktsOut
>> >>>
>> > probably are the source of the problem.
>> >
>> > any ideas what i might have missed?
>> >
>> > thanks and best regards
>> > Christoph
>>
>> Richard Carlson
>>
>>
>> <mailto:>
>> 1000 Oakbrook Dr
>> Ann Arbor, MI 48104
>>
>> P: 734-352-7043
>> C: 630-251-4572
>>
>
> Richard Carlson
>
>
> <mailto:>
> 1000 Oakbrook Dr
> Ann Arbor, MI 48104
>
> P: 734-352-7043
> C: 630-251-4572
>
- Re: Error running web100 3.5.0, Chris Welti, 09/01/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/01/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/01/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/01/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/01/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/02/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/02/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/02/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/02/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/02/2009
- AW: Error running web100 3.5.0, Galuschka Christoph, 09/02/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/02/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/02/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/02/2009
- Re: Error running web100 3.5.0, Richard Carlson, 09/01/2009
- Re: Error running web100 3.5.0, Chris Welti, 09/01/2009
Archive powered by MHonArc 2.6.16.