Skip to Content.
Sympa Menu

ndt-users - Re: Error running web100 3.5.0

Subject: ndt-users list created

List archive

Re: Error running web100 3.5.0


Chronological Thread 
  • From: Richard Carlson <>
  • To: Chris Welti <>
  • Cc: "" <>
  • Subject: Re: Error running web100 3.5.0
  • Date: Tue, 1 Sep 2009 08:28:25 -0500

Hi Chris;

Thanks for the update. I think I released a bad version of 3.5.6. I'll get a new release out in the next few hours.

Sorry for the disruption.

Rich

On Sep 1, 2009, at 7:39 AM, Chris Welti wrote:

Hi Rich,

I've just upgraded one of our NDT servers from 3.4.4 to 3.5.6.
I'm observing the same behaviour as listed in this thread:

** Starting test 1 of 1 **
Connected to: lsmp2 -- Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
checking for firewalls . . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client-to-server [C2S]) . . . . . 91.78Mb/s
running 10s inbound test (server-to-client [S2C]) . . . . . . 85.01Mb/s
S2C throughput test: Received wrong type of the message
ERROR MSG: Server (S2C throughput test): Invalid S2C throughput received
S2C throughput test FAILED!
Server unable to determine bottleneck link type.
Information: Other network traffic is congesting the link

There are test results for both C2S and S2C tests, but an error is reported anyway.
On the server there are also error messages (Protocol error! S2C throughput test FAILED!)

Anyway, I've tried to install all the 3.5. versions before and it seems to me that this bug was introduced with v3.5.3.
Using v3.5.2 or v3.5.1 it works as a charm on the same system (debian, lenny, 2.6.26-web100)
v3.5.3, v3.5.4, v3.5.5 and v3.5.6 all produce the same error output as above.

Maybe that helps... Regards,
Chris

Galuschka Christoph wrote:
Hello Richard,

I'm currently running the tests with 2.6.30.5 and IE6. JAVA - if
relevant - is 1.6.11.
The funny thing also is, I do get results in the webbrowser, the client
just doesn't finish correctly.

I will repair the WAIT-comment thing on monday (sorry, I'm not the best
in C)

Regards
Christoph

-----------------------------------------
Ing. Christoph Galuschka

TIWAG-Tiroler Wasserkraft AG
Bereich IT/Betrieb und Services
Eduard-Wallnöfer-Platz 2
6010 Innsbruck
T: +43 (0)50607 21832
F: +43 (0)50607 41832
www.tiroler-wasserkraft.at
-----------------------------------------
Firmenbuchgericht Innsbruck, FN 44133b
Sitz der Gesellschaft: Innsbruck
DVR: 0164089

------------------------------------------------------------------------
*From:* Richard Carlson
*Sent:* Sat 8/29/2009 15:28
*To:* Galuschka Christoph
*Cc:*

*Subject:* Re: Error running web100 3.5.0

Hi Chris;

What browser are you using? What kernel are you using on the server?
I'll try and duplicate this in my lab.

The problem is that the server thinks that the server-to-client test
failed, even though it completed successfully.
More in-line

On Aug 29, 2009, at 2:16 AM, Galuschka Christoph wrote:

Hello Richard,

thanks for posting the new releases.
I've installed 3.5.6 and I still get some errors resulting in an
incomplete measurement. Here is the output from web100srv:

ANL/Internet2 NDT ver 3.5.6
Variables file = /usr/local/ndt/web100_variables
log file = /usr/local/ndt/web100srv.log
Debug level set to 5
[snip snip snip]
Everything was normal up to this point.
fwd.saddr = dd70b0a:3003, rev.saddr = f006c0a:3461
01:02:56.724367 10.11.215.13:3003 --> 10.108.0.15:3461 Collected
pkt-pair data max = 18667
01:02:56.724367 10.108.0.15:3461 --> 10.11.215.13:3003 Collected
pkt-pair data max = 65475
Read ' 1 0 0 0 4 661 18667 6971 5501 5721 0 5377 976.37 0 0 0 1 0 7'
from monitor pipe
Read ' 0 0 0 1 367 9334 40681 26321 35413 65475 39990 34285 663.83
39864 40036 171967 0 39990 7' from monitor pipe
550764 kbps inbound
This is the measured sc2 speed.

libweb100: warning: accessing depricated variable AckPktsIn
Variable 0 (AckPktsIn): web100_snap_read(): invalid arguments
libweb100: warning: accessing depricated variable AckPktsOut
[snip snip snip]
The server walks through the list of variables twice, once for the 'read' group and once for the 'tuning' group. You can ignore these errors - they are non-events.
send_msg: type=5, len=18

[snip snip snip]
The data was successfully sent back to the client.
Signal 11 received by process 3746
Signal 17 received by process 3741
The child process received the terminate signal and the child process
terminated.

Protocol error!
send_msg: type=7, len=61
S2C throughput test FAILED!
This says the s2c test failed and the server sent that message to the
client, however as noted above, the test succeeded.

Finished testing C2S = 690.88 Mbps, S2C = -0.00 Mbps
Client --> Server data detects link = OC-12
Client <-- Server Ack's detect link = Gigabit Ethernet
Server --> Client data detects link = OC-12
Server <-- Client Ack's detect link = OC-12
CWND limited test = 43453.26 while unlimited = -0.71
Better throughput when CWND is limited, may be duplex mismatch
send_msg: type=8, len=42
send_msg: type=8, len=76
send_msg: type=8, len=89
send_msg: type=8, len=77
send_msg: type=8, len=82
send_msg: type=8, len=53
send_msg: type=9, len=0
Opened
'/usr/local/ndt/serverdata/ 2009/08/29/20090829T07:02:36.826169000Z_10.108.0.15:3444.meta'
metadata log file
Successfully returned from run_test() routine
Signal 17 received by process 3740
now = 1251529386 Process started at 1251529356, run time = 30
Select exited with rc = -1
Queue pointer = 3741, testing = 1, waiting = 1, zombie_check = 0
Received SIGCHLD signal for active web100srv process [3740]
wait3() returned 0 for PID=3741
wexitstatus = '0'
Attempting to clean up child 3741, head pid = 3741
Child process 3741 causing head pointer modification
Removing Child from head, decrementing waiting now = 0
Timer not running, waiting for new connection
And everything exits properly.


This is the result from the browser:

Connecting to '10.11.215.13' [/10.11.215.13] to run test
Connected to: 10.11.215.13 -- Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
checking for firewalls . . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client-to-server [C2S]) . . . . . 690.87Mb/s
running 10s inbound test (server-to-client [S2C]) . . . . . . 550.79Mb/s
S2C throughput test: Received wrong type of the message
ERROR MSG: Server (S2C throughput test): Invalid S2C throughput received
S2C throughput test FAILED!
The slowest link in the end-to-end path is a a 622 Mbps OC-12 subnet


So the question is, why did the server mistakenly report an error?

After running the test I re-read your email and checked
src/testoptions.c line 730 about the comment wait(NULL). I see the /*
*/ are still there so I removed them and recompiled everything. This
did not help very much - here is output from web100srv with the /* */
removed:


[snip snip snip]

Sorry for not being clear. The 3.5.6 code used the waitpid() call
instead of the wait() call. They are functionally equivalent, so when
you removed the comment, you now wait twice after the c2s test
completes. This causes the server to timeout and no s2c test is run.
Just remove/comment out one of the 2 wait() or waitpid() lines and
rebuild/install to get back to 1 wait call.

Regards;
Rich


I hope the debugging output helps...

thanks and best regards
Christoph Galuschka

------------------------------------------------------------------------
*From:* Richard Carlson
[mailto:]
*Sent:* Fri 8/28/2009 14:25
*To:* Galuschka Christoph
*Cc:*


<mailto:>
*Subject:* Re: Error running web100 3.5.0

Hi Chris;

Sorry about that. There is a bug in the 3.5.0 release. You can
download the latest version (3.5.6 which will be posted soon) or you
can easily patch the 3.5.0 release. Just edit the src/testoptions.c
file and go to line 730. You should find the line /* wait(NULL); */
- which ofcourse makes this a comment. Remove the "/*" and "*/"
char's (so its not a comment) and rebuild/reinstall the package. This
should clear out this fault.

Rich

On Aug 27, 2009, at 4:47 AM,

<mailto:>
wrote:

Hello,

I've just finished installing ndt-3.5.0 on a fresh SuSE 11.1 System
(incl. alle prerequisits; patch for kernel 2.6.27,
web100_userland-1.7). Server runs fine and I do get bandwith results.

However, the web100srv produces an error which fails to compelte the
test successfully. This is the output from -ddd:
ANL/Internet2 NDT ver 3.5.0
Variables file = /usr/local/ndt/web100_variables
log file = /usr/local/ndt/web100srv.log
Debug level set to 1
server ready on port 3001
web100_init() read 69 variables from file
Starting test suite:
Middlebox test
Simple firewall test
C2S throughput test
S2C throughput test
<-- Middlebox test -->
-- port: 3003
Sending 1456 Byte packets over the network
Signal 17 received by process 22352
<-------------------->
<-- Simple firewall test -->
-- port: 42133
-- time: 1
-- oport: 2571
<-------------------------->
<-- C2S throughput test -->
-- port: 3002
listening for Inet connection on testOptions->c2ssockfd, fd=3
Sending 'GO' signal, to tell client to head for the next test
Opening network interface 'eth2' for packet-pair timing
installing pkt filter for 'host 10.110.109.104 and port 2574'
Initial pkt src data = 8068484
New packet trace started -- initializing counters
365314 kbps outbound
Signal USR1(10) sent to child [22355]
Signal 10 received by process 22355
03:16:15.649224 03:16:15.649224 128 bytes read ' 0 0 84 694
7815 18212 77975 16876 70005 144937 1 1558 232.14 0 0 0 1 0' from
monitor pipe
128 bytes read ' 1 0 0 99 558 1644 45429 40869 16975 3745 1 1
274.82 86 14 109221 0 0' from monitor pipe
<------------------------->
<-- S2C throughput test -->
-- port: 3003
waiting for data on testOptions->s2csockfd
Signal 11 received by process 22355
Signal 17 received by process 22352
Opening network interface 'eth2' for packet-pair timing
installing pkt filter for 'host 10.110.109.104 and port 2580'
Initial pkt src data = 8068484
Signal 17 received by process 22352
New packet trace started -- initializing counters
sent 716955648 bytes to client in 10.00 seconds
Buffer control counters Total = 87519, new data = 0, Draining Queue
= 0
Signal USR2(12) sent to child [22357]
Signal 12 received by process 22357
03:16:26.019890 03:16:26.019890 Read ' 0 0 1 1 5 6 9612 1476
1235 1247 0 4235 454.46 0 0 0 1 0' from monitor pipe
Read ' 0 0 0 3 1179 9323 46700 9154 13887 148207 4687 19061 273.91
10056 242043 102 0 4687' from monitor pipe
573470 kbps inbound
libweb100: warning: accessing depricated variable AckPktsIn
libweb100: warning: accessing depricated variable AckPktsOut
Variable 13 (CwndRestores) not found in KIS
Variable 22 (MaxCaCwnd) not found in KIS
Variable 30 (MaxSaCwnd) not found in KIS
Variable 13 (CwndRestores) not found in KIS
Variable 22 (MaxCaCwnd) not found in KIS
Variable 30 (MaxSaCwnd) not found in KIS
Signal 11 received by process 22357
Signal 17 received by process 22352
Protocol error!
S2C throughput test FAILED!
Client --> Server data detects link = 10 Gigabit Enet
Client <-- Server Ack's detect link = OC-12
Server --> Client data detects link = OC-12
Server <-- Client Ack's detect link = 10 Gigabit Enet

If I'm not mistaking, the 2 lines:

libweb100: warning: accessing depricated variable AckPktsIn
libweb100: warning: accessing depricated variable AckPktsOut

probably are the source of the problem.

any ideas what i might have missed?

thanks and best regards
Christoph

Richard Carlson


<mailto:>
1000 Oakbrook Dr
Ann Arbor, MI 48104

P: 734-352-7043
C: 630-251-4572


Richard Carlson


<mailto:>
1000 Oakbrook Dr
Ann Arbor, MI 48104

P: 734-352-7043
C: 630-251-4572



Richard Carlson

1000 Oakbrook Dr
Ann Arbor, MI 48104

P: 734-352-7043
C: 630-251-4572




Archive powered by MHonArc 2.6.16.

Top of Page