Skip to Content.
Sympa Menu

ndt-users - Re: Slow Inbound Tests

Subject: ndt-users list created

List archive

Re: Slow Inbound Tests


Chronological Thread 
  • From: Clayton Keller <>
  • Cc:
  • Subject: Re: Slow Inbound Tests
  • Date: Mon, 17 Oct 2005 08:44:43 -0500

Richard Carlson wrote:
Hi Clay;

The trace you sent does show a problem. At this point I don't see a need for more, but it would be useful to see what the 2.6.12 kernel does. So I'd suggest you revert back to the 2.6.12 kernel and I'll try and figure out how to get the kernel problem resolved.

Rich

At 09:21 AM 10/17/2005, Clayton Keller wrote:

Richard Carlson wrote:

Hi Craig;
No, this NDT bug effects all servers. I ran into it while testing from multiple clients. Clients 2, 3, & 4 would get the "Other client testing please wait..." type message. Client 2 would not get the final results until client 4 finished. I'll add this patch to my next distribution, or you can apply it now if you are experiencing some problems.
Since this didn't fix Clay's problem, I may need to rethink how the tests are done. Right now the server simply streams data out for 10 seconds, sending as much as it can. Given the way TCP works, there is a probability that the server will build up a queue in the Send buffer (the bus is faster than the wire). This buffer will need to drain before the test is complete. Packet loss, or other factors could mean that this draining takes a long time so the client simply sits there waiting. If it takes too long, the server process will time-out and terminate so the client will never get the final results.
More later.
Rich
At 08:26 AM 10/14/2005, Pepmiller, Craig E. wrote:

Ok, so this is only seen when the NDT machine is configured for multiple
simultaneous clients?

Thanks-
-Craig

-----Original Message-----
From: Richard Carlson
[mailto:]
Sent: Wednesday, October 12, 2005 2:56 PM
To: Clayton Keller;

Subject: Re: Slow Inbound Tests

Hi Clayton;

This is a bug in the web100srv code. I forgot to shutdown the control
socket at the end of the test. If there are multiple clients then the
final results are sent in a LIFO manner, so the first client needs to
wait
until all subsequent clients are done before the results are returned.

I'll issue a patched version soon. In the mean time you can patch your
version by hand by adding the line "shutdown(ctlsockfd, SHUT_RDWR);" to
the
web100srv.c file (on line 1126).

Let me know if that fixes things.

Rich


---------------------------------------------------------------
Original code:
if (admin_view == 1) {
totalcnt = calculate(SumRTT, CountRTT, CongestionSignals,
PktsOut, DupAcksIn, AckPktsIn,
CurrentMSS, SndLimTimeRwin, SndLimTimeCwnd,
SndLimTimeSender,
MaxRwinRcvd, CurrentCwnd, Sndbuf, DataBytesOut,

mismatch, bad_cable,
(int)bwout, (int)bwin, c2sdata, s2cack, 1,
debug);
gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans,
Timeouts,
Sndbuf, MaxRwinRcvd, CurrentCwnd, mismatch,
bad_cable, totalcnt,
debug);
}

/* printf("Saved data to log file\n"); */

/* exit(0); */
}

main(argc, argv)

----------------------------------------------------------
Modified code
if (admin_view == 1) {
totalcnt = calculate(SumRTT, CountRTT, CongestionSignals,
PktsOut, DupAcksIn, AckPktsIn,
CurrentMSS, SndLimTimeRwin, SndLimTimeCwnd,
SndLimTimeSender,
MaxRwinRcvd, CurrentCwnd, Sndbuf, DataBytesOut,

mismatch, bad_cable,
(int)bwout, (int)bwin, c2sdata, s2cack, 1,
debug);
gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans,
Timeouts,
Sndbuf, MaxRwinRcvd, CurrentCwnd, mismatch,
bad_cable, totalcnt,
debug);
}
shutdown(ctlsockfd, SHUT_RDWR);
/* printf("Saved data to log file\n"); */

/* exit(0); */
}

main(argc, argv)




At 01:54 PM 10/12/2005, Clayton Keller wrote:
>I wanted to address this to the list. I believe there was a similar
post a
>week or so back but I wanted to address this clean.
>
>I currently have web100srv running from /etc/init.d/ndt with the
following:
>
>/usr/local/sbin/web100srv -a -m -l /var/log/web100/web100srv.log
>
>The system is running on Fedora Core 4 using a patached 2.6.13 kernel
from
>kernel.org.
>
>The server itself is also sitting behind a PIX firewall.
>
>We have noticed that the Outbound Test will run rather quickly, but
when
>the Inbound, server to client, test is ran it can take upwards of
several
>minutes to complete, many times as much as 4 minutes. There are other
>times where from the end user's point-of-view it appears the test never

>completes although you can see results for the test appear in the
>web100.log file. The test though will continue to sit on the unning 10s

>inbound test (server to client) . . . . . . portion of the test, and
many
>users are beginning to just close out the window.
>
>At this point I am looking for general issues that I can look into and
>possibly run debug against as far as these tests are concerned.
>
>Clayton Keller

------------------------------------


Richard

Did you want me to grab any more traces on newer versions of the 2.6.13.x kernel or more on the current kernel it is running? Or should I revert back to my 2.6.12.5 kernel and see how performance improves?

I saw from an earlier post to a differnent thread that it appears you are seeing some items in the traces that are eluding to issues pertaining to the 2.6.13.x kernel.

Clay


------------------------------------

Richard,

I will get that server running on that kernel as soon as I can. I'll let you know what we see test wise when I have a chance. Would you like any traces on the 2.6.12.5 kernel as well, or more so wait to see if there are any issues?



Archive powered by MHonArc 2.6.16.

Top of Page