Skip to Content.
Sympa Menu

ndt-users - Re: Slow Inbound Tests

Subject: ndt-users list created

List archive

Re: Slow Inbound Tests


Chronological Thread 
  • From: Richard Carlson <>
  • To: Clayton Keller <>,
  • Subject: Re: Slow Inbound Tests
  • Date: Thu, 13 Oct 2005 21:59:54 -0400

HI Clay;

I ran some tests from my office and again later from home a few minutes ago. So far nothing is jumping out at me. From work I ran 4 or 5 tests and I don't recall anything more than I said earlier this evening.

From home I cross my 802.11b wireless link to the cable modem box and then on to your server. I'm getting typical results (400Kb upload and 4 Mb download speed reported. However I did see a noticeable delay from the time the inbound test started until the results showed up. I turned on the Java console and I notice something strange. The client writes out the raw results of the inbound test and I'm seeing the exact number of bytes (16,678,912) being sent on every test. I also see that it is taking about 32 seconds to complete this transfer. However, if you look at the NDT "more details" page you will find a Duration value that is around 12 seconds.

The more I look at this the more suspicious I get about this possible bug in the Linux 2.6.13.2 kernel. Would it be possible to back up to the 2.6.12 kernel? The other thing that would help is to run the web100srv process with the -t option. This option will create a TCPDUMP formatted file (ndttrace.ip.port) for each test. I'd like to see what the outbound file looks like.

Here's what's going on. Yesterday I was working on a revised duplex-mismatch detection algorithm and I noticed some strange behavior in the traces. I'm seeing the kernel blast out data without using the standard TCP slow start algorithms. Now, the 2.6.13 kernel introduced a new modular congestion control scheme, replacing the old fixed scheme used up until the 2.6.12 kernel. What makes this even stranger is that the behavior of the server changes depending on the client (Windows and FreeBSD are bad, while another Linux client looks OK). The problem I see are aggravated by the mismatch condition causing loss. I'm wondering if the problem you are seeing is related. (Why else would the server sent exactly the same number of bytes 3 times in a row?)

So it would help if you could send me one or 2 of the ndttrace files the "-t" options creates. Especially if they occur with a long delay. It would also help if you look at the java console output to see if you are also getting a fixed amount of data (16 MB) on every test. Lastly if you could built a 2.6.12 kernel and let me know what happens. (assuming this really is an FC-4 host with the 2.6.13 kernel.

Regards;
Rich

At 04:11 PM 10/13/2005, Clayton Keller wrote:

Clayton Keller wrote:
Richard Carlson wrote:

Hi Clayton;

This is a bug in the web100srv code. I forgot to shutdown the control socket at the end of the test. If there are multiple clients then the final results are sent in a LIFO manner, so the first client needs to wait until all subsequent clients are done before the results are returned.

I'll issue a patched version soon. In the mean time you can patch your version by hand by adding the line "shutdown(ctlsockfd, SHUT_RDWR);" to the web100srv.c file (on line 1126).

Let me know if that fixes things.

Rich


---------------------------------------------------------------
Original code:
if (admin_view == 1) {
totalcnt = calculate(SumRTT, CountRTT, CongestionSignals, PktsOut, DupAcksIn, AckPktsIn,
CurrentMSS, SndLimTimeRwin, SndLimTimeCwnd, SndLimTimeSender,
MaxRwinRcvd, CurrentCwnd, Sndbuf, DataBytesOut, mismatch, bad_cable,
(int)bwout, (int)bwin, c2sdata, s2cack, 1, debug);
gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans, Timeouts,
Sndbuf, MaxRwinRcvd, CurrentCwnd, mismatch, bad_cable, totalcnt,
debug);
}

/* printf("Saved data to log file\n"); */

/* exit(0); */
}

main(argc, argv)

----------------------------------------------------------
Modified code
if (admin_view == 1) {
totalcnt = calculate(SumRTT, CountRTT, CongestionSignals, PktsOut, DupAcksIn, AckPktsIn,
CurrentMSS, SndLimTimeRwin, SndLimTimeCwnd, SndLimTimeSender,
MaxRwinRcvd, CurrentCwnd, Sndbuf, DataBytesOut, mismatch, bad_cable,
(int)bwout, (int)bwin, c2sdata, s2cack, 1, debug);
gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans, Timeouts,
Sndbuf, MaxRwinRcvd, CurrentCwnd, mismatch, bad_cable, totalcnt,
debug);
}
shutdown(ctlsockfd, SHUT_RDWR);
/* printf("Saved data to log file\n"); */

/* exit(0); */
}

main(argc, argv)




At 01:54 PM 10/12/2005, Clayton Keller wrote:

I wanted to address this to the list. I believe there was a similar post a week or so back but I wanted to address this clean.

I currently have web100srv running from /etc/init.d/ndt with the following:

/usr/local/sbin/web100srv -a -m -l /var/log/web100/web100srv.log

The system is running on Fedora Core 4 using a patached 2.6.13 kernel from kernel.org.

The server itself is also sitting behind a PIX firewall.

We have noticed that the Outbound Test will run rather quickly, but when the Inbound, server to client, test is ran it can take upwards of several minutes to complete, many times as much as 4 minutes. There are other times where from the end user's point-of-view it appears the test never completes although you can see results for the test appear in the web100.log file. The test though will continue to sit on the unning 10s inbound test (server to client) . . . . . .
portion of the test, and many users are beginning to just close out the window.

At this point I am looking for general issues that I can look into and possibly run debug against as far as these tests are concerned.

Clayton Keller



------------------------------------



Richard A. Carlson e-mail:
Network Engineer phone: (734) 352-7043
Internet2 fax: (734) 913-4255
1000 Oakbrook Dr; Suite 300
Ann Arbor, MI 48104
Thanks Richard,
I've got a test box I will apply and test to, then look at applying this to the live box and get some good "in the field testing". I'll let you know how things go.
Clay
Rich,

It appears that there is still a long delay in the amount of time it takes to complete the Inbound test (several minutes from my own personal attempts). Could we be looking at network issues on my end. At time I do see the following message returned as well:

"Information: Other network traffic is congesting the link"

If you would like to try it out for yourself and see if you can recreate this delay, the test is available at:

http://speedtest.ruraltel.net:7123/

I'd be grateful for any other info I can get, as far as things to look at and run debug against.

Clay

------------------------------------



Richard A. Carlson e-mail:

Network Engineer phone: (734) 352-7043
Internet2 fax: (734) 913-4255
1000 Oakbrook Dr; Suite 300
Ann Arbor, MI 48104



Archive powered by MHonArc 2.6.16.

Top of Page