Skip to Content.
Sympa Menu

ndt-users - Re: new server and slow off-lan server-to-client speeds

Subject: ndt-users list created

List archive

Re: new server and slow off-lan server-to-client speeds


Chronological Thread 
  • From: Dale Blount <>
  • To: Richard Carlson <>
  • Cc:
  • Subject: Re: new server and slow off-lan server-to-client speeds
  • Date: Mon, 20 Feb 2006 08:40:57 -0500

I just noticed I replied to Rich only on accident, resending to the list
now:

On Fri, 2006-02-17 at 20:11 -0500, Richard Carlson wrote:
> Hi Dale;
>
> Answers in-line
> At 11:23 AM 2/17/2006, Dale Blount wrote:
> >On Thu, 2006-02-16 at 17:02 -0500, Richard Carlson wrote:
> > > Hi Dale;
> > >
> > > At 02:07 PM 2/16/2006, Dale Blount wrote:
> > > >On Wed, 2006-02-15 at 10:40 -0500, Richard Carlson wrote:
> > > > > Hi Dale;
> > > > >
> > > > > I don't recall if I replied to this earlier but here it is again.
> > > > >
> > > > > I'm seeing this problem more and more and I am currently trying to
> > find a
> > > > > real fix for this problem. Here's what's happening.
> > > > >
> > > > > The NDT server starts sending data to the remote client. It enters
> > > > > a
> > > > > simple send loop and pumps data into the network for 10
> > seconds. It then
> > > > > exits the loop and sends the final results over to the client.
> > > > >
> > > > > The problem is, that when in this loop it is possible for the OS to
> > > > > transfer data to the TCP stack, faster than the TCP stack pump it
> > out into
> > > > > the network. This results in a large standing queue, visible with
> > > > > the
> > > > > netstat -nat command). So at the end of 10 seconds the code stops
> > pumping
> > > > > more data but the client keeps reading until the queue
> > empties. Note the
> > > > > Web100 Duration variable has a value of 34,893,925 microseconds, or
> > almost
> > > > > 35 seconds.
> > > > >
> > > > > One temporary step is to limit the max buffer space (tcp_wmax) to
> > > > something
> > > > > in the 512 KB to 1MB range. This will keep the queue from building
> > up too
> > > > > much, but it's really just a band-aid until I can figure out how to
> > > > monitor
> > > > > the queue length to prevent such large queues in the first place.
> > > > >
> > > >
> > > >Rich,
> > > >
> > > >I can't find a tcp_wmax setting, but here is what I have set:
> > >
> > > Sorry that was a typo on my part. It should have been tcp_wmem not
> > > _wmax..
> > >
> > > >net.core.wmem_max = 131072
> > > >net.core.rmem_max = 131072
> > > >net.ipv4.tcp_wmem = 4096 16384 131072
> > > >net.ipv4.tcp_rmem = 4096 16384 131072
> > > >
> >
> >
> >so my "net.ipv4.tcp_wmem = 4096 16384 131072" is ok to limit the queue
> >length?
>
> Yes, however this will limit you to about 128 KB so depending on the path
> length you might start seeing reduced performance. For example a round
> trip time of 11 msec would limit the maximum speed to ~95 Mbps (128
> KB/11msec)* 8b/B = 95.33 Mbps. This is OK for a campus network, so this
> would only be a problem if the clients are outside this network, and can
> run at 100 Mbps all the way to the server.
>

I've just set it at 128k in an attempt to get this problem figured out
really. Setting it to 512k still seems to have no effect.


> > > >Upload always works OK, but on anything but the lan, download is right
> > > >around 75k. It doesn't really matter if it's set to 128kb/512kb/2Mb,
> > > >it's always 70-80kb (both on a 5Mbps upload cable modem and a 768kbps
> > > >upload dsl link, both 3 hops from the ndt server).
> > > >
> > > >The old server that this is replacing is still around, and speedtests
> > > >to
> > > >it work just fine. Could the newer hardware alone be causing this
> > > >whole
> > > >problem? I've tried the sysctl settings from the old server with the
> > > >same results.
> > >
> > > It could be a hardware issue or an OS issue. Did you change/upgrade
> > the OS
> > > level too? I noticed a problem with my server when I went from Linux
> > > 2.6.12 to 2.6 13.
> >
> >I went from 2.6.12.2-web100 to 2.6.15.3-web100. Distro is the same
> >version.
>
> I started having problems when I moved from 2.6.12 to 2.6.13, the .14 & .15
> kernels also failed. I finally replaced the e100.c file in the .15
> distribution with the one from the .12 tree and my problems went away.
>
> > > I finally tracked it down to a change in the Intel
> > > FastEthernet (e100) NIC driver that came with the new OS. I replaced
> > > the
> > > new e100.c file with the one from the .12 kernel and everything started
> > > working again. I also have a report of a problem with a built in NIC,
> > > and
> > > the problem was resolved when a PCI bus based NIC was installed.
> > > Perhaps
> > > this is a bigger problem than I realize.
> > >
> >
> >I also moved from a Dlink PCI card to an onboard TG3 chipset.
>
> What NIC driver does this chipset use? One option is to try what I did and
> use the old driver/net/xxx.c file. Simply rename the file in the .15 tree
> and then copy in the file from the .12 tree. Then run make modules; make
> modules_install and reboot.
>
>

It's a TG3 chipset, I just installed the driver from 2.6.12.2 into the
2.6.15 kernel. Same results.


>
> > > > > If anyone has any suggestions on how to do this, please let me know.
> > > > >
> > > >
> > > >Couldn't the client be adjusted to stop reading after 10 seconds? It
> > > >could then report the data transferred so far.
> > >
> > > There is a timer that runs in the client to clean things up, but I
> > > either
> > > there's a bug in my code or something else is wrong and the timer isn't
> > > working. I am currently testing a patch on my server at
> > > http://web100.internet2.edu:7123 It would help if you could try this
> > > server and let me know if the tests run long or what happens.
> > >
> >
> >LAN: Duration: 12510326
> > running 10s outbound test (client to server) . . . 8.30Mb/s
> > running 10s inbound test (server to client) . . . 16.56Mb/s
> >
> >
> >CABLE: Duration: 17116116
> > running 10s outbound test (client to server) . . . . . 1.40Mb/s
> > running 10s inbound test (server to client) . . . . . . 1.80Mb/s
>
> So, if I read this right you get much higher speeds testing to my server
> with my new code than you do to your local server. I'll try and get a
> patch built and release a new version early next week. Thanks for the
> feedback.
>

My old local server is fine... it's just this new one with problems.
Here's an interesting point, though. A linux client connected to the
same 7mb/768kb dsl connection as the box i was doing testing from
earlier reports the correct speeds. The linux client continues to
report correctly using the driver from 2.6.12.2, the windows XP clients
continue to fail.

Dale





Archive powered by MHonArc 2.6.16.

Top of Page