ndt-users - Re: new server and slow off-lan server-to-client speeds
Subject: ndt-users list created
List archive
- From: Dale Blount <>
- To: Richard Carlson <>
- Cc:
- Subject: Re: new server and slow off-lan server-to-client speeds
- Date: Mon, 20 Feb 2006 08:40:57 -0500
I just noticed I replied to Rich only on accident, resending to the list
now:
On Fri, 2006-02-17 at 20:11 -0500, Richard Carlson wrote:
> Hi Dale;
>
> Answers in-line
> At 11:23 AM 2/17/2006, Dale Blount wrote:
> >On Thu, 2006-02-16 at 17:02 -0500, Richard Carlson wrote:
> > > Hi Dale;
> > >
> > > At 02:07 PM 2/16/2006, Dale Blount wrote:
> > > >On Wed, 2006-02-15 at 10:40 -0500, Richard Carlson wrote:
> > > > > Hi Dale;
> > > > >
> > > > > I don't recall if I replied to this earlier but here it is again.
> > > > >
> > > > > I'm seeing this problem more and more and I am currently trying to
> > find a
> > > > > real fix for this problem. Here's what's happening.
> > > > >
> > > > > The NDT server starts sending data to the remote client. It enters
> > > > > a
> > > > > simple send loop and pumps data into the network for 10
> > seconds. It then
> > > > > exits the loop and sends the final results over to the client.
> > > > >
> > > > > The problem is, that when in this loop it is possible for the OS to
> > > > > transfer data to the TCP stack, faster than the TCP stack pump it
> > out into
> > > > > the network. This results in a large standing queue, visible with
> > > > > the
> > > > > netstat -nat command). So at the end of 10 seconds the code stops
> > pumping
> > > > > more data but the client keeps reading until the queue
> > empties. Note the
> > > > > Web100 Duration variable has a value of 34,893,925 microseconds, or
> > almost
> > > > > 35 seconds.
> > > > >
> > > > > One temporary step is to limit the max buffer space (tcp_wmax) to
> > > > something
> > > > > in the 512 KB to 1MB range. This will keep the queue from building
> > up too
> > > > > much, but it's really just a band-aid until I can figure out how to
> > > > monitor
> > > > > the queue length to prevent such large queues in the first place.
> > > > >
> > > >
> > > >Rich,
> > > >
> > > >I can't find a tcp_wmax setting, but here is what I have set:
> > >
> > > Sorry that was a typo on my part. It should have been tcp_wmem not
> > > _wmax..
> > >
> > > >net.core.wmem_max = 131072
> > > >net.core.rmem_max = 131072
> > > >net.ipv4.tcp_wmem = 4096 16384 131072
> > > >net.ipv4.tcp_rmem = 4096 16384 131072
> > > >
> >
> >
> >so my "net.ipv4.tcp_wmem = 4096 16384 131072" is ok to limit the queue
> >length?
>
> Yes, however this will limit you to about 128 KB so depending on the path
> length you might start seeing reduced performance. For example a round
> trip time of 11 msec would limit the maximum speed to ~95 Mbps (128
> KB/11msec)* 8b/B = 95.33 Mbps. This is OK for a campus network, so this
> would only be a problem if the clients are outside this network, and can
> run at 100 Mbps all the way to the server.
>
I've just set it at 128k in an attempt to get this problem figured out
really. Setting it to 512k still seems to have no effect.
> > > >Upload always works OK, but on anything but the lan, download is right
> > > >around 75k. It doesn't really matter if it's set to 128kb/512kb/2Mb,
> > > >it's always 70-80kb (both on a 5Mbps upload cable modem and a 768kbps
> > > >upload dsl link, both 3 hops from the ndt server).
> > > >
> > > >The old server that this is replacing is still around, and speedtests
> > > >to
> > > >it work just fine. Could the newer hardware alone be causing this
> > > >whole
> > > >problem? I've tried the sysctl settings from the old server with the
> > > >same results.
> > >
> > > It could be a hardware issue or an OS issue. Did you change/upgrade
> > the OS
> > > level too? I noticed a problem with my server when I went from Linux
> > > 2.6.12 to 2.6 13.
> >
> >I went from 2.6.12.2-web100 to 2.6.15.3-web100. Distro is the same
> >version.
>
> I started having problems when I moved from 2.6.12 to 2.6.13, the .14 & .15
> kernels also failed. I finally replaced the e100.c file in the .15
> distribution with the one from the .12 tree and my problems went away.
>
> > > I finally tracked it down to a change in the Intel
> > > FastEthernet (e100) NIC driver that came with the new OS. I replaced
> > > the
> > > new e100.c file with the one from the .12 kernel and everything started
> > > working again. I also have a report of a problem with a built in NIC,
> > > and
> > > the problem was resolved when a PCI bus based NIC was installed.
> > > Perhaps
> > > this is a bigger problem than I realize.
> > >
> >
> >I also moved from a Dlink PCI card to an onboard TG3 chipset.
>
> What NIC driver does this chipset use? One option is to try what I did and
> use the old driver/net/xxx.c file. Simply rename the file in the .15 tree
> and then copy in the file from the .12 tree. Then run make modules; make
> modules_install and reboot.
>
>
It's a TG3 chipset, I just installed the driver from 2.6.12.2 into the
2.6.15 kernel. Same results.
>
> > > > > If anyone has any suggestions on how to do this, please let me know.
> > > > >
> > > >
> > > >Couldn't the client be adjusted to stop reading after 10 seconds? It
> > > >could then report the data transferred so far.
> > >
> > > There is a timer that runs in the client to clean things up, but I
> > > either
> > > there's a bug in my code or something else is wrong and the timer isn't
> > > working. I am currently testing a patch on my server at
> > > http://web100.internet2.edu:7123 It would help if you could try this
> > > server and let me know if the tests run long or what happens.
> > >
> >
> >LAN: Duration: 12510326
> > running 10s outbound test (client to server) . . . 8.30Mb/s
> > running 10s inbound test (server to client) . . . 16.56Mb/s
> >
> >
> >CABLE: Duration: 17116116
> > running 10s outbound test (client to server) . . . . . 1.40Mb/s
> > running 10s inbound test (server to client) . . . . . . 1.80Mb/s
>
> So, if I read this right you get much higher speeds testing to my server
> with my new code than you do to your local server. I'll try and get a
> patch built and release a new version early next week. Thanks for the
> feedback.
>
My old local server is fine... it's just this new one with problems.
Here's an interesting point, though. A linux client connected to the
same 7mb/768kb dsl connection as the box i was doing testing from
earlier reports the correct speeds. The linux client continues to
report correctly using the driver from 2.6.12.2, the windows XP clients
continue to fail.
Dale
- new server and slow off-lan server-to-client speeds, Dale Blount, 02/10/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/15/2006
- Re: new server and slow off-lan server-to-client speeds, Clayton Keller, 02/15/2006
- Re: new server and slow off-lan server-to-client speeds, Dale Blount, 02/16/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/16/2006
- Re: new server and slow off-lan server-to-client speeds, Dale Blount, 02/17/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/17/2006
- Re: new server and slow off-lan server-to-client speeds, Dale Blount, 02/20/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/17/2006
- Re: new server and slow off-lan server-to-client speeds, Dale Blount, 02/17/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/16/2006
- <Possible follow-up(s)>
- RE: new server and slow off-lan server-to-client speeds, Rick Tyrell, 02/20/2006
- RE: new server and slow off-lan server-to-client speeds, Rick Tyrell, 02/21/2006
- Re: new server and slow off-lan server-to-client speeds, Richard Carlson, 02/15/2006
Archive powered by MHonArc 2.6.16.