Skip to Content.
Sympa Menu

ndt-users - Re: Slow Inbound Tests

Subject: ndt-users list created

List archive

Re: Slow Inbound Tests


Chronological Thread 
  • From: Clayton Keller <>
  • To: Richard Carlson <>
  • Cc:
  • Subject: Re: Slow Inbound Tests
  • Date: Wed, 19 Oct 2005 13:33:09 -0500

Rich,

I've confused myself a little between the two threads maybe. The
additons in the INSTALL file, should I apply all of these tunings, or
leave what I had in my sysctl.conf file but with the 4M change, and then
also include the other recommendations form #9?

Also 3.1.4b, is including the patch you set to me that I added in myself
for the shutdown issues you saw, correct?

Clay

Richard Carlson wrote:
> Hi Clay;
>
> At 09:43 AM 10/19/2005, Clayton Keller wrote:
>
>> Rich,
>>
>> The server currently is not doing much else. Load averages on the server
>> sit pretty much at 0.00. It is a Pentium-4 3.40GHz with 2GB of RAM.
>> There is not anything else running on it that is causing any heavy loads
>> or additional traffic at this time.
>>
>> Currently, I have the following lines added to the /etc/sysctl.conf
>> file, which I acquired from the README:
>>
>> # Recommended sysctl settings from web100 README
>> net.core.wmem_max = 8388608
>> net.core.rmem_max = 8388608
>> net.ipv4.tcp_wmem = 4096 65536 8388608
>> net.ipv4.tcp_rmem = 4096 87380 8388608
>> net.ipv4.tcp_default_win_scale = 7
>> net.ipv4.tcp_moderate_rcvbuf = 1
>
>
> OK, the changes I suggest are minor. Just change the tcp_wmem and
> tcp_rmem max value to 4M (4194304) from the current 8M value. You can
> also change the tcp_default_win_scale value to 6.
>
> Let me know what happens.
>
> Rich
>
>> I can go ahead and make the adjustments that you recommended, but didn't
>> know if I should be making any further changes as well.
>>
>> I will run some further tests with the new settings and also with the
>> "-m" flag removed. However, I wanted to run the sysctl.conf settings
>> that we currently have by you first, and see if I should look at further
>> changes there.
>>
>> Clay
>>
>> Richard Carlson wrote:
>> > Hi Clay
>> >
>> > OK, I looked at the traces and the web100 stats and there are a couple
>> > of things that stand out.
>> >
>> > 1) your server is set to use 16 MB buffers.
>> > 2) this inbound test ran for 18 seconds (Duration and SndLimTimeCwnd)
>> > 3) the trace (.2790) shows that data stops flowing, but the connection
>> > isn't closing gracefully (no TCP FIN packets being exchanged). [This
>> > might be another bug in my server code]
>> >
>> > It's not clear to me why the test is running so long. What else is
>> > running on this server? Is it very busy? What does "/usr/bin/top"
>> > report? Finally, what messages appear in the clients Java console
>> > window? The client will report how long it spent reading data from the
>> > network
>> >
>> > Things to try:
>> > * One thing would be to reduce the maximum sender buffer size. Try
>> > making the max 4 MB instead of 16. Edit the /etc/sysctl.conf file and
>> > change the following lines.
>> > # increase Linux autotuning TCP buffer limits
>> > net.ipv4.tcp_rmem = 4096 87380 16777216
>> > net.ipv4.tcp_wmem = 4096 87380 16777216
>> > to # increase Linux autotuning TCP buffer limits
>> > net.ipv4.tcp_rmem = 4096 87380 4194304
>> > net.ipv4.tcp_wmem = 4096 87380 4194304
>> >
>> > and then run the "/sbin/sysctl -p" command.
>> >
>> > One possible problem is that the server is faster than the network so
>> > data is being placed in the send queue. The connection wouldn't
>> > shut-down until the queue is empty. So even if the NDT process stops
>> > sending after 10 seconds, it could take some time to drain the queue.
>> > With a 4 MB queue it would take less time to drain.
>> >
>> > That said, it isn't clear why the client is hanging for so long. I
>> > guess it's also possible that my shutdown patch isn't working properly
>> > in the multi-client mode. Can you try running the web100srv process
>> > without the -m flag. This will case the server to handle clients in a
>> > FIFO manner. If the server is busy the incoming clients will receive a
>> > message saying the server is busy and a test will begin in xx seconds.
>> > The client is updated every time another client's test finishes. I
>> know
>> > the shutdown() patch fixed a hang there, if possible give it a try and
>> > let me know what happens.
>> >
>> > That's all I can think of right now, I'll think about it some more
>> > tonight and run some tests tomorrow.
>> >
>> > Rich
>> >
>> > At 09:08 AM 10/18/2005, Clayton Keller wrote:
>> >
>> >> Rich,
>> >>
>> >> We are still seeing issues with the Inbound tests even after reverting
>> >> to the 2.6.12.5 kernel. This is not the Fedora Source kernel that
>> >> Martin is using, but the stock kernel.org download.
>> >>
>> >> I would like to go ahead and submit another trace for you. Is there a
>> >> possibility that the issues we are seeing are network/bandwidth issues
>> >> on our part?
>> >>
>> >> From my connection which is on a different network, the Outbound test
>> >> took aprox. 10 seconds while the Inbound test took well over one
>> >> minute. The info you are receiving is from a connection on that same
>> >> network. The Inbound test took about one minute before it reported its
>> >> results back to the user.
>> >>
>> >> I apologize, but I am not quite sure what all info is found in the
>> >> trace so I guess that is why I am asking you if there are external
>> >> issues on our end that maybe part of the cause.
>> >>
>> >> Also, I could look at using one of the Fedora kernels and patch it as
>> >> like Martyn did.
>> >>
>> >> Clay
>> >>
>> >>
>> >>
>> >> Richard Carlson wrote:
>> >>
>> >>> Hi Clay;
>> >>> The trace you sent does show a problem. At this point I don't see a
>> >>> need for more, but it would be useful to see what the 2.6.12 kernel
>> >>> does. So I'd suggest you revert back to the 2.6.12 kernel and I'll
>> >>> try and figure out how to get the kernel problem resolved.
>> >>> Rich
>> >>> At 09:21 AM 10/17/2005, Clayton Keller wrote:
>> >>>
>> >>>> Richard Carlson wrote:
>> >>>>
>> >>>>> Hi Craig;
>> >>>>> No, this NDT bug effects all servers. I ran into it while testing
>> >>>>> from multiple clients. Clients 2, 3, & 4 would get the "Other
>> >>>>> client testing please wait..." type message. Client 2 would not
>> >>>>> get the final results until client 4 finished. I'll add this patch
>> >>>>> to my next distribution, or you can apply it now if you are
>> >>>>> experiencing some problems.
>> >>>>> Since this didn't fix Clay's problem, I may need to rethink how the
>> >>>>> tests are done. Right now the server simply streams data out for
>> >>>>> 10 seconds, sending as much as it can. Given the way TCP works,
>> >>>>> there is a probability that the server will build up a queue in the
>> >>>>> Send buffer (the bus is faster than the wire). This buffer will
>> >>>>> need to drain before the test is complete. Packet loss, or other
>> >>>>> factors could mean that this draining takes a long time so the
>> >>>>> client simply sits there waiting. If it takes too long, the server
>> >>>>> process will time-out and terminate so the client will never get
>> >>>>> the final results.
>> >>>>> More later.
>> >>>>> Rich
>> >>>>> At 08:26 AM 10/14/2005, Pepmiller, Craig E. wrote:
>> >>>>>
>> >>>>>> Ok, so this is only seen when the NDT machine is configured for
>> >>>>>> multiple
>> >>>>>> simultaneous clients?
>> >>>>>>
>> >>>>>> Thanks-
>> >>>>>> -Craig
>> >>>>>>
>> >>>>>> -----Original Message-----
>> >>>>>> From: Richard Carlson
>> >>>>>> [mailto:]
>> >>>>>> Sent: Wednesday, October 12, 2005 2:56 PM
>> >>>>>> To: Clayton Keller;
>> >>>>>>
>> >>>>>> Subject: Re: Slow Inbound Tests
>> >>>>>>
>> >>>>>> Hi Clayton;
>> >>>>>>
>> >>>>>> This is a bug in the web100srv code. I forgot to shutdown the
>> >>>>>> control
>> >>>>>> socket at the end of the test. If there are multiple clients then
>> >>>>>> the
>> >>>>>> final results are sent in a LIFO manner, so the first client
>> needs to
>> >>>>>> wait
>> >>>>>> until all subsequent clients are done before the results are
>> >>>>>> returned.
>> >>>>>>
>> >>>>>> I'll issue a patched version soon. In the mean time you can patch
>> >>>>>> your
>> >>>>>> version by hand by adding the line "shutdown(ctlsockfd,
>> >>>>>> SHUT_RDWR);" to
>> >>>>>> the
>> >>>>>> web100srv.c file (on line 1126).
>> >>>>>>
>> >>>>>> Let me know if that fixes things.
>> >>>>>>
>> >>>>>> Rich
>> >>>>>>
>> >>>>>>
>> >>>>>> ---------------------------------------------------------------
>> >>>>>> Original code:
>> >>>>>> if (admin_view == 1) {
>> >>>>>> totalcnt = calculate(SumRTT, CountRTT,
>> >>>>>> CongestionSignals,
>> >>>>>> PktsOut, DupAcksIn, AckPktsIn,
>> >>>>>> CurrentMSS, SndLimTimeRwin,
>> SndLimTimeCwnd,
>> >>>>>> SndLimTimeSender,
>> >>>>>> MaxRwinRcvd, CurrentCwnd, Sndbuf,
>> >>>>>> DataBytesOut,
>> >>>>>>
>> >>>>>> mismatch, bad_cable,
>> >>>>>> (int)bwout, (int)bwin, c2sdata,
>> s2cack, 1,
>> >>>>>> debug);
>> >>>>>> gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans,
>> >>>>>> Timeouts,
>> >>>>>> Sndbuf, MaxRwinRcvd, CurrentCwnd,
>> mismatch,
>> >>>>>> bad_cable, totalcnt,
>> >>>>>> debug);
>> >>>>>> }
>> >>>>>>
>> >>>>>> /* printf("Saved data to log file\n"); */
>> >>>>>>
>> >>>>>> /* exit(0); */
>> >>>>>> }
>> >>>>>>
>> >>>>>> main(argc, argv)
>> >>>>>>
>> >>>>>> ----------------------------------------------------------
>> >>>>>> Modified code
>> >>>>>> if (admin_view == 1) {
>> >>>>>> totalcnt = calculate(SumRTT, CountRTT,
>> >>>>>> CongestionSignals,
>> >>>>>> PktsOut, DupAcksIn, AckPktsIn,
>> >>>>>> CurrentMSS, SndLimTimeRwin,
>> SndLimTimeCwnd,
>> >>>>>> SndLimTimeSender,
>> >>>>>> MaxRwinRcvd, CurrentCwnd, Sndbuf,
>> >>>>>> DataBytesOut,
>> >>>>>>
>> >>>>>> mismatch, bad_cable,
>> >>>>>> (int)bwout, (int)bwin, c2sdata,
>> s2cack, 1,
>> >>>>>> debug);
>> >>>>>> gen_html((int)bwout, (int)bwin, MinRTT, PktsRetrans,
>> >>>>>> Timeouts,
>> >>>>>> Sndbuf, MaxRwinRcvd, CurrentCwnd,
>> mismatch,
>> >>>>>> bad_cable, totalcnt,
>> >>>>>> debug);
>> >>>>>> }
>> >>>>>> shutdown(ctlsockfd, SHUT_RDWR);
>> >>>>>> /* printf("Saved data to log file\n"); */
>> >>>>>>
>> >>>>>> /* exit(0); */
>> >>>>>> }
>> >>>>>>
>> >>>>>> main(argc, argv)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> At 01:54 PM 10/12/2005, Clayton Keller wrote:
>> >>>>>> >I wanted to address this to the list. I believe there was a
>> similar
>> >>>>>> post a
>> >>>>>> >week or so back but I wanted to address this clean.
>> >>>>>> >
>> >>>>>> >I currently have web100srv running from /etc/init.d/ndt with the
>> >>>>>> following:
>> >>>>>> >
>> >>>>>> >/usr/local/sbin/web100srv -a -m -l /var/log/web100/web100srv.log
>> >>>>>> >
>> >>>>>> >The system is running on Fedora Core 4 using a patached 2.6.13
>> >>>>>> kernel
>> >>>>>> from
>> >>>>>> >kernel.org.
>> >>>>>> >
>> >>>>>> >The server itself is also sitting behind a PIX firewall.
>> >>>>>> >
>> >>>>>> >We have noticed that the Outbound Test will run rather
>> quickly, but
>> >>>>>> when
>> >>>>>> >the Inbound, server to client, test is ran it can take upwards of
>> >>>>>> several
>> >>>>>> >minutes to complete, many times as much as 4 minutes. There are
>> >>>>>> other
>> >>>>>> >times where from the end user's point-of-view it appears the test
>> >>>>>> never
>> >>>>>>
>> >>>>>> >completes although you can see results for the test appear in the
>> >>>>>> >web100.log file. The test though will continue to sit on the
>> >>>>>> unning 10s
>> >>>>>>
>> >>>>>> >inbound test (server to client) . . . . . . portion of the test,
>> >>>>>> and
>> >>>>>> many
>> >>>>>> >users are beginning to just close out the window.
>> >>>>>> >
>> >>>>>> >At this point I am looking for general issues that I can look
>> >>>>>> into and
>> >>>>>> >possibly run debug against as far as these tests are concerned.
>> >>>>>> >
>> >>>>>> >Clayton Keller
>> >>>>>>
>> >>>>>> ------------------------------------
>> >>>>
>> >>>>
>> >>>>
>> >>>> Richard
>> >>>>
>> >>>> Did you want me to grab any more traces on newer versions of the
>> >>>> 2.6.13.x kernel or more on the current kernel it is running? Or
>> >>>> should I revert back to my 2.6.12.5 kernel and see how performance
>> >>>> improves?
>> >>>>
>> >>>> I saw from an earlier post to a differnent thread that it appears
>> >>>> you are seeing some items in the traces that are eluding to issues
>> >>>> pertaining to the 2.6.13.x kernel.
>> >>>>
>> >>>> Clay
>> >>>
>> >>>
>> >>> ------------------------------------
>> >>>
>> >>> Richard A. Carlson e-mail:
>> >>>
>> >>> Network Engineer phone: (734)
>> 352-7043
>> >>> Internet2 fax: (734)
>> 913-4255
>> >>> 1000 Oakbrook Dr; Suite 300
>> >>> Ann Arbor, MI 48104
>> >>
>> >>
>> >>
>> >>
>> >> TCP/Web100 Network Diagnostic Tool v5.3.3e
>> >> click START to begin
>> >> Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
>> >> running 10s outbound test (client to server) . . . . . 894.71Kb/s
>> >> running 10s inbound test (server to client) . . . . . . 3.86Mb/s
>> >> Your PC is connected to a Cable/DSL modem
>> >> Information: Other network traffic is congesting the link
>> >>
>> >>
>> >> WEB100 Kernel Variables:
>> >> Client: localhost/127.0.0.1
>> >> AckPktsIn: 3330
>> >> AckPktsOut: 0
>> >> BytesRetrans: 81420
>> >> CongAvoid: 2639
>> >> CongestionOverCount: 0
>> >> CongestionSignals: 27
>> >> CountRTT: 2802
>> >> CurCwnd: 22080
>> >> CurMSS: 1380
>> >> CurRTO: 248
>> >> CurRwinRcvd: 258060
>> >> CurRwinSent: 5888
>> >> CurSsthresh: 16560
>> >> DSACKDups: 0
>> >> DataBytesIn: 0
>> >> DataBytesOut: 8879328
>> >> DataPktsIn: 0
>> >> DataPktsOut: 6192
>> >> DupAcksIn: 481
>> >> ECNEnabled: 0
>> >> FastRetran: 27
>> >> MaxCwnd: 63480
>> >> MaxMSS: 1380
>> >> MaxRTO: 295
>> >> MaxRTT: 111
>> >> MaxRwinRcvd: 258060
>> >> MaxRwinSent: 5888
>> >> MaxSsthresh: 41400
>> >> MinMSS: 1380
>> >> MinRTO: 229
>> >> MinRTT: 20
>> >> MinRwinRcvd: 238740
>> >> MinRwinSent: 5888
>> >> NagleEnabled: 1
>> >> OtherReductions: 0
>> >> PktsIn: 3330
>> >> PktsOut: 6192
>> >> PktsRetrans: 59
>> >> X_Rcvbuf: 16777216
>> >> RcvWinScale: 8
>> >> SACKEnabled: 3
>> >> SACKsRcvd: 510
>> >> SendStall: 0
>> >> SlowStart: 152
>> >> SampleRTT: 42
>> >> SmoothedRTT: 48
>> >> X_Sndbuf: 16777216
>> >> SndWinScale: 2
>> >> SndLimTimeRwin: 0
>> >> SndLimTimeCwnd: 18404625
>> >> SndLimTimeSender: 8258
>> >> SndLimTransRwin: 0
>> >> SndLimTransCwnd: 1
>> >> SndLimTransSender: 1
>> >> SndLimBytesRwin: 0
>> >> SndLimBytesCwnd: 8879328
>> >> SndLimBytesSender: 0
>> >> SubsequentTimeouts: 0
>> >> SumRTT: 127937
>> >> Timeouts: 0
>> >> TimestampsEnabled: 0
>> >> WinScaleRcvd: 2
>> >> WinScaleSent: 8
>> >> DupAcksOut: 0
>> >> StartTimeUsec: 118172
>> >> Duration: 18416093
>> >> c2sData: 2
>> >> c2sAck: 2
>> >> s2cData: 9
>> >> s2cAck: 3
>> >> half_duplex: 0
>> >> link: 100
>> >> congestion: 1
>> >> bad_cable: 0
>> >> mismatch: 0
>> >> spd: 0.00
>> >> bw: 3.49
>> >> loss: 0.004360465
>> >> avgrtt: 45.66
>> >> waitsec: 0.00
>> >> timesec: 18.00
>> >> order: 0.1444
>> >> rwintime: 0.0000
>> >> sendtime: 0.0004
>> >> cwndtime: 0.9996
>> >> rwin: 1.9688
>> >> swin: 128.0000
>> >> cwin: 0.4843
>> >> rttsec: 0.045659
>> >> Sndbuf: 16777216
>> >> aspd: 8.63416
>> >
>> >
>> > ------------------------------------
>> >
>> >
>> >
>> > Richard A. Carlson e-mail:
>> >
>> > Network Engineer phone: (734) 352-7043
>> > Internet2 fax: (734) 913-4255
>> > 1000 Oakbrook Dr; Suite 300
>> > Ann Arbor, MI 48104
>> >
>
>
> ------------------------------------
>
>
>
> Richard A. Carlson e-mail:
>
> Network Engineer phone: (734) 352-7043
> Internet2 fax: (734) 913-4255
> 1000 Oakbrook Dr; Suite 300
> Ann Arbor, MI 48104
>



Archive powered by MHonArc 2.6.16.

Top of Page