ndt-users - Solaris 10 TCP/IP weirdness uncovered by NDT

Subject: ndt-users list created

List archive

Solaris 10 TCP/IP weirdness uncovered by NDT

From: Simon Leinen <>
To:
Subject: Solaris 10 TCP/IP weirdness uncovered by NDT
Date: Thu, 26 May 2005 16:43:26 +0200

I found out something very strange about the new TCP/IP implementation
in Solaris 10 (which is called "FireEngine" and claims numerous
performance improvements).

Here's the "Statistics" output of an NDT run from my SunBlade 2500
workstation in Zurich to our NDT server http://ndt.switch.ch/ in
Geneva. Since the workstation is fast and its network connection
slow, and I have tuned my TCP buffers, the transfers easily fill the
100 Mb/s bottleneck that is the link between the Fast Ethernet switch
next to my office and the 100/1000 switch for our building (that link
has a little cross traffic from a few users as well as multicast
noise):

WEB100 Enabled Statistics:
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client to server) . . . . . 96.24Mb/s
running 10s inbound test (server to client) . . . . . . 94.44Mb/s

------ Client System Details ------
OS data: Name = SunOS, Architecture = sparc, Version = 5.10
Java data: Vendor = Sun Microsystems Inc., Version = 1.5.0-beta

------ Web100 Detailed Analysis ------
100 Mbps FastEthernet link found.
Link set to Half Duplex mode
Information: throughput is limited by other network traffic.
Good network cable(s) found
Normal duplex operation found.

Web100 reports the Round trip time = 51.0 msec; the Packet size = 1448
Bytes; and
There were 341 packets retransmitted, 3026 duplicate acks received, and
3498 SACK blocks received
The connection was idle 0 seconds (0%) of the time
This connection is network limited 99.91% of the time.
Contact your local network administrator to report a network problem

Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: ON
RFC 1323 Window Scaling: ON
Information: Network Middlebox is modifying MSS variable
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End

What caught my eye in these NDT results was the high RTT of 51.0 ms.
The basic RTT between here in Zurich and the server in Geneva (CERN)
is about 4.7 ms. At first I thought the NDT server didn't measure
correctly, but then I found that RTTs with my workstation do indeed
increase *very* significantly while it is running the NDT test. In
fact I measured much higher RTTs using ping (and found that the 51 ms
are probably just normal queueing in the server->client direction - so
"uncovered by NDT" is not quite to the point :-).

Here are the ping times from my Solaris 10 workstation to our NDT
server while I am running the test in the NDT applet:

:
leinen@diotima[leinen];
ping -s -A inet cemp1
PING cemp1: 56 data bytes
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=0. time=4.80 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=1. time=4.77 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=2. time=4.78 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=3. time=4.73 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=4. time=4.72 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=5. time=9.87e+03
ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=6. time=8.87e+03
ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=7. time=7.87e+03
ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=8. time=6.87e+03
ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=9. time=5.87e+03
ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=10.
time=4.87e+03 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=11.
time=3.87e+03 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=12.
time=2.87e+03 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=13.
time=1.87e+03 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=14. time=868. ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=15. time=45.3 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=16. time=49.2 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=17. time=51.4 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=18. time=51.8 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=19. time=52.2 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=20. time=53.1 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=21. time=52.5 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=22. time=29.0 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=23. time=51.2 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=24. time=51.8 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=25. time=16.2 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=26. time=4.83 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=27. time=4.80 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=28. time=4.81 ms
64 bytes from cemp1-eth1.switch.ch (130.59.35.130): icmp_seq=29. time=4.79 ms
^C
----cemp1 PING Statistics----
30 packets transmitted, 30 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 4.72/1807./9.87e+03/3.07e+03

The first five RTTs (seq 0-4) are from before the test starts. This
is the base RTT of 4.7 ms between Zurich and Geneva.

The next ten RTTs (seq 5-14) are from while the first test (client to
server) was running. Note that they were all received only *after*
that part of the test was done, so the maximum delay (seq=5) is almost
ten full seconds! In fact looking at a tcpdump on the workstation and
the NDT server, I could see that the ICMP *requests* were only sent
after the client->server test was done. It looks as if my Sun somehow
queued the entire NDT TCP transfer (about 117 MBytes?) at once, and
the ICMP ECHO requests waited patiently in the queue until it was
empty, or that outgoing TCP traffic is given higher priority than ICMP
traffic.

The next ten RTTs (seq 15-24) are from when the second test (server to
client) ran. When pinging the NDT server from another machine on the
same network segment, I measure a delay increase of 40-50 ms too, so
this is probably due to queueing in the router towards the 100 Mb/s
link.

The last five RTTs (seq 25-29) were after the tests - seq 25 is still
somewhat delayed, the four others show the base RTT again.

When I do the test using DNS over UDP, the impact is similar:

:
leinen@diotima[leinen];
for seq in {0,1,2}{0,1,2,3,4,5,6,7,8,9}; do ( a=`dig @130.59.35.130 a
www.switch.ch. |grep "Query time"` && echo "$seq: $a" & ); sleep 0.999; done
00: ;; Query time: 6 msec
01: ;; Query time: 6 msec
02: ;; Query time: 6 msec
03: ;; Query time: 6 msec
04: ;; Query time: 5 msec
11: ;; Query time: 3624 msec
06: ;; Query time: 3721 msec
09: ;; Query time: 373 msec
12: ;; Query time: 2610 msec
05: ;; Query time: 4745 msec
08: ;; Query time: 1689 msec
13: ;; Query time: 1581 msec
10: ;; Query time: 4648 msec
07: ;; Query time: 2711 msec
14: ;; Query time: 559 msec
15: ;; Query time: 49 msec
16: ;; Query time: 58 msec
17: ;; Query time: 52 msec
18: ;; Query time: 53 msec
19: ;; Query time: 54 msec
20: ;; Query time: 54 msec
21: ;; Query time: 57 msec
22: ;; Query time: 59 msec
23: ;; Query time: 51 msec
24: ;; Query time: 53 msec
25: ;; Query time: 6 msec
26: ;; Query time: 6 msec
27: ;; Query time: 6 msec
28: ;; Query time: 6 msec
29: ;; Query time: 6 msec

So an outbound bulk TCP transfer that fills the path can starve small
ICMP or UDP users on Solaris 10.

It should be noted that this happens only inside IPv4 - when I use
IPv6 for the ICMP or UDP transactions, those have normal reponse
times.
--
Simon.

Solaris 10 TCP/IP weirdness uncovered by NDT, Simon Leinen, 05/26/2005

List archive

Solaris 10 TCP/IP weirdness uncovered by NDT