Skip to Content.
Sympa Menu

ndt-dev - Re: [ndt-dev] Websocket Client - Upload Speed Problem

Subject: NDT-DEV email list created

List archive

Re: [ndt-dev] Websocket Client - Upload Speed Problem


Chronological Thread 
  • From: Richard Carlson <>
  • To: Don Slaunwhite <>, "" <>
  • Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem
  • Date: Fri, 10 Jul 2015 22:33:25 -0400

Don;

I put the data into the attached spread sheet and sorted the variables to do a side-by-side comparison.

First off, the collected data comes from the server-to-client test (download). This means we have some insights but to get more you'll need to turn on packet tracing on the server and look at that data off-line.

I'll try and look at the code later this weekend, I still suspect that there is a 10x math error in the code that is calculating the upload speed.

In any case I do see that the websocket client is not working as well as it should. As noted in the spreadsheet.

The 1st think I notice is that the both clients are setting (or not setting) the TCP options to the same values, except that the flash client is using a slightly small receive window.

The websocket client has a theoretical max speed of 311 Mbps, and achieves 488 Mbps (or 491 Mbps). The theoretical value is calculated based on the packet size, RTT, and loss rate. The server noted a loss rate of 0 for the flash client and .0000235%. While that doesn't seem like a lot, it does make a big difference on 20 msec paths. Since you wouldn't expect the network loss rate to vary between these 2 tests, I would suspect that the client is losing packets. That is, it can't empty the buffer fast enough and packets are dropped.

Next I notice that both clients report the path is an OC-48 (2.4 Gbps) path. The server performs a packet-pair test for every packet it sends and receives. During both test the network is delivering packets at line speed (or nearly so) So I don't suspect a physical path problem.

I also notice that the websocket client reports the PC's receive buffr as limiting the throughput to 'NaN Mbps'. NaN means 'Not a Number' so I suspect a bug in the client code not calculating this value correctly.

Looking at the Web100 variables I see the following.

avgrtt (Average RTT) 23.3 msec for both clients

bw (theoretical max) websocket client is limited by loss

CongAvoid (number of times the congestion window was increased by the
congestion avoidance algorithm) Websocket - 120,998: This says that TCP was in the congestion avoidance state for a long time! In this state the server is increasing the number of packets it sends by 1 every RTT. A very slow process on a 20 msec path and this is another indication of why the throughput is so low.

CWNDpeaks: this is a NDT derived number counting the number of time the congestion window increased and then dropped back down again. This is what gives TCP its classic sawtooth pattern. These are the peaks of the teeth. If you have the snaplog option on, you can generate a congestion window plot and see these peaks. Notice that the flash client has 52 peaks, meaning it is cycling much faster.

loss: again a NDT derived value showing the packet loss rate.

order: another NDT derived value showing how many times packets arrived out of order. The websocket client reported 1.65%. Again this is probably inside the client as it is losing multiple packets.

SndLimTrans{Cwnd, Rwin, Sender}: these are web100 variables that count how many times the server switched from one state to another. Note that the flash client toggled rapidly between Congestion window limited and sender window limited while the websocket client sat in the congestion window state for the majority of the time.

My conclusion is that the websocket client is not processing network packets efficiently. The packets are arriving and the network ring buffer is full so they are getting discarded. The client's TCP stack notices this and informs the server of this and the server takes the appropriate action and cuts the send rate.

What I can't tell if this is due to the client living inside a VM and the virtual NIC just isn't interacting with the client code or if there is some other problem. Have you tried running the clients on real hardware?

That's what I see from the data. Again if you want some c2s data you need to enable packet tracing on the server.

Rich



On 07/10/2015 03:17 PM, Don Slaunwhite wrote:
Hi Rich,

Here is an example of some data from two tests to the Calgary Server. The
first is the using the Websocket Client and the second is using the Flash
client. They were run one right after the other. These results are indicative
of what we are seeing regularly. Websocket Uploads are always much slower.

I am not TCP literate enough to evaluate the results effectively so any help
on analysing the logs would be great.

Thanks,
Don

Websocket Test from VM to Calgary
====================================

Upload 52 Mbps Ping 23 ms Download 488 Mbps

Client System Details
Client version: v1.0.0.0
OS data:: Windows Server 2012, Architecture:x86
Flash Info: Version = WIN 18,0,0,203
The slowest link in the end-to-end path is a 2.4 Gbps OC-48 subnet
Information: Other network traffic is congesting the link
This connection is network limited 94.49% of the time
2.4 Gbps OC-48 link found.
Link set to Full Duplex mode
Information: throughput is limited by other network traffic.
Good network cable(s) found
Normal duplex operation found.
Web100 reports the Round trip time = 23.28ms
the Packet size = 1460bytes
No packet loss - but packets arrived out-of-order 1.65% of the time
Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgement:
ON
RFC 896 Nagle Algorithm:
ON
RFC 3168 Explicit Congestion Notification:
OFF
RFC 1323 Time Stamping:
OFF
RFC 1323 Window Scaling:
ON - Server=8, Client=8
The theoretical network limit is 311.12 Mbps
The NDT server has a 2048.00 KByte buffer which limits the throughput to
1374.33 Mbps
Your PC/Workstation has a 6374.50 KByte buffer which limits the throughput to
NaN Mbps
The network based flow control limits the throughput to 939.09 Mbps
Client Data reports link is OC-48
Client Acks report link is GigE
Server Data reports link is OC-48
Server Acks report link is OC-48
WEB 100 - Detailed Test Results

AckPktsIn : 130975
AckPktsOut : 0
aspd : 0.00000
avgrtt : 23.28
bad_cable : 0
bw : 311.12
BytesRetrans : 0
c2sAck : 7
c2sData : 8
c2sRate : 51541
ClientToServerSpeed : 51.541
CongAvoid : 120998
congestion : 1
CongestionOverCount : 0
CongestionSignals : 1
CountRTT : 128569
CurCwnd : 1582640
CurMSS : 1460
CurRTO : 223
CurRwinRcvd : 3785728
CurRwinSent : 7040
CurSsthresh : 1432260
cwin : 21.8657
CWND-Limited : -nan
CWNDpeaks : 6
cwndtime : 0.9449
DataBytesIn : 573
DataBytesOut : 625955378
DataPktsIn : 1
DataPktsOut : 422956
DSACKDups : 0
DupAcksIn : 2159
DupAcksOut : 0
Duration : 10183018
ECNEnabled : 0
FastRetran : 0
half_duplex : 0
Jitter : 51
link : 0
loss : 0.000002364
MaxCwnd : 2865980
maxCWNDpeak : 2865980
MaxMSS : 1460
MaxRTO : 274
MaxRTT : 74
MaxRwinRcvd : 6527488
MaxRwinSent : 7040
MaxSsthresh : 1432260
minCWNDpeak : 627800
MinMSS : 1460
MinRTO : 223
MinRTT : 23
MinRwinRcvd : 0
MinRwinSent : 5840
mismatch : 0
NagleEnabled : 1
order : 0.0165
OtherReductions : 5
PktsIn : 130976
PktsOut : 422956
PktsRetrans : 0
RcvWinScale : 7
rttsec : 0.023284
rwin : 49.8008
rwintime : 0.0221
s2cAck : 8
s2cData : 8
s2cRate : 488203.7666613683
SACKEnabled : 3
SACKsRcvd : 35
SampleRTT : 23
SendStall : 1
sendtime : 0.0330
ServerToClientSpeed : 488.2037666613683
SlowStart : 2512
SmoothedRTT : 23
Sndbuf : 4194304
SndLimBytesCwnd : 608691440
SndLimBytesRwin : 586080
SndLimBytesSender : 16677858
SndLimTimeCwnd : 9620260
SndLimTimeRwin : 224548
SndLimTimeSender : 335983
SndLimTransCwnd : 70
SndLimTransRwin : 10
SndLimTransSender : 65
SndWinScale : 8
spd : 491.87
StartTimeUsec : 822769
SubsequentTimeouts : 0
SumRTT : 2993620
swin : 32.0000
Timeouts : 0
timesec : 10.00
TimestampsEnabled : 0
waitsec : 0.00
WinScaleRcvd : 8
WinScaleSent : 7
X_Rcvbuf : 87380
X_Sndbuf : 4194304


Flash Test from VM to Calgary
====================================

Upload 711 Mbps Ping 21 ms Download 776 Mbps

Client System Details
Client version: v1.0.0.0
OS data:: Windows Server 2012, Architecture: x86
Flash Info: Version = WIN 18,0,0,203
The slowest link in the end-to-end path is a
2.4 Gbps OC-48 subnet
Information: Other network traffic is congesting the link
This connection is sender limited 82.56% of the time
This connection is network limited 17.02% of the time
2.4 Gbps OC-48 link found.
Link set to Full Duplex mode
Information: throughput is limited by other network traffic.
Good network cable(s) found
Normal duplex operation found.
Web100 reports the Round trip time = 23.3ms
the Packet size = 1460bytes
No packet loss - but packets arrived out-of-order 0.50% of the time
C2S throughput test: Packet queuing detected: 0.28%
S2C throughput test: Packet queuing detected: -2.41%
Web100 reports TCP negotiated the optional Performance Settings to:
RFC 2018 Selective Acknowledgement:
ON
RFC 896 Nagle Algorithm:
ON
RFC 3168 Explicit Congestion Notification:
OFF
RFC 1323 Time Stamping:
OFF
RFC 1323 Window Scaling:
ON; Scaling Factors - Server=8, Client=7
The theoretical network limit is 47808.50 Mbps
The NDT server has a 2048.00 KByte buffer which limits the throughput to
1373.45 Mbps
Your PC/Workstation has a 8272.25 KByte buffer which limits the throughput to
2773.81 Mbps
The network based flow control limits the throughput to 1033.62 Mbps
Client Data reports link is OC-48
Client Acks report link is OC-48
Server Data reports link is OC-48
Server Acks report link is 10 Gig
WEB 100 - Detailed Test Results

Timeouts : 0
waitsec : 0
PktsRetrans : 0
timesec : 10
SndLimBytesSender : 882965584
CongestionOverCount : 0
link : 100
MinRwinRcvd : 64768
DupAcksIn : 957
rwintime : 0.0042
SubsequentTimeouts : 0
MaxRwinRcvd : 8470784
sendtime : 0.8256
MinRwinSent : 5840
MaxRwinSent : 5888
cwndtime : 0.1702
Sndbuf : 4194304
rttsec : 0.023299
CongAvoid : 0
X_Sndbuf : 4194304
rwin : 64.627
OtherReductions : 105
DataPktsOut : 680854
swin : 32
minCWNDpeak : 1874640
FastRetran : 0
cwin : 24.0823
X_Rcvbuf : 87380
AckPktsOut : 0
spd : 790.47
DupAcksOut : 0
SACKsRcvd : 63
order : 0.005
MaxMSS : 1460
CurCwnd : 2823640
PktsIn : 189616
CWNDpeaks : 52
MaxCwnd : 3156520
maxCWNDpeak : 3156520
SmoothedRTT : 23
SndLimTimeRwin : 42777
StartTimeUsec : 686140
SndLimTimeCwnd : 1735558
SndLimTimeSender : 8416704
Duration : 10195551
DataBytesOut : 1007364024
AckPktsIn : 189616
SendStall : 0
SndLimTransRwin : 1
SlowStart : 21762
SndLimTransCwnd : 1680
aspd : 0
SndLimTransSender : 1681
DataPktsIn : 0
MaxSsthresh : 0
CurRTO : 223
MaxRTO : 231
SampleRTT : 23
DataBytesIn : 0
MinRTO : 221
CurSsthresh : 2147483647
MinRTT : 21
MaxRTT : 43
DSACKDups : 0
CurRwinRcvd : 5015040
SndWinScale : 8
MinMSS : 1460
c2sData : 8
CurRwinSent : 5888
c2sAck : 8
s2cData : 8
s2cAck : 9
PktsOut : 680854
ECNEnabled : 0
mismatch : 0
NagleEnabled : 1
congestion : 1
SACKEnabled : 3
bad_cable : 0
TimestampsEnabled : 0
half_duplex : 0
SndLimBytesRwin : 79920
CongestionSignals : 0
BytesRetrans : 0
RcvWinScale : 7
SndLimBytesCwnd : 124318520
bw : 47808.5
WinScaleRcvd : 8
CountRTT : 188334
loss : 0
WinScaleSent : 7
CurMSS : 1460
avgrtt : 23.3
SumRTT : 4387999






-----Original Message-----
From:


[mailto:]
On Behalf Of Richard Carlson
Sent: July-08-15 8:28 PM
To:

Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem

Don;

Forgive me for sounding like a broken record on this topic, but the NDT
(Network Diagnostic Tool) system was specifically designed to go beyond
merely reporting up/down load speeds. The server captures dozens of TCP
variables and analyzes them to identify why a given result was posted.

You don't need to guess if there were packet retransmissions, the NDT server
tells you that. You don't need to guess if the delay is high, low, or
varying, just look at the test results. You don't need to guess if the
client's configuration is limiting throughput, just look at the test results.

If the results given at the end of the test aren't enough, then turn on more
diagnostics, capture the packet trace, get TCP variables at 5 msec intervals
for the 10 sec test. Use the data that's being collected, and reported, to
better understand what is going on.

If you need help reading the test results, post a question to the ndt-dev
email list. We'll be glad to help walk you through the results.

What I would start looking at is,

What is the TCP window size set to? The 100% increase in speed between
Calgary and Montreal probably means the receive window is the limited factor.
(assuming the RTT to Montreal is 2x the RTT to Calgary). Then look at the
time spend in the various states. Sitting in the send or
receive states indicates a host resource issue. Look at the number of
packets sent/ack's received.

Given that the Websocket Upload numbers are 10x lower than the others, I'd
wonder if there isn't a math error in the code somewhere and you are simply
seeing a reporting error, not a testing error. Looking at the packet traces
and other diagnostic data will quickly point out that type of code bug.

Rich (NDT grandfather).

On 07/08/2015 01:38 PM, Don Slaunwhite wrote:
Hi Jordan,

It seems we may have multiple email threads going on about this issue. I'll
respond with details on the other thread. 8) But to help clarify for the
people here.

We were running a Windows 2012 Server as the Amazon VM. At first we chose one with
"Low" network bandwidth, but we also created another with "High" network
bandwidth. On those Windows machines we have testing software which runs our IPT test through the
Chrome browser. It basically just goes flash then websocket etc. The times between runs is
roughly 30 seconds and we let it run overnight.

We did see slower values coming from the Toronto site, so we ran the same
tests to Calgary and Montreal. Those sites gave us much better results (in
both Flash and Websocket) but still Flash was significantly faster. I'm not
sure if it is our implementation of the Flash/websockets client or what.

For instance in Montreal

Flash Upload - 304Mbps
Flash Download - 220Mbps
Websocket Upload - 45Mbps
Webscoket Download - 230Mbps

And in Calgary

Flash Upload - 616Mbps
Flash Download - 542Mbps
Websocket Upload - 44Mbps
Webscoket Download - 472Mbps

So even on other servers we are definitely seeing a degradation of Websocket
upload. Now perhaps it truly is something with the VM. But even then it seems
a bit odd. We have no firewall/anti-virus on. What sort of things have you
seen gone wrong with VM testing?

We did do some external real life testing via our employee's home
systems (not through VPN etc) and we still saw slower Websocket
speeds. (But not anything of this magnitude of difference.)

For example:

(about 200 tests from around Ottawa)

20.8 Mbps - Flash Chrome Download Avg
3.034 Mpbs - Flash Chrome Upload Avg
21.2 Mpbs - Websocket Chrome Download Avg
2.56 Mpbs - Websocket Chrome Upload Avg

So not as big a difference, but still large enough to wonder. Then combined
with the VM testing above.

Do you have a dataset from your testing that we could look at?
Honestly if you have a good sample size of data that we can feel
comfortable with, then we can start looking at what exactly on our
implementation is going wrong. (if it is indeed our implementation.)

Thanks,
Don


-----Original Message-----
From: Jordan McCarthy
[mailto:]
Sent: July-07-15 3:53 PM
To: Don Slaunwhite
Cc:
;


Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi everybody,
We've also been monitoring the performance characteristics of the
WebSockets client closely, both before and after the client's official
publication in NDT 3.7.0, and we haven't been able to reproduce the disparity
that CIRA has encountered. We've run Websocket-based tests from several
different browser combinations on a variety of operating systems and
consumer-grade connections, during various times of the day, and haven't
encountered any appreciable differences (within any given
machine/connection/time of day combination). Additionally, for the sake of
thoroughness we've run C-client tests from the same connections, and the
numbers we got from the C client runs were pretty comparable with what we
were getting out of all of the Websockets tests.

Don: could you tell us a little bit more about your testing methodology? I'm
guessing you spun up a Linux VM, and used X-forwarding to get access to an
instance of the browser running on the VM?

Off the top of my head that sounds reasonable, but we've definitely seen
weird artifacts introduced by running tests out of VM environments, so
perhaps that could be throwing things off somewhat.

Jordan

Jordan McCarthy
Open Technology Institute @ New America Public Key: 0xC08D8042 | 4A61
3D39 4125 127D 65EA DDC2 BFBD A2E9 C08D 80
42

On 07/06/2015 02:45 PM, Don Slaunwhite wrote:
Hi Everyone,



My name is Don Slaunwhite and I’m a Product Manager at CIRA. We have
been utilizing the NDT tests as part of our Internet Performance Test
up here in Canada.



We have been working on transitioning to the Websocket client with
our test, but we have been seeing some very different results in
upload speeds as compared to the flash client.



We did a lot of internal/external testing and in every case the
upload speeds for the websocket version were lower (most times
significantly) than our current flash client. The download speeds are
comparable, with websocket usually coming in a bit faster



For example we setup a VM at Amazon to run some (hopefully!)
controlled tests. Using Chrome and Firefox.



Chrome Averages based on ~200 tests

Flash 19.3Mpbs Upload

Flash 49.8Mpbs Download

Websocket 9.3Mpbs Upload

Websocket 54.3Mpbs Download



Firefox Averages based on ~300 tests

Flash 27.4 Mpbs Upload

Flash 50.1 Mpbs Download

Websocket 11.1 Mpbs Upload

Websocket 57.2 Mpbs Download



In each case the websocket upload is significantly lower. I’m trying
to determine if this is expected behaviour with the websocket code.
If not what possible items might be causing this type of speed
degradation.



We are running with client versions 3.7.0 (Flash has a buffer size of
32K) against mlab servers in Toronto.



I realize there will be new functionality/capability with the
multiple stream releases, but right now I’d like to try and focus on
one major change at a time, so any ideas on speed differences between
Flash and Websocket using just 3.7.0 would be really helpful.



Thanks,

Don



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBCgAGBQJVnC4sAAoJEL+9ounAjYBCfVgH/2q3PGodloBkPZoa6dW5nTmx
pLRAitSZwD8DS12VP2Wdy9zWNhmDExJuCVtRVQo9jF+ZwPqghh7U+ZpGRqWvFYdq
XOUYxwUzRlN4fkVF43k+huGdrfGrG5Guz+zkkiVKAD/4Z1vLB6tknVUFyo5gOXs5
WcchPM8Hi/8V1x4i+nVY+FiwiVqJBDqG2EJXDPqMP/G60kguJGra2PhlljNl7j8t
sM0X+jyzQQzuUTruBHvQFES0TDPtS+AO07eft2JWUqdt6PcPYQt1NcBn8WJ+b/Ks
JF6KKBlG+vm0pJt7nuCflIgXDMe7CW885WhMf+rMGC5GByDa+rzxATHCS9TZANE=
=Ai6W
-----END PGP SIGNATURE-----

Attachment: canada-ndt.xls
Description: MS-Excel spreadsheet




Archive powered by MHonArc 2.6.16.

Top of Page