Skip to Content.
Sympa Menu

ndt-dev - RE: [ndt-dev] Websocket Client - Upload Speed Problem

Subject: NDT-DEV email list created

List archive

RE: [ndt-dev] Websocket Client - Upload Speed Problem


Chronological Thread 
  • From: Don Slaunwhite <>
  • To: Richard Carlson <>, "" <>
  • Subject: RE: [ndt-dev] Websocket Client - Upload Speed Problem
  • Date: Mon, 13 Jul 2015 13:32:49 +0000
  • Accept-language: en-CA, en-US

Hi Rich,

That's super info. I did notice the congestion avoidance number but only
because Flash was 0 and websocket was so high. I really didn't get what it
meant, so thank you very much for the great breakdown.

We have run tests on external real hardware (employees home computers) and we
have seen a slight speed degradation from the flash. 3.034 Mpbs - Flash
Upload Avg versus 2.56 Mpbs - Websocket Upload Avg based on 200 tests.

But honestly my thoughts there were that given the low upload bandwidth that
most people have a home, the variance isn't as large as on a high bandwidth
pipe (ie the tests from the VM). This was one of the reasons we were trying
to get a VM that had high bandwidth so we could test this scenario.

OK so in order to figure this out.

1. Could anyone on this list with a high bandwidth real hardware connection
try running a test from here

Flash - http://dev-soi-web-01.cira.ca/
Websocket - http://dev-soi-web-01.cira.ca/websocket/?testtype=websocket

and let us know what your Montreal/Calgary results are. We may or may not
have another issue with connectivity to Toronto so I'd like to leave it out
of the testing at this point.

2. Rich if you are able to take a look at the websocket code and let us know
if there is anything that you see, which may be problematic.

3. We (CIRA) don't have direct access to the Mlab server so I'll see about
getting the c2s data capture turned on.

So this brings up one last related question that I've always wondered and
hopefully it's not too silly of a question. Why by default would NDT only
capture and show S2C data and not the C2S as well?

Thanks,
Don



-----Original Message-----
From: Richard Carlson
[mailto:]

Sent: July-10-15 10:33 PM
To: Don Slaunwhite;

Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem

Don;

I put the data into the attached spread sheet and sorted the variables to do
a side-by-side comparison.

First off, the collected data comes from the server-to-client test
(download). This means we have some insights but to get more you'll need to
turn on packet tracing on the server and look at that data off-line.

I'll try and look at the code later this weekend, I still suspect that there
is a 10x math error in the code that is calculating the upload speed.

In any case I do see that the websocket client is not working as well as it
should. As noted in the spreadsheet.

The 1st think I notice is that the both clients are setting (or not
setting) the TCP options to the same values, except that the flash client is
using a slightly small receive window.

The websocket client has a theoretical max speed of 311 Mbps, and achieves
488 Mbps (or 491 Mbps). The theoretical value is calculated based on the
packet size, RTT, and loss rate. The server noted a loss rate of 0 for the
flash client and .0000235%. While that doesn't seem like a lot, it does make
a big difference on 20 msec paths. Since you wouldn't expect the network
loss rate to vary between these 2 tests, I would suspect that the client is
losing packets. That is, it can't empty the buffer fast enough and packets
are dropped.

Next I notice that both clients report the path is an OC-48 (2.4 Gbps) path.
The server performs a packet-pair test for every packet it sends and
receives. During both test the network is delivering packets at line speed
(or nearly so) So I don't suspect a physical path problem.

I also notice that the websocket client reports the PC's receive buffr as
limiting the throughput to 'NaN Mbps'. NaN means 'Not a Number' so I suspect
a bug in the client code not calculating this value correctly.

Looking at the Web100 variables I see the following.

avgrtt (Average RTT) 23.3 msec for both clients

bw (theoretical max) websocket client is limited by loss

CongAvoid (number of times the congestion window was increased by the
congestion avoidance algorithm) Websocket - 120,998: This says that TCP was
in the congestion avoidance state for a long time! In this state the server
is increasing the number of packets it sends by 1 every RTT. A very slow
process on a 20 msec path and this is another indication of why the
throughput is so low.

CWNDpeaks: this is a NDT derived number counting the number of time the
congestion window increased and then dropped back down again. This is what
gives TCP its classic sawtooth pattern. These are the peaks of the teeth.
If you have the snaplog option on, you can generate a congestion window plot
and see these peaks. Notice that the flash client has 52 peaks, meaning it
is cycling much faster.

loss: again a NDT derived value showing the packet loss rate.

order: another NDT derived value showing how many times packets arrived out
of order. The websocket client reported 1.65%. Again this is probably
inside the client as it is losing multiple packets.

SndLimTrans{Cwnd, Rwin, Sender}: these are web100 variables that count how
many times the server switched from one state to another. Note that the
flash client toggled rapidly between Congestion window limited and sender
window limited while the websocket client sat in the congestion window state
for the majority of the time.

My conclusion is that the websocket client is not processing network packets
efficiently. The packets are arriving and the network ring buffer is full so
they are getting discarded. The client's TCP stack notices this and informs
the server of this and the server takes the appropriate action and cuts the
send rate.

What I can't tell if this is due to the client living inside a VM and the
virtual NIC just isn't interacting with the client code or if there is some
other problem. Have you tried running the clients on real hardware?

That's what I see from the data. Again if you want some c2s data you need to
enable packet tracing on the server.

Rich



On 07/10/2015 03:17 PM, Don Slaunwhite wrote:
> Hi Rich,
>
> Here is an example of some data from two tests to the Calgary Server. The
> first is the using the Websocket Client and the second is using the Flash
> client. They were run one right after the other. These results are
> indicative of what we are seeing regularly. Websocket Uploads are always
> much slower.
>
> I am not TCP literate enough to evaluate the results effectively so any
> help on analysing the logs would be great.
>
> Thanks,
> Don
>
> Websocket Test from VM to Calgary
> ====================================
>
> Upload 52 Mbps Ping 23 ms Download 488 Mbps
>
> Client System Details
> Client version: v1.0.0.0
> OS data:: Windows Server 2012, Architecture:x86 Flash Info: Version =
> WIN 18,0,0,203 The slowest link in the end-to-end path is a 2.4 Gbps
> OC-48 subnet
> Information: Other network traffic is congesting the link This
> connection is network limited 94.49% of the time
> 2.4 Gbps OC-48 link found.
> Link set to Full Duplex mode
> Information: throughput is limited by other network traffic.
> Good network cable(s) found
> Normal duplex operation found.
> Web100 reports the Round trip time = 23.28ms the Packet size =
> 1460bytes No packet loss - but packets arrived out-of-order 1.65% of
> the time
> Web100 reports TCP negotiated the optional Performance Settings to:
> RFC 2018 Selective Acknowledgement:
> ON
> RFC 896 Nagle Algorithm:
> ON
> RFC 3168 Explicit Congestion Notification:
> OFF
> RFC 1323 Time Stamping:
> OFF
> RFC 1323 Window Scaling:
> ON - Server=8, Client=8
> The theoretical network limit is 311.12 Mbps The NDT server has a
> 2048.00 KByte buffer which limits the throughput to 1374.33 Mbps Your
> PC/Workstation has a 6374.50 KByte buffer which limits the throughput
> to NaN Mbps The network based flow control limits the throughput to
> 939.09 Mbps Client Data reports link is OC-48 Client Acks report link
> is GigE Server Data reports link is OC-48 Server Acks report link is
> OC-48 WEB 100 - Detailed Test Results
>
> AckPktsIn : 130975
> AckPktsOut : 0
> aspd : 0.00000
> avgrtt : 23.28
> bad_cable : 0
> bw : 311.12
> BytesRetrans : 0
> c2sAck : 7
> c2sData : 8
> c2sRate : 51541
> ClientToServerSpeed : 51.541
> CongAvoid : 120998
> congestion : 1
> CongestionOverCount : 0
> CongestionSignals : 1
> CountRTT : 128569
> CurCwnd : 1582640
> CurMSS : 1460
> CurRTO : 223
> CurRwinRcvd : 3785728
> CurRwinSent : 7040
> CurSsthresh : 1432260
> cwin : 21.8657
> CWND-Limited : -nan
> CWNDpeaks : 6
> cwndtime : 0.9449
> DataBytesIn : 573
> DataBytesOut : 625955378
> DataPktsIn : 1
> DataPktsOut : 422956
> DSACKDups : 0
> DupAcksIn : 2159
> DupAcksOut : 0
> Duration : 10183018
> ECNEnabled : 0
> FastRetran : 0
> half_duplex : 0
> Jitter : 51
> link : 0
> loss : 0.000002364
> MaxCwnd : 2865980
> maxCWNDpeak : 2865980
> MaxMSS : 1460
> MaxRTO : 274
> MaxRTT : 74
> MaxRwinRcvd : 6527488
> MaxRwinSent : 7040
> MaxSsthresh : 1432260
> minCWNDpeak : 627800
> MinMSS : 1460
> MinRTO : 223
> MinRTT : 23
> MinRwinRcvd : 0
> MinRwinSent : 5840
> mismatch : 0
> NagleEnabled : 1
> order : 0.0165
> OtherReductions : 5
> PktsIn : 130976
> PktsOut : 422956
> PktsRetrans : 0
> RcvWinScale : 7
> rttsec : 0.023284
> rwin : 49.8008
> rwintime : 0.0221
> s2cAck : 8
> s2cData : 8
> s2cRate : 488203.7666613683
> SACKEnabled : 3
> SACKsRcvd : 35
> SampleRTT : 23
> SendStall : 1
> sendtime : 0.0330
> ServerToClientSpeed : 488.2037666613683 SlowStart : 2512 SmoothedRTT :
> 23 Sndbuf : 4194304 SndLimBytesCwnd : 608691440 SndLimBytesRwin :
> 586080 SndLimBytesSender : 16677858 SndLimTimeCwnd : 9620260
> SndLimTimeRwin : 224548 SndLimTimeSender : 335983 SndLimTransCwnd : 70
> SndLimTransRwin : 10 SndLimTransSender : 65 SndWinScale : 8 spd :
> 491.87 StartTimeUsec : 822769 SubsequentTimeouts : 0 SumRTT : 2993620
> swin : 32.0000 Timeouts : 0 timesec : 10.00 TimestampsEnabled : 0
> waitsec : 0.00 WinScaleRcvd : 8 WinScaleSent : 7 X_Rcvbuf : 87380
> X_Sndbuf : 4194304
>
>
> Flash Test from VM to Calgary
> ====================================
>
> Upload 711 Mbps Ping 21 ms Download 776 Mbps
>
> Client System Details
> Client version: v1.0.0.0
> OS data:: Windows Server 2012, Architecture: x86 Flash Info: Version =
> WIN 18,0,0,203 The slowest link in the end-to-end path is a
> 2.4 Gbps OC-48 subnet
> Information: Other network traffic is congesting the link This
> connection is sender limited 82.56% of the time This connection is
> network limited 17.02% of the time
> 2.4 Gbps OC-48 link found.
> Link set to Full Duplex mode
> Information: throughput is limited by other network traffic.
> Good network cable(s) found
> Normal duplex operation found.
> Web100 reports the Round trip time = 23.3ms the Packet size =
> 1460bytes No packet loss - but packets arrived out-of-order 0.50% of
> the time C2S throughput test: Packet queuing detected: 0.28% S2C
> throughput test: Packet queuing detected: -2.41%
> Web100 reports TCP negotiated the optional Performance Settings to:
> RFC 2018 Selective Acknowledgement:
> ON
> RFC 896 Nagle Algorithm:
> ON
> RFC 3168 Explicit Congestion Notification:
> OFF
> RFC 1323 Time Stamping:
> OFF
> RFC 1323 Window Scaling:
> ON; Scaling Factors - Server=8, Client=7 The theoretical network limit
> is 47808.50 Mbps The NDT server has a 2048.00 KByte buffer which
> limits the throughput to 1373.45 Mbps Your PC/Workstation has a
> 8272.25 KByte buffer which limits the throughput to 2773.81 Mbps The
> network based flow control limits the throughput to 1033.62 Mbps
> Client Data reports link is OC-48 Client Acks report link is OC-48
> Server Data reports link is OC-48 Server Acks report link is 10 Gig
> WEB 100 - Detailed Test Results
>
> Timeouts : 0
> waitsec : 0
> PktsRetrans : 0
> timesec : 10
> SndLimBytesSender : 882965584
> CongestionOverCount : 0
> link : 100
> MinRwinRcvd : 64768
> DupAcksIn : 957
> rwintime : 0.0042
> SubsequentTimeouts : 0
> MaxRwinRcvd : 8470784
> sendtime : 0.8256
> MinRwinSent : 5840
> MaxRwinSent : 5888
> cwndtime : 0.1702
> Sndbuf : 4194304
> rttsec : 0.023299
> CongAvoid : 0
> X_Sndbuf : 4194304
> rwin : 64.627
> OtherReductions : 105
> DataPktsOut : 680854
> swin : 32
> minCWNDpeak : 1874640
> FastRetran : 0
> cwin : 24.0823
> X_Rcvbuf : 87380
> AckPktsOut : 0
> spd : 790.47
> DupAcksOut : 0
> SACKsRcvd : 63
> order : 0.005
> MaxMSS : 1460
> CurCwnd : 2823640
> PktsIn : 189616
> CWNDpeaks : 52
> MaxCwnd : 3156520
> maxCWNDpeak : 3156520
> SmoothedRTT : 23
> SndLimTimeRwin : 42777
> StartTimeUsec : 686140
> SndLimTimeCwnd : 1735558
> SndLimTimeSender : 8416704
> Duration : 10195551
> DataBytesOut : 1007364024
> AckPktsIn : 189616
> SendStall : 0
> SndLimTransRwin : 1
> SlowStart : 21762
> SndLimTransCwnd : 1680
> aspd : 0
> SndLimTransSender : 1681
> DataPktsIn : 0
> MaxSsthresh : 0
> CurRTO : 223
> MaxRTO : 231
> SampleRTT : 23
> DataBytesIn : 0
> MinRTO : 221
> CurSsthresh : 2147483647
> MinRTT : 21
> MaxRTT : 43
> DSACKDups : 0
> CurRwinRcvd : 5015040
> SndWinScale : 8
> MinMSS : 1460
> c2sData : 8
> CurRwinSent : 5888
> c2sAck : 8
> s2cData : 8
> s2cAck : 9
> PktsOut : 680854
> ECNEnabled : 0
> mismatch : 0
> NagleEnabled : 1
> congestion : 1
> SACKEnabled : 3
> bad_cable : 0
> TimestampsEnabled : 0
> half_duplex : 0
> SndLimBytesRwin : 79920
> CongestionSignals : 0
> BytesRetrans : 0
> RcvWinScale : 7
> SndLimBytesCwnd : 124318520
> bw : 47808.5
> WinScaleRcvd : 8
> CountRTT : 188334
> loss : 0
> WinScaleSent : 7
> CurMSS : 1460
> avgrtt : 23.3
> SumRTT : 4387999
>
>
>
>
>
>
> -----Original Message-----
> From:
>
>
> [mailto:]
> On Behalf Of Richard Carlson
> Sent: July-08-15 8:28 PM
> To:
>
> Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem
>
> Don;
>
> Forgive me for sounding like a broken record on this topic, but the NDT
> (Network Diagnostic Tool) system was specifically designed to go beyond
> merely reporting up/down load speeds. The server captures dozens of TCP
> variables and analyzes them to identify why a given result was posted.
>
> You don't need to guess if there were packet retransmissions, the NDT
> server tells you that. You don't need to guess if the delay is high, low,
> or varying, just look at the test results. You don't need to guess if the
> client's configuration is limiting throughput, just look at the test
> results.
>
> If the results given at the end of the test aren't enough, then turn on
> more diagnostics, capture the packet trace, get TCP variables at 5 msec
> intervals for the 10 sec test. Use the data that's being collected, and
> reported, to better understand what is going on.
>
> If you need help reading the test results, post a question to the ndt-dev
> email list. We'll be glad to help walk you through the results.
>
> What I would start looking at is,
>
> What is the TCP window size set to? The 100% increase in speed between
> Calgary and Montreal probably means the receive window is the limited
> factor. (assuming the RTT to Montreal is 2x the RTT to Calgary). Then look
> at the time spend in the various states. Sitting in the send or
> receive states indicates a host resource issue. Look at the number of
> packets sent/ack's received.
>
> Given that the Websocket Upload numbers are 10x lower than the others, I'd
> wonder if there isn't a math error in the code somewhere and you are simply
> seeing a reporting error, not a testing error. Looking at the packet
> traces and other diagnostic data will quickly point out that type of code
> bug.
>
> Rich (NDT grandfather).
>
> On 07/08/2015 01:38 PM, Don Slaunwhite wrote:
>> Hi Jordan,
>>
>> It seems we may have multiple email threads going on about this issue.
>> I'll respond with details on the other thread. 8) But to help clarify for
>> the people here.
>>
>> We were running a Windows 2012 Server as the Amazon VM. At first we chose
>> one with "Low" network bandwidth, but we also created another with "High"
>> network bandwidth. On those Windows machines we have testing software
>> which runs our IPT test through the Chrome browser. It basically just goes
>> flash then websocket etc. The times between runs is roughly 30 seconds and
>> we let it run overnight.
>>
>> We did see slower values coming from the Toronto site, so we ran the same
>> tests to Calgary and Montreal. Those sites gave us much better results (in
>> both Flash and Websocket) but still Flash was significantly faster. I'm
>> not sure if it is our implementation of the Flash/websockets client or
>> what.
>>
>> For instance in Montreal
>>
>> Flash Upload - 304Mbps
>> Flash Download - 220Mbps
>> Websocket Upload - 45Mbps
>> Webscoket Download - 230Mbps
>>
>> And in Calgary
>>
>> Flash Upload - 616Mbps
>> Flash Download - 542Mbps
>> Websocket Upload - 44Mbps
>> Webscoket Download - 472Mbps
>>
>> So even on other servers we are definitely seeing a degradation of
>> Websocket upload. Now perhaps it truly is something with the VM. But even
>> then it seems a bit odd. We have no firewall/anti-virus on. What sort of
>> things have you seen gone wrong with VM testing?
>>
>> We did do some external real life testing via our employee's home
>> systems (not through VPN etc) and we still saw slower Websocket
>> speeds. (But not anything of this magnitude of difference.)
>>
>> For example:
>>
>> (about 200 tests from around Ottawa)
>>
>> 20.8 Mbps - Flash Chrome Download Avg
>> 3.034 Mpbs - Flash Chrome Upload Avg
>> 21.2 Mpbs - Websocket Chrome Download Avg
>> 2.56 Mpbs - Websocket Chrome Upload Avg
>>
>> So not as big a difference, but still large enough to wonder. Then
>> combined with the VM testing above.
>>
>> Do you have a dataset from your testing that we could look at?
>> Honestly if you have a good sample size of data that we can feel
>> comfortable with, then we can start looking at what exactly on our
>> implementation is going wrong. (if it is indeed our implementation.)
>>
>> Thanks,
>> Don
>>
>>
>> -----Original Message-----
>> From: Jordan McCarthy
>> [mailto:]
>> Sent: July-07-15 3:53 PM
>> To: Don Slaunwhite
>> Cc:
>> ;
>>
>>
>> Subject: Re: [ndt-dev] Websocket Client - Upload Speed Problem
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA512
>>
>> Hi everybody,
>> We've also been monitoring the performance characteristics of
>> the WebSockets client closely, both before and after the client's official
>> publication in NDT 3.7.0, and we haven't been able to reproduce the
>> disparity that CIRA has encountered. We've run Websocket-based tests from
>> several different browser combinations on a variety of operating systems
>> and consumer-grade connections, during various times of the day, and
>> haven't encountered any appreciable differences (within any given
>> machine/connection/time of day combination). Additionally, for the sake
>> of thoroughness we've run C-client tests from the same connections, and
>> the numbers we got from the C client runs were pretty comparable with what
>> we were getting out of all of the Websockets tests.
>>
>> Don: could you tell us a little bit more about your testing methodology?
>> I'm guessing you spun up a Linux VM, and used X-forwarding to get access
>> to an instance of the browser running on the VM?
>>
>> Off the top of my head that sounds reasonable, but we've definitely seen
>> weird artifacts introduced by running tests out of VM environments, so
>> perhaps that could be throwing things off somewhat.
>>
>> Jordan
>>
>> Jordan McCarthy
>> Open Technology Institute @ New America Public Key: 0xC08D8042 | 4A61
>> 3D39 4125 127D 65EA DDC2 BFBD A2E9 C08D 80
>> 42
>>
>> On 07/06/2015 02:45 PM, Don Slaunwhite wrote:
>>> Hi Everyone,
>>>
>>>
>>>
>>> My name is Don Slaunwhite and I’m a Product Manager at CIRA. We have
>>> been utilizing the NDT tests as part of our Internet Performance
>>> Test up here in Canada.
>>>
>>>
>>>
>>> We have been working on transitioning to the Websocket client with
>>> our test, but we have been seeing some very different results in
>>> upload speeds as compared to the flash client.
>>>
>>>
>>>
>>> We did a lot of internal/external testing and in every case the
>>> upload speeds for the websocket version were lower (most times
>>> significantly) than our current flash client. The download speeds
>>> are comparable, with websocket usually coming in a bit faster
>>>
>>>
>>>
>>> For example we setup a VM at Amazon to run some (hopefully!)
>>> controlled tests. Using Chrome and Firefox.
>>>
>>>
>>>
>>> Chrome Averages based on ~200 tests
>>>
>>> Flash 19.3Mpbs Upload
>>>
>>> Flash 49.8Mpbs Download
>>>
>>> Websocket 9.3Mpbs Upload
>>>
>>> Websocket 54.3Mpbs Download
>>>
>>>
>>>
>>> Firefox Averages based on ~300 tests
>>>
>>> Flash 27.4 Mpbs Upload
>>>
>>> Flash 50.1 Mpbs Download
>>>
>>> Websocket 11.1 Mpbs Upload
>>>
>>> Websocket 57.2 Mpbs Download
>>>
>>>
>>>
>>> In each case the websocket upload is significantly lower. I’m trying
>>> to determine if this is expected behaviour with the websocket code.
>>> If not what possible items might be causing this type of speed
>>> degradation.
>>>
>>>
>>>
>>> We are running with client versions 3.7.0 (Flash has a buffer size
>>> of
>>> 32K) against mlab servers in Toronto.
>>>
>>>
>>>
>>> I realize there will be new functionality/capability with the
>>> multiple stream releases, but right now I’d like to try and focus on
>>> one major change at a time, so any ideas on speed differences
>>> between Flash and Websocket using just 3.7.0 would be really helpful.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Don
>>>
>>>
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>>
>> iQEcBAEBCgAGBQJVnC4sAAoJEL+9ounAjYBCfVgH/2q3PGodloBkPZoa6dW5nTmx
>> pLRAitSZwD8DS12VP2Wdy9zWNhmDExJuCVtRVQo9jF+ZwPqghh7U+ZpGRqWvFYdq
>> XOUYxwUzRlN4fkVF43k+huGdrfGrG5Guz+zkkiVKAD/4Z1vLB6tknVUFyo5gOXs5
>> WcchPM8Hi/8V1x4i+nVY+FiwiVqJBDqG2EJXDPqMP/G60kguJGra2PhlljNl7j8t
>> sM0X+jyzQQzuUTruBHvQFES0TDPtS+AO07eft2JWUqdt6PcPYQt1NcBn8WJ+b/Ks
>> JF6KKBlG+vm0pJt7nuCflIgXDMe7CW885WhMf+rMGC5GByDa+rzxATHCS9TZANE=
>> =Ai6W
>> -----END PGP SIGNATURE-----
>>



Archive powered by MHonArc 2.6.16.

Top of Page