Skip to Content.
Sympa Menu

thrulay-users - Re: Thrulay tests on sunnvale 10 g nic machines

Subject: Discussion list for thrulay, a network testing tool

List archive

Re: Thrulay tests on sunnvale 10 g nic machines


Chronological Thread 
  • From: stanislav shalunov <>
  • To: "Cottrell, Les" <>
  • Cc: "Logg, Connie A." <>, "Li, Yee-Ting" <>, "R. Hughes-Jones" <>,
  • Subject: Re: Thrulay tests on sunnvale 10 g nic machines
  • Date: 18 Oct 2005 22:27:23 -0400

"Cottrell, Les"
<>
writes:

> Increasing -l from the default 8192B to 65536B using two streams
> between Sun v20zs with dual 1.8GHz Opterons (connected by Cisco 6509
> switch ports) increases the throughput from 3.185Gbits/s to 4.751
> Gbits/s for a 60 second run. The MTU was 9KB (best we could do with
> 1500B was 4.0Gbits/s). The client cpu utilization was 99.9% for the
> 8192B blocking and 95.2% for the 65536B (using the time
> command). With blocking at 131072B we got up to 4.812Gbits/s with
> 87.1% utilization and with blocking of 262144B we got 4.68Gbits/s
> and 75.4% utilization. We ran the last one 7 times to look at noise
> and got 4.80 +- 0.06 Gbits/s and 78% +- 1.4%. I thought 100% cpu
> utilization meant the equivalent of one of the two cpus being fully
> utilized. Watching the top display it appears that at any given
> moment it uses only one cpu (even for 2 streams) and switches
> between them.

Les,

If you're shooting to saturate the pipe (or get to 8Gb/s, or
whatever), it might make sense to try to use the TSC library as
described in my other message. Large part of the purpose of the
library is to reduce CPU use, so it would help some. I don't know how
much it'd help, but it would essentially remove the CPU overhead
associated with timestamping and measuring the round-trip time.

The client, indeed, is single-threaded (this seemed to work better in
terms of fairness on single-threaded boxes, but results can differ, of
course). Usually, with all the interrupts and data copies, the kernel
can find things to do for the other CPU. I want to understand this
better, though. I would need to check to see whether a multi-process
implementation was tried.

Tuning the window size down from 8MB might help, too, but it didn't
look like the kernel had any internal queuing problems with the
results labeled as ``new drivers'', so the effect would likely be
marginal.

--Stas

--
Stanislav Shalunov http://www.internet2.edu/~shalunov/

This message is designed to be viewed upside down.



Archive powered by MHonArc 2.6.16.

Top of Page