Skip to Content.
Sympa Menu

ndt-users - Re: Am I seeing the right results?

Subject: ndt-users list created

List archive

Re: Am I seeing the right results?


Chronological Thread 
  • From: Richard Carlson <>
  • To: Peter Van Epp <>,
  • Subject: Re: Am I seeing the right results?
  • Date: Fri, 05 Oct 2007 09:57:55 -0400

All

A couple of comments.

You can usually use the ethtool program to change the number of packets the NIC needs to receive before generating an Interrupt. While this may be a problem for real applications, getting single packets is actually a benefit for the NDT system. This is because NDT server is dedicated to testing purposes and it is looking for specific problems. So I recommend setting the rx-frames variable to 1 using the "ethtool -C rx-frames 1" command.

The TCP Westwood team has written some papers on how to deal with interrupt coalescing issus. A future version of NDT may try to implement this same algorithm.

We are also looking at other methods to validate the pkt-pair link detection algorithm. More about this soon (I hope).

Rich

At 02:01 PM 9/29/2007, Peter Van Epp wrote:
On Sat, Sep 29, 2007 at 11:14:40AM +0200, Simon Leinen wrote:
> Peter Van Epp writes:
> > The "slowest link is gig" is likely caused by the server NIC
> > card having interrupt reduction (it has a correct name but I don't
> > remember it :-)) on.
>
> (People seem to prefer the fancier names of "interrupt coalescence" or
> "interrupt moderation" :-)
>
> http://kb.pert.geant2.net/PERTKB/InterruptCoalescence
>
> > NDT guesses link speed by packet interarrival time from the NIC if
> > it delivers multiple packets per interrupt that timing is disrupted
> > (this can usually be disabled in the NIC driver although perhaps not
> > easily).
>
> Very interesting, I hadn't known that. But how does NDT measure
> *packet* interarrival times - doesn't it only do TCP (where the
> application only sees a byte stream)?

It uses pcap to get the raw packets from the interface (including
kernel timestamps) for at least some parts of the testing.

>
> > Before I turned it off on our gig link it used to claim I had an
> > OC192 (which was of course news to me). Throughput looks about right
> > for a well performing 100 meg link though.
>
> Because interrupt coalescence is quickly becoming prevalent (even my
> laptop has it), it would be useful to think about measurement methods
> that are "robust" to it.
>
> In general, I would favour it if everybody used kernel timestamps
> (e.g. SO_TIMESTAMP), and every adapter that performs interrupt
> coalescence would decorate incoming frames with hardware timestamps.
> That wouldn't require much (if any) new hardware on the adapters, just
> a little more logic in the driver to convert hardware timestamps into
> OS-level timestamps.

It is more difficult than this. The Interrupt coalescence is built
in to the ethernet chips which have internal buffers and no access to the
kernel time. The answer is Endace DAG cards (www.endace.com) which keep an on
board time source (syncable to GPS via ntp if desired) that stamps the packet
in its internal buffer (and presumably builds the ethernet interface out of
descrete chips rather than one of the single chip solutions). They include an
onboard CPU and large packet cache (most chips have at most a 64K cache, I
believe a DAG is around 4 megs) and thus can capture correctly in the face of
bus contention as well. If you are doing disk I/O on the machine capturing you
will likely lose packets on a conventional gig enet card due to PCI bus
contention even at around 100 megabits per second (at least that has been my
experience on argus). The disk I/O ties up the bus too long and the card buffer
overruns and loses packets.
They are the preferred choice for measurement of all kinds but are a
little pricey, aroung $8K US for a gig and around $30K for OC192 when last
I asked (which is a while ago).

>
> Still I have no idea on how to provide such timestamps to an
> application that only uses TCP...
> --
> Simon.

As noted pcap, although at higher speeds I've been told the callback
mechanism eventually becomes a problem (around 600 megs I think) and you need
to change to something more exotic and thus less portable. I believe Endace
has a custom API for the truely speed challanged.

Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada

------------------------------------



Richard A. Carlson e-mail:

Network Engineer phone: (734) 352-7043
Internet2 fax: (734) 913-4255
1000 Oakbrook Dr; Suite 300
Ann Arbor, MI 48104


  • Re: Am I seeing the right results?, Richard Carlson, 10/05/2007

Archive powered by MHonArc 2.6.16.

Top of Page