Skip to Content.
Sympa Menu

perfsonar-user - Re: thread competition [was Re: [perfsonar-user] 40-100 GigE Hardware Recommendations?]

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: thread competition [was Re: [perfsonar-user] 40-100 GigE Hardware Recommendations?]


Chronological Thread 
  • From: "Wefel, Paul" <>
  • To: Eric Pouyoul <>, "" <>
  • Cc: Roy Hockett <>, Azher Mughal <>
  • Subject: Re: thread competition [was Re: [perfsonar-user] 40-100 GigE Hardware Recommendations?]
  • Date: Thu, 27 Mar 2014 17:57:50 +0000
  • Accept-language: en-US

We spent a fair amount of time troubleshooting 40G performance on SandyBridge as BlueWaters relies heavily on 40G Ethernet for it’s external IO systems.  Several issues came up, the first being deficiencies in the SB processor architecture, the second being issues with the Mellanox 40G driver which Mellanox has since corrected.

The SB processor issues were several.  The first being the ATR (Aging Timer Rollover) deadlock breaker in the Integrated I/O unit.  It’s best explained from this excerpt from an Intel tuning document:
"When the cores are very nearly idle and multiple PCIe* ports are attempting to achieve high bandwidth transfers, the deadlock breaker can kick in occasionally causing significant bandwidth performance degradation when it does. Certain real-world use cases are exposed to this issue. Specifically, “CPU Offload” usage models that rely heavily on PCIe throughput (such as GPGPUs or high bandwidth NICs) while the cores are generally idle may experience this performance anomaly.

Changing the ATR settings requires a BIOS mod from the server manufacturer.

The other issue had to do with the number of buffers each processor had allocated to read and write  I/O on the QPI bus.  I can’t find my documents right now on the difference btw SB and IVB but Intel recognized the problem and fixed the issue in IVB.  I think SB has like 16 read buffers and IVB has 64.  

There was an issue with c-states as well.  You want to turn them off.

The configuration of processor/pci-e slot/nic assignment is dependent on the nature of the traffic through the system.  If you are going to have peer-to-peer traffic between nic’s, keep them on the same processor socket.

If you don’t have peer-to-peer traffic between the nic’s then spread them evenly across processors which is what we are doing.  With iperf we can achieve 79Gb/s in this configuration.  We use two separate 40G nics instead of a single dual port 40G nic.

-paul


From: Eric Pouyoul <>
Date: Thursday, March 27, 2014 at 10:51 AM
To: "" <>
Cc: Roy Hockett <>, Azher Mughal <>
Subject: Re: thread competition [was Re: [perfsonar-user] 40-100 GigE Hardware Recommendations?]

Unfortunately no, I do not, I did ask the same question.

Eric


On Thu, Mar 27, 2014 at 10:08 AM, Jason Zurawski <> wrote:
Hi Roy;

Unfortunately I can't.  I am CCing Eric Pouyoul from ESnet who may have more insight.

-jason

On Mar 27, 2014, at 9:55 AM, Roy Hockett <> wrote:

> Azher and Jason,
>       Can you say a bit more on the  thread competition issue?  How have you seen this manifest itself?
> What was changed in the Ivy Bridge 2690 V2 processor to fix this issue?
>
> Thanks,
> -Roy Hockett
>
> Network Architect,
> ITS Communications Systems and Data Centers
> University of Michigan
> Tel: (734) 763-7325
> Fax: (734) 615-1727
> email:
>
> On Mar 20, 2014, at 2:07 PM, Azher Mughal wrote:
>
>> Hi Roy, All,
>>
>> Regarding 100GE NICs, I heard from a vendor (netronome) that their NIC will occupy multiple Gen3 x8 slots :) rather than x16 (in case of Mellanox).
>>
>> For processor as Jason mentioned, Ivy Bridge 2690 v2 should be used rather than the SandyBridge version as it has fixes on thread competition.
>>
>> I agree my documentation is getting a little older, will try to update in coming weeks.
>>
>> Cheers
>> -Azher
>>
>>
>>
>> On 3/20/2014 10:50 AM, Roy Hockett wrote:
>>> Mark,
>>> I built and tested two 40GE boxes for SC13, with mixed results.
>>>
>>> Keys that I found were the following:
>>>
>>> Single 40Gbps connected Server
>>> ------------------------------------------------
>>> Sandy/Ivy Bridge Family processor
>>> - Support for PCIe V3
>>> - Minimum x8 lane slot
>>>
>>> Dual 40Gbps connected server
>>> ------------------------------------------------
>>> Sandy/Ivy Bridge Family processor
>>> - 2 CPU each with PCIe v3 slots from each CPU.
>>> - Support for PCIe V3
>>> - Minimum x8 lane slot
>>>
>>> I had boxes that had both PCI slots connected to the same CPU and ran
>>> into issues when I tried to go above 60Gbps of traffic with nuttcp, which
>>> is why i recommend making sure the PCI slots are connected to different
>>> CPUs to avoid this issue.
>>>
>>> Network Card
>>> ---------------------
>>> 40GE card was ConnectX®-3 VPI.  I know there are a couple others out, but
>>> have not been able to test them.
>>>
>>> I am told by venters that 100Gbps cards will be PCIe v3 16 lanes.
>>> Be careful and look at the server architecture layout like the attached, so you
>>> can see how the PCI3 slots are connected to the CPU, whether they are 8 lanes
>>> or 16 lanes, even though the connector is x16.
>>>
>>> If you are doing Disk to Disk tests, the RAID controller and Disks are important.
>>>
>>> Many builtin RADI controllers have limitations, so make sure you under stand them.
>>>
>>> I know LSI have 12Gbps RAID controller that uses a PCIe v3 8 lane slot, so even
>>> if you aren't purchasing this now, it might be good to have an extra PCIe slot so
>>> you can use this if you want.
>>>
>>>
>>> Azher Mughal, did this back in 2011, but the information is still relevant.
>>>
>>> http://supercomputing.caltech.edu/archive/sc11/40gekit_setup.html
>>>
>>>
>>> <Mail Attachment.png>
>>>
>>>
>>> Thanks,
>>> -Roy Hockett
>>>
>>> Network Architect,
>>> ITS Communications Systems and Data Centers
>>> University of Michigan
>>> Tel: (734) 763-7325
>>> Fax: (734) 615-1727
>>> email:
>>>
>>> On Mar 20, 2014, at 12:08 PM, Christopher A Konger <> wrote:
>>>
>>>> Joe Breen at Utah and Conan Moore at Colorado also are building/testing 40G pS boxes.
>>>>
>>>> I have CC'd them on this in case you want to reach out directly (I think they have already conferred with Jason).
>>>>
>>>> Chris Konger / 864-656-8140
>>>>
>>>> From: [mailto:] On Behalf Of Jason Zurawski
>>>> Sent: Thursday, March 20, 2014 8:44 AM
>>>> To: Mark Gardner
>>>> Cc:
>>>> Subject: Re: [perfsonar-user] 40-100 GigE Hardware Recommendations?
>>>>
>>>> Hey Mark;
>>>>
>>>> I spec'ed one out last fall, but didn't purchase or do any testing.  It uses a 40G card that we have tested on ESnet DTNs (Mellanox MCX353A-TCBT ConnectX3), Ivy Bridge motherboard/processor combo, and lots of memory.  Spec is below, use at your own risk of course :)
>>>>
>>>> Thanks;
>>>>
>>>> -jason
>>>>
>>>>
>>>>
>>>> On Mar 20, 2014, at 8:22 AM, Mark Gardner <> wrote:
>>>>
>>>>> Has anyone built a perfSONAR node intended to support 40-100 Gbps Ethernet?
>>>>>
>>>>> We do not have that level of connectivity yet but are planning for the
>>>>> future. I think I have sufficient funds that I can purchase beefier
>>>>> hardware now to support those rates in the future. Does anyone have a
>>>>> spec that they would be willing to share?
>>>>>
>>>>> Mark
>>>>> --
>>>>> Mark Gardner
>>>>> Network Research Manager
>>>>> Virginia Tech




Archive powered by MHonArc 2.6.16.

Top of Page