Hey Joon;
Great, at least we now see there is a real impact on underpowered
machines.
Thanks for the hard work on this one;
-jason
Hyojoon Kim wrote:
Hi Jason,
I have copy-and-pasted some parts of
the mpstat output below; it is when the bandwidth measurement is
degraded to around 600Mbps, which is when it has several stateful
iptables rules. You can see CPU #0 hits 0% idle, and it stays
there for several seconds (like around 7-8 seconds).
When there are no iptables rules, CPU #0
never hits 0% idle, and although it goes close to 0%, it does not stay
there that long.
-Joon
03:00:50 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:51 PM all 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:51 PM 0 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:51 PM 1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:51 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:52 PM all 0.50 0.00 14.43 0.00 0.00 6.47
0.00 0.00 0.00 78.61
03:00:52 PM 0 0.00 0.00 29.00 0.00 0.00 13.00
0.00 0.00 0.00 58.00
03:00:52 PM 1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:52 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:53 PM all 0.50 0.00 31.16 0.00 0.00 19.10
0.00 0.00 0.00 49.25
03:00:53 PM 0 1.01 0.00 61.62 0.00 0.00 37.37
0.00 0.00 0.00 0.00
03:00:53 PM 1 1.01 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 98.99
03:00:53 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:54 PM all 0.00 0.00 31.50 0.00 0.00 18.50
0.00 0.00 0.00 50.00
03:00:54 PM 0 0.00 0.00 62.38 0.00 0.00 37.62
0.00 0.00 0.00 0.00
03:00:54 PM 1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:54 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:55 PM all 0.00 0.00 35.18 0.00 0.00 15.08
0.00 0.00 0.00 49.75
03:00:55 PM 0 0.00 0.00 70.00 0.00 0.00 30.00
0.00 0.00 0.00 0.00
03:00:55 PM 1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 100.00
03:00:55 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:00:56 PM all 0.50 0.00 34.83 0.50 0.00 16.42
0.00 0.00 0.00 47.76
03:00:56 PM 0 0.00 0.00 67.68 0.00 0.00 32.32
0.00 0.00 0.00 0.00
03:00:56 PM 1 0.00 0.00 4.00 1.00 0.00 0.00
0.00 0.00 0.00 95.00
…
…
03:01:00 PM CPU %usr %nice %sys %iowait %irq %soft
%steal %guest %gnice %idle
03:01:01 PM
all 0.00 0.00 35.00 0.00 0.00 15.00 0.00 0.00
0.00 50.00
03:01:01 PM
0 0.00 0.00 70.00 0.00 0.00 30.00 0.00 0.00
0.00 0.00
03:01:01 PM
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
03:01:01 PM
CPU %usr %nice %sys %iowait %irq %soft %steal %guest
%gnice %idle
03:01:02 PM
all 0.50 0.00 20.00 0.50 0.00 11.50 0.00 0.00
0.00 67.50
03:01:02 PM
0 0.00 0.00 36.36 0.00 0.00 23.23 0.00 0.00
0.00 40.40
03:01:02 PM
1 0.00 0.00 3.96 0.99 0.00 0.00 0.00 0.00
0.00 95.05
On Jan 4, 2016, at 12:56
PM, Jason Zurawski <> wrote:
Hey Joon;
Another suggestion would be to look at mpstat (e.g. mpstat -P ALL 1) in
the various test scenarios to see if things are getting pegged. If
Mark's synopsis is correct, the security stuff is probably the
bottleneck.
Thanks;
-jason
Mark Feit wrote:
I can’t confirm it for the LIVA
because I don’t have one, but iptables is well-known to be a drag in
high-traffic situations. You won’t notice this as much on bigger
machines because they tend to have enough horsepower that the extra
processing
time doesn’t matter. Your change effectively disables all of it, so
the increase in throughput makes perfect sense. The tradeoff is that
the machine is no longer protected from the outside.
Iptables processes the rules in each chain in the order
they appear, and the perfSONAR chain isn't arranged to get time- and
rate-sensitive traffic processed as quickly as possible.
—Mark
Hyojoon Kim wrote:
Hi all,
After some digging, I am pretty convinced that the
under-bandwidth achievement of the LIVA X box is due to:
- "perfsonar-toolkit-security” package installation.
“lsmod" command shows that nf_conntrack-related kernel
modules become loaded after the perfsonar security package installation,
along with several iptables rules. After this happens, bwctl
measurement will never go over 700Mbps when the test is
initiated from the LIVA X box. Flushing the iptables rules and
disabling the nf_conntrack-related modules fixes the issue, restoring
the bwctl measurement to 940Mbps.
Can someone confirm this?
Of course, you have to be able to remove the security
package after testing. You can do:
# apt-get remove perfsonar-toolkit-security
# apt-get autoremove
# cd /etc/iptables/
# rm rules.v4 rules.v6
# reboot
If this is confirmed, maybe it makes sense to put a note
at this link, in the section of “Installation Instructions”.
Or at this link, putting a note saying something like
“Don’t install the security package if you have a low-cost node, like
LIVA X”.
Happy holidays, everyone.
Thanks,
Joon
On Dec 30, 2015, at 10:20
AM, Hyojoon Kim <>
wrote:
Hi Jason,
Sorry for the late reply; was a bit out for the holidays,
and I wanted to actually replicate the problem.
- tcpdumps when the LIVA box only achieves around 513Mbps
is here. Note that if the test is initiated *from* the Dell server, it
does achieve 935Mbps.
- TCP settings are probably not exactly uniform. Let me
know which settings you want to see, and I will provide them from
"sysctl -a".
- The Dell server has CentOS.
Thanks,
Joon
On Dec 24, 2015, at 9:35 AM, Jason Zurawski <>
wrote:
Hey Joon;
Interesting stuff - its a little late to weigh in, but some thoughts as I
read this:
- You noted that when the Liva was the receiver (and Dell was
sending, things were 'ok'. The opposite (Dell receiving, Liva sending)
things degraded. Did you happen to capture any packet traces during
this? I would be curious to see if there was something
funky going on that would force a slowdown (pause frames, window
manipulation, etc.) either from one of the OSs or the NICs.
- Were the TCP settings uniform on both ends? Same congestion
control, same socket sizes, etc?
- Was the Dell a CentOS system, or also an ubuntu?
Glad its 'working', not glad that things are this touchy :)
-jason
Daniel Doyle wrote:
Joon,
Excellent! And thanks for passing along the information.
I'll have to dig a bit on this and figure out what the offending bit
was, but in the meanwhile I will update the page to reflect these
findings in case others are poking around and run into
the same issue.
-Dan
On Dec 22, 2015, at 6:24 PM, Hyojoon Kim <>
wrote:
Just to give an update on this issue:
Now I get 940Mbps with bwctl :-)
* Side note:
I don’t know what makes the other LIVA box with
Ubuntu-14.04-desktop-amd64 unable to achieve 940Mbps. I might dig into
it when I have the time. Just FYI, things I did differently on the
under-achieving box are:
- It’s a different OS (Ubuntu 14.04)
- I installed two more packages
("perfsonar-toolkit-security" and "perfsonar-toolkit-sysctl")
in addition to the "perfsonar-testpoint”
- I did some Linux tuning after it was
not able to achieve 940 Mbps.
Thanks,
Joon
On Dec 22, 2015, at 2:00 PM, Hyojoon Kim <>
wrote:
Thanks for all the suggestions and comments!
To answer some:
* I did remove the "ondemand option" as I saw that note somewhere too
before I ran the test. I’m sure it did something but I still get around
550Mbps bwctl.
* I did the Linux Tuning *after* I got 550Mbps at my first trial, hoping
it would fix it. No luck.
* Interestingly, I get around 940Mbps when I initiate the bwctl test
*from* a Dell node *to* the LIVA X box. But the other direction still
gives me around 550Mbps.
* One thing I noticed is that:
- When I initiate the test from the Dell node, it opens an ephemeral
port (e.g., 45250) on the dell node.
- local LIVAXBOX port 5593 connected with DELL port 45240
- From LIVAX box to Dell server is:
- local DELL port 5220 connected with LIVAXBOX port 5220
Thanks,
Joon
On Dec 22, 2015, at 1:43 PM, Uhl,
George D. (GSFC-423.0)[SGT INC] <>
wrote:
When I was researching the Liva X capabilities I ran across an email
from Larry Blunk of Merit that he posted on the list this past October.
One thing to note is that Ubuntu enables the "ondemand" init script
by default which puts the CPU in "powersave" mode. In testing the
LIVA, I found that this seems to limit throughput a bit in performance
tests. I get around 900Mbps TCP throughput vs. 940Mbps in
"performance" mode. Also saw some packet loss when doing UDP tests
with 1500 byte datagrams at 1 Gbps in powersave mode. You can disable
the script with the following which will leave the CPU in performance
mode.
update-rc.d -f ondemand remove
-Larry Blunk
Merit
Disabling the “ondemand” mode capability might work for you. I did it
and I was able to achieve 940 Mbps.
-George
From: <>
on behalf of Brian Tierney <>
Reply-To: "" <>
Date: Tuesday, December 22, 2015 at 1:12 PM
To: Daniel Doyle <>
Cc: Hyojoon Kim <>, "" <>
Subject: Re: [perfsonar-user] Bandwidth measurement with LIVA X 2GB/32GB
eMMC
On Tue, Dec 22, 2015 at 12:00 PM, Daniel Doyle <>
wrote:
Hi Joon,
Sorry to hear about that. A couple of questions / debugging ideas:
- Are you seeing this in both directions?
- Have you tried using the same port with a different device?
- If you're using / not using jumbo frames has the device been
configured accordingly?
fwiw, I did not apply any tunings out of the box on a LIVA X and got
>900Mbps on a local network. It's possible some of those tunings
might not be appropriate for a machine like that, but I have't dug into
it.
I agree that might be the issue. The tunings on fasterdata might not be
right for a small node.
See if the original default debian settings are better.
-Dan
On Dec 22, 2015, at 11:58 AM, Hyojoon
Kim <> wrote:
Hi all,
We decided to play with a LIVA X 2GB/32GB eMMC node with perfSONAR 3.5.
However, I am getting around 535Mbps instead of 940+Mbps when I initiate
a bwctl test to another local perfSONAR node we have. Between two Dell
R610 servers with perfSONAR, we do get over
950Mbps, so it’s likely not a network problem.
I have generally followed the instructions here (http://docs.perfsonar.net/install_debian.html),
and installed three packages:
apt-get install perfsonar-testpoint
apt-get install perfsonar-toolkit-security
apt-get install perfsonar-toolkit-sysctl
I also did the Linux Tuning, which is indicated here:
http://fasterdata.es.net/host-tuning/linux/
Does anyone know what I am missing? I do have Ubuntu 14.04.3 Desktop OS
on this. Maybe switching to the well-tested Ubuntu 12.04 is better?
===
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty
$ uname -a
Linux perfbox-livaxtest-01 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed
Dec 2 10:00:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
===
Thanks,
Joon
|