Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] failed tests after using mod_interface_route

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] failed tests after using mod_interface_route


Chronological Thread 
  • From: "Christopher J. Tengi" <>
  • To: perfsonar-user <>
  • Subject: [perfsonar-user] failed tests after using mod_interface_route
  • Date: Fri, 16 Oct 2015 19:17:38 +0000
  • Accept-language: en-US

We have 3 multi-homed perfSONAR toolkit 3.5 servers on the campus, and want
to run both delay and bandwidth tests between them. We would also like to to
run tests against perfSONAR nodes off-site. All 3 local nodes have 3
interfaces: management, delay, and bandwidth. All of the management
interfaces are on the same subnet, all of the delay interfaces are on the
same subnet, and all of the bandwidth interfaces are on the same subnet.

Initially, the only route on each server was the default route, which used
the management interface. We have a mesh configuration that defines the
tests between these servers as well as a number of single-homed devices
(CuBoxes) on various subnets around the campus. Note that we are only
running delay tets to these single-homed devices.

With just the default route in place, delay tests from some of the CuBoxes to
the “delay” interfaces on the servers was failing. This was due to the
remote frames coming in on the correct, delay interface on the multi-homed
server, but having the responses go out the management interface, as the
default route dictated. Our router’s sanity check ACL would reject the reply
frames that had the delay subnet source address coming in on the management
subnet SVI. Note that tests where both end-points were on the delay subnet
worked just fine. The same can be said for tests where both end-points were
on the bandwidth subnet.

We fixed the cross-subnet delay test issues by employing the
/opt/perfsonar_ps/toolkit/scripts/mod_interface_route command to force
replies out the same interface that the remote came in through. We initially
did this only for the delay subnet, but realized that we would need the same
thing done for the bandwidth subnet if we wanted to do wide-area bandwidth
testing. Once we ran mod_interface_route for the bandwidth subnet, the
bandwidth testing between the 3 servers, all on the same subnet, started
having problems.

Before we ran mod_interface_route on the bandwidth interfaces, we were
getting over 900 Mb/s on the 1 Gb/s interfaces. After running the command on
all 3 servers, we are down to more like 5 Mb/s. The difference between
before and after the command is that traceroute now shows a bounce through
the bandwidth subnet router before hitting the other node, which used to be
just 1 hop away.

I think I understand why this is occurring and I’m sure it is happening on
the delay subnet, as well. I assume that this behavior would not negatively
impact tests to remote sites, and, indeed, they would not be possible
*without* running mod_interface_route. However, the new interface-specific
route has pretty much killed bandwidth testing within the subnet, and I am
concerned that there are negative impacts over on the delay subnet, as well,
for tests staying on the subnet.

Here are the commands I ran on one of our servers, for reference:

==========
# management subnet
/opt/perfsonar_ps/toolkit/scripts/mod_interface_route --command add --device
em1 --ipv4_gateway 128.112.128.114
# delay subnet
/opt/perfsonar_ps/toolkit/scripts/mod_interface_route --command add --device
em2 --ipv4_gateway 128.112.228.1
# bandwidth subnet
/opt/perfsonar_ps/toolkit/scripts/mod_interface_route --command add --device
em3 --ipv4_gateway 128.112.172.1
==========

And yes, that first IP gateway address is correct. Don’t ask. :-) Here is
the output of “ip rule list” on the same machine:

==========
0: from all lookup local
200: from 128.112.228.23 lookup em2_source_route
200: from 128.112.172.174 lookup em3_source_route
200: from 128.112.128.61 lookup em1_source_route
32766: from all lookup main
32767: from all lookup default
==========

And, finally, the output from “ip route ls” on the same server:

==========
128.112.228.0/22 dev em2 proto kernel scope link src 128.112.228.23
128.112.172.0/22 dev em3 proto kernel scope link src 128.112.172.174
128.112.128.0/21 dev em1 proto kernel scope link src 128.112.128.61
default via 128.112.128.114 dev em1
==========


Any thoughts on how I can get my bandwidth back? :-) Have I missed
something here, or should I have done something differently? Any suggestions
or comments are welcome.


Thanks,
/Chris


PS The servers were all running the 3.4 toolkit code yesterday when I first
ran the mod_interface_route commands and trashed my bandwidth tests. I
didn’t think the upgrade to 3.5 would fix things, but I figured I should be
running the latest and greatest.




Archive powered by MHonArc 2.6.16.

Top of Page