Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Problem with PS node move

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Problem with PS node move


Chronological Thread 
  • From: Casey Russell <>
  • To:
  • Subject: [perfsonar-user] Problem with PS node move
  • Date: Wed, 3 Apr 2019 12:21:19 -0500

Group,

     On 3/23 we moved one of our PS nodes from one building on a campus to another.  The subnets moved with it, and it went from being direct-attached to one KanREN router, to being direct-attached to another KanREN router.  After that move we observed the following change:

     All tests in our mesh continued to operate normally except for latency tests Inbound to this node.  If you look at our dashboard, you'll note that IPv4 and IPv6 latency tests work fine when initiated outbound from KSU (the node we moved), but fail when every other host in the mesh tries to initiate a test inbound.  My first thought was an ACL didn't get applied properly, but I've reviewed them and they seem sane.  For reference, we use the host-based firewall and Juniper MX firewall filters on the router side to secure the hosts.  The Juniper firewall rule seems the same as it was before, and I can't see any reason the host-based filtering would have changed with a physical move.  


      Strangely, the psconfig-pscheduler-agent.log on one of our non-KSU hosts reports on 03/03 (two days before the move) that it scheduled 70 tasks.  Yesterday (and everyday since the move) it reports scheduling the same number of tasks.  So the external hosts don't seem to have a problem reaching KSU or setting up the test initially.  However the latency test doesn't show any results in maddash or the individual host test results.

2019/03/21 18:01:27 INFO pid=7183 prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=226 guid=3060949A-4C2C-11E9-BD6C-F71A4CD1F608 msg=Added 70 new tasks, and deleted 0 old tasks

2019/04/02 17:52:39 INFO pid=7183 prog=perfSONAR_PS::PSConfig::PScheduler::Agent::_run_end line=226 guid=ED1E3036-5598-11E9-BD6C-F71A4CD1F608 msg=Added 70 new tasks, and deleted 0 old tasks

Similarly, I don't have any problem hand-running latency tests to KSU from our outside hosts using owping or any of the other latency tools (I don't have the output from a latencybg test since it's default behavior is to run for a full day).  

To further troubleshoot, I went to one of my external hosts to verify that the "pscheduler schedule" shows tests to KSU.  It does.

]# pscheduler schedule | grep -v throughput | grep -v trace | grep -A 5 -B 2 ps-ksu-lt
2019-04-03T12:14:35Z - 2019-04-04T12:14:35Z  (Running)
latencybg --data-ports 8760-9960 --source-node ps-fhsu-lt.perfsonar.kanren.net
  --dest ps-ksu-lt.perfsonar.kanren.net --packet-padding 0 --flip --bucket-width
  0.001 --dest-node ps-ksu-lt.perfsonar.kanren.net --source ps-fhsu-
  lt.perfsonar.kanren.net --ip-version 4 --packet-interval 0.1 --packet-count 600
  (Run with tool 'powstream')

--
2019-04-03T12:14:35Z - 2019-04-04T12:14:35Z  (Running)
latencybg --data-ports 8760-9960 --source-node ps-fhsu-lt.perfsonar.kanren.net
  --dest ps-ksu-lt.perfsonar.kanren.net --packet-padding 0 --flip --bucket-width
  0.001 --dest-node ps-ksu-lt.perfsonar.kanren.net --source ps-fhsu-
  lt.perfsonar.kanren.net --ip-version 6 --packet-interval 0.1 --packet-count 600
  (Run with tool 'powstream')

(to see the public url's replace localhost with ps-fhsu-lt.perfsonar.kanren.net)

When I go to those URL's of course they're still running, so I'll have to try them again tomorrow to see what the results were.

Does anyone have any thoughts on what could be happening here other than an inbound ACL or MTU issue?  The tests should be storing data to a central repository, 


Sincerely,
Casey Russell
Network Engineer
KanREN
phone785-856-9809
2029 Becker Drive, Suite 282
Lawrence, Kansas 66047
linkedin twitter twitter




Archive powered by MHonArc 2.6.19.

Top of Page