perfsonar-user - Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed
Subject: perfSONAR User Q&A and Other Discussion
List archive
Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed
Chronological Thread
- From: Casey Russell <>
- To: Dan Doyle <>
- Cc: "" <>
- Subject: Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed
- Date: Thu, 3 Aug 2017 10:51:58 -0500
- Ironport-phdr: 9a23:kyTKEB9WqUZRHf9uRHKM819IXTAuvvDOBiVQ1KB+0+sQIJqq85mqBkHD//Il1AaPBtSLraocw8Pt8InYEVQa5piAtH1QOLdtbDQizfssogo7HcSeAlf6JvO5JwYzHcBFSUM3tyrjaRsdF8nxfUDdrWOv5jAOBBr/KRB1JuPoEYLOksi7ze6/9pnQbglSmDaxfa55IQmrownWqsQYm5ZpJLwryhvOrHtIeuBWyn1tKFmOgRvy5dq+8YB6/ShItP0v68BPUaPhf6QlVrNYFygpM3o05MLwqxbOSxaE62YGXWUXlhpIBBXF7A3/U5zsvCb2qvZx1S+HNsDtU7s6RSqt4LtqSB/wiScIKTg58H3MisdtiK5XuQ+tqwBjz4LRZoyaOuB+fqfAdt0EQ2RPUNtaWyhYDo68aocCCfcKM+RFoInnv1YBrxW+CwmiCu3s1zFGmGP50LYg3Og9CwzLxhAsE84MvXnSsd77NL0SUeewzKTQ0zjMdehW1in96YPVbh4hvOqMXalufsrVzUkkCgTIgUmKqYz4JDOVzfoCs2yf7+d7VuKgkWgnqxtvrTip3MssjJfGhp4Mx13C6C53zoE1JdiiR056Z96pCJxQtyeEOItwXMwiX3tkuCAkxb0aoZK7eCkKyJIkxxHBdvOHco6I7gj/W+aWJDd0nG5leby+hxau7Uiv1Pf8WtOo31ZNqypJit/MuWoW1xPJ8MSHRfx9/lm51TaKzQ/S5e5EIUE7laXBM54hxaYwmoAVsUvdAi/6gEb2g7WQdkU+5Oeo7f7rYrP4qZ+AL4N0kQb+M6s0lsy5H+s4Lg4DVHWY9+SkzLDv40z0TKlIg/AznKnUs4vVKdgeq6O2HwNZzpos5hOjADu609kVmHwKIExbdB2ZiYXiJkvAL+riDfilhlShiDdryO7CPr3mGpjNK2LMkLblfbpk9k5T0hY/wclQ5p5KFL0OPuj/WkD2tNzfAR85NxK7z/z7B9V604MSQWOPAqmHP6POqVKE+OMiL/WOaYIQtjbwKOMq6+LrgHI2hVMRYbWm0J4LZ3ykHflrJkCUbWTyjtgfCWsKuxAxTO3uiF2MSz5TYHOyUrog6TEhFYKmFpzDSZ6pgLyFxyq7HYdZZmZcCl+SF3fkbYOEVOkQaC6KOM9ujiQEVaS9S48mzRyurBH1y6BpLurP/S0Ys4js1MJs6+3OjhE96yZ0At+Z02GMVGF0gngISyEs0KB+p0x91kmM0bJ+g/NGCdxf+elFXRknOp7BnKRGDIXeUwSJUtCYTFvuFtmvBT02X9Y869AKZEx0H9OlhVbCwmynD6JDxJKRA5lh2avH0mm5Hdtm0HvC0OF1hEM7WdBCMWmOhad57QXVQYjEjxPKxO6Raa0A0XuVpy+4xm2UsRQdCVYtXA==
Casey,It actually just dawned on me while looking at this - I don't see the usual "created by meshconfig" information in the task spec. Are you / someone else running tests by hand to verify things? If so, it may be entirely possible that "perfsonar-meshconfig" isn't running at all or is failing for whatever reason to set up the tasks in your pscheduler instance. You may want to verify that the meshconfig is actually succeeding - /etc/perfsonar/meshconfig-agent.conf has the right <mesh> blocks, and the logs aren't full of errors. Here's an example of a task created by meshconfig for reference: https://ps-bryant-lt.perfsonar.kanren.net/ pscheduler/tasks/287657f6- 8707-45e9-8d0a-2aa6a491607c That would suggest that meshconfig is running. If meshconfig agent is working, you might want to verify that the information in the actual .json file references your host correctly.On Aug 3, 2017, at 9:58 AM, Dan Doyle <> wrote:Casey,It looks like there may be several issues at play here in the mesh, but I'll try to just focus on the host you identified. For reference, pScheduler has a very rich API that you can pull all the raw data for runs and such out of. Here are all of the scheduled tasks it has for "latencybg" tasks, ie the long lived powstreams typical in a latency mesh. Note: you mentioned ps-bryant-bw, but the linked mesh shows ps-bryant-lt. I'm not sure if this is the same host different interface, but I'm going to continue with the -lt example.Click any of those will show all the parameters for a run. You can add /runs to the end of any of to see the actual results of each run (a given task may have multiple runs inside of it) which typically includes the raw output and archiving information.Here's the task from "ps-bryant-lt.perfsonar.kanren.net " to "56m-ps.sox.net" which is a host that seems to be working in a number of spots, so probably is a decent example.Here's a sample run from that task that has finished: https://ps-bryant-lt.perfsonar.kanren.net/ pscheduler/tasks/eeb9a0ab- 62f6-4b1e-81b3-25437760d1ce/ runs/1638c950-8057-48ac-bdde- 1e62e3f22c93 It looks like the it worked fine, but I don't see any "archiving" information which would suggest that it finished the test and then did nothing with the results.I would probably start by looking at /etc/perfsonar/meshconfig-agent-tasks.conf and verify that the <measurement_archive> blocks in there are correct. If it's supposed to be learning the MAs from the mesh, ensure that "configure_archives 1" is set in /etc/perfsonar/meshconfig- agent.conf, otherwise ensure that it's either commented out or set to 0. Hope that helps point things in a useful direction - if not feel free to re-engage. I can take a look at your conf files as well, though I'd also say if you're using API keys for any MAs please be sure to strip any of that information out first.On Aug 2, 2017, at 6:38 PM, Casey Russell <> wrote:Group,One of our hosts participates in a larger regional mesh, about the time of the upgrades to PS 4.0, many hosts in the mesh "went yellow" meaning no results are found for tests in that grid square. I've just recently begun looking into why that is. While I can tell that it's either an inability to store results on my local host, or an inability of the dashboard host to read them. I can't wrap my head around what changed in 4.0 and what I need to do to fix it. A number of the hosts in the mesh just seemed to work through the conversion just fine although according to the JSON, they're running the same tests and storing the same *(flipped) results from remote hosts.I suspect it's related to the thread I've copied in below between George Uhl and Andrew Lake from Back in April, but even having read it, I can't quite grasp why some hosts in the mesh are working and some aren't. The mesh is at: http://ps.onenet.net/maddash-webui/index.cgi?grid= My host is ps-bryant-bw.perfsonar.kanren.Quilt%20Latency net . You can see the entire horizontal row (save one host) is yellow. That's the row where all the results should be stored in/retrieved from the local MA on my machine.I've verified that IPtables isn't blocking access to esmond from off-network (port 443/80), and I've tried adding IP based authentication for the remote hosts. Both to no effect.Any suggestions appreciated.On Thu, Apr 27, 2017 at 9:01 AM, Andrew Lake <> wrote:On April 26, 2017 at 7:00:48 PM, Uhl, George D. (GSFC-423.0)[SGT INC] () wrote:
Thanks for the clarification, Andy. So back in the 3.5.1 days I had identified pS nodes that I don’t manage as no_agent hosts with the intention of having my managed pS nodes initiate the bi-directional tests and send the bi-directional results the central MA. Is that feature no longer available in pS 4.0?You can still do no_agent with force bidirectional and your host will still initiate the test (i.e. be responsible for creating the pscheduler task) but in the caseof throughput, traceroute and ping tests the source is always the be one that sends it to the archiver regardless of the initator. Your OWAMP tests (latency and latencybg in pscheduler terminology) still work the same way since we can use the —flip option to have the local side be the only pscheduler participant and thus responsible for the archiving even when it is not the source.
Thanks,GeorgeFrom: Andrew Lake <>
Date: Wednesday, April 26, 2017 at 4:27 PM
To: George Uhl <>, "" <>
Subject: Re: [perfsonar-user] mesh tests fail to archive results from reverse path testsHi George,Sorry for the delay. The source of the test is always responsible for archiving and is the side that has all the info about whether it succeeded or not. If you swap mcln-ps.maxgigapop.net into the URL you should see what you are after: # pscheduler result --archivings https://mcln-ps.maxgigapop.net/pscheduler/task s/36bb4f1d-dd27-4e5b-8975-36d3 6b014af2/runs/2819c8ac-23be- 480f-8ef5-30760ea0e5c4 throughput --duration PT30S --source mcln-ps.maxgigapop.net --ip-version 4 --dest enpl-pt2-10g.eos.nasa.gov --window-size 1310720 --parallel 1* Stream ID 4Interval Throughput Retransmits Current Window0.0 - 1.0 9.03 Gbps 0 2.03 MBytes1.0 - 2.0 8.87 Gbps 0 2.03 MBytes2.0 - 3.0 8.70 Gbps 0 2.03 MBytes3.0 - 4.0 8.92 Gbps 0 2.03 MBytes4.0 - 5.0 8.81 Gbps 0 2.03 MBytes5.0 - 6.0 8.50 Gbps 0 2.03 MBytes6.0 - 7.0 8.05 Gbps 0 2.03 MBytes7.0 - 8.0 7.79 Gbps 0 2.03 MBytes8.0 - 9.0 7.49 Gbps 0 2.03 MBytes9.0 - 10.0 8.43 Gbps 0 2.03 MBytes10.0 - 11.0 8.64 Gbps 0 2.03 MBytes11.0 - 12.0 8.46 Gbps 0 2.03 MBytes12.0 - 13.0 8.10 Gbps 0 2.03 MBytes13.0 - 14.0 7.80 Gbps 0 2.03 MBytes14.0 - 15.0 7.23 Gbps 0 2.03 MBytes15.0 - 16.0 7.04 Gbps 0 2.03 MBytes16.0 - 17.0 7.10 Gbps 0 2.03 MBytes17.0 - 18.0 6.99 Gbps 0 2.03 MBytes18.0 - 19.0 7.26 Gbps 0 2.03 MBytes19.0 - 20.0 7.32 Gbps 0 2.03 MBytes20.0 - 21.0 7.38 Gbps 0 2.03 MBytes21.0 - 22.0 7.37 Gbps 0 2.03 MBytes22.0 - 23.0 7.29 Gbps 0 2.03 MBytes23.0 - 24.0 7.21 Gbps 0 2.03 MBytes24.0 - 25.0 7.11 Gbps 0 2.03 MBytes25.0 - 26.0 7.07 Gbps 0 2.03 MBytes26.0 - 27.0 7.06 Gbps 0 2.03 MBytes27.0 - 28.0 6.99 Gbps 0 2.03 MBytes28.0 - 29.0 7.07 Gbps 0 2.03 MBytes29.0 - 30.0 7.07 Gbps 0 2.03 MBytesSummaryInterval Throughput Retransmits0.0 - 30.0 7.74 Gbps 0Archivings:To esmond, Finished2017-04-25T03:53:53-05:00 400: Invalid JSON returned2017-04-25T03:54:58-05:00 400: Invalid JSON returned2017-04-25T04:04:05-05:00 400: Invalid JSON returned2017-04-25T05:06:55-05:00 400: Invalid JSON returned2017-04-25T06:07:27-05:00 400: Invalid JSON returned2017-04-25T07:07:32-05:00 400: Invalid JSON returned2017-04-25T08:12:35-05:00 400: Invalid JSON returned2017-04-25T09:13:20-05:00 400: Invalid JSON returned2017-04-25T10:16:57-05:00 400: Invalid JSON returned2017-04-25T11:17:02-05:00 400: Invalid JSON returned2017-04-25T12:17:08-05:00 400: Invalid JSON returned2017-04-25T13:17:13-05:00 400: Invalid JSON returned2017-04-25T14:18:37-05:00 400: Invalid JSON returned2017-04-25T15:22:30-05:00 Archiver permanently abandoned registering test after 14 attempt(s): 400: Invalid JSON returnedDoes archive.eos.nasa.gov allow mcln-ps.maxgigapop.net to connect to it on port 443? Having the source be responsible for the archiving is a change from 3.5 and a result of some of the architectural changes. Thanks,AndyOn April 25, 2017 at 3:09:01 PM, Uhl, George D. (GSFC-423.0)[SGT INC] () wrote:
Since the pS 4.0 upgrade I’ve noticed that some tests results are not getting archived to a my central archive. In these cases I manage one of the hosts and I test to a no_agent host. It’s the test results sourced from the no_agent host that fail to be archived. Drilling down into the pscheduler results on my managed host shows tests in both directions run successfully but only the managed->no_agent test results get archived. Nothing obvious in my meshconfig-agent-tasks.conf file stands out to me that would indicate a cause.From my managed host:# pscheduler result --archivings https://enpl-pt2-10g.eos.nasa.gov/pscheduler/ta sks/bb0dd703-6559-4543-83d6-b3 844bba516a/runs/356db092-d154- 4a30-b728-1a256431635c 2017-04-25T06:53:59-04:00 on enpl-pt2-10g.eos.nasa.gov and mcln-ps.maxgigapop.net with iperf3: throughput --duration PT30S --source enpl-pt2-10g.eos.nasa.gov --ip-version 4 --dest mcln-ps.maxgigapop.net--window-size 1310720 --parallel 1 * Stream ID 4Interval Throughput Retransmits Current Window0.0 - 1.0 6.19 Gbps 0 2.03 MBytes1.0 - 2.0 6.29 Gbps 0 2.03 MBytes2.0 - 3.0 6.28 Gbps 0 2.03 MBytes3.0 - 4.0 6.22 Gbps 0 2.03 MBytes4.0 - 5.0 6.15 Gbps 0 2.03 MBytes5.0 - 6.0 6.08 Gbps 0 2.03 MBytes6.0 - 7.0 5.89 Gbps 0 2.03 MBytes7.0 - 8.0 5.56 Gbps 0 2.03 MBytes8.0 - 9.0 5.06 Gbps 0 2.03 MBytes9.0 - 10.0 4.65 Gbps 0 2.03 MBytes10.0 - 11.0 4.38 Gbps 0 2.03 MBytes11.0 - 12.0 4.23 Gbps 0 2.03 MBytes12.0 - 13.0 4.25 Gbps 0 2.03 MBytes13.0 - 14.0 4.39 Gbps 0 2.03 MBytes14.0 - 15.0 6.29 Gbps 0 2.03 MBytes15.0 - 16.0 6.67 Gbps 0 2.03 MBytes16.0 - 17.0 6.64 Gbps 0 2.03 MBytes17.0 - 18.0 6.64 Gbps 0 2.03 MBytes18.0 - 19.0 6.68 Gbps 0 2.03 MBytes19.0 - 20.0 6.67 Gbps 0 2.03 MBytes20.0 - 21.0 6.66 Gbps 0 2.03 MBytes21.0 - 22.0 6.63 Gbps 0 2.03 MBytes22.0 - 23.0 6.63 Gbps 0 2.03 MBytes23.0 - 24.0 6.65 Gbps 0 2.03 MBytes24.0 - 25.0 6.68 Gbps 0 2.03 MBytes25.0 - 26.0 6.65 Gbps 0 2.03 MBytes26.0 - 27.0 6.66 Gbps 0 2.03 MBytes27.0 - 28.0 6.68 Gbps 0 2.03 MBytes28.0 - 29.0 6.66 Gbps 0 2.03 MBytes29.0 - 30.0 6.66 Gbps 0 2.03 MBytesSummaryInterval Throughput Retransmits0.0 - 30.0 6.06 Gbps 0Archivings:To esmond, Finished2017-04-25T06:54:40-04:00 SucceededFrom the no_agent host:# pscheduler result --archivings https://enpl-pt2-10g.eos.nasa.gov/pscheduler/ta sks/36bb4f1d-dd27-4e5b-8975-36 d36b014af2/runs/2819c8ac-23be- 480f-8ef5-30760ea0e5c4 2017-04-25T04:53:08-04:00 on mcln-ps.maxgigapop.net andenpl-pt2-10g.eos.nasa.gov with iperf3:throughput --duration PT30S --source mcln-ps.maxgigapop.net --ip-version 4 --dest enpl-pt2-10g.eos.nasa.gov --window-size 1310720 --parallel 1* Stream ID 4Interval Throughput Retransmits Current Window0.0 - 1.0 9.03 Gbps 0 2.03 MBytes1.0 - 2.0 8.87 Gbps 0 2.03 MBytes2.0 - 3.0 8.70 Gbps 0 2.03 MBytes3.0 - 4.0 8.92 Gbps 0 2.03 MBytes4.0 - 5.0 8.81 Gbps 0 2.03 MBytes5.0 - 6.0 8.50 Gbps 0 2.03 MBytes6.0 - 7.0 8.05 Gbps 0 2.03 MBytes7.0 - 8.0 7.79 Gbps 0 2.03 MBytes8.0 - 9.0 7.49 Gbps 0 2.03 MBytes9.0 - 10.0 8.43 Gbps 0 2.03 MBytes10.0 - 11.0 8.64 Gbps 0 2.03 MBytes11.0 - 12.0 8.46 Gbps 0 2.03 MBytes12.0 - 13.0 8.10 Gbps 0 2.03 MBytes13.0 - 14.0 7.80 Gbps 0 2.03 MBytes14.0 - 15.0 7.23 Gbps 0 2.03 MBytes15.0 - 16.0 7.04 Gbps 0 2.03 MBytes16.0 - 17.0 7.10 Gbps 0 2.03 MBytes17.0 - 18.0 6.99 Gbps 0 2.03 MBytes18.0 - 19.0 7.26 Gbps 0 2.03 MBytes19.0 - 20.0 7.32 Gbps 0 2.03 MBytes20.0 - 21.0 7.38 Gbps 0 2.03 MBytes21.0 - 22.0 7.37 Gbps 0 2.03 MBytes22.0 - 23.0 7.29 Gbps 0 2.03 MBytes23.0 - 24.0 7.21 Gbps 0 2.03 MBytes24.0 - 25.0 7.11 Gbps 0 2.03 MBytes25.0 - 26.0 7.07 Gbps 0 2.03 MBytes26.0 - 27.0 7.06 Gbps 0 2.03 MBytes27.0 - 28.0 6.99 Gbps 0 2.03 MBytes28.0 - 29.0 7.07 Gbps 0 2.03 MBytes29.0 - 30.0 7.07 Gbps 0 2.03 MBytesSummaryInterval Throughput Retransmits0.0 - 30.0 7.74 Gbps 0Archivings:This task had no archivings.
- [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Casey Russell, 08/02/2017
- Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Dan Doyle, 08/03/2017
- Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Dan Doyle, 08/03/2017
- Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Casey Russell, 08/03/2017
- Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Dan Doyle, 08/03/2017
- Re: [perfsonar-user] mesh config, bidirectional tests and storing results locally - can't wrap my head around what changed, Dan Doyle, 08/03/2017
Archive powered by MHonArc 2.6.19.