perfsonar-user - Re: [perfsonar-user] Issues with pscheduler and limits

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Issues with pscheduler and limits

From: Daniel Spisak <>
To: Mark Feit <>
Cc:
Subject: Re: [perfsonar-user] Issues with pscheduler and limits
Date: Thu, 2 Mar 2017 16:26:11 -0800
Ironport-phdr: 9a23:UuvkPhMok8g+lDVnKIsl6mtUPXoX/o7sNwtQ0KIMzox0I/v5rarrMEGX3/hxlliBBdydsKMZzbGN+Pq8EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQtFiT69bL9oIhi7owrdutcZjIB/Nqs/1xzFr2dSde9L321oP1WTnxj95se04pFu9jlbtuwi+cBdT6j0Zrw0QrNEAjsoNWA1/9DrugLYTQST/HscU34ZnQRODgPY8Rz1RJbxsi/9tupgxCmXOND9QL4oVTi+6apgVQTlgzkbOTEn7G7Xi9RwjKNFrxKnuxx/2JPfbIWMOPZjYq/RYdYWSGxcVchTSiNBGJuxYIkBD+QBM+hWrJTzqUUSohalHwagGPnixyVUinPq36A31fkqHwHc3AwnGtIDqHrao8/zNKcTT++1yLTDwyjbb/NXwjfy8ovIeQ0mrP6RR71wd9DdyVI3FwPElVWfs4/lMiiP2eQVq2iU8uphVeS1hG4iqgF+viOvyt0whYnOg4IY01bJ/jh3zoYyIN23Uk97Ydi8HZtftiGaK4t2Qt45TG1ypCk6zbgGtYajfCgR1JsnxwDQZOGac4iM/B3jTv+dLDd2hHJhebK/ghey8VS7yuHmSsa011BKriRdntnUrHAN0BvT6safSvt6/0eh3yqP2xrP5eFDJEA5k7fQJZ05wrMoiJYesFjPEy3zlUX4j6KZbVko9+2n5uj7frnpupqROJNohgz9N6kjn9KzDOo2PwUIQmOV4/6z1Kf58k38WLhKjuM5kq3esJ3CIMQUvK+5AwtM3oY+8RmzEy6q0dsGkXQJIl9JYh2Hj4/uO1HBJPD3E+2zjEirkDdu3/zGP7vhDYvRLnXbjrvsfKpx5kxZxQo9zt1Q/I5YB7QOLf7vRkP+qNnVAQMlPwG3xuvqDctx240QVG6XB6+WKqLSsVuG5uI1JOmMYZcYtyvnJPgj/fLulmc5mV4Gfaaz3psac2q3Hux+I0qEYXvshM0OEWILvgUiV+zmkkGNUTlWZ3qqRaIz+ik7CJ66DYfEXo2tmKKO3D21Hp1NYWBGDEqDEW3xe4WZQPcDdjiSL9RlkjwFTrihV5Qh2Q+0uA/7zbpnMvTb+jcetZ39yNh5+fffmg8v+jxpXIyh1DSoRnt31k0FRiN+iKVxrE1h4laFzaVihfFET5pe6+4fASkgMpuJ4+V8Ct32EjnZf8yASx7yTtygRyopT8k4wPcHe15xCtOrixHIxGyhBLpDxO/DP4A97q+Jhyu5HM160XuTkfB511Q=

I'm on an airplane currently, but I can get you the output of the tests I did in a few hours. In short, I was seeing "failed" tests for RTT and trace. My manual tests were latency and throughput which I got to succeed after my limits file edit. I tried running the RTT and trace tests manually and they run, but fail with no output to suggest what the issue was. I then tried looking up the perfsonar 3.5.x to pscheduler command reference and was able to execute 3.5.x style test commands which succeeded. All hosts were able to ping each other and were living on the same virtual network segment inside of virtualbox. I would love to do a video chat with you on this, I'll ping you privately about it.

On Mar 2, 2017 3:20 PM, "Mark Feit" <> wrote:

Daniel Spisak writes:

So I have made some progress on my deployment and come across a problem with pscheduler and the limits.

I tried to edit the limits files bogon block to exclude the 172 space since I use this in my test build. However, anytime I tried running a pscheduler command manually it would fail it and claim it was still matching against the bogon block.

Can you post the exact output from that and a copy of your limits file and let me know what version of the software you’re running?

The default limit file as it stands now (see the sources at https://github.com/perfsonar/toolkit/blob/master/etc/default_service_configs/pscheduler_limits.conf ) has an exclusion for the 172 block, so if you’re using an older one that you changed, you might want to incorporate the new stuff.

The “bogons” identifier should contribute to the “hostiles” classification, which is probably what you’re tripping. Also be aware that changes to the limits files can take as long as 15 seconds to be noticed.

So I said screw it and removed the entire bogon block from the limits JSON and then I could run manual pscheduler commands. However, when I do a pscheduler monitor, it shows all of the scheduled tests as failing and I can see my manual tests succeeding.

If you’re seeing runs where the state is “Non-Starter,” that’s an indication that the scheduler wanted to schedule the run and couldn’t because there wasn’t time available on all participants’ timelines _or_ because the run wouldn’t have passed the limits at the proposed times. (Limits get checked once when the task is first submitted and again when each run is scheduled. There are a few types of limits that can be used to deny scheduling at certain times.) Because pScheduler has a timeline (where Regular Testing would do its own thing and try to make measurements when their times came up), runs are laid out up to 24 hours in advance and must meet the limits when they’re scheduled.

If you’re seeing runs with a state of “Failed,”, that means the task passed the limits when it was first submitted to the system and each of the runs passed the limits again when scheduled _and_ the run started, but something unrelated to limits caused it to fail. There is sufficient information stored in the system to determine what happened and command-line tools to pull the information out.

If you’d like to drop me a line privately, we can jump on video and have a look at what’s happening on your system.

--Mark

[perfsonar-user] Issues with pscheduler and limits, Daniel Spisak, 03/01/2017
- Re: [perfsonar-user] Issues with pscheduler and limits, Mark Feit, 03/02/2017
  - Re: [perfsonar-user] Issues with pscheduler and limits, Daniel Spisak, 03/03/2017

List archive

Re: [perfsonar-user] Issues with pscheduler and limits