Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Problems debugging pscheduler: "Run was preempted."

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Problems debugging pscheduler: "Run was preempted."


Chronological Thread 
  • From: Casey Russell <>
  • To: Mark Feit <>
  • Cc: Brian Candler <>, "" <>
  • Subject: Re: [perfsonar-user] Problems debugging pscheduler: "Run was preempted."
  • Date: Thu, 5 Sep 2019 08:37:58 -0500

Mark and Brian,

     Sorry to tag onto this (perhaps unnecesarily), but I've been chasing a problem since about 8/21 when my nodes upgraded to 4.2.0.  A question perhaps for Brian, What were the first visible signs you started chasing here?  Since the upgrade my throughput tests have been wildly unreliable.  I've only been able to chase it sporadically since I've had some unplanned leave time in the last two weeks.  But is this why you started chasing this particular problem yourself?  Or should I keep looking into my problem separately?

image.png



Sincerely,
Casey Russell
Network Engineer
KanREN
phone785-856-9809
2029 Becker Drive, Suite 282
Lawrence, Kansas 66047
XSEDE Campus Champion
linkedin twitter twitter



On Thu, Sep 5, 2019 at 7:17 AM Mark Feit <> wrote:

(Sorry; I thought I’d sent this, but it got buried in a bunch of other windows.  I’ll reply to your others separately.)

 

Brian Candler writes:

/var/log/pscheduler/pscheduler.log just says:

Sep  1 16:29:24 perf1 runner INFO     6945: Run was preempted.

OK, at this point I'm stumped.  I think I've dug further into the innards of pscheduler than any end-user really is supposed to do.  And all I've found is: both ad-hoc and scheduled tests are being dropped on the floor, with message "Run was preempted.", and I believe this is because run_can_proceed() is returning false.

I think you’ve just passed the programming section of the interview.  ;-)

4.2 includes a new feature in the limit system that allows tasks where there is potential schedule contention (which, for all practical purposes, is throughput) to be prioritized.  This was announced in the release notes and the mechanics are covered in http://docs.perfsonar.net/release_candidates/4.2.0/config_pscheduler_limits.html#priorities-which-runs-happen-and-which-do-not.  The original use cases were to preempt repetitive tasks so ad-hoc testing could happen faster and to allow remote, ad-hoc testing on systems where the repetitive stuff is important.

What wasn’t made clear is that the configuration that ships with the toolkit gives marginally-higher priority to runs of tasks that originate on the local system.  That’s on me since I put it in there.  I may disable that in 4.2.1 so we can regroup.

Since throughput involves two systems, any run will need time scheduled at both ends.  Because the receiving end isn’t where the task originated, the run at that end could end up with a lower priority than some other and will be preempted.  The scheduler attempts to work around this where it can if the task is given enough slip, but on systems with congested schedules, it may not be able to and the run is preempted.  I’m going to have a look at whether or not the first participant should explicitly ask the others for whatever priority it got.  There are some implications around having to trust whoever’s asking for a higher priority not to abuse it, so making a decision on that will require some thought.

Going back to what I said at the start: whilst I would appreciate hints to help fix the specific problem here, I also think that pscheduler could do a better job of reporting problems.  If it decides that it's not a good idea to launch a tool, I think it should say *why* it has decided this.  Or at least document this error message: googling "+pscheduler run was preempted" turns up nothing.

The ”run was preempted” message is new with the priority feature.  I have an informal list of what the task states mean (https://github.com/perfsonar/pscheduler/wiki/Run-States), but no entry in my informal list of error messages (https://github.com/perfsonar/pscheduler/wiki/Error-Messages).  I’ll add something to the latter.  I’m not sure what I could add to the message beyond “run was preempted by a higher-priority task” to make it more useful.

pScheduler stores a lot of diagnostic information that you can get at by running ”pscheduler result --diags <RUN-URL>”.  The limit system diagnostics are shown for the lead participant only, but I can slip a change into 4.2.1 that will add it for the others.

--Mark

 

--
To unsubscribe from this list: https://lists.internet2.edu/sympa/signoff/perfsonar-user



Archive powered by MHonArc 2.6.19.

Top of Page