Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] All/Most pscheduler tasks are Non-Starters

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] All/Most pscheduler tasks are Non-Starters


Chronological Thread 
  • From: Mark Feit <>
  • To: "Strelka, Justin" <>, "" <>
  • Subject: Re: [perfsonar-user] All/Most pscheduler tasks are Non-Starters
  • Date: Thu, 2 May 2024 16:52:20 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=internet2.edu; dmarc=pass action=none header.from=internet2.edu; dkim=pass header.d=internet2.edu; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FOmjsJiF17IrGd0ffwF1UCQ28DNjAaXbou66m2Ng5mI=; b=DCKNLWK5AopU6IP24AklafMTvfOXzQWMXA1kOZp7HeXZ0PonwnCn/Roh3AA7B4/OyBjFrGcb7ahJKCuLYVo9QfWAtbMGtMFdmBebcsZF/cPW/bbFjRSxlwa74Y6LB4O+RU/wsIV8I7Or4M0Dql8e3QF2DeK8RvG9m1xg5YI2jZZqyYTdC45OPkKK2/n7fcuhZNhNnNfRrhHfBIkPyJFn0IaR+6aDDo3poXeJ8eJVAMsPo9tFdrKdJLLix8YGRsGN3aApDP2GXd6IQJy3JyEMWwCv4Sj2PMSOg4Jrj5PSjXijXVsa73gbdi5MV9pce8spfIFbWZPIG3MPX2o0iFW/0A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LUWf/zjK2DRhZpUn1VD/fR7RJsiJKyavGfRPlPJc/bnD11qxivZynuOE31t/DmghYXWRT591KMbL4zMwLcEcZMuKwnahzmr/HyOVnNEIIAeVNsYoWhj2I9EKuoZBnwty+85gmElnNe3R/qlMG7d3OwWomBlx+nT7cpcYLxYamoNrvcZzlfwgAyRWmv7RtChV9FsaCjOkW9YGNRHxYH3flYuj8bcAJxnigGUqXMHRC4C7n0MztTaB0S6nnxSBDadTFqiNX7Sk82yNatm5kxMogsdAYDtZwJUSw7Yv6PPjWioj4QKFx+c7XvRtaXiotWye5qibDmAp/HxEZYGI5FRUnQ==
  • Msip_labels: MSIP_Label_95965d95-ecc0-4720-b759-1f33c42ed7da_Enabled=True;MSIP_Label_95965d95-ecc0-4720-b759-1f33c42ed7da_SiteId=a0f29d7e-28cd-4f54-8442-7885aee7c080;MSIP_Label_95965d95-ecc0-4720-b759-1f33c42ed7da_SetDate=2024-05-01T21:17:43.6419631Z;MSIP_Label_95965d95-ecc0-4720-b759-1f33c42ed7da_ContentBits=0;MSIP_Label_95965d95-ecc0-4720-b759-1f33c42ed7da_Method=Standard

Strelka, Justin writes:

 

I am setting up a new perfsonar host on 5.0.8 on RHEL 9 OS. I have set up some repeating tests on the box and have run into an issue where my tasks are all becoming Non-Starters after a period of time. I have two types of tasks running on the host currently latency and throughput. Below is an outline of both tasks submitted along with a screenshot of the pscheduler monitor command listing all/most new tasks as Non-Starters. Not sure what my issue is here.

 

Non-starters happen when one of the test participants (two of them for throughput, one for latency) is unable to find time on the schedule to run it or it’s being uncooperative for some other reason.  Looking at your monitor screenshot, it doesn’t look like there are schedule conflicts because there would be pending tasks among the non-starters.

 

You can diagnose this by looking at the schedule for a few minutes on either side of right now and picking out non-starting run (I set this one up to fail on purpose):

 

$ pscheduler schedule -PT5M +PT5M

2024-05-02T13:23:45Z - 2024-05-02T13:24:01Z (Non-Starter)

idleex --duration PT15S (Run with tool 'sleep')

https://work/pscheduler/tasks/069104b0-c936-40db-9a00-ea95ed2e0d08/runs/24c027be-f65f-46f6-9960-71b27f97c9a6

 

Then ask pScheduler what happened:

 

$ pscheduler result https://work/pscheduler/tasks/069104b0-c936-40db-9a00-ea95ed2e0d08/runs/24c027be-f65f-46f6-9960-71b27f97c9a6

2024-05-02T13:23:45+00:00 on work with sleep:

 

idleex --duration PT15S

 

Run never started: No times available for this run.

 

Since you have a lot of them, browsing the log (/var/log/pscheduler/pscheduler.log) will give you a bigger picture of why pScheduler is throwing in the towel on these runs:

 

May  2 13:23:46 work scheduler[430664]: scheduler INFO  48917: Posting non-starting run at 2024-05-02T13:23:45Z for task 069104b0-c936-40db-9a00-ea95ed2e0d08: No times available for this run.

 

Other things worth mentioning:

 

You’ll want to add the slip parameter to the schedule blocks in your pSConfig file.  The CLI uses a default slip of PT5M, going under the assumption that people running ad hoc measurements are willing to wait a few minutes for them to run.  pSConfig does not, which means that if it decides to schedule a measurement at 2:30, it happens then or not at all.   More slip decreases the chance of a collision with other runs, so add as much as you’re willing to tolerate.

 

Because running throughput at the same time as latency will distort the latency results, their plugins are configured to avoid each other.  There’s a variant of latency called latencybg that will run in parallel with anything else and produces a stream of results over a longer duration.  The reason we have the two types is a trade-off that lets people running the streaming flavor get results and can ignore distortions from overlapping throughput tests if they choose.  I bring this up because the latency tests you’re running will chew up about 55 seconds of schedule time each.  They’ll run in parallel with each other but are likely to collide with throughputs, especially if there’s no slip.

 

 

Hope that helps get you started.  Let us know what you find out from the logs on your system.

 

--Mark

 

 




Archive powered by MHonArc 2.6.24.

Top of Page