Skip to Content.
Sympa Menu

grouper-users - RE: [grouper-users] database loader job with START_TO_START_INTERVAL

Subject: Grouper Users - Open Discussion List

List archive

RE: [grouper-users] database loader job with START_TO_START_INTERVAL


Chronological Thread 
  • From: Chris Hyzer <>
  • To: Scott Koranda <>
  • Cc: grouper-users <>
  • Subject: RE: [grouper-users] database loader job with START_TO_START_INTERVAL
  • Date: Thu, 24 Jul 2014 19:07:54 +0000
  • Accept-language: en-US

Ok, good. I think this direction makes sense. :)

Thanks,
Chris

-----Original Message-----
From: Scott Koranda
[mailto:]

Sent: Thursday, July 24, 2014 3:03 PM
To: Chris Hyzer
Cc: grouper-users
Subject: Re: [grouper-users] database loader job with START_TO_START_INTERVAL

On Wed, Jul 16, 2014 at 8:22 PM, Chris Hyzer
<>
wrote:
> Quartz is 1.6.0
>
> The Grouper job implements StatefulJob so multiple wont run at once:
>
> /**
> * class which will run a loader job
> * implements StatefulJob so multiple dont run at once
> */
> public class GrouperLoaderJob implements Job, StatefulJob {

Excellent.

Not that I did not believe you, but to be sure I tested this and verified it.

So instead of using monit and a gsh script I will revert to just using
CRON for the loader job.

I see that you added some detail to the wiki. Thanks. I have edited it
just a bit.

>
> Just curious, what is it in the three loader jobs that requires them to run
> concurrently? I would think if new groups are seldom added and the third
> job uses a previous job's groups as members, it would generally still work
> fine and would catch up in the next hour, right? All the memberships in
> the existing groups would work fine. Can you describe a case where you
> need the dependency in the jobs?

The issue was not dependency, just an attempt to reduce the latency
for the case when new groups and memberships are added in the SOR. The
idea was to get the new 'all' group and its memberships into the
change log as quickly as possible after the information shows up in
the SOR. Since I can rely on CRON to not run multiple copies of the
same job I can simply increase the frequency at which the job(s) will
run and on those days where there is an order or magnitude more
provisioning to be done a few cycles will simply be skipped (or
delayed, which is also fine).

>
> Im not aware of other people having this issue, but I wonder if we had a
> job depend on another job if we could have it sleep if it noticed the other
> job was running until it was done... I would think this would be a rare
> requirement, but Im interested to hear about it :) Or another way to chain
> them together.

I think it will depend on the extent to which Grouper is part of a
"real time" provisioning system from a SOR to another system (like
Google groups) where the requirement is loosely "as soon as the group
is available in the SOR I want to be able to see it in Google and send
mail to it".

Thanks,

Scott



>
> Thanks,
> Chris
>
> -----Original Message-----
> From: Scott Koranda
> [mailto:]
> Sent: Wednesday, July 16, 2014 2:30 PM
> To: Chris Hyzer; grouper-users
> Subject: Re: [grouper-users] database loader job with
> START_TO_START_INTERVAL
>
> Hi,
>
> On Wed, Jul 16, 2014 at 1:04 PM, Chris Hyzer
> <>
> wrote:
>> Why do you not just have loader jobs with a quartz-cron schedule to run
>> these hourly?
>
> My concern is that on some days due to new courses being added or
> removed the load will be heavy and take much longer than an hour to
> complete.
>
> I do not know if the loader scheduler is sophisticated enough to not
> run a second instance of a loader job when one is already running, nor
> how gracefully the race condition would be handled by Grouper.
>
>> Sorry, I know you explained it, but can you expand? I think quartz will
>> not run the same job twice if already running.
>
> Can you tell me precisely which version of Quartz is used in Grouper
> 2.1.5? I will then try to read up and determine what the expected
> behavior is. A pointer to the right place in the source would also be
> helpful so I can see what the implementation looks like (unless you
> already have this documented, in which case I apologize--I looked for
> it but could not find it so please point me to it).
>
> If Quartz will not run the same job twice if already running then that
> will really help (and I will document it on the wiki).
>
>> Also, for the dependencies, if the students job isn’t done when the
>> instructors job runs, it shouldn’t really matter since it just adds groups
>> to other groups and will catch up if there is a race condition in the next
>> hour...
>
> It has more to do with latency--if I know the jobs run in series
> without any delay in between them then the latency is just that much
> reduced and the changes make it to the change log and the custom
> change log consumer faster.
>
>> once the jobs are setup they should run quickly should generally will be
>> done by the time the next one is scheduled.
>
> My testing shows it depends on the number of rows that need to be
> added and deleted. During a term turnover it can be a substantial
> increase in time.
>
> Thanks,
>
> Scott
>
>>
>> Thanks,
>> Chris
>>
>> -----Original Message-----
>> From: Scott Koranda
>> [mailto:]
>> Sent: Tuesday, July 15, 2014 4:41 PM
>> To: Chris Hyzer
>> Cc: grouper-users
>> Subject: Re: [grouper-users] database loader job with
>> START_TO_START_INTERVAL
>>
>> Hi,
>>
>> I have decided to take this outside of the loader and instead use the
>> 'monit' utility to check once every hour if a GSH script is running.
>> The GSH script runs the 3 loader jobs in series that together
>> provision the groups we need provisioned (students, instructors, then
>> 'all', much like the Penn example in the documentation).
>>
>> monit will check every hour if the GSH script is running and if not it
>> will start it again.
>>
>> Rather than creating, running, and then destroying the loader jobs on
>> the fly I prefer to use loaderRunOneJob() for jobs that are
>> permanently saved in Grouper. The issue then is to make sure the
>> loader never actually runs the jobs.
>>
>> I have tried to do that by setting the grouperLoaderQuartzCron for the
>> jobs to "0 0 0 * * ? 2099" so that, theoretically, they should only
>> run in the year 2099 (by which time I hope to be retired and no longer
>> responsible for any group provisioning).
>>
>> Is there any reason to expect that Grouper and the loader will not
>> respect that cron configuration?
>>
>> Thanks,
>>
>> Scott
>>
>> On Wed, Jul 9, 2014 at 2:08 PM, Scott Koranda
>> <>
>> wrote:
>>> Understood.
>>>
>>> This issue is that we want the loader jobs to run as often as possible
>>> to help beat down provisioning latency. But at certain times of the
>>> year (when semesters turn over) the amount of work, and hence the
>>> amount of time it takes for the loader job to run, spikes. The danger
>>> then is that two instances run at the same time.
>>>
>>> I do not see how to reconcile those requirements/needs with CRON so I
>>> was testing the START_TO_START_INTERVAL functionality, but I did not
>>> expect random start times.
>>>
>>> I appreciate any recommendations you can make.
>>>
>>> Thanks,
>>>
>>> Scott
>>>
>>> On Wed, Jul 9, 2014 at 2:03 PM, Chris Hyzer
>>> <>
>>> wrote:
>>>> Hmmmm, yes, you would :)
>>>>
>>>> For a while now I have always used cron scheduling. Pick a random time
>>>> for it to run and run it every hour at that time (so they don’t all run
>>>> at the same time). Then you can have a better idea about this :)
>>>>
>>>> Thanks,
>>>> Chris
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Scott Koranda
>>>> [mailto:]
>>>> Sent: Wednesday, July 09, 2014 3:01 PM
>>>> To: Chris Hyzer
>>>> Cc: grouper-users
>>>> Subject: Re: [grouper-users] database loader job with
>>>> START_TO_START_INTERVAL
>>>>
>>>> Thanks.
>>>>
>>>> If I kick one off manually do I have to worry about the possibility of
>>>> two running simultaneously, or will having one instance running
>>>> prevent the other one from running?
>>>>
>>>> Thanks,
>>>>
>>>> Scott
>>>>
>>>> On Wed, Jul 9, 2014 at 1:58 PM, Chris Hyzer
>>>> <>
>>>> wrote:
>>>>> It doesn’t necessarily start when the loader starts:
>>>>>
>>>>> //start time is the interval seconds / 5, rand
>>>>> int startSeconds = (int)(Math.random() * intervalSeconds);
>>>>> Date startTime = new Date(System.currentTimeMillis() +
>>>>> (startSeconds*1000));
>>>>>
>>>>> Don’t want all START_TO_STARTs to start when the loader starts or you
>>>>> could have performance problems :)
>>>>>
>>>>> If you want to kick one off manually, you can do that from GSH:
>>>>>
>>>>> https://spaces.internet2.edu/display/Grouper/GrouperShell+%28gsh%29#GrouperShell%28gsh%29-Loader
>>>>>
>>>>> Thanks,
>>>>> Chris
>>>>>
>>>>> -----Original Message-----
>>>>> From:
>>>>>
>>>>>
>>>>> [mailto:]
>>>>> On Behalf Of Scott Koranda
>>>>> Sent: Wednesday, July 09, 2014 1:46 PM
>>>>> To: grouper-users
>>>>> Subject: [grouper-users] database loader job with
>>>>> START_TO_START_INTERVAL
>>>>>
>>>>> Hi,
>>>>>
>>>>> I created a loader job that uses START_TO_START_INTERVAL with an
>>>>> interval of 3600 seconds.
>>>>>
>>>>> My understanding is that when I restart the loader process the loader
>>>>> job should start immediately, and then run again one hour after it
>>>>> completes.
>>>>>
>>>>> Is that correct?
>>>>>
>>>>> I do not see any evidence in grouper_error.log (at the INFO level)
>>>>> that the loader started the job after it was restarted. Should I?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Scott



Archive powered by MHonArc 2.6.16.

Top of Page