grouper-dev - RE: [grouper-dev] ldap errors and real time provisioning

Subject: Grouper Developers Forum

List archive

RE: [grouper-dev] ldap errors and real time provisioning

From: Chris Hyzer <>
To: "Michael R. Gettes" <>, Shilen Patel <>
Cc: Tom Zeller <>, Grouper Dev <>
Subject: RE: [grouper-dev] ldap errors and real time provisioning
Date: Wed, 20 Jun 2012 13:24:35 +0000
Accept-language: en-US

With Penn's loader jobs we get data from the warehouse for some groups, and
in the Warehouse, sometimes people have a temporary pennid before one is
assigned to them permanently or maybe there is a conflict that needs to be
resolved. In any case, we get subject not found in grouper for those few
people, and it blocks that one loader job. I think that is a case where
someone could say, I know we sometimes have subject not found problems, Im ok
with it, just ignore it, the next time the job runs it will resolve itself.

Thanks,
Chris
________________________________________
From:

[]
on behalf of Michael R. Gettes
[]
Sent: Wednesday, June 20, 2012 8:54 AM
To: Shilen Patel
Cc: Tom Zeller; Grouper Dev
Subject: Re: [grouper-dev] ldap errors and real time provisioning

excellent point on the multiple targets. We have a homegrown replication
environment I hope to divest ourselves of in the next 2 years. Experience
indicates treating each target separately is desirable. Block the target
having trouble but not the others.

As for the not all subjects case for Duke, I am sure there is a valid reason
for wanting to do what you do but I respectfully question supporting this
case. If you are wanting to slice communities to different targets then
there should be valid data to determine how to slice it. Then, if a not
found condition is occurs, it is an error. Did you do at Duke because the
data wasn't available to Grouper to slice the communities to the various
targets?

These issues all boil down to data integrity and a "replication mindset".
Violating replication with all sorts of exceptions really isn't replication.
I know, some will say this is provisioning, not replication. Yes, I can
agree it falls under the notion of provisioning as this is the larger
function you are trying to achieve but the mechanism employed is a form of
replication and I believe replication is much closer to a binary function of
"it works or it doesn't". I also believe this mindset is easier to support
functionally and technically (explaining to people and writing the code).

/mrg

On Jun 20, 2012, at 8:35, Shilen Patel wrote:

> I had an action item from the last call to comment on error handling. I
> agree with where this is going and I think having a way of blocking rather
> than ignoring errors would be very helpful. This should make incremental
> provisioning more reliable and therefore have less need for running the
> bulk sync often, which can be very expensive depending on the number of
> objects that you have in Grouper and the performance of your target.
>
> I would just add that there are probably some errors that should be
> treated differently than other errors. At Duke, we use the change log to
> directly provision 3 different clusters of directories and not all of them
> have all subjects. So for instance, it would be nice if a subject not
> found in a provisioning target could either be retried or ignored
> depending on configuration.
>
> Also, if you have multiple provisioning targets configured with the PSP,
> would an error on one target end up blocking updates to all other targets
> until the one target is fixed? I suppose that would depend on whether the
> PSP uses one change log consumer vs multiple? Is that possible? Along
> those lines, it would be nice if these options could be different based on
> the target.
>
> Thanks!
>
> -- Shilen
>
>
> On 6/19/12 5:37 PM, "Tom Zeller"
> <>
> wrote:
>
>> I'll commit retryOnError = false to grouper-loader.properties for now.
>>
>> Thanks.
>>
>> On Tue, Jun 19, 2012 at 12:54 PM, Michael R. Gettes
>> <>
>> wrote:
>>> I recommend retryOnError be false by default. RetryOnError true, I
>>> believe, should be something someone consciously changes and clearly
>>> documented. I won't put up a fight if others feel strongly for the
>>> opposite.
>>>
>>> /mrg
>>>
>>> On Jun 19, 2012, at 13:24, Tom Zeller wrote:
>>>
>>>> I am adding a retryOnError option to the psp change log consumer, what
>>>> should the default be ?
>>>>
>>>> Currently, retryOnError is false, meaning do not retry a change log
>>>> entry.
>>>>
>>>> Should retryOnError be true for 2.1.1 ?
>>>>
>>>> Thanks,
>>>> TomZ
>>>>
>>>> On Thu, May 31, 2012 at 1:56 PM, Michael R. Gettes
>>>> <>
>>>> wrote:
>>>>> See https://bugs.internet2.edu/jira/browse/GRP-799
>>>>>
>>>>> I hope it is sufficient.
>>>>>
>>>>> /mrg
>>>>>
>>>>> On May 31, 2012, at 12:45, Chris Hyzer wrote:
>>>>>
>>>>>> The change log is designed for this behavior if you implement the
>>>>>> consumer this way (i.e. after Michael submits a jira, TomZ could put
>>>>>> that switch in). Just return the last index of the change log that
>>>>>> was processed, and it will do nothing until the next minute, and will
>>>>>> try that same record again. Maybe if we want an error queue that
>>>>>> could be built into the change log so other consumers could benefit
>>>>>> as well. If TomZ does implement Michael's request, it would probably
>>>>>> be nice if the full sync would somehow update the current change log
>>>>>> index to the max index so if real-time was stuffed due to missing
>>>>>> subject that it would startup again after the full sync at the point
>>>>>> where the full sync started... if the incrementals were stalled for
>>>>>> some reason (for longer than a certain period of time), you would be
>>>>>> notified I believe via the grouper diagnostics if you have that
>>>>>> hooked up to nagios or whatever...
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From:
>>>>>>
>>>>>>
>>>>>> [mailto:]
>>>>>> On Behalf Of
>>>>>> Tom Zeller
>>>>>> Sent: Thursday, May 31, 2012 12:08 PM
>>>>>> To: Grouper Dev
>>>>>> Subject: Re: [grouper-dev] ldap errors and real time provisioning
>>>>>>
>>>>>> Submit a bug or improvement to jira so we can estimate
>>>>>> implementation.
>>>>>>
>>>>>> For this particular scenario, I think most of the work involves
>>>>>> defining "failure", which will most likely be some sort of
>>>>>> javax.naming.NamingException. The simplest thing to do may be to
>>>>>> (block and) retry any NamingException. Another option may be to make
>>>>>> decisions based on the error message of the NamingException.
>>>>>>
>>>>>> The configuration should probably reside in
>>>>>> grouper-loader.properties,
>>>>>> near other change log consumer settings. Perhaps a toggle,
>>>>>> onNamingException = retry | ignore.
>>>>>>
>>>>>> Right now, NamingExceptions are ignored, meaning they are logged and
>>>>>> the next change log record is processed.
>>>>>>
>>>>>> Or, maybe the configuration property should consist of actions
>>>>>> followed by comma separated exceptions or error messages
>>>>>>
>>>>>> retry=NamingException, commit failed
>>>>>> ignore=AttributeInUseException
>>>>>>
>>>>>> Not sure about that last one, hopefully someone has a better idea.
>>>>>>
>>>>>> TomZ
>>>>>>
>>>>>> On Thu, May 31, 2012 at 10:27 AM, Michael R. Gettes
>>>>>> <>
>>>>>> wrote:
>>>>>>> What can I do to convince you to, in the very least, provide an
>>>>>>> option to block on failures? It is how I would want to run it.
>>>>>>>
>>>>>>> /mrg
>>>>>>>
>>>>>>> On May 31, 2012, at 10:53, Tom Zeller wrote:
>>>>>>>
>>>>>>>> For 2.1.0, I decided to avoid blocking and rely on full
>>>>>>>> synchronizations, which may be scheduled in
>>>>>>>> grouper-loader.properties,
>>>>>>>> to repair real time provisioning failures.
>>>>>>>>
>>>>>>>> When I was dealing with error handling in the psp change log
>>>>>>>> consumer,
>>>>>>>> I thought of the Northern Exposure episode where the computer
>>>>>>>> prompts
>>>>>>>> "Abort, Retry, Fail ?" and the user is unable to answer (freaks
>>>>>>>> out)
>>>>>>>> and turns off the computer.
>>>>>>>>
>>>>>>>> I felt that blocking change log processing was probably the least
>>>>>>>> desirable option.
>>>>>>>>
>>>>>>>> A failure queue is interesting, but it may be important to preserve
>>>>>>>> the order of operations, so we'll need to think that through. We
>>>>>>>> might
>>>>>>>> need to configurably map provisioned target exceptions to abort |
>>>>>>>> retry | fail | ignore handling.
>>>>>>>>
>>>>>>>> In this particular scenario, we would need to map the "commit
>>>>>>>> failed"
>>>>>>>> ldap error to "retry", probably waiting some configurable interval
>>>>>>>> (60s, 5min, ?) before retrying.
>>>>>>>>
>>>>>>>> TomZ
>>>>>>>>
>>>>>>>> On Thu, May 31, 2012 at 9:30 AM, Gagné Sébastien
>>>>>>>> <>
>>>>>>>> wrote:
>>>>>>>>> I was asking myself the same question. Maybe a missing group in
>>>>>>>>> the LDAP, it could be manually deleted by another application.
>>>>>>>>> Maybe a missing subject ? (but that would be caught in Grouper
>>>>>>>>> before the LDAP request).
>>>>>>>>>
>>>>>>>>> We are still experimenting with the provisioning and the grouper
>>>>>>>>> loader and we had many occasion where data didn't match (login vs
>>>>>>>>> full DN). That might affect my current impression. When the
>>>>>>>>> configuration is done correctly I suppose the data will always
>>>>>>>>> match.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Message d'origine-----
>>>>>>>>> De : Michael R. Gettes
>>>>>>>>> [mailto:]
>>>>>>>>> Envoyé : 31 mai 2012 10:17
>>>>>>>>> À : Gagné Sébastien
>>>>>>>>> Cc : Lynn Garrison; Grouper Dev
>>>>>>>>> Objet : Re: [grouper-dev] ldap errors and real time provisioning
>>>>>>>>>
>>>>>>>>> what kind of "bad data" are you considering?
>>>>>>>>>
>>>>>>>>> /mrg
>>>>>>>>>
>>>>>>>>> On May 31, 2012, at 9:56, Gagné Sébastien wrote:
>>>>>>>>>
>>>>>>>>>> I agree that would be an interesting feature, but the reaction
>>>>>>>>>> should
>>>>>>>>>> depend on the LDAP error. Some errors could be because of bad
>>>>>>>>>> data in
>>>>>>>>>> one record and these shouldn't block the provisioning of all the
>>>>>>>>>> other
>>>>>>>>>> changelog. I think this is where an error queue might be useful;
>>>>>>>>>> you
>>>>>>>>>> try them all and if one has bad data, it will be in the error
>>>>>>>>>> queue to
>>>>>>>>>> retry later, but all the others will still complete
>>>>>>>>>> successfully. Of
>>>>>>>>>> course if the ldap server has a problem you'll have a huge error
>>>>>>>>>> queue, but they would have been waiting in the changelog anyway.
>>>>>>>>>> I
>>>>>>>>>> think it's important for the error queue to be retried
>>>>>>>>>> periodically
>>>>>>>>>>
>>>>>>>>>> There's the PSP daily full sync that kinda solves this problem.
>>>>>>>>>> If you enable it, all the failed transactions will be synched
>>>>>>>>>> later when the ldap server will be back online. I believe this
>>>>>>>>>> sync isn't based on the changelog but on a diff between Grouper
>>>>>>>>>> and the LDAP.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Message d'origine-----
>>>>>>>>>> De :
>>>>>>>>>>
>>>>>>>>>> [mailto:]
>>>>>>>>>> De la part de Michael
>>>>>>>>>> R.
>>>>>>>>>> Gettes Envoyé : 31 mai 2012 09:31 À : Lynn Garrison Cc : Grouper
>>>>>>>>>> Dev
>>>>>>>>>> Objet : Re: [grouper-dev] ldap errors and real time provisioning
>>>>>>>>>>
>>>>>>>>>> +1 to this request. failures should block processing. i view
>>>>>>>>>> this similar to data replication - the idea is to keep the data
>>>>>>>>>> in sync and if there are problems in the sync process, they
>>>>>>>>>> should block, or, in the very least, be placed into an error
>>>>>>>>>> queue. I hate the error queue notion but I do realize lots of
>>>>>>>>>> products do things this way these days.
>>>>>>>>>>
>>>>>>>>>> /mrg
>>>>>>>>>>
>>>>>>>>>> On May 31, 2012, at 9:26, Lynn Garrison wrote:
>>>>>>>>>>
>>>>>>>>>>> Is there a way to stop the real time provisioning if there
>>>>>>>>>>> are problems with the ldap server? We moved to testing real
>>>>>>>>>>> time provisioning with openldap. During the provisioning
>>>>>>>>>>> testing, the file system became full and ldap updates started
>>>>>>>>>>> returning errors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2012-05-31 09:15:16,001: [DefaultQuartzScheduler_Worker-8]
>>>>>>>>>>> ERROR BaseSpmlProvider.execute(388) - - Target 'psp' - Modify
>>>>>>>>>>> XML:
>>>>>>>>>>> <modifyResponse xmlns='urn:oasis:names:tc:SPML:2:0'
>>>>>>>>>>> status='failure'
>>>>>>>>>>> requestID='2012/05/31-09:15:15.993' error='customError'>
>>>>>>>>>>> <errorMessage>[LDAP: error code 80 - commit
>>>>>>>>>>> failed]</errorMessage>
>>>>>>>>>>> </modifyResponse>
>>>>>>>>>>>
>>>>>>>>>>> psp continued to process the change log events. By the
>>>>>>>>>>> time we realized what was happening, all the change log events
>>>>>>>>>>> had been processed and only have the members were provisioned to
>>>>>>>>>>> the group.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lynn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: [grouper-dev] ldap errors and real time provisioning, Tom Zeller, 06/19/2012
- Re: [grouper-dev] ldap errors and real time provisioning, Michael R. Gettes, 06/19/2012
  - Re: [grouper-dev] ldap errors and real time provisioning, Tom Zeller, 06/19/2012
    - Re: [grouper-dev] ldap errors and real time provisioning, Shilen Patel, 06/20/2012
      - Re: [grouper-dev] ldap errors and real time provisioning, Michael R. Gettes, 06/20/2012
        
        RE: [grouper-dev] ldap errors and real time provisioning, Chris Hyzer, 06/20/2012
        
        Re: [grouper-dev] ldap errors and real time provisioning, Shilen Patel, 06/20/2012
        
        Re: [grouper-dev] ldap errors and real time provisioning, Tom Zeller, 06/20/2012
      - Re: [grouper-dev] ldap errors and real time provisioning, Tom Zeller, 06/20/2012

List archive

RE: [grouper-dev] ldap errors and real time provisioning