Skip to Content.
Sympa Menu

grouper-users - [grouper-users] RE: PSP bulkSync problems

Subject: Grouper Users - Open Discussion List

List archive

[grouper-users] RE: PSP bulkSync problems


Chronological Thread 
  • From: Dave Churchley <>
  • To: "Bee-Lindgren, Bert A" <>, "" <>
  • Subject: [grouper-users] RE: PSP bulkSync problems
  • Date: Tue, 15 Dec 2015 19:34:20 +0000
  • Accept-language: en-GB, en-US
  • Authentication-results: spf=none (sender IP is ) ;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:23

1.       The messages are spread throughout the sync, perhaps including some little clusters. We’re not aware of any particular LDAP hiccups at those times.

2.       From what we’ve seen so far with per-group syncs, there doesn’t appear to be much consistency to the failures. Do you know if it’s possible to do a per-stem sync?

4.    Errors started almost immediately, not clustered around 6 hours.

 

We now actually think there might be a wider issue with LDAP lookups in our AD – some other services also seem to be affected. At the moment we don’t know the cause so we’re not sure if there’s actually anything we could in Grouper to help the matter or whether we’d see similar issues if we rolled back to our old version.

 

This afternoon we’ve also been trying the realtime provisioning and run into some unexpected issues there, too. I’ll post a separate question to the list about that as it doesn’t relate to bulkSync.

 

Thanks

Dave

 

 

From: Bee-Lindgren, Bert A [mailto:]
Sent: 15 December 2015 17:52
To: Dave Churchley <>;
Subject: Re: PSP bulkSync problems

 

Continuing...

 

1) Failing all the time would certainly be better in terms of finding the problem. Are the messages spread evenly though your multi-hour run of the bulkSync? Any chance there were LDAP hiccups during that time?

 

2) Yes, I'd expect either the current group's sync or (probably) the entire bulkSync to fail. I'd think this would be desired compared to mistaken member removals. Combining with per-group sync's could help us start working towards understanding: is it (particular) subject lookups that fail within a group or does a group that has all subject lookups for one sync sometimes fail on other runs?

 

4) This caching time could be interesting if the errors started or were clustered around that 6hours cache ttl.

 


From: Dave Churchley <>
Sent: Tuesday, December 15, 2015 8:31 AM
To: Bee-Lindgren, Bert A;
Subject: RE: PSP bulkSync problems

 

Thanks Bert

 

In response to your questions/comments:

 

1.       Yes, the “Unable to resolve identifier” messages are all followed by a subject id. I can find them in gsh and it seems that the bulkSync can find them some of the time as well. These subjects are in more than one group and we’re only seeing the “Unable to resolve identifier” for some of their groups and not others.

2.       With regards to the it looks to me like this would cause the whole bulkSync process to terminate if it couldn’t find a subject. Have I understood that right?

3.       This is good news!

4.       The only change since I sent you the files before is in the ehcache.xml file. The documentation at https://spaces.internet2.edu/display/Grouper/Grouper+Provisioning#GrouperProvisioning-ConfigureSubjectAPICache seems to be a bit vague but, based on the advice at https://www.youtube.com/watch?v=D4n7BUzVjC8&t=8m20s, I’ve set timeToIdleSeconds and timeToLiveSeconds to be 6 hours (which was how long the bulkSync was taking in testing).

spaces.internet2.edu

Introduction Grouper groups, memberships, and stems may be provisioned using the provisioning service provider (PSP, formerly known as LDAPPC-NG).

 

 

Thanks
Dave

 

From: Bee-Lindgren, Bert A []
Sent: 15 December 2015 12:29
To: Dave Churchley <>;
Subject: Re: PSP bulkSync problems

 

A few things/ideas...

1) Those "Unable to resolve identifer" messages should include a subject that could not be found. If that is the case, what happens when you try to findSubject them in gsh?

2) There is code in place to error when subject lookups fail. This is centered around OnNotFound.fail option. I can't find direct reference documenting this option, but this thread indicates that can be added to filter configurations.

 

3) You can sync the groups while realtime provisioning is running. It's remotely possible that a realtime update could be removed by a longer-running, full sync of the group (that read data that had become stale (and was realtime-processed) before the full-sync was complete). However, a) This is unlikely,  b) it would be fixed next time the group was full-sync'ed, and c) is probably better than the current situation.

 

4) Have there been any changes in your config files? It will help continue to investigate what is happening?

 

 

Hoping this helps,

  Bert

________________________________________
From: Dave Churchley <>
Sent: Tuesday, December 15, 2015 5:46 AM
To: Bee-Lindgren, Bert A;
Subject: PSP bulkSync problems

Good morning

We went ahead and moved to v2.2.2 over the weekend and yesterday started off a psp bulkSync with AD. This ran for over 24 hours with no signs of reaching the end.

We've also noticed about 200 instances of the following error which we're assuming is being caused by a timeout:
WARN  PsoReference.getReferences(137) -  - Pso reference 'membersLdap' - Unable to resolve identifier

This has meant that some users have been removed from groups incorrectly - the users are in AD and haven't been removed from all the groups they're a member of. We've now stopped the bulkSync before it removes more people from groups incorrectly.

We're considering our options for what to do next. Any ideas would be welcome!

At the moment, the plan is to individually sync the groups that we know have been affected by the above issue and then switch on real time psp provisioning and allow users back onto the system. There are still a number of large groups (15000+ members) that we know the bulkSync didn't get around to syncing. Is it safe to try to sync these groups individually whilst the real time provisioning is running?

I'd appreciate any help or advice anyone can offer.

Thanks
Dave

>-----Original Message-----
>From: [mailto:grouper-users-
>] On Behalf Of Dave Churchley
>Sent: 03 December 2015 16:36
>To: Bee-Lindgren, Bert A <>; grouper-
>
>Subject: [grouper-users] RE: Very slow PSP bulkSync in 2.2.2
>
>Thanks Bert
>
>Any help you can give us would be greatly appreciated.
>
>For info, we're using a MySQL database and have about 7000 groups
>provisioning to AD. I've attached the relevant conf files.
>
>Thanks
>Dave
>
>>-----Original Message-----
>>From: Bee-Lindgren, Bert A []
>>Sent: 03 December 2015 15:55
>>To: Dave Churchley <>; grouper-
>>
>>Subject: Re: Very slow PSP bulkSync in 2.2.2
>>
>>Dave,
>>
>>I'll be happy to help with this. I'm focusing on the next generation of PSP, so
>>I'm only a 70% expert in the 2.2.2 version. Therefore, it may take a few
>rounds
>>back and forth, but I'll do my best to reach your historical performance.
>>
>>Let's start here:
>>What are your LDAP-connection parameters (masking private details, of
>>course)?
>>
>>Sincerely,
>>  Bert
>>
>>
>>________________________________________
>>From: <grouper-users-
>>> on behalf of Dave Churchley
>><>
>>Sent: Thursday, December 3, 2015 10:21 AM
>>To:
>>Subject: [grouper-users] Very slow PSP bulkSync in 2.2.2
>>
>>Good afternoon
>>
>>I'm working on Grouper at Newcastle University where we've been running
>>1.6.3 for many years now. We're planning to upgrade to 2.2.2 soon, with real
>>time PSP provisioning to AD backed up with a bulkSync overnight.
>>
>>In the testing we've done so far, the bulkSync consistently takes about 6
>>hours to run which is a lot longer than we'd like. We've tried various things to
>>speed this up, including increasing the available memory and adjusting the
>>settings in ehcache.xml (as described in training video 4,
>>https://www.youtube.com/watch?v=D4n7BUzVjC8).
>>
>>Our current 1.6.3 setup takes about 2 hours to provision to AD using the old
>>ldappc method, so 6 hours seems like a very long time to us. Does anyone
>>have any suggestions of how we can reduce this, please?
>>
>>Thanks in advance for any help!
>>
>>Thanks
>>Dave Churchley
>>Newcastle University
>>




Archive powered by MHonArc 2.6.16.

Top of Page