Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] ldap source vs jdbc source

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] ldap source vs jdbc source


Chronological Thread 
  • From: Rob Hebron <>
  • To:
  • Subject: Re: [grouper-users] ldap source vs jdbc source
  • Date: Mon, 14 Mar 2011 19:08:01 +0000

Lynn,

We had a similar problem at Cardiff University a few years ago: we needed Group provisioning to be faster than seemed to be possible using LDAPPC. The solution was to track groups for changes using the changelog process, and send these through to the directory.

We used our existing Identity Management infrastructure to process event sent as a JSON object (code for this is now in Grouper, and documentation is one the Wiki). Grouper runs a thread per changelog consumer, so it would be possible to run multiple threads for any operation (with care). Separate threads for STEM, GROUP and MEMBERSHIP events were what we went for, and we get changes sent through to our LDAP directories within a couple of minutes.

Cardiff doesn't use the ESB consumer exclusively though - in practise Grouper Loader, LDAPC-NG and the ESB changelog consumer are now used in combination.

Rob

On 11/03/11 19:08, Lynn Garrison wrote:
Chris,

On Mar 11, 2011, at 12:55 PM, Chris Hyzer wrote:

You could try Jim Fox's vt-ldap source which has better performance. I think
the link for it is here:

http://staff.washington.edu/fox/grouper/dist/

I need to take a look at that. I know that you are planning on
incorporating it into the standard grouper api package. Will that be 1.7 or
2.0?
How are you loading your group into Grouper? Grouper loader, WS, GSH, API?
15 minutes for 30k members is 30 members/sec which is pretty good I think.


I am using gsh. I wanted to go a quick test. The next step is to
try other ways to load the data.
I cant really comment on ldappcng performance.

For loading into grouper though, it has to do a bunch of work, so even if it
does a subject query for each row, the order of magnitude of the operation
overall will still be the same if you remove the subject query. I would
assume it would be similar for ldappcng as well. However, you will be better
off if you can load by specifying the subjectId and sourceId so it goes to
one source with one query to resolve. If you only specify a netId, and call
idOrIdentifier, and not sourceId, that will query each source (~4?), with two
queries (8 total). The loader is pretty efficient about this.

Generally Grouper is designed assuming there is not a lot of membership
churn, right? I.e. you will have your 30k students in a group, and each day,
you might add or remove a couple hundred, and then a few days a year, you
have +- 5k. Those loads will take a while as you see for the few days where
the new students are entered.

Do you think these performance numbers will be a problem for you?


The provisioning to ldap timing will be a problem for us. We use GPFS
for our all share file system and we control group access to share file
systems with groups in ldap. So all of are groups have to be provisioned to
ldap. Currently we support three types of groups - standing (department),
course and user managed groups. The standing groups are created once and
changes to the groups are made once a day. Course groups are created once a
semester and changes are made once a day. The user managed groups are
maintained only in ldap and changes appear as soon as they occur. One of
the reasons that we are looking at grouper is to make the changes to standing
and course groups in real time.
During our testing we ran into several areas of concern. We were
running ldappcng with a bulksync and interval of 180 seconds. On the first
interval, the sync took about 45 minutes because the large group had to be
provisioned. During that interval, I added several more small groups (2
members) to grouper. The groups didn't appear in ldap until the large group
was provisioned. Once the large group was in sync, provisioning of new
groups and members happened very quickly. I added one member to the large
group, and the sync was back up to taking 45 minutes. It appeared that it
modified all members of the group instead of just the one added. At least
that is what I believe that the SPML was telling me. The large group is our
faculty staff group and could potentially change multiple times a day.

Concerns
1. Minor change to membership appears to re-provision the group
2. Provisioning appears to be single threaded

Btw, I don't think it will change your performance that much, but I think
that a JDBC source has opportunities in the future to have some performance
improvements and feature improvements since it could bulk load subjects and
page/sort better, but Im not an ldap person, so maybe we could do a similar
thing there too. At Penn we just the jdbc2 source...

Thanks,
Chris

-----Original Message-----
From:


[mailto:]
On Behalf Of Lynn Garrison
Sent: Friday, March 11, 2011 11:02 AM
To:

Subject: [grouper-users] ldap source vs jdbc source

Our test environment at Penn State is configure with an ldap source;
oracle for the gsa. We recently executed a test load with our largest group
- ~32k faculty/staff members. We loaded all the members to grouper and then
provisioned them to lpad using ldappcng. We executed the test several times.

The load into grouper executed in 15 to 18 minutes.
The provision to ldap with ldappcng executed in 45 to 50 minutes.

We are using the source api and ldappcng version 1.6.3.
Questions:

1. Are these reasonable times?
2. Would we see an improvement in the ldappcng execution time if we
were using a jdbc source?


We are looking at replacing the current mechanism for managing groups
- 66000+ groups, ~32k members in the largest group. One of the requirements
is that all groups be provision to ldap and available for use as soon as they
are created.





Archive powered by MHonArc 2.6.16.

Top of Page