grouper-dev - Re: [grouper-dev] changelog implementation sketch

Subject: Grouper Developers Forum

List archive

Re: [grouper-dev] changelog implementation sketch

From: Kathryn Huxtable <>
To: Grouper Dev <>
Subject: Re: [grouper-dev] changelog implementation sketch
Date: Thu, 12 Jun 2008 13:11:20 -0500

I've been thinking this through since the call and I think maybe you're right about not adding a record for each of 11,000 changes. At KU we did add all those changes, and then our equivalent of ldappc (rdbpc) retrieved them with a single query and batched them according to subject.

It seems that if you were to store the data in three blocks, one for each group changed, and that each block contained all the subject changes, that similar work could be done. The batching by subject would be done differently, but I doubt there would be any order of magnitude complexity change, and possibly not even any performance change.

-K

On Jun 12, 2008, at 5:42 AM, GW Brown, Information Systems and Computing wrote:

Here is a scenario to encapsulate some of the issues raised in the conference call and my thoughts and questions on them:

Group A (5000 members) is added to Group B which is a member of Group C.
Group D has 1000 members in common with Group A.
Group E = Group D complement Group C

So, at the top level there is one change - Group A becomes a member of Group B

Further changes:
1) 5000 effective additions to Group B
2) 5001 effective additions to Group C
3) 1000 effective deletions from E

The actual changes could all be represented as one block of LDIF (or other format), or we could have 3 blocks, one for each group changed.

In the change log we would log the high level change, and also the fact that Groups C and E were changed (recording the parent change). A notification system could propagate 1 change or 3 - the receiving system can parse the change format and deal with it how it likes.

I don't think we should be trying to notify the 11000 effective changes individually.

In LDAP group memberships are often represented as a 'subject' with a multi-valued group attribute rather than as a group with a multi- valued members attribute. Experience with LDAPPC has shown that it is more efficient to work out which subjects have changed memberships and push the required changes rather than simply go through a whole group membership and make repeated changes to the same subjects.

So, for LDAPPC it seems to make more sense to record changes against Member or, in addition to the group changes, also record individual effective changes so they can easily be queried. This would lead to many new rows and would likely be slow if executed from Java, but would probably perform fine if changes to the grouper_memberships table invoked a database trigger - could these be related to a high level change i.e. would the trigger have access to that information?

On the other hand, LDAPPC could parse group changes and determine the membership changes that way.

Would LDAPPC query the database for changes directly or should it use API methods?

Does the multi-valued group attribute in LDAP use full names and/or uuids to represent groups? Are there implications for Stem name changes which may affect lots of group names?

Gary

--On 10 June 2008 10:49 -0500 Tom Zeller
<>
wrote:

On Tue, Jun 10, 2008 at 10:12 AM, GW Brown, wrote:

--On 10 June 2008 09:17 -0500 Tom Zeller
<>
wrote:

Representing changes from a group-centric point of view via LDIF seems
straightforward, but what about member-centrically ? e.g. how someone's
group memberships have changed.

What is the context here? Are we back to the subject id changing or is
this to make it easier to query from a member point of view i.e. show me
all the memberships for 'x' on a given date?

The latter, make it easier/possible to query changes from a member point
of view.

If the actual change definitions - the LDIF or whatever format we use -
for a group can be queried using 'like' we could have an inefficient
means of finding relevant changes which would allow us to compute a
subject's memberships at a given point in time.

I agree: querying LDIF via SQL seems inefficient, perhaps even improper.

----------------------
GW Brown, Information Systems and Computing

Re: [grouper-dev] changelog implementation sketch, (continued)
- Re: [grouper-dev] changelog implementation sketch, GW Brown, Information Systems and Computing, 06/02/2008

List archive

Re: [grouper-dev] changelog implementation sketch