grouper-dev - RE: [grouper-dev] changelog implementation sketch

Subject: Grouper Developers Forum

List archive

RE: [grouper-dev] changelog implementation sketch

From: "GW Brown, Information Systems and Computing" <>
To: Chris Hyzer <>, Tom Barton <>, Tom Zeller <>
Cc: Grouper Dev <>
Subject: RE: [grouper-dev] changelog implementation sketch
Date: Wed, 11 Jun 2008 11:13:49 +0100

--On 11 June 2008 02:09 -0400 Chris Hyzer
<>
wrote:

Some thoughts:

1. From doing the POC in hooks, I noticed that not all info is available
at time of hibernate call. E.g. when adding group A to group B, all the
members of A are given a membership in group B. But the subjectIds
aren't even known, it just uses the member uuid's. Similarly, another
example is if a subject is added to group A, and group A is a member of
group B, then a membership record is added for the subject in group B.
But the group name is not known, only the UUID. So if we are going to
have a friendly representation of change, then it would result in more DB
select queries just to get that info.

Hopefully my response to your hooks POC shows how we could make the information available without doing additional queries - which isn't to say that it would involve a reasonable amount of refactoring.

2. There are lots of queries going on for each operation. E.g. to add a
member to a group, it is multiple queries (e.g. above). And if each of
those queries results in an audit query, and each audit query goes to the
DB for a sequential ID, and potentially reads more info, I don't see how
we aren't going to have a slow down by a factor of 1.5-3 (depending on
the operation). But of course we can test this.

Grabbing a sequential ID ought to be quick - but the inserts would be a concern - that is why I originally suggested LOBs so we'd have one insert, though effective membership changes mean we may often have a variable number of inserts - but not usually hundreds-thousands

3. I think packing data in a field (e.g. the LDIF design) to be
extrapolated later by systems frequently sounds good at first, but later
it might be wished to be relational. If we are storing all data about
what has changed, how about we:

a. Make a table for each table we want to track
b. Insert a "shadow" record in the appropriate table for each DB operation
c. This would be done in java, but for people who want high performance,
they could turn off the java part and just use triggers (this should make
significantly faster)

Even using triggers you'd be doing a lot of writes. A nightly batch (or more frequent process) which takes large changes and populates the shadow tables could work. My guess is, as Tom B. suggests, that the point-in-time analysis will be infrequent - but important when it is required.

Just brainstorming here... :)

Kind regards,
Chris

-----Original Message-----
From: Tom Barton
[mailto:]
Sent: Tuesday, June 10, 2008 10:50 PM
To: Tom Zeller
Cc: GW Brown, Information Systems and Computing; Grouper Dev
Subject: Re: [grouper-dev] changelog implementation sketch

For audit, the first order of business is to preserve the information,
and for that purpose (alone) we don't need to store the info in both a
group-centric and a member-centric way.

Actually answering point-in-time questions should be of rather low
frequency compared to using the info for incremental change
notification, and it's only the latter that has a need for low latency.

So I think we can pick the single representation (group-centric or
member-centric) best suited to sensibly representing changes. And since
not all changes to group info are membership related, it looks like
group-centric is the only single one that meets the need.

Tom

Tom Zeller wrote:
> On Tue, Jun 10, 2008 at 10:12 AM, GW Brown, wrote:
>
>
> --On 10 June 2008 09:17 -0500 Tom Zeller
<
>
<mailto:>>
wrote:
>
>
> Representing changes from a group-centric point of view via
LDIF
> seems
> straightforward, but what about member-centrically ? e.g. how
> someone's
> group memberships have changed.
>
> What is the context here? Are we back to the subject id changing
or
> is this to make it easier to query from a member point of view
i.e.
> show me all the memberships for 'x' on a given date?
>
>
> The latter, make it easier/possible to query changes from a member
> point of view.
>
>
>
> If the actual change definitions - the LDIF or whatever
format
> we use - for a group can be queried using 'like' we could
have
> an inefficient means of finding relevant changes which would
> allow us to compute a subject's memberships at a given point
in
> time.
>
>
> I agree: querying LDIF via SQL seems inefficient, perhaps even
improper.
>

----------------------
GW Brown, Information Systems and Computing

Re: [grouper-dev] changelog implementation sketch, (continued)
- Re: [grouper-dev] changelog implementation sketch, GW Brown, Information Systems and Computing, 06/02/2008

List archive

RE: [grouper-dev] changelog implementation sketch