grouper-users - RE: [grouper-users] Penn's organizational hierarchy

Subject: Grouper Users - Open Discussion List

List archive

RE: [grouper-users] Penn's organizational hierarchy

From: Chris Hyzer <>
To: Chris Hyzer <>, Niels van Dijk <>
Cc: Grouper Users Mailing List <>, "" <>
Subject: RE: [grouper-users] Penn's organizational hierarchy
Date: Tue, 26 May 2009 16:45:14 -0400
Accept-language: en-US
Acceptlanguage: en-US

I tried the idea where it retrieves all memberships in the registry for each
group in the groupList job in one query, and it didn't really help all that
much. I checked our logs, and the main loader that on the second run took 11
minutes, has been going in about 1 minute 20 seconds to manage 30k
memberships (with or without the change). Not sure why the 3-8th run were
faster than the 2nd run. If I don't have group privileges being managed on
all include/exclude groups, it is half that. Since loading memberships in
one query didn't help much, I defaulted that to false... if anyone wants to
try this somewhat experimental feature, you can set the
grouper-loader.properties setting in 1.4.2+:

loader.getAllGroupListMembershipsAtOnce = true

https://bugs.internet2.edu/jira/browse/GRP-281

Thanks,
Chris

> -----Original Message-----
> From: Chris Hyzer
> Sent: Friday, May 22, 2009 8:38 AM
> To: 'Niels van Dijk'
> Cc: Grouper Users Mailing List;
>
> Subject: RE: [grouper-users] Penn's organizational hierarchy
>
> Btw, the loader could be tuned to be more efficient, though it seems
> like it has acceptable performance at the moment... right now there is
> one query which is run to load the data. If it is a simple loader job,
> or a group list job, there is only one query. Then for each group
> managed (for a simple job, it is one group, or a group list, it is many
> groups), there is one query to list all current members in the registry
> (note, Im not sure if subjects are resolved, I hope not). The two
> lists are reconciled, and any inserts or deletes are performed.
>
> For group list jobs, I think it would make it much faster if there was
> one query against the registry to get the memberships of all groups in
> the list.
>
> I think we are ok with holding everything in memory (e.g. for Penn's
> org list, that would be 28k subjectId's and group names). That should
> be 3 megs of data (less depending on java String pooling). That
> doesn't seem unreasonable. However, I could picture ordering the query
> by group name and subject id, and cycling through the results in order
> without having to bring everything into memory...
>
> Another optimization could be, if the db connection of the loader job
> is the same as the grouper registry (including if you use a dblink), I
> could picture joining to the registry so that query would return the
> inserts and deletes only (or two queries, one for inserts, one for
> delete, if one doesn't work out)
>
> We should profile the loader to make sure we get the right bottlenecks
> though... :) If anyone has thoughts or concerns please let me know.
>
> Thanks,
> Chris
>
> > -----Original Message-----
> > From: Niels van Dijk
> > [mailto:]
> > Sent: Friday, May 22, 2009 3:36 AM
> > To: Chris Hyzer
> > Cc: Grouper Users Mailing List;
> >
> > Subject: Re: [grouper-users] Penn's organizational hierarchy
> >
> > Hello Chris,
> >
> > Thanks for the interesting document . Are you able to tell something
> > about the performance of grouper in the setup you describe?
> >
> > thanks in advance,
> > Regards,
> >
> > Niels
> >
> > Chris Hyzer wrote:
> > > Hey,
> > >
> > > I implemented the Grouper org hook at Penn. I have permission to
> > share my experience outside of Penn so I made a document here:
> > >
> > >
> >
> https://wiki.internet2.edu/confluence/display/GrouperWG/Penn+organizati
> > onal+hierarchy
> > >
> > > This is the size of our org implementation:
> > >
> > > 27,000 people in orgs at Penn
> > > 2,200 orgs
> > > 3,000 org groups (more due to include/exclude lists)
> > > 500,000 org memberships (there are a lot due to the rollups, and
> > include/exclude lists)
> > >
> > > Note that the bulk of the work was in figuring out how our org data
> > is structured, and how I can expose it in views to something that the
> > loader can process. If you want to get this working at your
> > institution this document should be helpful.
> > >
> > > Btw, Loris, does this help with your org issues?
> > >
> > > Thanks,
> > > Chris
> >
> > --
> > Niels van Dijk
> > Advanced Services
> >
> > T: +31 302 305 337 / M: +31 651 347 657
> > SURFnet - PO Box 19035 - NL-3501 DA Utrecht - The Netherlands -
> >
> > http://www.surfnet.nl
> > SURFnet grensverleggend netwerk voor hoger onderwijs en onderzoek

Penn's organizational hierarchy, Chris Hyzer, 05/21/2009
- Re: [grouper-users] Penn's organizational hierarchy, Niels van Dijk, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009
    - Re: [grouper-users] Penn's organizational hierarchy, Niels van Dijk, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/26/2009
    - Re: [grouper-users] Penn's organizational hierarchy, Gabby Capitanchik, 05/27/2009
- Re: [grouper-users] Penn's organizational hierarchy, Loris Bennett, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009

List archive

RE: [grouper-users] Penn's organizational hierarchy