grouper-users - Re: [grouper-users] Penn's organizational hierarchy

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] Penn's organizational hierarchy

From: Gabby Capitanchik <>
To: Chris Hyzer <>
Cc: Niels van Dijk <>, Grouper Users Mailing List <>, "" <>, Tim gray <>
Subject: Re: [grouper-users] Penn's organizational hierarchy
Date: Wed, 27 May 2009 09:47:51 +0100
Organization: The University of Edinburgh

Chris,

I was wondering if you could give some rough figures for how many queries that your grouper installation can handle per minute (or hour)?

Cheers

Gabby Capitanchik

Chris Hyzer wrote:

I tried the idea where it retrieves all memberships in the registry for each
group in the groupList job in one query, and it didn't really help all that
much. I checked our logs, and the main loader that on the second run took 11
minutes, has been going in about 1 minute 20 seconds to manage 30k
memberships (with or without the change). Not sure why the 3-8th run were
faster than the 2nd run. If I don't have group privileges being managed on
all include/exclude groups, it is half that. Since loading memberships in
one query didn't help much, I defaulted that to false... if anyone wants to
try this somewhat experimental feature, you can set the
grouper-loader.properties setting in 1.4.2+:

loader.getAllGroupListMembershipsAtOnce = true

https://bugs.internet2.edu/jira/browse/GRP-281

Thanks,
Chris

-----Original Message-----
From: Chris Hyzer
Sent: Friday, May 22, 2009 8:38 AM
To: 'Niels van Dijk'
Cc: Grouper Users Mailing List;

Subject: RE: [grouper-users] Penn's organizational hierarchy

Btw, the loader could be tuned to be more efficient, though it seems
like it has acceptable performance at the moment... right now there is
one query which is run to load the data. If it is a simple loader job,
or a group list job, there is only one query. Then for each group
managed (for a simple job, it is one group, or a group list, it is many
groups), there is one query to list all current members in the registry
(note, Im not sure if subjects are resolved, I hope not). The two
lists are reconciled, and any inserts or deletes are performed.

For group list jobs, I think it would make it much faster if there was
one query against the registry to get the memberships of all groups in
the list.

I think we are ok with holding everything in memory (e.g. for Penn's
org list, that would be 28k subjectId's and group names). That should
be 3 megs of data (less depending on java String pooling). That
doesn't seem unreasonable. However, I could picture ordering the query
by group name and subject id, and cycling through the results in order
without having to bring everything into memory...

Another optimization could be, if the db connection of the loader job
is the same as the grouper registry (including if you use a dblink), I
could picture joining to the registry so that query would return the
inserts and deletes only (or two queries, one for inserts, one for
delete, if one doesn't work out)

We should profile the loader to make sure we get the right bottlenecks
though... :) If anyone has thoughts or concerns please let me know.

Thanks,
Chris

-----Original Message-----
From: Niels van Dijk
[mailto:]
Sent: Friday, May 22, 2009 3:36 AM
To: Chris Hyzer
Cc: Grouper Users Mailing List;

Subject: Re: [grouper-users] Penn's organizational hierarchy

Hello Chris,

Thanks for the interesting document . Are you able to tell something
about the performance of grouper in the setup you describe?

thanks in advance,
Regards,

Niels

Chris Hyzer wrote:

Hey,

I implemented the Grouper org hook at Penn. I have permission to

share my experience outside of Penn so I made a document here:

https://wiki.internet2.edu/confluence/display/GrouperWG/Penn+organizati

onal+hierarchy

This is the size of our org implementation:

27,000 people in orgs at Penn
2,200 orgs
3,000 org groups (more due to include/exclude lists)
500,000 org memberships (there are a lot due to the rollups, and

include/exclude lists)

Note that the bulk of the work was in figuring out how our org data

is structured, and how I can expose it in views to something that the
loader can process. If you want to get this working at your
institution this document should be helpful.

Btw, Loris, does this help with your org issues?

Thanks,
Chris

--
Niels van Dijk
Advanced Services

T: +31 302 305 337 / M: +31 651 347 657
SURFnet - PO Box 19035 - NL-3501 DA Utrecht - The Netherlands -

http://www.surfnet.nl
SURFnet grensverleggend netwerk voor hoger onderwijs en onderzoek

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Penn's organizational hierarchy, Chris Hyzer, 05/21/2009
- Re: [grouper-users] Penn's organizational hierarchy, Niels van Dijk, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009
    - Re: [grouper-users] Penn's organizational hierarchy, Niels van Dijk, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/26/2009
    - Re: [grouper-users] Penn's organizational hierarchy, Gabby Capitanchik, 05/27/2009
- Re: [grouper-users] Penn's organizational hierarchy, Loris Bennett, 05/22/2009
  - RE: [grouper-users] Penn's organizational hierarchy, Chris Hyzer, 05/22/2009

List archive

Re: [grouper-users] Penn's organizational hierarchy