grouper-dev - Re: [grouper-dev] Performance of Group Searches
Subject: Grouper Developers Forum
List archive
- From: Shilen Patel <>
- To: Tom Barton <>
- Cc: Grouper Dev <>
- Subject: Re: [grouper-dev] Performance of Group Searches
- Date: Fri, 19 Oct 2007 09:03:05 -0400
Tom Barton wrote:
Excellent. You are most definitely not insane. :-)
Shilen Patel wrote:
1. Using ehcache and adding a new cache type in grouper.ehcache.xml, I've adding caching to Member objects in GrouperAccessAdapter.
Given #3 below, how does this actually net an improvement? Inquiring minds want to know ...
So this is actually an improvement to the privilege checking. For each group that's returned from the database query, a check is done to see if the person has access to view the group. The method that's called in GrouperAccessAdapter gets passed a Subject object and so the method does a query for the Member object. Since the Member object is not cached, a Member query is done for every privilege check. So that's 3119 extra queries in my example.
2. The next modification is related to scoping the results. To determine if a group or a stem (X) is a child of another stem (Y), the API currently does some recursive checks up the hierarchy of X to see if Y is found. Instead I made a modification to just check the object names. If the name of X starts with the name of Y, then X is a child of Y.
If this makes a substantial difference in performance, it's probably worth it to bake this assumption into code and forever link names to locations. If not, it might be best to continue to evaluate this relationship structurally. Would this constrain our options when we get around to implementing support for moving and copying groups & stem hierarchies around?
In my example, this was the biggest performance win because the "ECON" stems were several stems down from the root stem. As long as the group name (or some other attribute of the group that's available) is named like it is now with the location, this should not constrain options to move and copy stems and groups. However, every group and stem name from the source hierarchy would have to be updated, but maybe that's okay given the kind of operation taking place?
3. To do the actual database search for the groups, the API currently first gets a list of all group attribute ids by doing 1 query. For the ECON example above, that would result in a list of 3120 group attribute ids. Next, the API performs 3120 queries to retrieve all of the group attribute data. Then there will be another 3120 queries to get the group data. So that's 6241 queries. Furthermore, say sometime in the future you want to call group.getName() on all of the 3120 groups, that will result in 15,000 more queries. Anyways, so I reduced all that down to 1 query that takes about 5 seconds. I've set the group attributes as a property of the group so that additional queries to get group attributes are not needed. I did not use ehcache for this, although that might be something to think about. Any thoughts on whether there will be problems if group attributes are queried and saved ahead of time like this?
When you say attributes are made a property of the group, are you referring to a hibernate construct? Is that the same for hibernate 2 & 3? Would a caching approach loose effectiveness as the number of search hits exceeds cache size?
I can't think of a downside to pre-fetching group attributes.
I actually didn't make any modifications to the hibernate configuration. Each group already had a HashMap attributes property with getter and setter methods. The getter method would do a query in the database for the group attributes. The change I made was to retrieve the attributes while the search is taking place so that all the attributes of all the groups can be retrieved at once rather than querying for the attributes for each group individually. I would see a benefit of using caching here if you would expect Group objects to exist for long periods of time while the attributes are updating by something else. But I cannot think of a specific example of why we would need to worry about this right now. The cache size and time to live are things to definitely consider.
Thanks,
-- Shilen
- Performance of Group Searches, Shilen Patel, 10/18/2007
- Re: [grouper-dev] Performance of Group Searches, Tom Barton, 10/18/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Tom Barton, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Michael R. Gettes, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Tom Barton, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, GW Brown, Information Systems and Computing, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Michael R. Gettes, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, GW Brown, Information Systems and Computing, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Shilen Patel, 10/19/2007
- Re: [grouper-dev] Performance of Group Searches, Tom Barton, 10/18/2007
Archive powered by MHonArc 2.6.16.