Subject: Grouper Developers Forum
- From: Shilen Patel <>
- To: Grouper Dev <>
- Subject: Performance of Group Searches
- Date: Thu, 18 Oct 2007 17:24:23 -0400
I've been working on GRP-48 which involves improving the group search performance in the Grouper API. I've found a few ways to make some huge performance improvements, but before I get too far into code changes and testing, I thought I would describe what I'm doing. This is to primarily make sure I'm not breaking any design decisions that I may not be aware of.
So first here are the performance results. I'll use a specific search example using Duke's test Grouper installation. We have 3120 "ECON" courses somewhere within the stem duke:siss:courses. Note that these results do not use the Grouper UI.
A search for ECON at the duke stem using a non-GrouperSystem session currently takes 134 seconds. With code changes - 22 seconds. A search for ECON at the duke stem using a GrouperSystem session currently takes 109 seconds. With code changes - 6 seconds.
A search for ECON at the root stem using a non-GrouperSystem session currently takes 63 seconds. With code changes - 22 seconds.
A search for ECON at the root stem using a GrouperSystem session currently takes 39 seconds. With code changes - 6 seconds.
After the modifications, in the cases where a non-GrouperSystem session is created, about 75 percent of the time is actually spent on privilege checking. I haven't yet looked for performance improvements in this area. I've also noticed that the Grouper UI also does some privilege checking during group searches, but I don't understand why. Shouldn't this already be taken care of in the API? Gary can you comment on this?
So I've made 3 primary modifications to get the performance results described above.
1. Using ehcache and adding a new cache type in grouper.ehcache.xml, I've adding caching to Member objects in GrouperAccessAdapter.
2. The next modification is related to scoping the results. To determine if a group or a stem (X) is a child of another stem (Y), the API currently does some recursive checks up the hierarchy of X to see if Y is found. Instead I made a modification to just check the object names. If the name of X starts with the name of Y, then X is a child of Y.
3. To do the actual database search for the groups, the API currently first gets a list of all group attribute ids by doing 1 query. For the ECON example above, that would result in a list of 3120 group attribute ids. Next, the API performs 3120 queries to retrieve all of the group attribute data. Then there will be another 3120 queries to get the group data. So that's 6241 queries. Furthermore, say sometime in the future you want to call group.getName() on all of the 3120 groups, that will result in 15,000 more queries. Anyways, so I reduced all that down to 1 query that takes about 5 seconds. I've set the group attributes as a property of the group so that additional queries to get group attributes are not needed. I did not use ehcache for this, although that might be something to think about. Any thoughts on whether there will be problems if group attributes are queried and saved ahead of time like this?
So hopefully all this makes sense. Please let me know if you have any thoughts or think what I'm doing is completely insane.
- Performance of Group Searches, Shilen Patel, 10/18/2007
- Re: [grouper-dev] Performance of Group Searches, Tom Barton, 10/18/2007
- Re: [grouper-dev] Performance of Group Searches, GW Brown, Information Systems and Computing, 10/19/2007
Archive powered by MHonArc 2.6.16.