Skip to Content.
Sympa Menu

grouper-users - [grouper-users] RE: grouper loader performance

Subject: Grouper Users - Open Discussion List

List archive

[grouper-users] RE: grouper loader performance

Chronological Thread 
  • From: "Doppala, Karthik" <>
  • To: Chris Hyzer <>, "" <>
  • Subject: [grouper-users] RE: grouper loader performance
  • Date: Thu, 11 Sep 2014 18:49:49 +0000
  • Accept-language: en-US

About two months ago we migrated from 2.1.2 to 2.1.5. Prior to 2.1.5 we were using subject_id and had to switch to subject_identifier due to functional reasons. We fine-tuned our Sql Server DB and significantly improved the performance but that is still not good enough. There seem to be lot of factors, we observed that the performance degrades as more and more groups and memberships are added, Also I believe the amount of data in the change log tables also matters. In our test environment it took us 11 hours to load 1.7 million memberships, the first group had around 240k members and it just took over an hour and to the end it came down to 140K members/hour. Between each group, DB indexes were defragged, and 116 non-primary key indexes kept disabled. This kept adequate the physical disk space organization and buffer for the growing data during the load activity. The 116 indexes were chosen (through analysis) where the updates were more than the seeks, scans & lookups. As I mentioned this was in test environment and when we moved to production we saw the performance deteriorate further (more network traffic, physical DB server etc being the reasons). Even now the daily syncs for some of the largest groups (~400K) take around 2-3 hours even though the number of memberships modified are very few.



From: [mailto:] On Behalf Of Chris Hyzer
Sent: Wednesday, September 10, 2014 8:18 PM
Subject: [grouper-users] grouper loader performance


There were questions on the IAM online today about grouper loader performance.  Here is an example at Penn:


Looking at the grouper_loader_log table you can see how big the groups are, low long things take, how many inserts and deletes.


Our ezproxy group has 85,000 members. (key is subject_id)


When there are few changes (e.g. a couple days ago), we had 90 insertions and 40 deletions.  It took 6 seconds to get the 85,000 rows to operate on, and 2.5 minutes to see whats in grouper and do the 130 operations.


We did a query change today where the memberships changed drastically, and it did 13k additions and 11k deletions and it took 50 minutes.


Generally with the loader in steady state and keying off of subject_id it will run pretty quickly.






Archive powered by MHonArc 2.6.16.

Top of Page