Skip to Content.
Sympa Menu

grouper-users - [grouper-users] RE: grouper loader performance

Subject: Grouper Users - Open Discussion List

List archive

[grouper-users] RE: grouper loader performance


Chronological Thread 
  • From: Chris Hyzer <>
  • To: "Doppala, Karthik" <>, "" <>
  • Subject: [grouper-users] RE: grouper loader performance
  • Date: Thu, 11 Sep 2014 19:28:33 +0000
  • Accept-language: en-US

Theres basically 4 queries being run:

 

1.       The query to get the members from the loader source of the members who should be in the group.  That will return 400k rows.  That only returns subject_identifier or also subject_source_id? (adding source id will improve performance as well). 

2.       The query to get the members from group.  This is an API call, and should return 400k rows as well

3.       The subject lookups to resolve subject_identifier to subject_id (not sure if these are batched or not)

4.       The adds/removes to sync up the group

 

Can you run with the p6spy driver to see which queries are taking a long time?  Maybe that would help… just set the driver in grouper.hibernate.properties (or I guess loader properties for loader connections), to be the p6spy one, and then in the spy.properties put a path of a file for the log…  this would be for your test system J

 

Thanks,

Chris

 

 

 

 

From: Doppala, Karthik [mailto:]
Sent: Thursday, September 11, 2014 2:50 PM
To: Chris Hyzer;
Subject: RE: grouper loader performance

 

About two months ago we migrated from 2.1.2 to 2.1.5. Prior to 2.1.5 we were using subject_id and had to switch to subject_identifier due to functional reasons. We fine-tuned our Sql Server DB and significantly improved the performance but that is still not good enough. There seem to be lot of factors, we observed that the performance degrades as more and more groups and memberships are added, Also I believe the amount of data in the change log tables also matters. In our test environment it took us 11 hours to load 1.7 million memberships, the first group had around 240k members and it just took over an hour and to the end it came down to 140K members/hour. Between each group, DB indexes were defragged, and 116 non-primary key indexes kept disabled. This kept adequate the physical disk space organization and buffer for the growing data during the load activity. The 116 indexes were chosen (through analysis) where the updates were more than the seeks, scans & lookups. As I mentioned this was in test environment and when we moved to production we saw the performance deteriorate further (more network traffic, physical DB server etc being the reasons). Even now the daily syncs for some of the largest groups (~400K) take around 2-3 hours even though the number of memberships modified are very few.

 

 

From: [] On Behalf Of Chris Hyzer
Sent: Wednesday, September 10, 2014 8:18 PM
To:
Subject: [grouper-users] grouper loader performance

 

There were questions on the IAM online today about grouper loader performance.  Here is an example at Penn:

 

Looking at the grouper_loader_log table you can see how big the groups are, low long things take, how many inserts and deletes.

 

Our ezproxy group has 85,000 members. (key is subject_id)

 

When there are few changes (e.g. a couple days ago), we had 90 insertions and 40 deletions.  It took 6 seconds to get the 85,000 rows to operate on, and 2.5 minutes to see whats in grouper and do the 130 operations.

 

We did a query change today where the memberships changed drastically, and it did 13k additions and 11k deletions and it took 50 minutes.

 

Generally with the loader in steady state and keying off of subject_id it will run pretty quickly.

 

Thanks,

Chris

 

 




Archive powered by MHonArc 2.6.16.

Top of Page