grouper-dev - RE: [grouper-dev] subject batching performance improvement

Subject: Grouper Developers Forum

List archive

RE: [grouper-dev] subject batching performance improvement

From: Chris Hyzer <>
To: Tom Barton <>, "" <>
Subject: RE: [grouper-dev] subject batching performance improvement
Date: Mon, 28 Nov 2011 05:18:34 +0000
Accept-language: en-US

JDBC2 performance results:

Resolving 2000 subjects by batch took: 1271ms

Resolving 2000 subjects individually took: 37997ms

This implementation of JDBC does 180 bind variables max per query and it is 30 times faster. So if ldap does max 100, there should be a speedup percentage of the same order of magnitude… J

The JDBC implementation is very similar so the results should be similar. I don’t have a JDBC (non-2) source with tons of data, so I don’t know for sure, someone could run a program like below to find out against the GROUPER_2_0_2 tag and their subject source…

Thanks,

Chris

/**

* @author mchyzer

* $Id$

package edu.internet2.middleware.grouper.poc;

import java.util.ArrayList;

import java.util.HashMap;

import java.util.List;

import java.util.Map;

import edu.internet2.middleware.grouper.SubjectFinder;

import edu.internet2.middleware.grouper.hibernate.HibernateSession;

import edu.internet2.middleware.subject.Subject;

/**

public class SubjectPerformance {

/**

* @param args

public static void main(String[] args) {

//lets get 5k pennids

List<String> pennids = HibernateSession.bySqlStatic().listSelect(String.class,

"select penn_id from person_source_v where rownum < 6000", null);

int index = 0;

{

//lets do a batch and a single one to prime the pump

List<String> fiveHundred = new ArrayList<String>();

for (int i=0;i<500;i++) {

fiveHundred.add(pennids.get(index));

index++;

}

Map<String, Subject> result = SubjectFinder.findByIds(fiveHundred, "pennperson");

if (result.size() < 490) {

throw new RuntimeException("Problem! " + result.size());

}

fiveHundred = new ArrayList<String>();

result = new HashMap<String, Subject>();

for (int i=0;i<500;i++) {

fiveHundred.add(pennids.get(index));

index++;

}

for (String pennid : fiveHundred) {

Subject subject = SubjectFinder.findByIdAndSource(pennid, "pennperson", false);

if (subject != null) {

result.put(pennid, subject);

}

if (result.size() < 490) {

throw new RuntimeException("Problem! " + result.size());

}

long start = System.nanoTime();

//############### lets try batched

//lets do a batch and a single one to prime the pump

List<String> twoThousand = new ArrayList<String>();

for (int i=0;i<2000;i++) {

twoThousand.add(pennids.get(index));

index++;

}

Map<String, Subject> result = SubjectFinder.findByIds(twoThousand, "pennperson");

if (result.size() < 1900) {

throw new RuntimeException("Problem! " + result.size());

}

System.out.println("Resolving 2000 subjects by batch took: " + ((System.nanoTime() - start) / 1000000) + "ms");

//############### lets try not batched

start = System.nanoTime();

twoThousand = new ArrayList<String>();

result = new HashMap<String, Subject>();

for (int i=0;i<2000;i++) {

twoThousand.add(pennids.get(index));

index++;

}

for (String pennid : twoThousand) {

Subject subject = SubjectFinder.findByIdAndSource(pennid, "pennperson", false);

if (subject != null) {

result.put(pennid, subject);

}

if (result.size() < 1900) {

throw new RuntimeException("Problem! " + result.size());

}

System.out.println("Resolving 2000 subjects individually took: " + ((System.nanoTime() - start) / 1000000) + "ms");

}

From: [mailto:] On Behalf Of Tom Barton
Sent: Sunday, November 27, 2011 10:22 AM
To:
Subject: Re: [grouper-dev] subject batching performance improvement

Thanks Chris. Two questions for the dev team:

1. I think we realized that batching is also possible in JNDI context, yes? Is someone looking into that? For 2.0.2?

2. It'd be great to have performance numbers with and without batching for some simple set of test cases. Is that possible?

Thanks,
Tom

On 11/26/2011 11:21 PM, Chris Hyzer wrote:

I finished coding the subject resolution batching performance improvement… Penn has a subject picker which finds subjects who are employees, and it would take more than 20 seconds to search which caused a timeout. Now it takes a few seconds, and the number of queries and amount of data is bounded (not by N J ).

https://bugs.internet2.edu/jira/browse/GRP-713

Subject batching to not require N queries has been added to the following places:

UI subject picker when subject is in a group
UI get members
UI get members lite
WS get members
WS get privileges
WS get memberships

https://bugs.internet2.edu/jira/browse/GRP-712

There should be batching (for jdbc and jdbc2 source adapters) so that multiple subjects can be resolved at once. Note: for source adapters which do not implement a batched method to retrieve multiple subjects, they will be retrieved as they were before one at a time.

If you use the jdbc2 source adapter, you dont have to configure anything differently, it will just work.

If you use the jdbc source adapter, compare the new sources.example.xml, you can set these properties

     
     <init-param>
       <param-name>useInClauseForIdAndIdentifier</param-name>
       <param-value>true</param-value>
     </init-param>

     
     <init-param>
       <param-name>identifierAttributes</param-name>
       <param-value>LOGINID</param-value>
     </init-param>

Then you can change your queries for id or identifier to have an inclause variable, and assign that variable for the inclause part:

     <search>
         <searchType>searchSubject</searchType>
         <param>
             <param-name>sql</param-name>
             <param-value>
select
   s.subjectid as id, s.name as name,
   (select sa2.value from subjectattribute sa2 where name='name' and sa2.SUBJECTID = s.subjectid) as lfname,
   (select sa3.value from subjectattribute sa3 where name='loginid' and sa3.SUBJECTID = s.subjectid) as loginid,
   (select sa4.value from subjectattribute sa4 where name='description' and sa4.SUBJECTID = s.subjectid) as description,
   (select sa5.value from subjectattribute sa5 where name='email' and sa5.SUBJECTID = s.subjectid) as email
from
   subject s
where
   {inclause}
            </param-value>
         </param>
         <param>
             <param-name>inclause</param-name>
             <param-value>
s.subjectid = ?
            </param-value>
         </param>
     </search>
     <search>
         <searchType>searchSubjectByIdentifier</searchType>
         <param>
             <param-name>sql</param-name>
             <param-value>
select
   s.subjectid as id, s.name as name,
   (select sa2.value from subjectattribute sa2 where name='name' and sa2.SUBJECTID = s.subjectid) as lfname,
   (select sa3.value from subjectattribute sa3 where name='loginid' and sa3.SUBJECTID = s.subjectid) as loginid,
   (select sa4.value from subjectattribute sa4 where name='description' and sa4.SUBJECTID = s.subjectid) as description,
   (select sa5.value from subjectattribute sa5 where name='email' and sa5.SUBJECTID = s.subjectid) as email
from
   subject s, subjectattribute a
where
   a.name='loginid' and s.subjectid = a.subjectid and {inclause}
             </param-value>
         </param>
         <param>
             <param-name>inclause</param-name>
             <param-value>
   a.value = ?
            </param-value>
         </param>
     </search>

There are now these SubjectFinder methods:

SubjectFinder.findByIds(Collection<String>)
SubjectFinder.findByIds(Collection<String>, String)
SubjectFinder.findByIdentifiers(Collection<String>)
SubjectFinder.findByIdentifiers(Collection<String>, String)
SubjectFinder.findByIdsOrIdentifiers(Collection<String>)
SubjectFinder.findByIdsOrIdentifiers(Collection<String>, String)

There is this method in member to resolve subjects in a collection of members at once:

Member.resolveSubjects(Collection<Member>, boolean)

This method will resolve subjects in membership array results:

Membership.resolveSubjects(Collection<Object[]>)

This method will resolve subjects in privilege objects:

PrivilegeHelper.resolveSubjects(Collection<GrouperPrivilege>, boolean)

Thanks,

Chris

[grouper-dev] subject batching performance improvement, Chris Hyzer, 11/27/2011
- Re: [grouper-dev] subject batching performance improvement, Tom Barton, 11/27/2011
  - RE: [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/27/2011
    - Re: [grouper-dev] subject batching performance improvement, Tom Zeller, 11/27/2011
    - RE: [grouper-dev] subject batching performance improvement, Jim Fox, 11/28/2011
  - RE: [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/28/2011
- <Possible follow-up(s)>
- RE: [grouper-dev] subject batching performance improvement, Thomas Barton, 11/28/2011
- RE: [grouper-dev] subject batching performance improvement, Thomas Barton, 11/28/2011

List archive

RE: [grouper-dev] subject batching performance improvement