Skip to Content.
Sympa Menu

grouper-dev - RE: [grouper-dev] subject batching performance improvement

Subject: Grouper Developers Forum

List archive

RE: [grouper-dev] subject batching performance improvement


Chronological Thread 
  • From: Thomas Barton <>
  • To: "" <>, "" <>
  • Subject: RE: [grouper-dev] subject batching performance improvement
  • Date: Mon, 28 Nov 2011 08:02:37 -0600
  • Accept-language: en-US
  • Acceptlanguage: en-US

A worthy improvement!

Tom (mobile)


-----Original Message-----
From: Chris Hyzer []
Received: Sunday, 27 Nov 2011, 11:18pm
To: Thomas Barton []; []
Subject: RE: [grouper-dev] subject batching performance improvement

JDBC2 performance results:

Resolving 2000 subjects by batch took: 1271ms

Resolving 2000 subjects individually took: 37997ms

 

This implementation of JDBC does 180 bind variables max per query and it is 30 times faster.  So if ldap does max 100, there should be a speedup percentage of the same order of magnitude…  J

 

The JDBC implementation is very similar so the results should be similar.  I don’t have a JDBC (non-2) source with tons of data, so I don’t know for sure, someone could run a program like below to find out against the GROUPER_2_0_2 tag and their subject source…

 

Thanks,

Chris

 

/**

* @author mchyzer

* $Id$

*/

package edu.internet2.middleware.grouper.poc;

 

import java.util.ArrayList;

import java.util.HashMap;

import java.util.List;

import java.util.Map;

 

import edu.internet2.middleware.grouper.SubjectFinder;

import edu.internet2.middleware.grouper.hibernate.HibernateSession;

import edu.internet2.middleware.subject.Subject;

 

 

/**

*

*/

public class SubjectPerformance {

 

  /**

   * @param args

   */

  public static void main(String[] args) {

 

    //lets get 5k pennids

    List<String> pennids = HibernateSession.bySqlStatic().listSelect(String.class,

        "select penn_id from person_source_v where rownum < 6000", null);

   

    int index = 0;

   

    {

      //lets do a batch and a single one to prime the pump

      List<String> fiveHundred = new ArrayList<String>();

     

      for (int i=0;i<500;i++) {

        fiveHundred.add(pennids.get(index));

        index++;

      }

     

      Map<String, Subject> result = SubjectFinder.findByIds(fiveHundred, "pennperson");

     

      if (result.size() < 490) {

        throw new RuntimeException("Problem! " + result.size());

      }

     

      fiveHundred = new ArrayList<String>();

     

      result = new HashMap<String, Subject>();

     

      for (int i=0;i<500;i++) {

        fiveHundred.add(pennids.get(index));

        index++;

      }

     

      for (String pennid : fiveHundred) {

        Subject subject = SubjectFinder.findByIdAndSource(pennid, "pennperson", false);

        if (subject != null) {

          result.put(pennid, subject);

        }

      }

     

      if (result.size() < 490) {

        throw new RuntimeException("Problem! " + result.size());

      }

    }   

 

    long start = System.nanoTime();

   

    //############### lets try batched

    //lets do a batch and a single one to prime the pump

    List<String> twoThousand = new ArrayList<String>();

   

    for (int i=0;i<2000;i++) {

      twoThousand.add(pennids.get(index));

      index++;

    }

   

    Map<String, Subject> result = SubjectFinder.findByIds(twoThousand, "pennperson");

   

    if (result.size() < 1900) {

      throw new RuntimeException("Problem! " + result.size());

    }

   

    System.out.println("Resolving 2000 subjects by batch took: " + ((System.nanoTime() - start) / 1000000) + "ms");

   

    //############### lets try not batched

    start = System.nanoTime();

   

    twoThousand = new ArrayList<String>();

   

    result = new HashMap<String, Subject>();

   

    for (int i=0;i<2000;i++) {

      twoThousand.add(pennids.get(index));

      index++;

    }

   

    for (String pennid : twoThousand) {

      Subject subject = SubjectFinder.findByIdAndSource(pennid, "pennperson", false);

      if (subject != null) {

        result.put(pennid, subject);

      }

    }

   

    if (result.size() < 1900) {

      throw new RuntimeException("Problem! " + result.size());

    }

 

    System.out.println("Resolving 2000 subjects individually took: " + ((System.nanoTime() - start) / 1000000) + "ms");

   

  

  }

 

}

 

 

From: [mailto:] On Behalf Of Tom Barton
Sent: Sunday, November 27, 2011 10:22 AM
To:
Subject: Re: [grouper-dev] subject batching performance improvement

 

Thanks Chris. Two questions for the dev team:

1. I think we realized that batching is also possible in JNDI context, yes? Is someone looking into that? For 2.0.2?

2. It'd be great to have performance numbers with and without batching for some simple set of test cases. Is that possible?

Thanks,
Tom

On 11/26/2011 11:21 PM, Chris Hyzer wrote:

I finished coding the subject resolution batching performance improvement… Penn has a subject picker which finds subjects who are employees, and it would take more than 20 seconds to search which caused a timeout.  Now it takes a few seconds, and the number of queries and amount of data is bounded (not by N J ).

 

https://bugs.internet2.edu/jira/browse/GRP-713

 

Subject batching to not require N queries has been added to the following places:

UI subject picker when subject is in a group
UI get members
UI get members lite
WS get members
WS get privileges
WS get memberships

 

https://bugs.internet2.edu/jira/browse/GRP-712

 

There should be batching (for jdbc and jdbc2 source adapters) so that multiple subjects can be resolved at once. Note: for source adapters which do not implement a batched method to retrieve multiple subjects, they will be retrieved as they were before one at a time.

If you use the jdbc2 source adapter, you dont have to configure anything differently, it will just work.

If you use the jdbc source adapter, compare the new sources.example.xml, you can set these properties

     <!-- if you are going to use the inclause attribute
       on the search to make the queries batchable when searching
       by id or identifier -->
     <init-param>
       <param-name>useInClauseForIdAndIdentifier</param-name>
       <param-value>true</param-value>
     </init-param>
     
     <!-- comma separate the identifiers for this row, this is for the findByIdentifiers if using an in clause -->
     <init-param>
       <param-name>identifierAttributes</param-name>
       <param-value>LOGINID</param-value>
     </init-param>

Then you can change your queries for id or identifier to have an inclause variable, and assign that variable for the inclause part:

     <search>
         <searchType>searchSubject</searchType>
         <param>
             <param-name>sql</param-name>
             <param-value>
select
   s.subjectid as id, s.name as name,
   (select sa2.value from subjectattribute sa2 where name='name' and sa2.SUBJECTID = s.subjectid) as lfname,
   (select sa3.value from subjectattribute sa3 where name='loginid' and sa3.SUBJECTID = s.subjectid) as loginid,
   (select sa4.value from subjectattribute sa4 where name='description' and sa4.SUBJECTID = s.subjectid) as description,
   (select sa5.value from subjectattribute sa5 where name='email' and sa5.SUBJECTID = s.subjectid) as email
from
   subject s
where
   {inclause}
            </param-value>
         </param>
         <param>
             <param-name>inclause</param-name>
             <param-value>
s.subjectid = ?
            </param-value>
         </param>
     </search>
     <search>
         <searchType>searchSubjectByIdentifier</searchType>
         <param>
             <param-name>sql</param-name>
             <param-value>
select
   s.subjectid as id, s.name as name,
   (select sa2.value from subjectattribute sa2 where name='name' and sa2.SUBJECTID = s.subjectid) as lfname,
   (select sa3.value from subjectattribute sa3 where name='loginid' and sa3.SUBJECTID = s.subjectid) as loginid,
   (select sa4.value from subjectattribute sa4 where name='description' and sa4.SUBJECTID = s.subjectid) as description,
   (select sa5.value from subjectattribute sa5 where name='email' and sa5.SUBJECTID = s.subjectid) as email
from
   subject s, subjectattribute a
where
   a.name='loginid' and s.subjectid = a.subjectid and {inclause}
             </param-value>
         </param>
         <param>
             <param-name>inclause</param-name>
             <param-value>
   a.value = ?
            </param-value>
         </param>
     </search>




There are now these SubjectFinder methods:

SubjectFinder.findByIds(Collection<String>)
SubjectFinder.findByIds(Collection<String>, String)
SubjectFinder.findByIdentifiers(Collection<String>)
SubjectFinder.findByIdentifiers(Collection<String>, String)
SubjectFinder.findByIdsOrIdentifiers(Collection<String>)
SubjectFinder.findByIdsOrIdentifiers(Collection<String>, String)

There is this method in member to resolve subjects in a collection of members at once:

Member.resolveSubjects(Collection<Member>, boolean)

This method will resolve subjects in membership array results:

Membership.resolveSubjects(Collection<Object[]>)

This method will resolve subjects in privilege objects:

PrivilegeHelper.resolveSubjects(Collection<GrouperPrivilege>, boolean)

 

 

Thanks,

Chris




Archive powered by MHonArc 2.6.16.

Top of Page