grouper-dev - RE: [grouper-dev] subject batching performance improvement
Subject: Grouper Developers Forum
List archive
- From: Chris Hyzer <>
- To: Tom Barton <>, "" <>
- Subject: RE: [grouper-dev] subject batching performance improvement
- Date: Mon, 28 Nov 2011 05:18:34 +0000
- Accept-language: en-US
JDBC2 performance results: Resolving 2000 subjects by batch took: 1271ms Resolving 2000 subjects individually took: 37997ms This implementation of JDBC does 180 bind variables max per query and it is 30 times faster. So if ldap does max 100, there should be a speedup percentage of the same order of magnitude…
J The JDBC implementation is very similar so the results should be similar. I don’t have a JDBC (non-2) source with tons of data, so I don’t know for sure, someone could run a program like below to find out against
the GROUPER_2_0_2 tag and their subject source… Thanks, Chris /** *
@author mchyzer * $Id$ */ package edu.internet2.middleware.grouper.poc; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import edu.internet2.middleware.grouper.SubjectFinder; import edu.internet2.middleware.grouper.hibernate.HibernateSession; import edu.internet2.middleware.subject.Subject; /** * */ public
class SubjectPerformance {
/** *
@param args */
public
static
void main(String[] args) {
//lets get 5k pennids List<String> pennids = HibernateSession.bySqlStatic().listSelect(String.class,
"select penn_id from person_source_v where rownum < 6000",
null);
int
index = 0;
{
//lets do a batch and a single one to prime the pump List<String> fiveHundred =
new ArrayList<String>();
for
(int i=0;i<500;i++) { fiveHundred.add(pennids.get(index)); index++; }
Map<String, Subject> result = SubjectFinder.findByIds(fiveHundred,
"pennperson");
if
(result.size() < 490) {
throw
new RuntimeException("Problem!
" + result.size()); }
fiveHundred =
new ArrayList<String>();
result =
new HashMap<String, Subject>();
for
(int i=0;i<500;i++) { fiveHundred.add(pennids.get(index)); index++; }
for
(String pennid : fiveHundred) { Subject subject = SubjectFinder.findByIdAndSource(pennid,
"pennperson",
false);
if (subject !=
null) { result.put(pennid, subject); } }
if
(result.size() < 490) {
throw
new RuntimeException("Problem!
" + result.size()); } }
long start = System.nanoTime();
//############### lets try batched
//lets do a batch and a single one to prime the pump List<String> twoThousand =
new ArrayList<String>();
for
(int i=0;i<2000;i++) { twoThousand.add(pennids.get(index)); index++; }
Map<String, Subject> result = SubjectFinder.findByIds(twoThousand,
"pennperson");
if
(result.size() < 1900) {
throw
new RuntimeException("Problem!
" + result.size()); }
System.out.println("Resolving
2000 subjects by batch took: " + ((System.nanoTime() - start) / 1000000) +
"ms");
//############### lets try not batched start = System.nanoTime();
twoThousand =
new ArrayList<String>();
result =
new HashMap<String, Subject>();
for
(int i=0;i<2000;i++) { twoThousand.add(pennids.get(index)); index++; }
for
(String pennid : twoThousand) { Subject subject = SubjectFinder.findByIdAndSource(pennid,
"pennperson",
false);
if (subject !=
null) { result.put(pennid, subject); } }
if
(result.size() < 1900) {
throw
new RuntimeException("Problem!
" + result.size()); } System.out.println("Resolving
2000 subjects individually took: " + ((System.nanoTime() - start) / 1000000) +
"ms");
} } From: [mailto:]
On Behalf Of Tom Barton Thanks Chris. Two questions for the dev team: I finished coding the subject resolution batching performance improvement… Penn has a subject picker which finds subjects who are employees, and it would take more than 20 seconds to search which caused a timeout. Now it takes a few seconds,
and the number of queries and amount of data is bounded (not by N
J ). https://bugs.internet2.edu/jira/browse/GRP-713 Subject batching to not require N queries has been added to the following places: https://bugs.internet2.edu/jira/browse/GRP-712 There should be batching (for jdbc and jdbc2 source adapters) so that multiple subjects can be resolved at once. Note: for source adapters which do not implement a batched method to retrieve multiple subjects, they will be retrieved as
they were before one at a time. Thanks, Chris |
- [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/27/2011
- Re: [grouper-dev] subject batching performance improvement, Tom Barton, 11/27/2011
- RE: [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/27/2011
- Re: [grouper-dev] subject batching performance improvement, Tom Zeller, 11/27/2011
- RE: [grouper-dev] subject batching performance improvement, Jim Fox, 11/28/2011
- RE: [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/28/2011
- RE: [grouper-dev] subject batching performance improvement, Chris Hyzer, 11/27/2011
- <Possible follow-up(s)>
- RE: [grouper-dev] subject batching performance improvement, Thomas Barton, 11/28/2011
- RE: [grouper-dev] subject batching performance improvement, Thomas Barton, 11/28/2011
- Re: [grouper-dev] subject batching performance improvement, Tom Barton, 11/27/2011
Archive powered by MHonArc 2.6.16.