Skip to Content.
Sympa Menu

grouper-dev - RE: [grouper-dev] speedup by caching in the ldap source adapter ?

Subject: Grouper Developers Forum

List archive

RE: [grouper-dev] speedup by caching in the ldap source adapter ?


Chronological Thread 
  • From: Chris Hyzer <>
  • To: Tom Zeller <>, "Michael R. Gettes" <>
  • Cc: Grouper Dev <>
  • Subject: RE: [grouper-dev] speedup by caching in the ldap source adapter ?
  • Date: Fri, 18 May 2012 18:11:22 +0000
  • Accept-language: en-US

FYI, I did some performance numbers of translating pennids to pennkeys in
ldap... I didn't do a uid=* since I don't want to get yelled at if something
bad happened (600k total). But I tried it one by and one and batched by
100... the result was 1 by 1 not in cache for 20k queries was 2 minutes. 1
by 1 after running it once (primed the cache?) took 30 seconds. Querying 100
at a time took 12 seconds and doesn't need a server side cache (since
different 100 in filter probably)... (I don't know what the optimum batch
size is... 200 was slower, 50 was faster [10 seconds total instead of 12].

Thanks,
Chris

###############
## 1 at a time
###############

Retrieved pennids: 5598ms
Shuffled pennids: 5ms
Retrieved from ldap (1000): 7216ms
Retrieved from ldap (2000): 12040ms
Retrieved from ldap (3000): 16976ms
Retrieved from ldap (4000): 23398ms
Retrieved from ldap (5000): 32562ms
Retrieved from ldap (6000): 39335ms
Retrieved from ldap (7000): 47543ms
Retrieved from ldap (8000): 53950ms
Retrieved from ldap (9000): 57965ms
Retrieved from ldap (10000): 69532ms
Retrieved from ldap (11000): 77387ms
Retrieved from ldap (12000): 86075ms
Retrieved from ldap (13000): 89965ms
Retrieved from ldap (14000): 91959ms
Retrieved from ldap (15000): 99919ms
Retrieved from ldap (16000): 101779ms
Retrieved from ldap (17000): 103607ms
Retrieved from ldap (18000): 106131ms
Retrieved from ldap (19000): 108994ms
Retrieved from ldap (20000): 110865ms
Retrieved from ldap all: 111045ms
Found 18854 pennkeys

############
## 1 at a time, 2nd run (cached on ldap server?)
############

Retrieved pennids: 4224ms
Shuffled pennids: 3ms
Retrieved from ldap (1000): 1490ms
Retrieved from ldap (2000): 3152ms
Retrieved from ldap (3000): 4781ms
Retrieved from ldap (4000): 6386ms
Retrieved from ldap (5000): 8060ms
Retrieved from ldap (6000): 9586ms
Retrieved from ldap (7000): 11118ms
Retrieved from ldap (8000): 12646ms
Retrieved from ldap (9000): 14170ms
Retrieved from ldap (10000): 15727ms
Retrieved from ldap (11000): 17263ms
Retrieved from ldap (12000): 18796ms
Retrieved from ldap (13000): 20313ms
Retrieved from ldap (14000): 21871ms
Retrieved from ldap (15000): 23501ms
Retrieved from ldap (16000): 25018ms
Retrieved from ldap (17000): 26583ms
Retrieved from ldap (18000): 28125ms
Retrieved from ldap (19000): 29628ms
Retrieved from ldap (20000): 31153ms
Retrieved from ldap all: 31305ms
Found 18854 pennkeys

#############
## 100 at a time
#############

Retrieved pennids: 4402ms
Shuffled pennids: 3ms
Retrieved from ldap (1000): 690ms
Retrieved from ldap (2000): 1354ms
Retrieved from ldap (3000): 1995ms
Retrieved from ldap (4000): 2629ms
Retrieved from ldap (5000): 3268ms
Retrieved from ldap (6000): 3918ms
Retrieved from ldap (7000): 4565ms
Retrieved from ldap (8000): 5190ms
Retrieved from ldap (9000): 5808ms
Retrieved from ldap (10000): 6439ms
Retrieved from ldap (11000): 7068ms
Retrieved from ldap (12000): 7697ms
Retrieved from ldap (13000): 8318ms
Retrieved from ldap (14000): 8948ms
Retrieved from ldap (15000): 9591ms
Retrieved from ldap (16000): 10231ms
Retrieved from ldap (17000): 10859ms
Retrieved from ldap (18000): 11486ms
Retrieved from ldap (19000): 12112ms
Retrieved from ldap (20000): 12738ms
Retrieved from ldap all: 12738ms
Found 18854 pennkeys

###############
## 100 at a time second try (its shuffled so server caching doesnt really
help)
###############

Retrieved pennids: 4477ms
Shuffled pennids: 3ms
Retrieved from ldap (1000): 712ms
Retrieved from ldap (2000): 1370ms
Retrieved from ldap (3000): 2036ms
Retrieved from ldap (4000): 2675ms
Retrieved from ldap (5000): 3300ms
Retrieved from ldap (6000): 3933ms
Retrieved from ldap (7000): 4557ms
Retrieved from ldap (8000): 5183ms
Retrieved from ldap (9000): 5819ms
Retrieved from ldap (10000): 6451ms
Retrieved from ldap (11000): 7067ms
Retrieved from ldap (12000): 7715ms
Retrieved from ldap (13000): 8351ms
Retrieved from ldap (14000): 8976ms
Retrieved from ldap (15000): 9609ms
Retrieved from ldap (16000): 10243ms
Retrieved from ldap (17000): 10870ms
Retrieved from ldap (18000): 11501ms
Retrieved from ldap (19000): 12131ms
Retrieved from ldap (20000): 12756ms
Retrieved from ldap all: 12756ms
Found 18854 pennkeys


At the risk of someone telling me its not a good test, here is the source...




/**
* @author mchyzer
* $Id$
*/
package edu.internet2.middleware.grouper.ldap;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import edu.internet2.middleware.grouper.hibernate.HibernateSession;
import edu.internet2.middleware.grouper.misc.GrouperStartup;
import edu.internet2.middleware.grouper.util.GrouperUtil;


/**
*
*/
public class LdapPennPoc {

/**
*
* @param pennids
*/
public static void runAllOneByOne(List<String> pennids) {

Set<String> pennkeysSet = new HashSet<String>();

//try 100 without timing
for (int i=0; i<100; i++) {
List<String> pennkeys = retrieveOnePennkey(pennids, i);

pennkeysSet.addAll(GrouperUtil.nonNull(pennkeys));

// for (int j=0;j<pennkeys.size();j++) {
//
// System.out.println("Pennkey: " + pennkeys.get(j));
// }
}

long start = System.nanoTime();
for (int i=100;i<pennids.size();i++) {
List<String> pennkeys = retrieveOnePennkey(pennids, i);

pennkeysSet.addAll(GrouperUtil.nonNull(pennkeys));

if (i%1000 == 0) {
System.out.println("Retrieved from ldap (" + i + "): " +
((System.nanoTime() - start) / 1000000) + "ms");
}
}

System.out.println("Retrieved from ldap all: " + ((System.nanoTime() -
start) / 1000000) + "ms");
System.out.println("Found " + pennkeysSet.size() + " pennkeys");
}


/**
* @param pennids
* @param i
* @return the list
*/
private static List<String> retrieveOnePennkey(List<String> pennids, int i)
{
return LdapSession.list(String.class, "pennProd", "ou=pennnames",
LdapSearchScope.ONELEVEL_SCOPE, "(pennid=" + pennids.get(i) + ")",
"pennname");
}

/**
* @param pennids
* @param startIndex
* @return the list
*/
private static List<String> retrieve100Pennkeys(List<String> pennids, int
startIndex) {

StringBuilder filter = new StringBuilder(" (| ");

//(...K1...) (...K2...) (...K3...) (...K4...))");
boolean foundOne = false;
for (int i=startIndex*100;(i<startIndex*100 + 100) &&
(startIndex*100+99<pennids.size());i++) {

filter.append("(pennid=").append(pennids.get(i)).append(") ");

foundOne = true;

}
if (!foundOne) {
return new ArrayList<String>();
}

filter.append(" )");

//System.out.println(filter);

return LdapSession.list(String.class, "pennProd", "ou=pennnames",
LdapSearchScope.ONELEVEL_SCOPE, filter.toString(), "pennname");
}



/**
*
* @param pennids
*/
public static void runAll100atTime(List<String> pennids) {

Set<String> pennkeysSet = new HashSet<String>();

List<String> pennkeys = retrieve100Pennkeys(pennids, 0);

pennkeysSet.addAll(GrouperUtil.nonNull(pennkeys));

long start = System.nanoTime();
for (int i=1;i<pennids.size()/100;i++) {
pennkeys = retrieve100Pennkeys(pennids, i);
if (i%10 == 0) {
System.out.println("Retrieved from ldap (" + (i*100)+ "): " +
((System.nanoTime() - start) / 1000000) + "ms");
}
pennkeysSet.addAll(GrouperUtil.nonNull(pennkeys));

}

System.out.println("Retrieved from ldap all: " + ((System.nanoTime() -
start) / 1000000) + "ms");
System.out.println("Found " + pennkeysSet.size() + " pennkeys");

}


/**
*
* @param args
*/
public static void main(String[] args) {
GrouperStartup.runDdlBootstrap = false;
GrouperStartup.ignoreCheckConfig = true;

long start = System.nanoTime();

//get all pennid
List<String> pennids =
HibernateSession.bySqlStatic().listSelect(String.class,
"select distinct(GMLV.SUBJECT_ID) from grouper_memberships_lw_v gmlv "
+ " where GMLV.SUBJECT_SOURCE = 'pennperson' and gmlv.list_name =
'members' "
+ " and GMLV.GROUP_NAME = 'penn:community:employee' and rownum <=
20100 ", null);

System.out.println("Retrieved pennids: " + ((System.nanoTime() - start) /
1000000) + "ms");

start = System.nanoTime();

Collections.shuffle(pennids);

System.out.println("Shuffled pennids: " + ((System.nanoTime() - start) /
1000000) + "ms");

//runAllOneByOne(pennids);

runAll100atTime(pennids);

}

}



-----Original Message-----
From:


[mailto:]
On Behalf Of Tom Zeller
Sent: Friday, May 18, 2012 1:14 PM
To: Michael R. Gettes
Cc: Grouper Dev
Subject: Re: [grouper-dev] speedup by caching in the ldap source adapter ?

Actually, your half-way approach is what I was proposing : an
optional search for all subjects performed once at startup to
prime/warm the subject cache. Here's a potential configuration snippet
(sources.xml, with xml tags removed) :

searchSubject
filter : (&amp;(uid=%TERM%)(objectclass=person))
scope : subtree_scope
base : ou=people,dc=example,dc=edu

<!-- Set to true to cache subjects once at startup. -->
cache : true
<!-- The initial capacity should be slightly greater than the number
of subjects. -->
cacheInitialCapacity : 250000
<!-- The value used to replace %TERM% in the search filter. -->
cacheSearchValue : *

For what it's worth, I spent some time looking at the OpenIDM product,
and it too relies on _query_all_ids for provisioning.

TomZ

On Thu, May 17, 2012 at 7:15 PM, Michael R. Gettes
<>
wrote:
> Hi Tom,
>
> I am very concerned about this - allowing to do uid=* is killer on the
> cache for the LDAP server.  It's a really nasty thing to do.  Bottom-line,
> it's not just a caching problem for grouper or a provisioning system, it's
> a caching problem for the components you're querying, like LDAP.  I should
> also note, as you know, there are many parameters on LDAP servers to govern
> limits on searching and such and SOME ldap servers offer limits on a DN
> basis while others do not.  Maybe a "half-way" approach would be to offer
> an optional "priming search" at start-up where those who can do as you are
> suggesting would be able to do so.
>
> I have 389 and because of my situation at CMU I have to search every member
> of every group before making any modifications to the group.  A group with
> 22K members takes about 8 minutes to process after the first pass on the
> group when the set of member DNs gets loaded into the LDAP cache.  The
> first time through can take 2-3 times longer.  Additionally, I have the
> problem that uid=* would return a great deal more than the 22K needed for
> my largest group, I have about 400K objects and about 50% have uid
> attributes.  We are not configured to pull the full 400K objects into cache
> for LDAP.
>
> I hope this helps.
>
> /mrg
>
> On May 17, 2012, at 19:43, Tom Zeller wrote:
>
>> When provisioning an ldap target, it seems that querying for members
>> takes the most time, especially when provisioning the "everyone"
>> group. Even if the source resolver has a large cache, at least one
>> ldap search is performed for every grouper membership.
>>
>> To avoid the performance penalty and scaling issues with
>> one-ldap-search-per-membership, I added a simple cache to the ldap
>> source adapter, and this cache is warmed by slurping the target ldap
>> directory at startup.
>>
>> In other words, rather than n "(uid=...)" searches, just do one
>> "(uid=*)" search.
>>
>> I hesitate to give numbers, but on my local box, provisioning a group
>> with 100k members takes approximately 25 minutes (not hours !) without
>> the warmed-up cache. With the warmed-up cache, provisioning a group
>> with 100k members takes more like 5 minutes.
>>
>> Another option is to make sure that all subject attributes needed for
>> provisioning are written to the grouper_members table, and avoid ldap
>> member searches entirely.
>>
>> What do you think ?
>>
>> TomZ
>



Archive powered by MHonArc 2.6.16.

Top of Page