grouper-users - Re: [grouper-users] Re: Did not get all the way through the batch
Subject: Grouper Users - Open Discussion List
List archive
- From: Jeffrey Crawford <>
- To: David Langenberg <>
- Cc: Gouper Users List <>
- Subject: Re: [grouper-users] Re: Did not get all the way through the batch
- Date: Thu, 12 Feb 2015 12:12:51 -0800
So yes can stand down on this one with the idle timeout on the LDAP server this completes now, sorry for the false alarm.
Jeffrey
---------------------------------------
On Thu, Feb 12, 2015 at 11:50 AM, Jeffrey Crawford <> wrote:
I was just looking at this with some other admins, We are using OpenDJ however I think this issue is more related to the Load Balancer dropping connections silently, I didn't notice before that grouper holds persistent connections to LDAP, after I discovered that things started to make more sense I'm going to try again with the LDAP servers idle timeout set to two minutes.Stay TunedJeffrey---------------------------------------On Thu, Feb 12, 2015 at 9:51 AM, David Langenberg <> wrote:What's your LDAP server?Dave--On Fri, Feb 6, 2015 at 11:26 AM, Jeffrey Crawford <> wrote:Okay a little bit of a follow up, It looks like the bulkSync is not hanging so much as it's stalling pretty bad. I've stopped using the scheduled sync in the loader and tried bulkSync. It seems to process the first group pretty quickly, but then after that when it's looking up records, it takes a 15 minute pause between lookupRequest and lookupResponse, I'm going to try a snip the relevant information from the logs but I can provide more if something is missing:
-----------------------------
2015-02-05 23:51:30,767: [main] INFO Psp.execute(1417) - - Psp 'psp' - Sync SyncRequest[id=G000104134,requestID=2015/02/05-23:51:3
0.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-05 23:51:30,767: [main] INFO Psp.execute(1152) - - Psp 'psp' - Diff DiffRequest[id=G000104134,requestID=2015/02/05-23:51:30.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-05 23:51:30,767: [main] INFO Psp.execute(980) - - Psp 'psp' - Calc CalcRequest[id=G000104134,requestID=2015/02/05-23:51:30.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-05 23:51:30,778: [main] INFO Psp.execute(1009) - - Psp 'psp' - Calc CalcResponse[id=G000104134,status=success,requestID=2015/02/05-23:51:30.767,pso=PSO[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]
2015-02-05 23:51:30,779: [main] INFO BaseSpmlProvider.execute(285) - - Target 'psp' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/05-23:51:30.779]
2015-02-05 23:51:30,779: [main] INFO BaseSpmlProvider.execute(285) - - Target 'ldap1' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/05-23:51:30.779]<~15 min pause>
2015-02-06 00:06:56,408: [main] INFO BaseSpmlProvider.execute(314) - - Target 'ldap1' - Lookup LookupResponse[pso=PSO[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]],status=success,requestID=2015/02/05-23:51:30.779]
2015-02-06 00:06:56,411: [main] INFO Psp.execute(1446) - - Psp 'psp' - Sync SyncResponse[id=G000104134,status=success,requestID=20
15/02/05-23:51:30.767,SynchronizedResponse[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID
=<null>]]]
2015-02-06 00:06:56,411: [main] INFO Psp.execute(1417) - - Psp 'psp' - Sync SyncRequest[id=ID00002,requestID=2015/02/06-00:06:5
6.411,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-06 00:06:56,411: [main] INFO Psp.execute(1152) - - Psp 'psp' - Diff DiffRequest[id=ID00002,requestID=2015/02/06-00:06:5
6.411,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-06 00:06:56,412: [main] INFO Psp.execute(980) - - Psp 'psp' - Calc CalcRequest[id=ID00002,requestID=2015/02/06-00:06:56
.412,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]
2015-02-06 00:06:56,425: [main] INFO Psp.execute(1009) - - Psp 'psp' - Calc CalcResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.412,pso=PSO[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]
2015-02-06 00:06:56,425: [main] INFO BaseSpmlProvider.execute(285) - - Target 'psp' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/06-00:06:56.425]
2015-02-06 00:06:56,425: [main] INFO BaseSpmlProvider.execute(285) - - Target 'ldap1' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/06-00:06:56.425]<~15 min pause>
2015-02-06 00:22:22,055: [main] INFO BaseSpmlProvider.execute(314) - - Target 'psp' - Lookup LookupResponse[pso=PSO[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]],status=success,requestID=2015/02/06-00:06:56.425]
2015-02-06 00:22:22,056: [main] INFO Psp.execute(1181) - - Psp 'psp' - Diff DiffResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.411,synchronized=SynchronizedResponse[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]
2015-02-06 00:22:22,056: [main] INFO Psp.execute(1446) - - Psp 'psp' - Sync SyncResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.411,SynchronizedResponse[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]-----------------------------
-----------------------------The corresponding ldap queries are responding quickly which can be seen from "etime" but it seems to be having long gaps between each query, the ldap logs for the above look like:
-----------------------------
-----------------------------
[05/Feb/2015:23:36:05 -0800] CONNECT conn=5775882 from=xxx.xxx.xxx.xxx:42305 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[05/Feb/2015:23:36:05 -0800] BIND REQ conn=5775882 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[05/Feb/2015:23:36:05 -0800] BIND RES conn=5775882 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=1
[05/Feb/2015:23:36:05 -0800] SEARCH REQ conn=5775882 op=1 msgID=2 base="uid=goofy,ou=people,dc=ucsc,dc=edu" scope=baseObject filter
="(objectclass=*)" attrs="objectclass,objectClass,ou,description,cn,member"
[05/Feb/2015:23:36:05 -0800] SEARCH RES conn=5775882 op=1 msgID=2 result=0 nentries=1 etime=0
[05/Feb/2015:23:36:05 -0800] CONNECT conn=5775883 from=xxx.xxx.xxx.xxx:42306 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[05/Feb/2015:23:36:05 -0800] BIND REQ conn=5775883 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[05/Feb/2015:23:36:05 -0800] BIND RES conn=5775883 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=0
[05/Feb/2015:23:36:05 -0800] SEARCH REQ conn=5775883 op=1 msgID=2 base="ou=people,dc=ucsc,dc=edu" scope=wholeSubtree filter="(ucscPe
rsonID=ID00001)" attrs="entryDN"
[05/Feb/2015:23:36:05 -0800] SEARCH RES conn=5775883 op=1 msgID=2 result=0 nentries=1 etime=0
[05/Feb/2015:23:36:05 -0800] UNBIND REQ conn=5775883 op=2 msgID=3<no activity>
[06/Feb/2015:00:06:56 -0800] CONNECT conn=5777352 from=xxx.xxx.xxx.xxx:42373 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[06/Feb/2015:00:06:56 -0800] BIND REQ conn=5777352 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[06/Feb/2015:00:06:56 -0800] BIND RES conn=5777352 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=1
[06/Feb/2015:00:06:56 -0800] SEARCH REQ conn=5777352 op=1 msgID=2 base="uid=dduck,ou=people,dc=ucsc,dc=edu" scope=baseObject filt
er="(objectclass=*)" attrs="objectclass,objectClass,ou,description,cn,member"
[06/Feb/2015:00:06:56 -0800] SEARCH RES conn=5777352 op=1 msgID=2 result=0 nentries=1 etime=0
[06/Feb/2015:00:06:56 -0800] CONNECT conn=5777353 from=xxx.xxx.xxx.xxx:42374 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[06/Feb/2015:00:06:56 -0800] BIND REQ conn=5777353 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[06/Feb/2015:00:06:56 -0800] BIND RES conn=5777353 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=0
[06/Feb/2015:00:06:56 -0800] SEARCH REQ conn=5777353 op=1 msgID=2 base="ou=people,dc=ucsc,dc=edu" scope=wholeSubtree filter="(ucscPe
rsonID=ID00002)" attrs="entryDN"
[06/Feb/2015:00:06:56 -0800] SEARCH RES conn=5777353 op=1 msgID=2 result=0 nentries=1 etime=0
[06/Feb/2015:00:06:56 -0800] UNBIND REQ conn=5777353 op=2 msgID=3
[06/Feb/2015:00:06:56 -0800] DISCONNECT conn=5777353 reason="Client Unbind"
------------------------------------
------------------------------------From what I can tell when the LookupRequest triggers, it is performing an LDAP query right away. I tested using a direct query on the command line to make sure it was returning promptly, so I can confirm it returns pretty much instantly. The psp however doesn't continue until 15 min later. I've been using using jvisualvm with the cpu profiler turned on. Only the top three methods that have any significant time profiled are:com.sun.jndi.ldap.Connection.run () = 13,884,632 mssun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run () = 4,833,007 msjava.util.concurrent.ThreadPoolExecutor$Worker.run () = 420,001 msI'm not sure if the jndi is high because of pooling or if something is getting stuck there. We are using java 1.7.0_71 which should have the JNDI deadlock problem fix.Sorry for the long winded email, let me know how I can help debug the bulkSyncThanksJeffrey C.On Fri, Jan 30, 2015 at 1:57 PM, Jeffrey Crawford <> wrote:any ideas anyone?JeffreyBoth pilots and IT professionals require training and currency before charging into clouds!---------------------------------------On Wed, Jan 28, 2015 at 1:39 PM, Jeffrey Crawford <> wrote:This error shows up when we schedule the changeLog.psp.fullSync.quartzCron to run every Tuesday evening. It starts to run and then it looks like there is some sort of conflict between the changelog provisioner and the bulk sync process. Am I doing something incorrectly?If I restart the loader it goes through, the following is the full message:
------------------------------------------------------
2015-01-28 13:37:00,129: [DefaultQuartzScheduler_Worker-2] INFO PspChangeLogConsumer.processChangeLogEntries(481) - - PSP Consumer 'psp' - Full sync is running, returning sequence number '548601'
2015-01-28 13:37:00,187: [DefaultQuartzScheduler_Worker-2] ERROR ChangeLogHelper.processRecords(281) - - Did not get all the way through the batch! 548601 != 548701
2015-01-28 13:37:00,187: [DefaultQuartzScheduler_Worker-2] ERROR GrouperLoaderJob.runJob(433) - - Error on job: CHANGE_LOG_consumer_psp
java.lang.RuntimeException: Error in loader job: null, check logs: Did not get all the way through the batch! 548601 != 548701
at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.runJob(GrouperLoaderJob.java:422)
at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.execute(GrouperLoaderJob.java:315)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
2015-01-28 13:37:00,200: [DefaultQuartzScheduler_Worker-2] ERROR GrouperLoaderJob.execute(318) - - Error running up job
java.lang.RuntimeException: Error in loader job: null, check logs: Did not get all the way through the batch! 548601 != 548701
at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.runJob(GrouperLoaderJob.java:422)
at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.execute(GrouperLoaderJob.java:315)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
2015-01-28 13:37:02,037: [DefaultQuartzScheduler_Worker-10] INFO EventLog.info(156) - - [b8ed644e26334776af4d6bc5f4ab9f2b,'GrouperSystem','application'] session: start (19ms)JeffreyBoth pilots and IT professionals require training and currency before charging into clouds!---------------------------------------David LangenbergIdentity & Access ManagementThe University of Chicago
- [grouper-users] Re: Did not get all the way through the batch, Jeffrey Crawford, 02/06/2015
- Re: [grouper-users] Re: Did not get all the way through the batch, David Langenberg, 02/12/2015
- Re: [grouper-users] Re: Did not get all the way through the batch, Jeffrey Crawford, 02/12/2015
- Re: [grouper-users] Re: Did not get all the way through the batch, Jeffrey Crawford, 02/12/2015
- Re: [grouper-users] Re: Did not get all the way through the batch, Jeffrey Crawford, 02/12/2015
- Re: [grouper-users] Re: Did not get all the way through the batch, David Langenberg, 02/12/2015
Archive powered by MHonArc 2.6.16.