Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] Re: Did not get all the way through the batch

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] Re: Did not get all the way through the batch


Chronological Thread 
  • From: Jeffrey Crawford <>
  • To: David Langenberg <>
  • Cc: Gouper Users List <>
  • Subject: Re: [grouper-users] Re: Did not get all the way through the batch
  • Date: Thu, 12 Feb 2015 11:50:18 -0800

I was just looking at this with some other admins, We are using OpenDJ however I think this issue is more related to the Load Balancer dropping connections silently, I didn't notice before that grouper holds persistent connections to LDAP, after I discovered that things started to make more sense I'm going to try again with the LDAP servers idle timeout set to two minutes.

Stay Tuned

Jeffrey
---------------------------------------

On Thu, Feb 12, 2015 at 9:51 AM, David Langenberg <> wrote:
What's your LDAP server?  

Dave

On Fri, Feb 6, 2015 at 11:26 AM, Jeffrey Crawford <> wrote:
Okay a little bit of a follow up, It looks like the bulkSync is not hanging so much as it's stalling pretty bad. I've stopped using the scheduled sync in the loader and tried bulkSync. It seems to process the first group pretty quickly, but then after that when it's looking up records, it takes a 15 minute pause between lookupRequest and lookupResponse, I'm going to try a snip the relevant information from the logs but I can provide more if something is missing:

-----------------------------
2015-02-05 23:51:30,767: [main] INFO  Psp.execute(1417) -  - Psp 'psp' - Sync SyncRequest[id=G000104134,requestID=2015/02/05-23:51:3
0.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-05 23:51:30,767: [main] INFO  Psp.execute(1152) -  - Psp 'psp' - Diff DiffRequest[id=G000104134,requestID=2015/02/05-23:51:30.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-05 23:51:30,767: [main] INFO  Psp.execute(980) -  - Psp 'psp' - Calc CalcRequest[id=G000104134,requestID=2015/02/05-23:51:30.767,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-05 23:51:30,778: [main] INFO  Psp.execute(1009) -  - Psp 'psp' - Calc CalcResponse[id=G000104134,status=success,requestID=2015/02/05-23:51:30.767,pso=PSO[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]

2015-02-05 23:51:30,779: [main] INFO  BaseSpmlProvider.execute(285) -  - Target 'psp' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/05-23:51:30.779]

2015-02-05 23:51:30,779: [main] INFO  BaseSpmlProvider.execute(285) -  - Target 'ldap1' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/05-23:51:30.779]

<~15 min pause>

2015-02-06 00:06:56,408: [main] INFO  BaseSpmlProvider.execute(314) -  - Target 'ldap1' - Lookup LookupResponse[pso=PSO[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]],status=success,requestID=2015/02/05-23:51:30.779]

2015-02-06 00:06:56,411: [main] INFO  Psp.execute(1446) -  - Psp 'psp' - Sync SyncResponse[id=G000104134,status=success,requestID=20
15/02/05-23:51:30.767,SynchronizedResponse[psoID=PSOIdentifier[id='uid=dduck,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID
=<null>]]]

2015-02-06 00:06:56,411: [main] INFO  Psp.execute(1417) -  - Psp 'psp' - Sync SyncRequest[id=ID00002,requestID=2015/02/06-00:06:5
6.411,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-06 00:06:56,411: [main] INFO  Psp.execute(1152) -  - Psp 'psp' - Diff DiffRequest[id=ID00002,requestID=2015/02/06-00:06:5
6.411,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-06 00:06:56,412: [main] INFO  Psp.execute(980) -  - Psp 'psp' - Calc CalcRequest[id=ID00002,requestID=2015/02/06-00:06:56
.412,returnData=everything,schemaEntityRef=SchemaEntityRef[targetID=ldap1,entityName=member1,isContainer=false]]

2015-02-06 00:06:56,425: [main] INFO  Psp.execute(1009) -  - Psp 'psp' - Calc CalcResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.412,pso=PSO[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]

2015-02-06 00:06:56,425: [main] INFO  BaseSpmlProvider.execute(285) -  - Target 'psp' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/06-00:06:56.425]

2015-02-06 00:06:56,425: [main] INFO  BaseSpmlProvider.execute(285) -  - Target 'ldap1' - Lookup LookupRequest[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>],returnData=everything,requestID=2015/02/06-00:06:56.425]

<~15 min pause>

2015-02-06 00:22:22,055: [main] INFO  BaseSpmlProvider.execute(314) -  - Target 'psp' - Lookup LookupResponse[pso=PSO[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]],status=success,requestID=2015/02/06-00:06:56.425]

2015-02-06 00:22:22,056: [main] INFO  Psp.execute(1181) -  - Psp 'psp' - Diff DiffResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.411,synchronized=SynchronizedResponse[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]

2015-02-06 00:22:22,056: [main] INFO  Psp.execute(1446) -  - Psp 'psp' - Sync SyncResponse[id=ID00002,status=success,requestID=2015/02/06-00:06:56.411,SynchronizedResponse[psoID=PSOIdentifier[id='uid=mmouse,ou=people,dc=ucsc,dc=edu',targetID=ldap1,containerID=<null>]]]
-----------------------------
-----------------------------

The corresponding ldap queries are responding quickly which can be seen from "etime" but it seems to be having long gaps between each query, the ldap logs for the above look like:

-----------------------------
-----------------------------
[05/Feb/2015:23:36:05 -0800] CONNECT conn=5775882 from=xxx.xxx.xxx.xxx:42305 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[05/Feb/2015:23:36:05 -0800] BIND REQ conn=5775882 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[05/Feb/2015:23:36:05 -0800] BIND RES conn=5775882 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=1
[05/Feb/2015:23:36:05 -0800] SEARCH REQ conn=5775882 op=1 msgID=2 base="uid=goofy,ou=people,dc=ucsc,dc=edu" scope=baseObject filter
="(objectclass=*)" attrs="objectclass,objectClass,ou,description,cn,member"
[05/Feb/2015:23:36:05 -0800] SEARCH RES conn=5775882 op=1 msgID=2 result=0 nentries=1 etime=0
[05/Feb/2015:23:36:05 -0800] CONNECT conn=5775883 from=xxx.xxx.xxx.xxx:42306 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[05/Feb/2015:23:36:05 -0800] BIND REQ conn=5775883 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[05/Feb/2015:23:36:05 -0800] BIND RES conn=5775883 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=0
[05/Feb/2015:23:36:05 -0800] SEARCH REQ conn=5775883 op=1 msgID=2 base="ou=people,dc=ucsc,dc=edu" scope=wholeSubtree filter="(ucscPe
rsonID=ID00001)" attrs="entryDN"
[05/Feb/2015:23:36:05 -0800] SEARCH RES conn=5775883 op=1 msgID=2 result=0 nentries=1 etime=0
[05/Feb/2015:23:36:05 -0800] UNBIND REQ conn=5775883 op=2 msgID=3

<no activity>

[06/Feb/2015:00:06:56 -0800] CONNECT conn=5777352 from=xxx.xxx.xxx.xxx:42373 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[06/Feb/2015:00:06:56 -0800] BIND REQ conn=5777352 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[06/Feb/2015:00:06:56 -0800] BIND RES conn=5777352 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=1
[06/Feb/2015:00:06:56 -0800] SEARCH REQ conn=5777352 op=1 msgID=2 base="uid=dduck,ou=people,dc=ucsc,dc=edu" scope=baseObject filt
er="(objectclass=*)" attrs="objectclass,objectClass,ou,description,cn,member"
[06/Feb/2015:00:06:56 -0800] SEARCH RES conn=5777352 op=1 msgID=2 result=0 nentries=1 etime=0
[06/Feb/2015:00:06:56 -0800] CONNECT conn=5777353 from=xxx.xxx.xxx.xxx:42374 to=xxx.xxx.xxx.xxx:389 protocol=LDAP
[06/Feb/2015:00:06:56 -0800] BIND REQ conn=5777353 op=0 msgID=1 version=3 type=SIMPLE dn="cn=xxxxxxxxx"
[06/Feb/2015:00:06:56 -0800] BIND RES conn=5777353 op=0 msgID=1 result=0 authDN="cn=xxxxxxxxxxxx,cn=xxxx DNs,cn=xxxxx" etime=0
[06/Feb/2015:00:06:56 -0800] SEARCH REQ conn=5777353 op=1 msgID=2 base="ou=people,dc=ucsc,dc=edu" scope=wholeSubtree filter="(ucscPe
rsonID=ID00002)" attrs="entryDN"
[06/Feb/2015:00:06:56 -0800] SEARCH RES conn=5777353 op=1 msgID=2 result=0 nentries=1 etime=0
[06/Feb/2015:00:06:56 -0800] UNBIND REQ conn=5777353 op=2 msgID=3
[06/Feb/2015:00:06:56 -0800] DISCONNECT conn=5777353 reason="Client Unbind"
------------------------------------
------------------------------------

From what I can tell when the LookupRequest triggers, it is performing an LDAP query right away. I tested using a direct query on the command line to make sure it was returning promptly, so I can confirm it returns pretty much instantly. The psp however doesn't continue until 15 min later. I've been using using jvisualvm with the cpu profiler turned on. Only the top three methods that have any significant time profiled are:
com.sun.jndi.ldap.Connection.run ()                         = 13,884,632 ms
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run () = 4,833,007 ms
java.util.concurrent.ThreadPoolExecutor$Worker.run ()       = 420,001 ms

I'm not sure if the jndi is high because of pooling or if something is getting stuck there. We are using java 1.7.0_71 which should have the JNDI deadlock problem fix.

Sorry for the long winded email, let me know how I can help debug the bulkSync

Thanks
Jeffrey C.


On Fri, Jan 30, 2015 at 1:57 PM, Jeffrey Crawford <> wrote:
any ideas anyone?

Jeffrey

Both pilots and IT professionals require training and currency before charging into clouds!
---------------------------------------

On Wed, Jan 28, 2015 at 1:39 PM, Jeffrey Crawford <> wrote:
This error shows up when we schedule the changeLog.psp.fullSync.quartzCron to run every Tuesday evening. It starts to run and then it looks like there is some sort of conflict between the changelog provisioner and the bulk sync process. Am I doing something incorrectly?

If I restart the loader it goes through, the following is the full message:
------------------------------------------------------
2015-01-28 13:37:00,129: [DefaultQuartzScheduler_Worker-2] INFO  PspChangeLogConsumer.processChangeLogEntries(481) -  - PSP Consumer 'psp' - Full sync is running, returning sequence number '548601'
2015-01-28 13:37:00,187: [DefaultQuartzScheduler_Worker-2] ERROR ChangeLogHelper.processRecords(281) -  - Did not get all the way through the batch! 548601 != 548701
2015-01-28 13:37:00,187: [DefaultQuartzScheduler_Worker-2] ERROR GrouperLoaderJob.runJob(433) -  - Error on job: CHANGE_LOG_consumer_psp
java.lang.RuntimeException: Error in loader job: null, check logs: Did not get all the way through the batch! 548601 != 548701
        at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.runJob(GrouperLoaderJob.java:422)
        at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.execute(GrouperLoaderJob.java:315)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
2015-01-28 13:37:00,200: [DefaultQuartzScheduler_Worker-2] ERROR GrouperLoaderJob.execute(318) -  - Error running up job
java.lang.RuntimeException: Error in loader job: null, check logs: Did not get all the way through the batch! 548601 != 548701
        at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.runJob(GrouperLoaderJob.java:422)
        at edu.internet2.middleware.grouper.app.loader.GrouperLoaderJob.execute(GrouperLoaderJob.java:315)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
2015-01-28 13:37:02,037: [DefaultQuartzScheduler_Worker-10] INFO  EventLog.info(156) -  - [b8ed644e26334776af4d6bc5f4ab9f2b,'GrouperSystem','application'] session: start (19ms)



Jeffrey

Both pilots and IT professionals require training and currency before charging into clouds!
---------------------------------------





--
David Langenberg
Identity & Access Management
The University of Chicago




Archive powered by MHonArc 2.6.16.

Top of Page