Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] Grouper applications stuck after time of inactivity

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] Grouper applications stuck after time of inactivity


Chronological Thread 
  • From: Sonia Zorba <>
  • To:
  • Subject: Re: [grouper-users] Grouper applications stuck after time of inactivity
  • Date: Fri, 9 Sep 2016 16:51:26 +0200

Hi,

I eventually resolved this issue.

Analyzing the situation with Wireshark I found the LDAP server, after a time of inactivity, doesn't respond anymore to the client.
Then there is a sequence of "TCP Retransmission" packets that stops when it reaches the maximum attempt count (see attached screenshot).

The stuck is also visible using the command netstat -o:

Before:

Proto Recv-Q Send-Q Local Address               Foreign Address             State  
tcp        0      0 <MY-SERVER>:41500           <LDAP-SERVER>:ldap          ESTABLISHED off (0.00/0/0)

During the problem:

Proto Recv-Q Send-Q Local Address               Foreign Address             State  
tcp        0    188 <MY-SERVER>:41500           <LDAP-SERVER>:ldap          ESTABLISHED on (41,84/8/0)

The problem seems to be firewall-related, I quote from http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ :

"It's a very common issue, when you are behind a NAT proxy or a firewall, to be disconnected without a reason. This behavior is caused by the connection tracking procedures implemented in proxies and firewalls, which keep track of all connections that pass through them. Because of the physical limits of these machines, they can only keep a finite number of connections in their memory. The most common and logical policy is to keep newest connections and to discard old and inactive connections first."

Some libraries offer keep alive settings to solve this problem, but the vt-ldap library seems not.
Its ConnectLdapValidator class doesn't understand these connections should be reopened, because they are in a "half-opened" (dropped) state.

My solution was to install the following library:

https://github.com/flonatel/libdontdie

and loading it on Tomcat startup, adding these lines on $CATALINA_HOME/bin/setenv.sh

export DD_DEBUG=1
export DD_TCP_KEEPALIVE_TIME=300
export DD_TCP_KEEPALIVE_INTVL=60
export DD_TCP_KEEPALIVE_PROBES=5
export LD_PRELOAD=/usr/lib/libdontdie.so

When the library is active netstat -o shows the following:

Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 <MY-SERVER>:41500           <LDAP-SERVER>:ldap          ESTABLISHED keepalive (232,51/0/0)

Cheers,
Sonia


On 02/09/2016 19:10, Sonia Zorba wrote:
Hi,

this started happening when we configured the application to use the external LDAP. At the time there weren't other applications on the server. This happens both with Tomcat 6 and Tomcat 8.

I restarted Tomcat several time, trying to change some configurations, but nothing is changed.
I initially used the API binary without patches.

Today I have installed the patches using the Grouper installer (on api, ui and ws). All was ok. About 3 hours later, I accessed the Grouper UI and it was stuck, waiting for something:

2016-09-02 14:31:43,692: [http-nio-8080-exec-10] DEBUG BaseSourceAdapter.getSubjectByIdOrIdentifier(277) -  - method: getSubjectByIdOrIdentifier, idOrIdentifier: GrouperSystem, exceptionIfNull: false
2016-09-02 14:43:33,055: [http-nio-8080-exec-1] DEBUG LdapSourceAdapter.getLdapUnique(862) -  - method: getLdapUnique, search: Search [params=scope: SUBTREE_SCOPE, filter: (& (uid=%TERM%) (objectclass=inetOrgPerson)), base: ou=people,dc=inaf,dc=it, searchType=searchSubject], searchValue: GrouperSystem, attributeNames: Array size: 9: [0]: uid

It waited more than 10 minutes, then the page became unreachable.

In catalina.out I have found:

02-Sep-2016 14:43:46.124 SEVERE [http-nio-8080-exec-7] org.apache.catalina.realm.JNDIRealm.authenticate Exception performing authentication
 javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'ou=people,dc=inaf,dc=it'

Restarting the server all was ok again.

Now, 4 hours later, the same behavior...

I think my problem could be firewall/connection related, similar to these:

http://shibboleth.1660669.n2.nabble.com/Sjibb-LDAP-pool-hangs-on-firewall-TCP-session-truncation-td7626121.html
https://bz.apache.org/bugzilla/show_bug.cgi?id=45834

but I'm not a network expert and I don't know what I could configure in Grouper/Tomcat to avoid this issue (if there is something that I can do).

This is my grouperPatchStatus.properties:

# added by grouper-installer
grouper_v2_3_0_api_patch_0.date = 2016/09/02 11:20:28
# added by grouper-installer
grouper_v2_3_0_api_patch_0.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_1.date = 2016/09/02 11:20:31

# added by grouper-installer
grouper_v2_3_0_api_patch_1.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_2.date = 2016/09/02 11:20:33

# added by grouper-installer
grouper_v2_3_0_api_patch_2.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_3.date = 2016/09/02 11:20:35

# added by grouper-installer
grouper_v2_3_0_api_patch_3.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_4.date = 2016/09/02 11:20:37

# added by grouper-installer
grouper_v2_3_0_api_patch_4.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_5.date = 2016/09/02 11:20:39

# added by grouper-installer
grouper_v2_3_0_api_patch_5.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_6.date = 2016/09/02 11:20:42

# added by grouper-installer
grouper_v2_3_0_api_patch_6.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_7.date = 2016/09/02 11:20:46

# added by grouper-installer
grouper_v2_3_0_api_patch_7.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_8.date = 2016/09/02 11:20:49

# added by grouper-installer
grouper_v2_3_0_api_patch_8.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_9.date = 2016/09/02 11:20:51

# added by grouper-installer
grouper_v2_3_0_api_patch_9.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_10.date = 2016/09/02 11:20:53

# added by grouper-installer
grouper_v2_3_0_api_patch_10.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_11.date = 2016/09/02 11:20:55

# added by grouper-installer
grouper_v2_3_0_api_patch_11.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_12.date = 2016/09/02 11:20:57

# added by grouper-installer
grouper_v2_3_0_api_patch_12.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_13.date = 2016/09/02 11:20:58

# added by grouper-installer
grouper_v2_3_0_api_patch_13.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_14.date = 2016/09/02 11:21:00

# added by grouper-installer
grouper_v2_3_0_api_patch_14.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_15.date = 2016/09/02 11:21:02

# added by grouper-installer
grouper_v2_3_0_api_patch_15.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_16.date = 2016/09/02 11:21:03

# added by grouper-installer
grouper_v2_3_0_api_patch_16.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_17.date = 2016/09/02 11:21:05

# added by grouper-installer
grouper_v2_3_0_api_patch_17.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_18.date = 2016/09/02 11:21:07

# added by grouper-installer
grouper_v2_3_0_api_patch_18.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_19.date = 2016/09/02 11:21:09

# added by grouper-installer
grouper_v2_3_0_api_patch_19.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_20.date = 2016/09/02 11:21:11

# added by grouper-installer
grouper_v2_3_0_api_patch_20.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_21.date = 2016/09/02 11:21:13

# added by grouper-installer
grouper_v2_3_0_api_patch_21.state = applied

# added by grouper-installer
grouper_v2_3_0_api_patch_22.date = 2016/09/02 11:21:14

# added by grouper-installer
grouper_v2_3_0_api_patch_22.state = applied

Cheers,
Sonia


----- Original Message -----
From: "Hyzer, Chris" 
To: "Sonia Zorba" , "grouper-users" 
Sent: Thursday, September 1, 2016 3:53:34 PM
Subject: RE: [grouper-users] Grouper applications stuck after time of inactivity

I think those errors are not accurate...

Have you reloaded the webapp, or do you restart tomcat when applying changes?  Is this app the only one in that tomcat or do other apps run there too?  Which patches do you have installed exactly?  See  grouperPatchStatus.properties

Did this only start happening recently?

Thanks
Chris

-----Original Message-----
From:  [] On Behalf Of Sonia Zorba
Sent: Thursday, September 01, 2016 4:15 AM
To: 
Subject: [grouper-users] Grouper applications stuck after time of inactivity

Hi,

I'm experiencing a strange behavior from several days.

I have a Grouper 2.3.0 installed on Tomcat 8 (web services + UI).

My sources are:
- an LDAP owned by an external machine;
- an LDAP in localhost;

I'm the only person currently using this testing machine, so in the 
night there are no requests to Grouper. When I open the UI application 
or call the webservice for the first time in the morning, the 
applications wait about 15 minutes, then all is working correctly.

I increased the logging levels, but I only see a long wait:

2016-09-01 08:56:18,701: [http-nio-8080-exec-4] DEBUG LdapSourceAdapter.getLdapResultsHelper(740) -  - searchType: searchSubject, filter: (& (uid=GrouperSystem) (objectclass=inetOrgPerson))

2016-09-01 09:11:57,658: [http-nio-8080-exec-4] DEBUG LdapSourceAdapter.getLdapUnique(862) -  - method: getLdapUnique, search: Search [params=scope: SUBTREE_SCOPE, filter: (& (uid=%TERM%) (objectclass=inetOrgPerson)), base: ou=[...], searchType=searchSubject], searchValue: GrouperSystem, attributeNames: Array size: 9: [0]: uid


This happens when Grouper call the external LDAP (the localhost one 
responds immediately).

I tried calling the services and the UI with curl all night with a cron 
job each minute and the problem didn't happen.
However, in the first access with another testing webapp that uses the 
GrouperClient API, that webapp waited for 15 minutes too! So this seems 
associated to each single client or session created.

I read that could be a problem of "connection pools leak", so I would 
try to forcedly close abandoned connections. How I can configure this in 
the Grouper settings?

Morover, in my catalina.out I read these bad memory leak warning:

31-Aug-2016 23:55:18.765 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

  java.lang.Object.wait(Native Method)

  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)

  com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43)

31-Aug-2016 23:55:18.766 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-AdminTaskTimer] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

  java.lang.Object.wait(Native Method)

  java.util.TimerThread.mainLoop(Timer.java:552)

  java.util.TimerThread.run(Timer.java:505)

31-Aug-2016 23:55:18.767 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#0] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

  java.lang.Object.wait(Native Method)

  com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)

31-Aug-2016 23:55:18.767 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

  java.lang.Object.wait(Native Method)

  com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)

31-Aug-2016 23:55:18.768 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#2] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

  java.lang.Object.wait(Native Method)

  com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)

31-Aug-2016 23:55:18.768 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [net.sf.ehcache.CacheManager@3e2b44dd] but has failed to stop it. This is very likely to create a memory leak.


I don't know if this could be related to the wait too. I added these 
JAVA_OPTS but it didn't help:

JAVA_OPTS="$JAVA_OPTS -Djava.awt.headless=true -Xms512M -Xmx1024m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:PermSize=64m -XX:MaxPermSize=512m"


Your help will be really appreciate.

Thanks,
Sonia


Attachment: wireshark-screenshot.jpg
Description: JPEG image




Archive powered by MHonArc 2.6.19.

Top of Page