grouper-users - Re: [grouper-users] Grouper applications stuck after time of inactivity
Subject: Grouper Users - Open Discussion List
List archive
- From: Sonia Zorba <>
- To:
- Subject: Re: [grouper-users] Grouper applications stuck after time of inactivity
- Date: Fri, 9 Sep 2016 16:51:26 +0200
Hi, I eventually resolved this issue. Analyzing the situation with Wireshark I found the LDAP server, after a time of inactivity, doesn't respond anymore to the client. Then there is a sequence of "TCP Retransmission" packets that stops when it reaches the maximum attempt count (see attached screenshot). The stuck is also visible using the command netstat -o: Before: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 <MY-SERVER>:41500 <LDAP-SERVER>:ldap ESTABLISHED off (0.00/0/0) During the problem: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 188 <MY-SERVER>:41500 <LDAP-SERVER>:ldap ESTABLISHED on (41,84/8/0) The problem seems to be firewall-related, I quote from http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ : "It's a very common issue, when you are behind a NAT proxy or a firewall, to be disconnected without a reason. This behavior is caused by the connection tracking procedures implemented in proxies and firewalls, which keep track of all connections that pass through them. Because of the physical limits of these machines, they can only keep a finite number of connections in their memory. The most common and logical policy is to keep newest connections and to discard old and inactive connections first." Some libraries offer keep alive settings to solve this problem, but the vt-ldap library seems not. Its ConnectLdapValidator class doesn't understand these connections should be reopened, because they are in a "half-opened" (dropped) state. My solution was to install the following library: https://github.com/flonatel/libdontdie and loading it on Tomcat startup, adding these lines on $CATALINA_HOME/bin/setenv.sh export DD_DEBUG=1 export DD_TCP_KEEPALIVE_TIME=300 export DD_TCP_KEEPALIVE_INTVL=60 export DD_TCP_KEEPALIVE_PROBES=5 export LD_PRELOAD=/usr/lib/libdontdie.so When the library is active netstat -o shows the following: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 <MY-SERVER>:41500 <LDAP-SERVER>:ldap ESTABLISHED keepalive (232,51/0/0) Cheers, Sonia On 02/09/2016 19:10, Sonia Zorba wrote:
Hi, this started happening when we configured the application to use the external LDAP. At the time there weren't other applications on the server. This happens both with Tomcat 6 and Tomcat 8. I restarted Tomcat several time, trying to change some configurations, but nothing is changed. I initially used the API binary without patches. Today I have installed the patches using the Grouper installer (on api, ui and ws). All was ok. About 3 hours later, I accessed the Grouper UI and it was stuck, waiting for something: 2016-09-02 14:31:43,692: [http-nio-8080-exec-10] DEBUG BaseSourceAdapter.getSubjectByIdOrIdentifier(277) - - method: getSubjectByIdOrIdentifier, idOrIdentifier: GrouperSystem, exceptionIfNull: false 2016-09-02 14:43:33,055: [http-nio-8080-exec-1] DEBUG LdapSourceAdapter.getLdapUnique(862) - - method: getLdapUnique, search: Search [params=scope: SUBTREE_SCOPE, filter: (& (uid=%TERM%) (objectclass=inetOrgPerson)), base: ou=people,dc=inaf,dc=it, searchType=searchSubject], searchValue: GrouperSystem, attributeNames: Array size: 9: [0]: uid It waited more than 10 minutes, then the page became unreachable. In catalina.out I have found: 02-Sep-2016 14:43:46.124 SEVERE [http-nio-8080-exec-7] org.apache.catalina.realm.JNDIRealm.authenticate Exception performing authentication javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'ou=people,dc=inaf,dc=it' Restarting the server all was ok again. Now, 4 hours later, the same behavior... I think my problem could be firewall/connection related, similar to these: http://shibboleth.1660669.n2.nabble.com/Sjibb-LDAP-pool-hangs-on-firewall-TCP-session-truncation-td7626121.html https://bz.apache.org/bugzilla/show_bug.cgi?id=45834 but I'm not a network expert and I don't know what I could configure in Grouper/Tomcat to avoid this issue (if there is something that I can do). This is my grouperPatchStatus.properties: # added by grouper-installer grouper_v2_3_0_api_patch_0.date = 2016/09/02 11:20:28 # added by grouper-installer grouper_v2_3_0_api_patch_0.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_1.date = 2016/09/02 11:20:31 # added by grouper-installer grouper_v2_3_0_api_patch_1.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_2.date = 2016/09/02 11:20:33 # added by grouper-installer grouper_v2_3_0_api_patch_2.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_3.date = 2016/09/02 11:20:35 # added by grouper-installer grouper_v2_3_0_api_patch_3.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_4.date = 2016/09/02 11:20:37 # added by grouper-installer grouper_v2_3_0_api_patch_4.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_5.date = 2016/09/02 11:20:39 # added by grouper-installer grouper_v2_3_0_api_patch_5.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_6.date = 2016/09/02 11:20:42 # added by grouper-installer grouper_v2_3_0_api_patch_6.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_7.date = 2016/09/02 11:20:46 # added by grouper-installer grouper_v2_3_0_api_patch_7.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_8.date = 2016/09/02 11:20:49 # added by grouper-installer grouper_v2_3_0_api_patch_8.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_9.date = 2016/09/02 11:20:51 # added by grouper-installer grouper_v2_3_0_api_patch_9.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_10.date = 2016/09/02 11:20:53 # added by grouper-installer grouper_v2_3_0_api_patch_10.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_11.date = 2016/09/02 11:20:55 # added by grouper-installer grouper_v2_3_0_api_patch_11.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_12.date = 2016/09/02 11:20:57 # added by grouper-installer grouper_v2_3_0_api_patch_12.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_13.date = 2016/09/02 11:20:58 # added by grouper-installer grouper_v2_3_0_api_patch_13.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_14.date = 2016/09/02 11:21:00 # added by grouper-installer grouper_v2_3_0_api_patch_14.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_15.date = 2016/09/02 11:21:02 # added by grouper-installer grouper_v2_3_0_api_patch_15.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_16.date = 2016/09/02 11:21:03 # added by grouper-installer grouper_v2_3_0_api_patch_16.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_17.date = 2016/09/02 11:21:05 # added by grouper-installer grouper_v2_3_0_api_patch_17.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_18.date = 2016/09/02 11:21:07 # added by grouper-installer grouper_v2_3_0_api_patch_18.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_19.date = 2016/09/02 11:21:09 # added by grouper-installer grouper_v2_3_0_api_patch_19.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_20.date = 2016/09/02 11:21:11 # added by grouper-installer grouper_v2_3_0_api_patch_20.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_21.date = 2016/09/02 11:21:13 # added by grouper-installer grouper_v2_3_0_api_patch_21.state = applied # added by grouper-installer grouper_v2_3_0_api_patch_22.date = 2016/09/02 11:21:14 # added by grouper-installer grouper_v2_3_0_api_patch_22.state = applied Cheers, Sonia ----- Original Message ----- From: "Hyzer, Chris" To: "Sonia Zorba" , "grouper-users" Sent: Thursday, September 1, 2016 3:53:34 PM Subject: RE: [grouper-users] Grouper applications stuck after time of inactivity I think those errors are not accurate... Have you reloaded the webapp, or do you restart tomcat when applying changes? Is this app the only one in that tomcat or do other apps run there too? Which patches do you have installed exactly? See grouperPatchStatus.properties Did this only start happening recently? Thanks Chris -----Original Message----- From: [] On Behalf Of Sonia Zorba Sent: Thursday, September 01, 2016 4:15 AM To: Subject: [grouper-users] Grouper applications stuck after time of inactivity Hi, I'm experiencing a strange behavior from several days. I have a Grouper 2.3.0 installed on Tomcat 8 (web services + UI). My sources are: - an LDAP owned by an external machine; - an LDAP in localhost; I'm the only person currently using this testing machine, so in the night there are no requests to Grouper. When I open the UI application or call the webservice for the first time in the morning, the applications wait about 15 minutes, then all is working correctly. I increased the logging levels, but I only see a long wait: 2016-09-01 08:56:18,701: [http-nio-8080-exec-4] DEBUG LdapSourceAdapter.getLdapResultsHelper(740) - - searchType: searchSubject, filter: (& (uid=GrouperSystem) (objectclass=inetOrgPerson)) 2016-09-01 09:11:57,658: [http-nio-8080-exec-4] DEBUG LdapSourceAdapter.getLdapUnique(862) - - method: getLdapUnique, search: Search [params=scope: SUBTREE_SCOPE, filter: (& (uid=%TERM%) (objectclass=inetOrgPerson)), base: ou=[...], searchType=searchSubject], searchValue: GrouperSystem, attributeNames: Array size: 9: [0]: uid This happens when Grouper call the external LDAP (the localhost one responds immediately). I tried calling the services and the UI with curl all night with a cron job each minute and the problem didn't happen. However, in the first access with another testing webapp that uses the GrouperClient API, that webapp waited for 15 minutes too! So this seems associated to each single client or session created. I read that could be a problem of "connection pools leak", so I would try to forcedly close abandoned connections. How I can configure this in the Grouper settings? Morover, in my catalina.out I read these bad memory leak warning: 31-Aug-2016 23:55:18.765 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43) 31-Aug-2016 23:55:18.766 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-AdminTaskTimer] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) java.util.TimerThread.mainLoop(Timer.java:552) java.util.TimerThread.run(Timer.java:505) 31-Aug-2016 23:55:18.767 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#0] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683) 31-Aug-2016 23:55:18.767 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683) 31-Aug-2016 23:55:18.768 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [C3P0PooledConnectionPoolManager[identityToken->12yj2ui9i1qqmmey649b1u|3b702e77]-HelperThread-#2] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683) 31-Aug-2016 23:55:18.768 WARNING [localhost-startStop-4] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [grouper] appears to have started a thread named [net.sf.ehcache.CacheManager@3e2b44dd] but has failed to stop it. This is very likely to create a memory leak. I don't know if this could be related to the wait too. I added these JAVA_OPTS but it didn't help: JAVA_OPTS="$JAVA_OPTS -Djava.awt.headless=true -Xms512M -Xmx1024m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:PermSize=64m -XX:MaxPermSize=512m" Your help will be really appreciate. Thanks, Sonia |
Attachment:
wireshark-screenshot.jpg
Description: JPEG image
- [grouper-users] Grouper applications stuck after time of inactivity, Sonia Zorba, 09/01/2016
- RE: [grouper-users] Grouper applications stuck after time of inactivity, Hyzer, Chris, 09/01/2016
- Re: [grouper-users] Grouper applications stuck after time of inactivity, Sonia Zorba, 09/02/2016
- Re: [grouper-users] Grouper applications stuck after time of inactivity, Sonia Zorba, 09/09/2016
- Re: [grouper-users] Grouper applications stuck after time of inactivity, Sonia Zorba, 09/02/2016
- RE: [grouper-users] Grouper applications stuck after time of inactivity, Hyzer, Chris, 09/01/2016
Archive powered by MHonArc 2.6.19.