Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] LDAP timeouts after Java upgrade

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] LDAP timeouts after Java upgrade


Chronological Thread 
  • From: Baron Fujimoto <>
  • To:
  • Subject: Re: [grouper-users] LDAP timeouts after Java upgrade
  • Date: Mon, 18 May 2020 15:55:21 -1000

On Tue, May 05, 2020 at 05:22:17PM -1000, Baron Fujimoto wrote:
On Tue, Apr 28, 2020 at 10:14:55AM +0100, Robert Bradley wrote:

On 28/04/2020 02:27, Baron Fujimoto wrote:
We're running Grouper 2.2.2 with LDAP (389DS) as a subject source.
We were previously using Java 1.0.8_212 successfully. However, I
recently upgraded the instance to use the current version of Java
(251), and after doing so noticed that while it initially appears
to work as expected, the LDAP connections eventually begin to time
out with the following error:

javax.naming.NamingException: LDAP response read timed out,
timeout used:-1ms

The timeouts start to occur after ~20 minutes. Netstat shows no
open connections to our LDAP at that point.

The grouper host is actually a node in a cluster behind a load
balancer, but our lb admins can't find any relevant ~20 minute
timeout value there.

I've empirically determined that this appears to happen with a
version of Java 8 higher than 221 (i.e. 231, 241, 251). I dodn't
see anything in the JDK release notes for 231 that appear to be
relevant.
<https://www.oracle.com/technetwork/java/javase/8u-relnotes-2225394.ht
ml>


One thought is that it could be a similar JNDI bug to that described
in https://wiki.shibboleth.net/confluence/display/IDP30/LDAPonJava%3E8
and https://issues.shibboleth.net/jira/browse/IDP-1441. The problem
with that is that the fix probably involves upgrading Grouper to get
the ldaptive library instead of vt-ldap, and then configuring it to
use the UnboundID library instead of JNDI. I doubt that's a practical
option in the short term unless vt-ldap has a similar setting you can tr
y.

FWIW, I'm also seeing similar behavior when I attempted the same upgrade in
one of our CAS deployments, though the timeout errors happen rather quickly
there, so it doesn't appear to be Grouper specific. AFAIK, the version of CAS
involved is using an ldaptive library, so that suggests that the problem may
not lie with the vt-ldap library. Unfortunately my searches don't turn up any
evidence that this has been a problem for others, so I'm kind of at a loss
now. :/

We're still wrestling with this, but have uncovered a few more details in
case it provides any new insight into the problem.

1) Our LDAP is actually a cluster behind an F5 load balancer. If we point CAS
or Grouper at non-load balanced LDAP host, we do not see the timeout problem.
It appears that both JDK 8u231+ *and* LDAP behind the load balancer are
necessary conditions to trigger the timeour error.

So clearly it's some interaction between the two, possibly some half closed
or zombie connection from the load balancer that the upgraded Java is not
dealing with properly.

2) We've empirically determined that if we shorten the default value for the
LDAP pool validation from 600s to, say, 60s for CAS then this also mitigates
the timeout problem. The shortened pool validation period seems to be
sufficient to function as some sort of keepalive.

I tried something similar for Grouper, setting the validateTimerPeriod to a
very short value with the following in our ldap.properties:

edu.vt.middleware.ldap.pool.validatePeriodically = true
edu.vt.middleware.ldap.pool.validateTimerPeriod = 60

It seems to have the same mitigating effect for the Grouper UI, but not for
the Grouper WS as far as I can tell.

--
UH Information Technology Services : Identity & Access Mgmt, Middleware
minutas cantorum, minutas balorum, minutas carboratum desendus pantorum



Archive powered by MHonArc 2.6.19.

Top of Page