grouper-users - [grouper-users] Grouper 2.2 changelog processing performance and PostgreSQL 9.1

Subject: Grouper Users - Open Discussion List

List archive

[grouper-users] Grouper 2.2 changelog processing performance and PostgreSQL 9.1

From: Robert Bradley <>
To:
Subject: [grouper-users] Grouper 2.2 changelog processing performance and PostgreSQL 9.1
Date: Wed, 05 Nov 2014 17:50:49 +0000

While testing Grouper 2.2.0 in our Oxford test environment, we kept
running into a similar problem to the "ChangeLogTempToChangeLog
processing questions" thread back in October. Unlike that case, we are
using PostgreSQL 9.1 on a Debian Wheezy system, but the result was
similar: processing ~80000 changes took several days at a rate of around
15 changelog records processed per minute. Our postgres process was
running at ~100% of one CPU core with minimal I/O during this period.

I eventually tracked the cause of this down to the queries performed
within the method findPITMembershipsJoinedWithOldPITGroupSet(...) in
Hib3PITMembershipViewDAO. This uses an SQL WHERE clause of the form
"AND NOT EXISTS (SELECT 1 FROM grouper_pit_memberships_all_v ...)". In
theory, the database should figure out that no records are required from
the subquery and not fetch the entire table. In practice, the resulting
query takes 2-4s to run vs. milliseconds when adding "LIMIT 1" to the
subquery by hand.

The ideal solution for this problem is to upgrade to Postgres 9.2 or
later, since this is really a problem with the database and not Grouper.
However, for maintenance reasons we would prefer to use the postgresql
package in the Debian stable repository rather than maintaining our own
packages or installing the Debian testing version (9.4 beta 3). We found
that applying the following changes worked well as a workaround for this
bug:

$ cd
${GROUPER_SOURCE}/grouper/src/grouper/edu/internet2/middleware/grouper/internal/dao/hib3/
$ sed -i~ -e 's/not exists (select 1/1 not in (select 1/g' *.java
$ sed -i~ -e 's/exists (select 1/1 in (select 1/g' *.java
$ # Clean up backup files
$ rm *~

The result is that changelog records are processed around 30 times
faster on average compared to the original Grouper 2.2.0 code. Of
course, this is fairly specific to PostgreSQL 9.1 and below, but may be
of help to others regardless.
--
Dr Robert Bradley
Identity and Access Management, IT Services, University of Oxford

[grouper-users] Grouper 2.2 changelog processing performance and PostgreSQL 9.1, Robert Bradley, 11/05/2014

List archive

[grouper-users] Grouper 2.2 changelog processing performance and PostgreSQL 9.1