Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] Maintaining Grouper database size

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] Maintaining Grouper database size


Chronological Thread 
  • From: David Langenberg <>
  • To: "Hyzer, Chris" <>, Rory Larson <>, Shilen Patel <>
  • Cc: "" <>
  • Subject: Re: [grouper-users] Maintaining Grouper database size
  • Date: Wed, 31 Jan 2018 16:55:00 +0000
  • Accept-language: en-US
  • Authentication-results: spf=none (sender IP is ) ;
  • Ironport-phdr: 9a23:tp7C6RYKNEixIGNXLU5HkvD/LSx+4OfEezUN459isYplN5qZrsq9bnLW6fgltlLVR4KTs6sC17KP9fi4EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQpFiCagbb9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPCTQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjm58axlVAHnhzsGNz4h8WHYlMpwjL5AoBm8oxBz2pPYbJ2JOPZ7eK7WYNEUSndbXstJVyJPHJ6yb5cBAeQCM+ZXrYfyp1oSohSxHgSsC//jyjpSi3Pqx6A30eIsGhzG0gw6GNIOtWzZotvvO6cMUOC60bTDwzPZYPNQ1jfy9o7IfQs/rvGWUrJ7bM3cxlQ1GAPbkFqQrZfoPy6J2eQNsmiU8/BvVe2oi2I9tQ5+vyWvy94qh4LUiIwVzVXE+j94wIYzPdC4SUh7YcK6H5tKsSGVKZd2Tt04T252pSk617sLsoO4cigS0Jkr2gLTZvOdf4WH4B/vTuicLDRiiH54ZL6yiA6+/VW+xuD/TMW4zVhHoypfntTItX0BzQHf5tSbRvZ740yvwyyA1xrJ5eFBOU00lbTUK5omwrMojpQerUPNEjPolEnvkaKYcFso9vGv6+v8fLrqvJicN5Joig7lNaQuh8q/DvkiPggWRWib/vi826P/8k3lQbVKifs2nrPesJDHOcQboqm5AwhW0oo59xm/CDKm3MwZnXkBMl1FZAqKg5X1N1zBOvz1AvWyj06inTpo3fzKI6HtD5rVInjGirjhfLJ960BGyAo0yNBS/49bBasfIP3tXk/8r8HYAQM5Mwy13+noEtp91oQEVW2RH6CZLbvesUWU6eI3P+mMeIgVtS7yK/g/4P7ukGc5lkEHfaWwwJsXc263Hu59I0iCZXrsg8wBEXsRvgYgVuDqiVuCUSJNaHaoWaIz+C07BJy8AYjdW4+tne/J4CDuVL1HdG1cThinEW3pbM/MD/IHaDOAL9VJkycPE6W5RoknkxyiqVm+g4BgNPuc3iQCrpXlnIxv4+TJjhY0rBRpCcDb3m2QGSU8pWMJQzIwlJh2vUF5ggOY0al8jvpwGcFQ6rVEXhpsZrDGyOkvJ9nsWw6JWt6PR1ugSNi8DnllQtsvztsmfk19Gt6rgReF0ia3VexG34eXDYA5p/qPl0P6INxwnjOfjPEs
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

Sounds good to us. We'd appreciate those maint jobs.

Dave

--
David Langenberg
Asst Director, Identity Management
The University of Chicago

On 1/31/18, 10:50 AM,
"
on behalf of Hyzer, Chris"
<
on behalf of
>
wrote:

For audit records (this is the grouper_audit_entry_v):

Penn currently has 11,685,969 records in grouper_audit_entry_v
addGroupMemberhip and deleteGroupMembership has 80% of the entries
99% of entries have no logged_in_subject_id
99% are loader, blank, grouperShell
70k (less than 1% are from UI)

How about a daemon that:

Deletes records older than a year (month? Would be configurable) that
have no logged in subject id (not tied to a user doing something)
Deletes records older than a year (configurable) which are loader, blank,
or grouperShell system
There isn’t much left, but we could delete any record over 5 years if
people want (we have 90k records older than 5 years which are grouperUI or
grouperWS)

https://bugs.internet2.edu/jira/browse/GRP-1674


#####################

For PIT Penn has 10 million memberships in the PIT table.
7.5 million (at least) are loader jobs
5 million of loader jobs older than 2 years

How about a daemon that deletes all over 5 (configurable) years old, and
loader data older than 2 (configurable) years?


Thoughts?

Thanks
Chris

-----Original Message-----
From: Rory Larson
[mailto:]

Sent: Monday, January 29, 2018 4:22 PM
To: Hyzer, Chris
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Well, there is a "created_on" and a "last_updated" field, which are
apparently Unix-type dates. That would make it possible to delete everything
in the table prior to a chosen date. There's also "server_user_name", which
gives the account that ran the transaction, e.g., myself vs. the
grouperLoader. That would let us delete only the ones that were run
regularly under the grouperLoader, and keep the changes made by individual
users. I'm not sure how worried we should be about keeping that information.
On a 40 GB table, how long would a selected DML delete take? And if we do
it that way, we are still left with a huge table, because MySQL only marks
records for deletion without actually deleting them. In fact, wouldn't it
just make the table bigger by adding all the deletion transactions?

Thanks,
Rory


-----Original Message-----
From: Hyzer, Chris
[mailto:]

Sent: Monday, January 29, 2018 2:45 PM
To: Rory Larson
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Is there a way to look at the data of the audit table and see which
records to delete? I think any loader jobs audits can be deleted
(GrouperSystem is the entity?), but things users do through the UI for
example should be kept so you know who did what. Make sense? Take a look
and see if anything jumps out or we can look at our audit tables too...

Thanks
Chris

-----Original Message-----
From: Rory Larson
[mailto:]
Sent: Monday, January 29, 2018 3:42 PM
To: Hyzer, Chris
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Thanks, all, for the suggestions. I think I'm hearing that I don't want
to purge data from large point-in-time tables manually through SQL. Instead,
I should run an edu.internet2.middleware.grouper... command under gsh, such
as:


edu.internet2.middleware.grouper.pit.PITUtils.deleteInactiveRecords(new
Date(), true); or

edu.internet2.middleware.grouper.pit.PITUtils.deleteInactiveObjectsInStem("my:stem",
true);

This will eliminate inactive records or objects only, meaning that if the
people are still around, they will still be taking up space with
point-in-time data going back perhaps for many years, correct? And
additionally, we would still have to perform an OPTIMIZE TABLE on each PIT
table to (possibly) realize any gains?

I will plan to do this, but I'm unsure of how much space, if any, I'm
likely to recover.

Another suggestion was made off-line, that I should perform a TRUNCATE on
the grouper_audit_entry table. In fact, this table has grown to about 40 GB,
and is apparently the biggest source of our problem. I understand that this
is simply a transaction log table, and that nothing else depends on it.
Truncation means that the table would essentially be dropped and then
re-created, minus all the data. That would be done in SQL. Is there any
reason not to do this? If so, is there a better way to reduce its size?

Thanks,
Rory


-----Original Message-----
From: Hyzer, Chris
[mailto:]
Sent: Monday, January 29, 2018 1:03 PM
To: Shilen Patel
<>;
Rory Larson
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Im worried about the future state of using direct SQL, maybe something
else in future will be needed that we put in the GSH command... you should
definitely use that.



Thanks

Chris



-----Original Message-----

From:


[mailto:]
On Behalf Of Shilen Patel

Sent: Monday, January 29, 2018 1:34 PM

To: Rory Larson
<>

Cc:


Subject: Re: [grouper-users] Maintaining Grouper database size



If you’re going to trim the audit based on time, I’d suggest using the
gsh command since it’ll delete the records from the tables in the right order
(taking into account foreign keys). But as long as you delete them in the
right order via sql directly, that should be fine as well. Also, recently, I
documented how much space was being taken up at Duke (we use Oracle).
https://urldefense.proofpoint.com/v2/url?u=https-3A__spaces.internet2.edu_display_Grouper_Duke-2BDisk-2BSpace-2BUsage&d=DwIGaQ&c=Cu5g146wZdoqVuKpTNsYHVKLgTZS09MUACeOXHWmTvE&r=8UxP4WgXNK7VFhDT-iCutw&m=i5pbjVCyOSrgzJa3qHir_ok5n_Zw8MCNGbVSHFXVgZs&s=1qaggpDGSz3DUP03rlkM40RUuYd4lBXJ6ylGMs76vQQ&e=
I was surprised to see so much of the space being used by indexes. We
could possibly look at reducing the number of indexes to help.



- Shilen



On 1/29/18, 9:13 AM,
"
on behalf of Rory Larson"
<
on behalf of
>
wrote:



Hello,



We are encountering an issue of our Grouper database growing to a
size that threatens to use up the entire hard drive space available. The
database server is a physical machine with no room to expand, running a
MySQL-type database (5.5.56-MariaDB MariaDB Server).



The problem is not so much the size of the group data itself, but
with point-in-time tables that record what that data was back to the
beginning of the Grouper installation. Some of these have grown to huge
sizes, on the order of 20 GB or so. A suggestion has been made that we could
delete a lot of this prior to some reasonable point in time, especially
groups or memberships that no longer exist. This would be fine, but I'm
wondering if this can be done from the database command line on a per-table
basis, or whether there are dependencies that require doing this through gsh
function calls?



Also, because this is a MySQL-type database, shrinking the database
physical size after deletion of unnecessary data is not trivial. OPTIMIZE
TABLE ... might work, or it might make a table bigger. ALTER TABLE ... ROW
FORMAT=COMPRESSED is promising as tried on our test database, but takes about
four times as long as OPTIMIZE TABLE. In either case, updating has to be
shut down while running them, and they may require duplicating the table
during the operation. We are at 90% now, and can't afford any more big
tables. The ultimate MySQL solution seems to be to dump the database, delete
the whole thing, and reload it, which would mean complete downtime for
Grouper.



Does anyone else using a MySQL-type database have a system for
handling this problem and maintaining a reasonable database size?



Thanks,

Rory










Attachment: smime.p7s
Description: S/MIME cryptographic signature




Archive powered by MHonArc 2.6.19.

Top of Page