Skip to Content.
Sympa Menu

grouper-users - RE: [grouper-users] Maintaining Grouper database size

Subject: Grouper Users - Open Discussion List

List archive

RE: [grouper-users] Maintaining Grouper database size


Chronological Thread 
  • From: Jim Fox <>
  • To: "Hyzer, Chris" <>
  • Cc: Rory Larson <>, Shilen Patel <>, "" <>
  • Subject: RE: [grouper-users] Maintaining Grouper database size
  • Date: Wed, 31 Jan 2018 09:05:16 -0800 (PST)
  • Ironport-phdr: 9a23:v67+NR3NemlM3mhXsmDT+DRfVm0co7zxezQtwd8ZseIXKPad9pjvdHbS+e9qxAeQG9mDsrQc06L/iOPJYSQ4+5GPsXQPItRndiQuroEopTEmG9OPEkbhLfTnPGQQFcVGU0J5rTngaRAGUMnxaEfPrXKs8DUcBgvwNRZvJuTyB4Xek9m72/q99pHPfglEniaxba9vJxiqsAvdsdUbj5F/Iagr0BvJpXVIe+VSxWx2IF+Yggjx6MSt8pN96ipco/0u+dJOXqX8ZKQ4UKdXDC86PGAv5c3krgfMQA2S7XYBSGoWkx5IAw/Y7BHmW5r6ryX3uvZh1CScIMb7Vq4/Vyi84Kh3SR/okCYHOCA/8GHLkcx7kaZXrAu8qxBj34LYZYeYP+d8cKzAZ9MXXWRPUMZPWSJcAIyybIUPAOQOMulEtIT9okcCoAGkCAWwHu7iyDlFjWL2060g1OQhFBnL0gghH9ILrX/arNL1NagPWu2ywqnIyjrCYuhY2Djn84jIchEhofeDXbJsdsrRz1MjGB3YgVWNsIHoOS6e2OoKs2ie9eVgVOSvhnYmqwF3vjig2N0sio/XiYIJ1FzI7zt2z5soJdC+VUV1YsakHYNNuy2ENIZ6WMcvTmFytCokxbAKp4S3cDYUxJg53xLTduKLfouS7h79SOqcIi10iG5/dL++gRu57FKuxffmVsau1VZHtipFncfItnAKzxHT78eGSvxg/ke9xzqDyRrf6vpeLUApm6vXMYAuzaMtlpYLsETDGDH5mFnugaOLaEkp+fKk5uDnb7n8uJOROJF4hhvgPqgwhMCzGeE4PRIPX2if9+S8zrrj/UjhTbVEif03nbLUsZ7AKsQApq65AglV0ok/6xa4CTeqysoXkmQaLF5dYhKIk5DpO03SIPD/Ffq/jE6skDB2x/DeILLhGIzBLmXYkLj/Z7Z98FVRyA4yzdBE+5JUEa8NLOjyWk/3qNzXEAU5MwqqzOb7FtlxzJ0RVn+SAvzRDKSH+3+Z9O81Z6GnZJUUo3y1f/0u5+/8gGURmEQWO7Sx0J0RLn20A6I1DV+eZC/Og9JJP2Aa9l4wS/zCl1SZUT9VaGq1WeQx6ixtW9HuNpvKWo342O/J5yy8BJADIzkeUl0=


I'd like that. Doesn't have to be a daemon though. Just a gsh thing would work for me.

Jim


On Wed, 31 Jan 2018, Hyzer, Chris wrote:

Date: Wed, 31 Jan 2018 08:49:54
From: "Hyzer, Chris"
<>
To: Rory Larson
<>,
Shilen Patel
<>
Cc:
""

<>
Subject: RE: [grouper-users] Maintaining Grouper database size

For audit records (this is the grouper_audit_entry_v):

Penn currently has 11,685,969 records in grouper_audit_entry_v
addGroupMemberhip and deleteGroupMembership has 80% of the entries
99% of entries have no logged_in_subject_id
99% are loader, blank, grouperShell
70k (less than 1% are from UI)

How about a daemon that:

Deletes records older than a year (month? Would be configurable) that have
no logged in subject id (not tied to a user doing something)
Deletes records older than a year (configurable) which are loader, blank, or
grouperShell system
There isn’t much left, but we could delete any record over 5 years if people
want (we have 90k records older than 5 years which are grouperUI or grouperWS)

https://bugs.internet2.edu/jira/browse/GRP-1674


#####################

For PIT Penn has 10 million memberships in the PIT table.
7.5 million (at least) are loader jobs
5 million of loader jobs older than 2 years

How about a daemon that deletes all over 5 (configurable) years old, and
loader data older than 2 (configurable) years?


Thoughts?

Thanks
Chris

-----Original Message-----
From: Rory Larson
[mailto:]
Sent: Monday, January 29, 2018 4:22 PM
To: Hyzer, Chris
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Well, there is a "created_on" and a "last_updated" field, which are apparently
Unix-type dates. That would make it possible to delete everything in the table prior to a chosen date.
There's also "server_user_name", which gives the account that ran the transaction, e.g., myself
vs. the grouperLoader. That would let us delete only the ones that were run regularly under the
grouperLoader, and keep the changes made by individual users. I'm not sure how worried we should be about
keeping that information. On a 40 GB table, how long would a selected DML delete take? And if we do it
that way, we are still left with a huge table, because MySQL only marks records for deletion without
actually deleting them. In fact, wouldn't it just make the table bigger by adding all the deletion
transactions?

Thanks,
Rory


-----Original Message-----
From: Hyzer, Chris
[mailto:]
Sent: Monday, January 29, 2018 2:45 PM
To: Rory Larson
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Is there a way to look at the data of the audit table and see which records
to delete? I think any loader jobs audits can be deleted (GrouperSystem is
the entity?), but things users do through the UI for example should be kept
so you know who did what. Make sense? Take a look and see if anything jumps
out or we can look at our audit tables too...

Thanks
Chris

-----Original Message-----
From: Rory Larson
[mailto:]
Sent: Monday, January 29, 2018 3:42 PM
To: Hyzer, Chris
<>;
Shilen Patel
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Thanks, all, for the suggestions. I think I'm hearing that I don't want to
purge data from large point-in-time tables manually through SQL. Instead, I
should run an edu.internet2.middleware.grouper... command under gsh, such as:


edu.internet2.middleware.grouper.pit.PITUtils.deleteInactiveRecords(new
Date(), true); or

edu.internet2.middleware.grouper.pit.PITUtils.deleteInactiveObjectsInStem("my:stem",
true);

This will eliminate inactive records or objects only, meaning that if the
people are still around, they will still be taking up space with
point-in-time data going back perhaps for many years, correct? And
additionally, we would still have to perform an OPTIMIZE TABLE on each PIT
table to (possibly) realize any gains?

I will plan to do this, but I'm unsure of how much space, if any, I'm likely
to recover.

Another suggestion was made off-line, that I should perform a TRUNCATE on the
grouper_audit_entry table. In fact, this table has grown to about 40 GB, and
is apparently the biggest source of our problem. I understand that this is
simply a transaction log table, and that nothing else depends on it.
Truncation means that the table would essentially be dropped and then
re-created, minus all the data. That would be done in SQL. Is there any
reason not to do this? If so, is there a better way to reduce its size?

Thanks,
Rory


-----Original Message-----
From: Hyzer, Chris
[mailto:]
Sent: Monday, January 29, 2018 1:03 PM
To: Shilen Patel
<>;
Rory Larson
<>
Cc:

Subject: RE: [grouper-users] Maintaining Grouper database size

Im worried about the future state of using direct SQL, maybe something else
in future will be needed that we put in the GSH command... you should
definitely use that.



Thanks

Chris



-----Original Message-----

From:


[mailto:]
On Behalf Of Shilen Patel

Sent: Monday, January 29, 2018 1:34 PM

To: Rory Larson
<>

Cc:


Subject: Re: [grouper-users] Maintaining Grouper database size



If you’re going to trim the audit based on time, I’d suggest using the gsh command since it’ll delete
the records from the tables in the right order (taking into account foreign keys). But as long as
you delete them in the right order via sql directly, that should be fine as well. Also, recently, I
documented how much space was being taken up at Duke (we use Oracle).
https://urldefense.proofpoint.com/v2/url?u=https-3A__spaces.internet2.edu_display_Grouper_Duke-2BDisk-2BSpace-2BUsage&d=DwIGaQ&c=Cu5g146wZdoqVuKpTNsYHVKLgTZS09MUACeOXHWmTvE&r=8UxP4WgXNK7VFhDT-iCutw&m=i5pbjVCyOSrgzJa3qHir_ok5n_Zw8MCNGbVSHFXVgZs&s=1qaggpDGSz3DUP03rlkM40RUuYd4lBXJ6ylGMs76vQQ&e=
I was surprised to see so much of the space being used by indexes. We could possibly look at
reducing the number of indexes to help.



- Shilen



On 1/29/18, 9:13 AM,
"
on behalf of Rory Larson"
<
on behalf of
>
wrote:



Hello,



We are encountering an issue of our Grouper database growing to a size
that threatens to use up the entire hard drive space available. The database
server is a physical machine with no room to expand, running a MySQL-type
database (5.5.56-MariaDB MariaDB Server).



The problem is not so much the size of the group data itself, but with
point-in-time tables that record what that data was back to the beginning of
the Grouper installation. Some of these have grown to huge sizes, on the
order of 20 GB or so. A suggestion has been made that we could delete a lot
of this prior to some reasonable point in time, especially groups or
memberships that no longer exist. This would be fine, but I'm wondering if
this can be done from the database command line on a per-table basis, or
whether there are dependencies that require doing this through gsh function
calls?



Also, because this is a MySQL-type database, shrinking the database
physical size after deletion of unnecessary data is not trivial. OPTIMIZE
TABLE ... might work, or it might make a table bigger. ALTER TABLE ... ROW
FORMAT=COMPRESSED is promising as tried on our test database, but takes about
four times as long as OPTIMIZE TABLE. In either case, updating has to be
shut down while running them, and they may require duplicating the table
during the operation. We are at 90% now, and can't afford any more big
tables. The ultimate MySQL solution seems to be to dump the database, delete
the whole thing, and reload it, which would mean complete downtime for
Grouper.



Does anyone else using a MySQL-type database have a system for handling
this problem and maintaining a reasonable database size?



Thanks,

Rory












Archive powered by MHonArc 2.6.19.

Top of Page