grouper-users - RE: [grouper-users] Maintaining Grouper database size
Subject: Grouper Users - Open Discussion List
List archive
- From: Rory Larson <>
- To: "Black, Carey M." <>, "Hyzer, Chris" <>
- Cc: Shilen Patel <>, David Langenberg <>, Gail H Lift <>, "" <>
- Subject: RE: [grouper-users] Maintaining Grouper database size
- Date: Mon, 5 Feb 2018 22:22:45 +0000
- Accept-language: en-US
- Authentication-results: spf=none (sender IP is ) ;
- Ironport-phdr: 9a23:L/3GFRzD9ZI7jXvXCy+O+j09IxM/srCxBDY+r6Qd0uoWK/ad9pjvdHbS+e9qxAeQG9mDsrQc06L/iOPJYSQ4+5GPsXQPItRndiQuroEopTEmG9OPEkbhLfTnPGQQFcVGU0J5rTngaRAGUMnxaEfPrXKs8DUcBgvwNRZvJuTyB4Xek9m72/q99pHPfglEniaxba9vJxiqsAvdsdUbj5F/Iagr0BvJpXVIe+VSxWx2IF+Yggjx6MSt8pN96ipco/0u+dJOXqX8ZKQ4UKdXDC86PGAv5c3krgfMQA2S7XYBSGoWkx5IAw/Y7BHmW5r6ryX3uvZh1CScIMb7Vq4/Vyi84Kh3SR/okCYHOCA/8GHLkcx7kaZXrAu8qxBj34LYZYeYP+d8cKzAZ9MXXWpPUNhMWSxdDI2ybIUPAOgAPelEoIbwvEEBoQeiCQS2GO/j1j1Fi3nr1qM6yeQhFgTG0RQ8EdIJqnTVrMj+OqEIXuCv16nIyjPDZO5R1Dfn7IjHaAohoeqLXbJ2bMbc0lQvFwXBjlmKt4PqIi6V2/0LvmOG4eRgUuevhHQmqwF3ujWvx8EsipXXiYIPzFDL6zl5zJgvKdKmVUF7fMaoEINKtyGdMIt2TNsiQ2ZpuCY81r0Ko4K0fC8PyJg/yB7fceSHf5GW7h3+SeqcIDV1iX1jdbmihBiy6VCtxvPmWcWozVpHqzdJnsTRun0I2Rze5dSLRud480qjxzmC2AHe5+RBLEwqiabWKYYtzqMumpcXq0jOGiH7lF/5gaOMaEkp/u6l4Pn9bLr8vJ+TLYp0hxn+Mqswnsy/Bvw1PBASUmac5eiwyqTv8FD/TrlUl/E2lbLWv47AKcQcu665HxRa0oE+6xa5EjiqyswYnWMALFJZZh2Ik5TpO1DJIPD+F/u/hEmskCtvx/DBOb3hAY/BIWTEkLfkZbp96khcxxQvzd1H+Z5bEK0NLO/2V0PsqdDUExo0MwK7zur7FNlw04ETVnyAD6KYNa7ftEGE6v4tLuWUYY8aojf9K/wr5/70in85nEcQfbOt3ZQNcnC4BfNmI0OEbXf3n9cBF2MKshAgQ+P3lV2OSSRTaGqqX6Ig+jE7D5qrDYjZRoCqnbyBxDm0HodPamBbEVCDD23od56fVvcIaSKSOdNhkicaWbS7So8h0w2uuxHgy7phMOXU5jMUuYj929do+u2A3S01oHZeC82W0CXFZGhuk3JAYnl8lPR1pUV2yRHai/NQhOdFU9Ff+qUNGk0aJIzR1agyINDoWxmLNoOMQ1a3UNi8KTAqRZQs29IIZQBwF8j03T7Z2C//ObgPhvSvDYYv9aSUi2L1LtxmzHCf/LQnix8rTtYZZj7uvbJ26wWGX92BqE6ejav/MP1EhCM=
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
I hate losing data too, but at some point it has to be sacrificed. Perhaps we can think of clean desks and purged databases as imposing a reasonable statute
of limitations on our ability to determine past culpability. I like Chris' suggestion below as a basic foundational step to keep the database size under control, according to the needs to the institution. Once that is
in place, other improvements could certainly be added in later versions. But to start with, let's just go with that single, configurable deletion daemon. It would recognize four categories of record according to origin: UI; grouperShell; loader; and blank.
Each could be configured locally to any number of retention time units (could we say days?), with 0 being "never delete". Out of the box, all four would be defaulted to zero, with the proffered suggestion of five years for UI records and one year for each
of the others. Does this sound reasonable? By the way, I was able to win back 38 GB of hard drive space for our server by simply truncating (dropping) the 40 GB grouper_audit_entry table. (If we ever
need any of that data, we can probably restore it from backup.) This meant deleting not only the old records, but all of them. So we need to consider that not having a cleanup process in place before the database threatens to choke may actually mean losing
more data than we would otherwise. Thanks again to everyone for their helpful advice and sharing of expertise. I think the pressure on us has been relieved for now. Rory From: Black, Carey M. [mailto:]
In general, I think there is a clear need here for operational tools/processes to manage the DB data growth. However, I also hate losing data. ( Delete is a form of “loss”. Hopefully a willful choice, but still a loss.)
Mostly because we lose the ability to ask a whole range of questions about “what really happened”? ( While looking back instead of planning ahead.
J ) Maybe it would be better to have a model where this kind of audit data is moved from “Active” to “Archived” then off to “delete”? Maybe a shadow table(s) where the “Archived data” can be held just out of sight of the operation of the UI/WS, but still around for other reporting? Your schedule of a configuration to define the duration of “Active” (Days/weeks/months, move from “Active” to “Archive” on that
schedule.) and “Achieved” (Days/weeks/months/years) data sounds good. Then add a later schedule to more from Archived to delete. I also think there is the possibility for some to want to treat any membership change ( regardless of source [UI/WS/Loader/etc…]) as equally valuable, and others
might see “non-human” process as less necessary to have in their active audit trail. So maybe the definition of that should be a separate config item? (AKA: “has a subject id”
vs “no subject id” for the change) Maybe even special groups that need more monitoring/carve outs for extra ( or reduced) retention too. Also, I also wonder if there are some reports/summary/monitoring that should be done before the delete that would preserve some details/trends while still letting
go of the volume of data? Maybe there are some groups that it would be nice to monitor the count of members once a day, month, etc.. across the cycles of the academic/finical
calendar? Maybe seeing spikes/dips in Loader loaded data by group/job? Maybe seeing growth/shrinking basis, ref, access control policy groups in the system over time? Etc… So I think it may be harder than just “archive/delete every N days”. Might even be a opportunity to tag with attributes to signal what to do for each group? (
maybe with a system config default if not tagged? ) .. Thinking like Attestation, but for the definition of things like: “ArchiveAfter”, ‘DeleteAfter”, “CollectStatsEvery”…. --
Carey Matthew From: [mailto:]
On Behalf Of Rory Larson Agreed. That would be a very nice feature. Would time-based deletes be based on create-date or last-mod-date? There seems to be a difference between these in the grouper_audit_entry table, though I'm
not sure why a log record or point-in-time record would ever be modified. Thanks, Rory From: Gail H Lift []
Sounds good here too. The configurable time intervals will make it easy to adjust to local needs. On Wed, Jan 31, 2018 at 11:55 AM, David Langenberg <> wrote:
--
|
- RE: [grouper-users] Maintaining Grouper database size, Rory Larson, 02/05/2018
- <Possible follow-up(s)>
- RE: [grouper-users] Maintaining Grouper database size, Hyzer, Chris, 02/16/2018
- Re: [grouper-users] Maintaining Grouper database size, Peter DiCamillo, 02/16/2018
- RE: [grouper-users] Maintaining Grouper database size, Hyzer, Chris, 02/16/2018
- Message not available
- RE: [grouper-users] Maintaining Grouper database size, Gail H Lift, 02/17/2018
- Message not available
- RE: [grouper-users] Maintaining Grouper database size, Hyzer, Chris, 02/16/2018
- Re: [grouper-users] Maintaining Grouper database size, Peter DiCamillo, 02/16/2018
Archive powered by MHonArc 2.6.19.