Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] Large number of changes and provisioning

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] Large number of changes and provisioning


Chronological Thread 
  • From: Jeffrey Crawford <>
  • To: "Waldbieser, Carl" <>
  • Cc: Gouper Users List <>
  • Subject: Re: [grouper-users] Large number of changes and provisioning
  • Date: Thu, 20 Jul 2017 09:19:48 -0700
  • Ironport-phdr: 9a23:r3UwHRDMo8qvMAFpj07PUyQJP3N1i/DPJgcQr6AfoPdwSPT6rsbcNUDSrc9gkEXOFd2CrakV26yO6+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fdbghMhzexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfOpWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnMVhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Qiqp4bt1RxD0iScHLz85/3/Risxsl6JQvRatqwViz4LIfI2ZMfxzdb7fc9wHX2pMRsReVyJBDI2ybIUBEvQPMvpDoobnu1cDtwGzCRWwCO7tzDJDm3/43bc90+QkCQzI2BIvH9wAsHTOstr0NLoZXP6vzKbSwzTDYfRW2S3g54PVdR0ho++DXbx+ccrL10YuFx/Kg06NqYP5JDOayv4BvHaG4Op9TO+ijXMspQJpojW32Mshi5XFi4AQx1DK9ih225o5KNi3RUJnfdKoDZ5duD2GO4RtR84vRn9ktDggxbAApJW1ZjIFyI49yB7ac/GHc5aH4hbkVOuJJDd3nnNleLamixaz7Uis1vTwV8aq3FpUtCVJiNbMtncK1xzc7siIVOFx8Vum2TaKzwzT6+dELl4olafDNZIsw6I8m5gWvETNHSL5g1n6gaqZe0k45uSk9uHqban6qpCHMoJ5jx/yPro0lcCnBOQ3KAkOX2yV+eSm073j+FX0QLdUgf04nKnZqo7VJMQHqaOiHg9azp0j5AqlAzi4zdsYgGELLEhZdxKfk4jpJ1bOLej3DfelhFSsjS9ryO7cPrH4H5XNNWbMkK36fbtm705cyREzzcxE555KEL0BIfTzWlPvu9zCCB82LRC0z/j9BNpjy4weRDHHPqjMepzbtVOC5+80LvPILKQcojK3Y6w67vrih340kncZdKegzJYLdH3+E/h7dRa3e33p1/UMH2kQsxt2Z/bjl1OFSzIbM3S/Uawm6yAwIJ+tFoyFS4yw1u/SlBynF4FbMzgVQmuHFm3lIsDdA68B

I suppose we are kinda doing the same thing, except I'm modifying the internal data and kicking off a full sync as opposed to redirecting the psp via message queue.

Although I agree the composite change could be improved, I can also see a case where someone may need to populate a large group, say alumni to be able to retrieve transcripts. That would in one go create a group of over 200,000 records. This could be performed by someone we've delegated access to for one, so we may not be immediately aware of it until load and update times become delayed.

It sounds like you may have set up a better more configurable intermediate queuing system, which we don't have and probably are not going to get at the moment. So some of our choices are more limited in that regard.

I supposed that the config could be based on a particular provisioner, so for example a different max change before bulk sync setting could be done per item in the grouper-loader config file. That would allow you to set a different max change before bulk sync config per item. The bulk sync settings are already configurable per provisioner so that would make sense to me. However you make a good point that the setting for LDAP may need to be different than a setting for Google groups, etc.

I don't know how easy of difficult the above would be since I'm not in the code pretty much at all. I will say we use grouper pretty much as a stand alone product. The change log works pretty well for the most part in handling changes, but there is a point in diminishing returns when a large number of changes need to happen.

Jeffrey E. Crawford
Enterprise Service Team
    ^         ^
   / \  ^    / \    ^
  /   \/ \  /   \  / \
 /        \/     \/   \
/                      \

You have been assigned this mountain to prove to others that it *can* be moved.

On Thu, Jul 20, 2017 at 7:32 AM, Waldbieser, Carl <> wrote:
Jeffery,

The issue is specific to a class of provisioners.  If I assume updates dominate the work performed by LDAP services, then work performed by incremental updates to a group is O(n).  If I perform an update where I know the end state, that is O(1).

Compare that to a target that is a database (without transactions, as that will just make the example complex).  Suppose the database represents each group member as a single row.  In that case, incremental updates and bulk updates actually must perform the same amount of work.

In your specific example, it would actually be ideal that the composite could be edited in place without causing every member to be removed and many re-added.  It would be useful if one could create a new composite and "replace into" an existing composite.  In that case, only the actual differences would be reported.  This would more accurately reflect the *intent* of the changes an operator wanted to make.

The Lafayette LDAP provisioner does optimize to some extent to handle this case.  The provisioner does not process LDAP changes immediately as it receives them.  Instead, it collects the incremental changes in a database and processes them in batches at some short, configurable, regular interval (~20s in production).  This allows a couple optimizations:
1) If a subject has multiple add/removes to a specific group, only the last operation needs to be processed.
2) If multiple subjects are added/removed to/from a group, the group only needs to be updated 1 time for that batch.  The wider the update interval, the more subjects you can process per batch.

I have run into the scenario you are talking about.  In general, since I know it is going to create a lot of churn, my approach is to temporarily route the changes to a null route (via our rabbitMQ exchange) so the messages are discarded.  Once I am finished with the change, I re-instate the original route and then fire off a bulk sync.  Your suggestion would make that to happen automatically, and I agree it would be useful.  I am unsure though whether Grouper should *not* produce incremental changes for the change logger in this case, though.

Thanks,
Carl Waldbieser
ITS Systems Programmer
Lafayette College

----- Original Message -----
From: "Jeffrey Crawford" <>
To: "Gouper Users List" <>
Sent: Tuesday, July 18, 2017 2:26:52 PM
Subject: [grouper-users] Large number of changes and provisioning

We had an interesting case show up not so long ago, basically there was a
change in a group that in effect, removed everyone, and then added them all
back (composite group change), there were > 122,000 members of the group so
it cause a huge back log of changes that wound up taking quite a few hours.

Eventually I just stopped grouper, tagged the psp entry in the
grouper_change_log_consumer, to be the same number as syncGroups and
restarted, which performed a bulkSync. That only takes 30 min.

Additionally what I noticed is that our ldap servers were backed up quite a
bit as they were busy deleting records one at a time, and then adding them
again one at a time.

It got me to thinking that perhaps there should be a setting that will
identify how many records are supposed to change from the change log and
say if it's over 10,000, instead of processing the change log, it would
sync up the psp record to match syncGroups, and perform a bulk sync, which
is also easier on the LDAP servers as it does a compare and just modifies
what needs to change.

This setting would be settable by the admin since different environments
might find they should process 30,000 before the change log takes longer
than a bulk sync for example

Thoughts?

Jeffrey E. Crawford
Enterprise Service Team <>
    ^         ^
   / \  ^    / \    ^
  /   \/ \  /   \  / \
 /        \/     \/   \
/                      \

You have been assigned this mountain to prove to others that it *can* be
moved.




Archive powered by MHonArc 2.6.19.

Top of Page