Skip to Content.
Sympa Menu

grouper-dev - Re: [grouper-dev] reducing excessive psp memory consumption

Subject: Grouper Developers Forum

List archive

Re: [grouper-dev] reducing excessive psp memory consumption


Chronological Thread 
  • From: Tom Zeller <>
  • To: Tom Barton <>
  • Cc:
  • Subject: Re: [grouper-dev] reducing excessive psp memory consumption
  • Date: Tue, 26 Jun 2012 17:37:04 -0500

I committed fixes for 2.1.1. The bulk sync code has been improved, and
the giant bulk java xml objects may optionally be omitted. The bug has
a few more details.

https://bugs.internet2.edu/jira/browse/GRP-812

The child objects of a bulk object are logged by default, so there is
a record if folks need one.

I am happy this is in 2.1.1, which represents performance
improvements, faster and less memory.

On Sun, Jun 24, 2012 at 9:35 AM, Tom Barton
<>
wrote:
> TomZ,
>
> If I understand correctly, a bulk object is created during the course of a
> bulk operation, but the object itself is not needed operationally. If that's
> so, then yes, an option to skip it or create an alternate with much smaller
> footprint sounds appropriate.
>
> Could better error handling, as we've been speaking of in a different
> thread, provide diagnostic info that would reduce the added value of a
> complete bulk object? Riffing on your thought, maybe this suggests creating
> a stream of child objects that are discarded except for those children for
> which an error occurred.
>
> TomB
>
>
> On 6/21/2012 10:57 AM, Tom Zeller wrote:
>>
>> The excessive use of memory by the psp came up during the grouper-dev
>> call yesterday, I had forgotten about this issue consciously, but for
>> some un-conscious reason had been looking at the relevant code this
>> week.
>>
>> The psp provides, in addition to some spmlv2 crud operations, 6 types
>> of provisioning requests/responses, which may be boiled down to 2.
>> They are :
>>
>>  calc
>>  diff
>>  sync
>>  bulkCalc
>>  bulkDiff
>>  bulkSync
>>
>> where "bulk" means "for all source and target identifiers".
>>
>> The calc-diff-sync triplet is based on the list/show-verify-repair
>> operations from the Nexus provisioning code at Memphis done by Walter
>> Hoehn. I could always remember "--verify -v" and "--repair -r",
>> probably because they have the same number of characters, but I never
>> could remember whether it was "--list -l" or "--show -s". I also
>> thought that "-v" is usually verbose. So I went with --calc , --diff,
>> and --sync (same number of characters, alphabetical order of
>> operation).
>>
>> And the diff and sync operations are essentially the same, where
>> "--diff" equals "--dry-run --sync", meaning figure out all of the
>> changes but don't do them.
>>
>> It would be nice if either spml or scim would provide some
>> standardization of the calc-diff-sync triplet.
>>
>> Now to the memory issue. A bulk operation wraps multiple
>> calc|diff|sync operations. In psuedo-xml :
>>
>> <bulkCalc>
>>  <calc id='foo' />
>>  <calc id='bar' />
>>  ...
>> </bulkCalc>
>>
>> where a calc wraps an spmlv2 provisioned service object :
>>
>>  <calc 'foo'>
>>   <identifier cn=foo,dc=edu />
>>   <attribute cn=foo>
>>   <reference memberOf cn=group,dc=edu>
>> </calc>
>>
>> and diff and sync wrap spmlv2 modify requests/responses :
>>
>> <diff|sync>
>>  <modify ...>
>>  <modify ...>
>>  ...
>> </diff|sync>
>>
>> Examples are available as test resources in the various psp-example-*
>> projects.
>>
>> Since spml supports pluggable profiles, the xml could be json. Right
>> now, attributes are dsml.
>>
>> So, during a bulk operation, a java object representing the bulk
>> response is created, which consists of a list of child objects. At the
>> end of the bulk operation, the bulk java object is marshalled into xml
>> for logging or optionally written to a file.
>>
>>
>> These bulk objects consume memory proportional to the number of child
>> objects. And, these objects are based on the spmlv2 toolkit library,
>> which has custom [un]marshallers. Yeah, pause there for a second.
>>
>>
>> A bulkDiff response is very useful in that it helps you figure out
>> whether or not your data is in sync, and can reveal patterns as to how
>> it might have gotten out of sync, like a failed job in a source system
>> somewhere. So these representations are useful, but I imagine that
>> very few people use them programmatically, since no one has asked for
>> a parser.
>>
>> We might provide a toggle whereby a bulk operation optionally includes
>> child objects. In other words, during scheduled provisioning cycles,
>> the bulk operation would return a response containing only "success"
>> or "failure" (with errors).
>>
>> Another option is to stream the child objects of a bulk operation. If
>> thought about adding a spring-mvc project to the psp, with rest
>> annotations. Maybe someone has already figured this out ?
>>
>> TomZ



Archive powered by MHonArc 2.6.16.

Top of Page