Skip to Content.
Sympa Menu

grouper-dev - Re: [grouper-dev] reducing excessive psp memory consumption

Subject: Grouper Developers Forum

List archive

Re: [grouper-dev] reducing excessive psp memory consumption

Chronological Thread 
  • From: Tom Barton <>
  • To:
  • Subject: Re: [grouper-dev] reducing excessive psp memory consumption
  • Date: Sun, 24 Jun 2012 09:35:29 -0500


If I understand correctly, a bulk object is created during the course of a bulk operation, but the object itself is not needed operationally. If that's so, then yes, an option to skip it or create an alternate with much smaller footprint sounds appropriate.

Could better error handling, as we've been speaking of in a different thread, provide diagnostic info that would reduce the added value of a complete bulk object? Riffing on your thought, maybe this suggests creating a stream of child objects that are discarded except for those children for which an error occurred.


On 6/21/2012 10:57 AM, Tom Zeller wrote:
The excessive use of memory by the psp came up during the grouper-dev
call yesterday, I had forgotten about this issue consciously, but for
some un-conscious reason had been looking at the relevant code this

The psp provides, in addition to some spmlv2 crud operations, 6 types
of provisioning requests/responses, which may be boiled down to 2.
They are :


where "bulk" means "for all source and target identifiers".

The calc-diff-sync triplet is based on the list/show-verify-repair
operations from the Nexus provisioning code at Memphis done by Walter
Hoehn. I could always remember "--verify -v" and "--repair -r",
probably because they have the same number of characters, but I never
could remember whether it was "--list -l" or "--show -s". I also
thought that "-v" is usually verbose. So I went with --calc , --diff,
and --sync (same number of characters, alphabetical order of

And the diff and sync operations are essentially the same, where
"--diff" equals "--dry-run --sync", meaning figure out all of the
changes but don't do them.

It would be nice if either spml or scim would provide some
standardization of the calc-diff-sync triplet.

Now to the memory issue. A bulk operation wraps multiple
calc|diff|sync operations. In psuedo-xml :

<calc id='foo' />
<calc id='bar' />

where a calc wraps an spmlv2 provisioned service object :

<calc 'foo'>
<identifier cn=foo,dc=edu />
<attribute cn=foo>
<reference memberOf cn=group,dc=edu>

and diff and sync wrap spmlv2 modify requests/responses :

<modify ...>
<modify ...>

Examples are available as test resources in the various psp-example-*

Since spml supports pluggable profiles, the xml could be json. Right
now, attributes are dsml.

So, during a bulk operation, a java object representing the bulk
response is created, which consists of a list of child objects. At the
end of the bulk operation, the bulk java object is marshalled into xml
for logging or optionally written to a file.

These bulk objects consume memory proportional to the number of child
objects. And, these objects are based on the spmlv2 toolkit library,
which has custom [un]marshallers. Yeah, pause there for a second.

A bulkDiff response is very useful in that it helps you figure out
whether or not your data is in sync, and can reveal patterns as to how
it might have gotten out of sync, like a failed job in a source system
somewhere. So these representations are useful, but I imagine that
very few people use them programmatically, since no one has asked for
a parser.

We might provide a toggle whereby a bulk operation optionally includes
child objects. In other words, during scheduled provisioning cycles,
the bulk operation would return a response containing only "success"
or "failure" (with errors).

Another option is to stream the child objects of a bulk operation. If
thought about adding a spring-mvc project to the psp, with rest
annotations. Maybe someone has already figured this out ?


Archive powered by MHonArc 2.6.16.

Top of Page