Skip to Content.
Sympa Menu

grouper-dev - [grouper-dev] reducing excessive psp memory consumption

Subject: Grouper Developers Forum

List archive

[grouper-dev] reducing excessive psp memory consumption


Chronological Thread 
  • From: Tom Zeller <>
  • To: Grouper Dev <>
  • Subject: [grouper-dev] reducing excessive psp memory consumption
  • Date: Thu, 21 Jun 2012 10:57:12 -0500

The excessive use of memory by the psp came up during the grouper-dev
call yesterday, I had forgotten about this issue consciously, but for
some un-conscious reason had been looking at the relevant code this
week.

The psp provides, in addition to some spmlv2 crud operations, 6 types
of provisioning requests/responses, which may be boiled down to 2.
They are :

calc
diff
sync
bulkCalc
bulkDiff
bulkSync

where "bulk" means "for all source and target identifiers".

The calc-diff-sync triplet is based on the list/show-verify-repair
operations from the Nexus provisioning code at Memphis done by Walter
Hoehn. I could always remember "--verify -v" and "--repair -r",
probably because they have the same number of characters, but I never
could remember whether it was "--list -l" or "--show -s". I also
thought that "-v" is usually verbose. So I went with --calc , --diff,
and --sync (same number of characters, alphabetical order of
operation).

And the diff and sync operations are essentially the same, where
"--diff" equals "--dry-run --sync", meaning figure out all of the
changes but don't do them.

It would be nice if either spml or scim would provide some
standardization of the calc-diff-sync triplet.

Now to the memory issue. A bulk operation wraps multiple
calc|diff|sync operations. In psuedo-xml :

<bulkCalc>
<calc id='foo' />
<calc id='bar' />
...
</bulkCalc>

where a calc wraps an spmlv2 provisioned service object :

<calc 'foo' >
<identifier cn=foo,dc=edu />
<attribute cn=foo >
<reference memberOf cn=group,dc=edu >
</calc>

and diff and sync wrap spmlv2 modify requests/responses :

<diff|sync>
<modify ...>
<modify ...>
...
</diff|sync>

Examples are available as test resources in the various psp-example-*
projects.

Since spml supports pluggable profiles, the xml could be json. Right
now, attributes are dsml.

So, during a bulk operation, a java object representing the bulk
response is created, which consists of a list of child objects. At the
end of the bulk operation, the bulk java object is marshalled into xml
for logging or optionally written to a file.


These bulk objects consume memory proportional to the number of child
objects. And, these objects are based on the spmlv2 toolkit library,
which has custom [un]marshallers. Yeah, pause there for a second.


A bulkDiff response is very useful in that it helps you figure out
whether or not your data is in sync, and can reveal patterns as to how
it might have gotten out of sync, like a failed job in a source system
somewhere. So these representations are useful, but I imagine that
very few people use them programmatically, since no one has asked for
a parser.

We might provide a toggle whereby a bulk operation optionally includes
child objects. In other words, during scheduled provisioning cycles,
the bulk operation would return a response containing only "success"
or "failure" (with errors).

Another option is to stream the child objects of a bulk operation. If
thought about adding a spring-mvc project to the psp, with rest
annotations. Maybe someone has already figured this out ?

TomZ



Archive powered by MHonArc 2.6.16.

Top of Page