Skip to Content.
Sympa Menu

grouper-users - RE: [grouper-users] Grouper Loader - of another type

Subject: Grouper Users - Open Discussion List

List archive

RE: [grouper-users] Grouper Loader - of another type


Chronological Thread 
  • From: Jim Fox <>
  • To: "Black, Carey M." <>
  • Cc: "Gettes, Michael" <>, "" <>
  • Subject: RE: [grouper-users] Grouper Loader - of another type
  • Date: Wed, 13 Feb 2019 19:58:25 -0800 (PST)


It seems there is some attachment to loader jobs that is preventing you from finding the flexibility you seek.

We send group events to a message topic (aws language), and anyone can receive them. From these messages anyone can provision whatever resource they desire. It is beyond our control and I, personally, appreciate that. Yes, they might come back to our API for details, but we can support that. It's not a problem.

Jim

On Thu, 14 Feb 2019, Black, Carey M. wrote:

Date: Thu, 14 Feb 2019 03:38:44 +0000
From: "Black, Carey M." <>
To: "Gettes, Michael" <>
Cc: "" <>
Subject: RE: [grouper-users] Grouper Loader - of another type


Michael,


Rest (or Soap) endpoints would be a nice feature for a loader job to be able
to connect to. However, the inherent problem is
that (unlike LDAP/SQL) the constructs for ?selecting data? is not as simple
as ?aname? (attribute/column name).

At the exit of the REST/SOAP interface is a ?TBD? document (json/xml formats
are common, but not strictly correct all of the time
either) that you need to be able to parse and ?get the data you want? out.
Simply put, it is not trivial to generically do that to
*just any* REST target. Especially since ?REST? is very much subject to : ?And thirdly,
the code is more what you'd call "guidelines"
than actual rules.?[1] ( You at least need to invent an ?extraction layer?
and a ?mapping layer? to get data from the external formats to
an internal format. ) And establishing those loader jobs would require those
extraction and mapping layers to be implemented
(configured and/or written ) by the implementer.


I cannot say that it is impossible. However, it would be ?alot? of work
inside the Grouper project.

And since at the end you likely would need to code up some Java anyways?. You
might as well just ?do it all yourself? and skip the
fancy ?generic? extraction/mapping layers/tools.




On the other hand?. ( If you are willing to load SQL staging tables?.)


Personally, I am looking down a path to ingest REST based data too. I have
started down a path based on work done by another
University. REF:
https://spaces.at.internet2.edu/display/Grouper/Newcastle+University+Introduction
( Thank you by the way! )

Basically using an ETL ( Extraction, Transformation, Load ) tool to get the
job done.

NOTE: There are many ETL tools to pick from. Talend is free, and it has it?s
quarks. ( REF:
https://www.talend.com/products/data-integration/data-integration-open-studio/
FWIW: Talend talks to just about every data
source you can think of. ( https://www.talendforge.org/components/index.php ,
Edition ?Talend Open Studio for Data Integration? ,
click ?Show All?, then do a find for what you are looking for. J ) So I think
this general approach should work for lots of sources.
)


I have been able to get as far as reading from a bespoke REST (web service)
and loading the data into ?shadow tables? in the Grouper
DB. Then I was going to trigger a set of SQL loader jobs to pull the data
into Grouper proper. J (With a rather hacky way to
fork gsh scripts to kick off the loader jobs at the end of the ETL work. L )



Now, if someone was very enterprising, I could also see a possibility for
grouper API to be called directly from inside Talend.
( It is Java based, and has the ability to call ?custom code? as a built in
feature. ) While I quickly considered that idea, I also
dismissed it.

The work to maintain that code may, over time, not be worth the effort. And
given that the ETL model ( at least in Talend )
appears to be ?row centric? model, you would likely be making more Grouper
API calls than the Loader would need to by ingesting an
SQL table. ( And frankly using some SQL trickery before the loader job an
using an incremental loader job would likely also
greatly reduce the Grouper API calls even more dramatically.) However
starting the Loader job(s) without forking a process (to
spin up a gsh script) would be nice. I just have not decided how hard it
would be to get that level of integration into Talend.
Frankly having a Web Service call that could start a Loader process might be
an ideal approach. However, that Grouper Web
Service would need some ACL?s on it too. J )


Picture loading multiple tables with 100K(s) rows. ( via talend )

Use some SQL to ?diff? those rows with Grouper tables ( or a previous copy of
the staging table ) to find the ?nothing to do here? rows
and drop them.

Then only process the ?new? or ?removed? rows in an SQL incremental loader
process. ( with add / remove defined for each row that is
left.)


That approach pushes the ETL and diff processing outside of Grouper and
reduces the Grouper API work to a minimum. (Just adds
and removes) And uses ?standard, built in features? as much as possible too.

True there may be issue with the SQL diff process against the main Grouper
tables. (Thus the use of shadow table(s)) I also
considered offloading all of that DB work to a ?read only replicated copy? of
the DB too. But that could lead to race conditions. (
Unless the groups are strictly controlled by the above process.) And likely
is more complicated than most users would want to
get too. Yet, it should work efficiently, effectively and scale well too.


My approach is still a work in progress. Others have already blazed this path
too. ( YMMV, Y(Value)MV too. )


Likely not the answer you wanted, but I bet you can get this up and running
in a few weeks with little to no code written. J If
not, let me know.


--

Carey Matthew


[1] Quote from Captin Barbossa from the Pirate of the Caribbean movie series.
REF:
https://en.wikiquote.org/wiki/Pirates_of_the_Caribbean:_The_Curse_of_the_Black_Pearl


From:
<> On Behalf Of Gettes, Michael
Sent: Wednesday, February 13, 2019 8:45 PM
To: Richard Frovarp <>
Cc:
Subject: Re: [grouper-users] Grouper Loader - of another type


I?m wanting to manage many 10s of thousands of groups. I believe that would
mean many thousands of calls via web services whereby
a loader job would handle this all in bulk? the memberships, the privs, the
names/descriptions all in one loader job. The scale, I
believe, is best handled by the loader job.


I hope this helps.


/mrg



On Feb 13, 2019, at 8:09 PM, Richard Frovarp <>
wrote:


Grouper newbie here. I would likely need something similar. Why not write
intermediary code to use Grouper web services?
That was my plan, so I'm curious as to what I'm missing.


________________________________________________________________________________________________________________________________


From:
<> on behalf of Gettes, Michael
<>
Sent: Wednesday, February 13, 2019 4:36:55 PM
To:
Subject: [grouper-users] Grouper Loader - of another type


Hi all,

Currently, grouper supports loader jobs of LDAP and SQL and an additional
capability to inject messages to process changes
related to an individual - a way of sparking loader jobs for one person
instead of in bulk - at least this is my
interpretation.

I have a need for loader jobs to be of an arbitrary nature - call a program,
written in any language, which might do REST
calls or whatever and return, in bulk, something similar to what the loader
job now receives from SQL/LDAP. This way I can
go against alternative sources without the need of staging the data into
LDAP/SQL but get all the benefits and scale of a
grouper loader job.

Does anyone else see a need for this? Grouper dev dudes? (and dudettes)? have
you considered this? I can only assume you have
since you guys have thought of a great many things for grouper.

Many thanks for your time and consideration especially if you choose to
respond.

/mrg







Archive powered by MHonArc 2.6.19.

Top of Page