Skip to Content.
Sympa Menu

grouper-users - RE: [grouper-users] Grouper Shell : Export/Import encoding

Subject: Grouper Users - Open Discussion List

List archive

RE: [grouper-users] Grouper Shell : Export/Import encoding


Chronological Thread 
  • From: Chris Hyzer <>
  • To: "" <>, "" <>
  • Subject: RE: [grouper-users] Grouper Shell : Export/Import encoding
  • Date: Fri, 26 Mar 2010 02:57:07 -0400
  • Accept-language: en-US
  • Acceptlanguage: en-US

If you can confirm this was a parsing problem, caused by comments, that would
be useful. Until then I have a Jira which I closed. Here are the details
and the fix if you want to put it in your source and build, or get from SVN
(you need this in the 1.4 part during the export)

https://bugs.internet2.edu/jira/browse/GRP-405

First of all, on the pre 1.6 export/import, the creator and modifier might
not be maintained. In 1.6+ next generation export/import, it is maintained
exactly like the export.

Second, is the non ascii chars:

I tried this (from previous email):

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 3% resetRegistry()
Registry reset: all data deleted, and default data inserted, e.g. root stem
gsh 4% grouperSession = GrouperSession.startRootSession();
edu.internet2.middleware.grouper.GrouperSession:
6e3ec50d47704accb69f0325acd765a1,'GrouperSystem','application'
gsh 5% addRootStem("tést", "tést");
stem: name='tést' displayName='tést' uuid='31d6e793da3546f885214d37faebbc4e'
gsh 6% addGroup("tést", "àGroup", "àGroup");
group: name='tést:àGroup' displayName='tést:àGroup'
uuid='51d9d9fe3dd54e78a1fe8c5d49e31659'
gsh 7% exit

C:\mchyzer\grouper\v1_5\grouper\bin> gsh -xmlexport GrouperSystem
c:/temp/grouper.xml

export file is escaped:

<group extension='&#8230;Group'
displayExtension='&#8230;Group'
name='t&#8218;st:&#8230;Group'
displayName='t&#8218;st:&#8230;Group'
id='51d9d9fe3dd54e78a1fe8c5d49e31659'
contextId='72d3ffca40de432eae54874cdaac6f13'
>

but there are bad comments:

<!-- 't,st:...Group' -->

which caused this error:

unable to import from xml: Invalid byte 1 of 1-byte UTF-8 sequence.
edu.internet2.middleware.grouper.exception.GrouperException: Invalid byte 1
of 1-byte UTF-8 sequence.
at
edu.internet2.middleware.grouper.xml.XmlReader.getDocumentFromFile(XmlReader.java:65)

I reset the registry:

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 0% resetRegistry()
This db user 'grouper' and url 'jdbc:mysql://localhost:3306/grouper' are
allowed to be changed in the grouper.properties Continuing...
Registry reset: all data deleted, and default data inserted, e.g. root stem
gsh 2% exit

I search and replaced those chars in the comments in the XML file, and now it
imports:

C:\mchyzer\grouper\v1_5\grouper\bin>gsh -xmlimport GrouperSystem
c:/temp/grouper.xml

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 0% grouperSession = GrouperSession.startRootSession();
edu.internet2.middleware.grouper.GrouperSession:
d4ddc92109b44452a45d48116816add0,'GrouperSystem','application'
gsh 1% GroupFinder.findByName(grouperSession, "tést:àGroup", true);
group: name='tést:àGroup' displayName='tést:àGroup'
uuid='51d9d9fe3dd54e78a1fe8c5d49e31659'
gsh 2%

I can escape the comments in 1.4+, but I wonder if you are having the same
problem... my thought is that the XML parser unescapes that stuff on import
so it doesn't have to be explicitly unescaped... I added a test case that
will prove it is fixed, and will test the 1.6 export/import for international
characters.

The fix is:

XmlWriter: line 66:
FROM:
return "<!-- " + s + " -->";

TO (and import the class at top of file):
return "<!-- " + StringEscapeUtils.escapeXml(s) + " -->";

This is fixed in SVN in 1.4, 1.5, and 1.6. The 1.6 next generation export
import did not have this problem, it just exports the chars as is and works
fine somehow... I also added the legacy export/import test cases to the next
generation export import, all is good.

Regards,
Chris

-----Original Message-----
From: Chris Hyzer
Sent: Thursday, March 25, 2010 1:51 PM
To:
'';


Subject: RE: [grouper-users] Grouper Shell : Export/Import encoding



> First of all, there is a problem with the columns creator_id, modifier_id,
> modify_time and create_time of most tables. Those columns are set to the
> user who imports the XML

> and the time of the import.

> Is there anything we can do in order to prevent this from happening? (We
> have 2 types of groups defined by those informations : user created and
> automatic)

I don't think so, but the 1.6 version of xmlimport and export will not have
this issue.


> Second point is the XML encoding. As we use grouper in an french context,
> we could have some accents (example: é, à, ...) which are correctly encoded
> in an ISO mode

> ( è => &#232; ) using "StringEscapeUtils.escapeXml" in grouper.
> But when we imports the XML, There is no "StringEscapeUtils.unescapeXml" so
> the datas are inserted in the database as recovered (with encoding).
> Both databases are UTF-8 encoded.
>
> Is it unknown? or is there a solution?

I tried this:

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 3% resetRegistry()
Registry reset: all data deleted, and default data inserted, e.g. root stem
gsh 4% grouperSession = GrouperSession.startRootSession();
edu.internet2.middleware.grouper.GrouperSession:
6e3ec50d47704accb69f0325acd765a1,'GrouperSystem','application'
gsh 5% addRootStem("tést", "tést");
stem: name='tést' displayName='tést' uuid='31d6e793da3546f885214d37faebbc4e'
gsh 6% addGroup("tést", "àGroup", "àGroup");
group: name='tést:àGroup' displayName='tést:àGroup'
uuid='51d9d9fe3dd54e78a1fe8c5d49e31659'
gsh 7% exit

C:\mchyzer\grouper\v1_5\grouper\bin> gsh -xmlexport GrouperSystem
c:/temp/grouper.xml

export file is escaped:

<group extension='&#8230;Group'
displayExtension='&#8230;Group'
name='t&#8218;st:&#8230;Group'
displayName='t&#8218;st:&#8230;Group'
id='51d9d9fe3dd54e78a1fe8c5d49e31659'
contextId='72d3ffca40de432eae54874cdaac6f13'
>

but there are bad comments:

<!-- 't,st:.Group' -->

which caused this error:

unable to import from xml: Invalid byte 1 of 1-byte UTF-8 sequence.
edu.internet2.middleware.grouper.exception.GrouperException: Invalid byte 1
of 1-byte UTF-8 sequence.
at
edu.internet2.middleware.grouper.xml.XmlReader.getDocumentFromFile(XmlReader.java:65)

I reset the registry:

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 0% resetRegistry()
This db user 'grouper' and url 'jdbc:mysql://localhost:3306/grouper' are
allowed to be changed in the grouper.properties
Continuing...
Registry reset: all data deleted, and default data inserted, e.g. root stem
gsh 2% exit

I search and replaced those chars in the comments in the XML file, and now it
imports:

C:\mchyzer\grouper\v1_5\grouper\bin>gsh -xmlimport GrouperSystem
c:/temp/grouper.xml

C:\mchyzer\grouper\v1_5\grouper\bin>gsh

gsh 0% grouperSession = GrouperSession.startRootSession();
edu.internet2.middleware.grouper.GrouperSession:
d4ddc92109b44452a45d48116816add0,'GrouperSystem','application'
gsh 1% GroupFinder.findByName(grouperSession, "tést:àGroup", true);
group: name='tést:àGroup' displayName='tést:àGroup'
uuid='51d9d9fe3dd54e78a1fe8c5d49e31659'
gsh 2%

I can escape the comments in 1.4+, but I wonder if you are having the same
problem... my thought is that the XML parser unescapes that stuff on import
so it doesn't have to be explicitly unescaped... I added a test case that
will prove it is fixed, and will test the 1.6 export/import for international
characters.

More soon...

Chris




Archive powered by MHonArc 2.6.16.

Top of Page