Skip to Content.
Sympa Menu

grouper-users - Re: [grouper-users] xml-export: Inconsistent enconding in export file

Subject: Grouper Users - Open Discussion List

List archive

Re: [grouper-users] xml-export: Inconsistent enconding in export file


Chronological Thread 
  • From: "GW Brown, Information Systems and Computing" <>
  • To: Loris Bennett <>, Grouper Users Mailing List <>
  • Subject: Re: [grouper-users] xml-export: Inconsistent enconding in export file
  • Date: Fri, 10 Oct 2008 08:01:03 +0100

Hi Loris,

That would explain it - my sources.xml has:

<?xml version="1.0" encoding="utf-8"?>

and so SourceManager tried unsuccessfully to read the character as UTF-8.

In the absence of an explicit encoding the system default would be used - the System property 'file.encoding' shows your default and can vary depending on the OS.

On my system I get:

file.encoding=Cp1252


Easy way to check is to add the following target to a build.xml:

<target name="enc">
<echo message="file.encoding=${file.encoding}"/>
</target>

Should you want to you can override the default by setting the system property i.e

-Dfile.encoding=<encoding>

Gary

--On 10 October 2008 08:40 +0200 Loris Bennett <> wrote:

Hi Gary,

In my source.xml I have 'ä'. In hex it is C3A4, so the file is UTF-8. It
is maybe somewhat surprising that this works, since there is no
<?xml>-tag with the encoding.

Are there any external dependencies which play a role in XML parsing
which could vary from platform to platform? I am running 64bit Debian
etch.

Anyway, thanks for correcting the problem.

Loris

On Thu, 2008-10-09 at 16:17 +0100, GW Brown, Information Systems and
Computing wrote:
Hi Loris,

I think I've fixed this now - in CVS. The code which exports group and
stem attributes was 'escaping' the output, but the source name was not
being escaped.

I modified:
private void _writeSubjectSourceMetaData(Source sa)
in XmlExporter so that name is escaped - id was escaped already.

this.xml.internal_puts("name=" + Quote.single( XML.escape(sa.getName())
) );

Out of interest, what did you have in sources.xml - ä or &#228;? When I
tried ä SourceManager would not parse sources.xml.

Gary

--On 11 September 2008 12:24 +0200 Loris Bennett
<>
wrote:

> Hi,
>
> An import of an export from Grouper gave me the following error:
>
> [java] [Fatal Error] export-cats-and-dogs.xml:145:30: Invalid byte
> 2 of 3-byte UTF-8 sequence.
>
> Looking at the XML, I see that different encoding is used for the
> umlaut in the 'name' attribute of the 'source' tag to that used for the
> contents of the 'description' tag.
>
> <source id='fub'
> name='Freie Universität Berlin'
>
> class='edu.internet2.middleware.subject.provider.JDBCSourceAdapter'
> >
> <subjectType name='person'/>
> </source>
> </subjectSourceMetaData>
> </metadata>
> <data>
>
> <!-- 'fub' -->
> <stem extension='fub'
> displayExtension='FU Berlin'
> name='fub'
> displayName='FU Berlin'
> id='148c85b9-9ed8-4721-b33b-65c71025f938'
> >
> <description>Freie Universit&#228;t Berlin</description>
>
> Replacing the 'ä' with '&#228;' allows the import to succeed.
>
> Is this a known issue?
>
> Loris
>
> --
> Dr. Loris Bennett
> Computer Centre
> Freie Universität Berlin
> Berlin, Germany
>



----------------------
GW Brown, Information Systems and Computing





----------------------
GW Brown, Information Systems and Computing




Archive powered by MHonArc 2.6.16.

Top of Page