Skip to Content.
Sympa Menu

shibboleth-dev - Re: [Shib-Dev] Shibboleth 2.0 IdP xml digital signature

Subject: Shibboleth Developers

List archive

Re: [Shib-Dev] Shibboleth 2.0 IdP xml digital signature


Chronological Thread 
  • From: "Adam Lantos" <>
  • To:
  • Subject: Re: [Shib-Dev] Shibboleth 2.0 IdP xml digital signature
  • Date: Thu, 30 Oct 2008 18:24:55 +0100

> It doesn't matter what it thinks the encoding is. Base64 operates on the
> octets and produces ASCII, which should transmit cleanly no matter what the
> container thinks the encoding is. At the other end, you get back the
> octets, and the XML parser in the SP follows the encoding specified in the
> XML declaration (or assumes UTF-8).
>
> At least, that's my understanding of the process. If I'm wrong, it would be
> in the base64 step, I guess, but I'm pretty sure base64 is ignorant of
> character encoding of the data being encoded, for the simple reason that it
> works on binary data too.


Of course, the base64 works on octets. But Java is unicode inside.
That is, every single step to produce octets from java.lang.String
instances involves character encodings.


>> I probably found the root of this problem inside opensaml2.
>
> I don't think so.
>
>> Here, at method populateVelocityContext,
>>
>> String messageXML = XMLHelper.
>> nodeToString(
>> messageContext.getOutboundSAMLMessage().getDOM());
>
> I would assume nodeToString runs ignorant of the container transfer
> encoding, and would probably produce UTF-8 XML by default.

It will, because Strings are all unicode in java.

>
>> String encodedMessage =
>> Base64.encodeBytes(
>> messageXML.getBytes(), Base64.DONT_BREAK_LINES);
>>
>> I think that messageXML.getBytes() should be messageXML.getBytes("utf-8").
>
> I don't see that it matters. XML is self-describing in a sense, and if the
> transport medium is safe (and base64 is), it shouldn't affect anything. In
> fact, this is exactly *why* we use base64. WS-Federation sticks the XML
> directly into the form, HTML-encoded, but that is extremely messy and
> vulnerable to this kind of problem.


Here is my stripped-down version of the charset problem,

public static void main(String[] args) {
String unicodeString = "Ádám";
byte[] bytes = unicodeString.getBytes();
for (int i=0; i<bytes.length; i++) {
System.out.print(bytes[i] + " ");
}
System.out.println();
String decodedString = new String(bytes);
System.out.println(decodedString);
}

Compile it as UTF-8 source.

Then run it in unicode-aware environment

$ LC_ALL="en_US.utf8" java Main
-61 -127 100 -61 -95 109
Ádám

Then run it in non unicode-aware environment

$ LC_ALL="POSIX" java Main
63 100 63 109
?d?m


It is clear, that the response xml should be utf-8. When it's run in
non-unicode environments then you'll signature verification problem
since digest is computed on the unicode string. Additionally, all
non-ascii characters get replaced to question marks.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes()

"Encodes this String into a sequence of bytes using the platform's
default charset, storing the result into a new byte array. The
behavior of this method when this string cannot be encoded in the
default charset is unspecified."

So when you have multibyte attribute value data from LDAP, the
getBytes() will ignore utf-8 characters and replace them with '?'
(even this replacement is unspecified). Now I checked, the SAML2
Attributes Profile mentions utf-8 on LDAP attribute values, not the
SAML2 Profiles, my mistake. We agree that the XML Shib IdP produces
must be utf-8 in any cases, this now requires the container locale to
be set something unicode aware -utf8.

I think at least the IdP troubleshooting FAQ should include this requirement.


cheers,
Adam



Archive powered by MHonArc 2.6.16.

Top of Page