shibboleth-dev - RE: DOMParser

Subject: Shibboleth Developers

List archive

RE: DOMParser

From: "Howard Gilbert" <>
To: <>
Subject: RE: DOMParser
Date: Fri, 19 Nov 2004 14:16:14 -0500

> Schema location-wise, I couldn't quite tell, but I'm a strict believer
> that
> XML files don't specify schema locations to my code, and I decide what
> schemas are physically behind a namespace or use a separate catalog/config
> to determine it. I'm not sure if that's what you were concerned about or
> arguing for.

There are two kinds of schema links. One is in the base XML file being
parsed:

<ShibbolethOriginConfig
xmlns="urn:mace:shibboleth:origin:1.0"
...
xsi:schemaLocation="urn:mace:shibboleth:origin:1.0 origin.xsd"

The schemaLocation references a namespace and some version of a file ID. In
many cases this is a URL. If, as here, it is a file name then a program
processing the file has to resolve it, and different tools make different
assumptions. You do want to have a schemaLocation of some sort in the file
so that XML editors can validate the source and help you make changes. The
problem is that whatever you put in this field has to be something the tool
can use, and the exact syntax may vary from one tool to another.

The good news is that all versions of our code either completely ignore this
field or they only look at the ending file name. Therefore, they will accept
and correctly process "origin.xsd", "../schemas/origin.xsd",
"c:\foobar\origin.xsd", and "http://www.foobar.edu/schemas/origin.xsd";
because only the ending origin.xsd part matters. So you can use any of these
forms to keep your XML editor happy, and our code still does the right
thing.

The other link is from one schema to another:

<xs:import namespace="urn:mace:shibboleth:credentials:1.0"
schemaLocation="credentials.xsd"/>

Same deal, different syntax.

Now in both the Apache and JAXP systems, you have two ways to handle XSD
files. You can provide a list of XSD files to be preloaded and used by the
parser before it starts, and you can provide an EntityResolver callback
object that takes the schemaLocation String (in any form), parses it in some
way, and resolves it to a source of XML.

So, now there are three approaches.

1) I think this represents the old Java code, specifies no pre-loaded XSD
files, and depends entirely on an EntityResolver to map all references to
any "foo.xsd" file to a Java resource of the form "/shemas/foo.xsd" and then
to a canonical path of the form
"c:\tomcat5504\webapps\shibboleth\WEB-INF\classes\schemas\foo.xsd" or its
local equivalent on any system. This applies to both types of links, so the
origin.xml file chooses to be validated by the origin.xsd schema. The
downside here occurs when the origin.xml file chooses to be validated by the
shibboleth-targetconfig-1.0.xsd schema. Then a target format file parses
without error, but produces a DOM with target elements that the origin code
will not understand. OK, so it won't get very far, because the first sanity
check is for the root element to be a "ShibbolethOriginConfig" and that will
fail. Still, it seems squishy for code to just believe that the XML
configuration file will choose the right schema.

2) The other extreme is for the code processing an origin.xml file (or any
Origin main configuration file) to know at the Java code level that an
origin configuration uses origin.xsd, credentials.xsd, and namemapper.xsd.
It then creates an array of three canonical filenames and passes them as the
argument to the setAttribute statement that sets the JAXP_SCHEMA_SOURCE
property (or its Apache equivalent). Now there is no EntityResolver, or else
there is a dummy that always returns null because all the files should have
been preloaded.

This is probably the best controlled version, but it does require the
programmer to drill down from link to link until he has found all of the
schemas that get called in directly (from the main schema) or indirectly
(from a linked schema or a schema linked from a linked schema.

3) An intermediate version, which tends to be what the current SP code does,
is for the code processing a shibboleth.xml file to know it begins with the
shibboleth-targetconfig-1.0.xsd file and just pass it to the parser.
However, it then also provides an EntityResolver that will process
references from the xs:import links of the schema. This way a schema file
can link to other schemas, but the base XML configuration file
schemaLocation is ignored (because that namespace schema has already been
loaded by the parser from the file name provided by the code). I am more
willing to trust the links in an XSD file that I provide than the links from
an XML file that the customer wrote.

We used to have 1-EntityResolver and may still in the Origin. The SP is
based on 3-Hybrid. I would be willing to do the research and move everything
to 2-JAXP_SCHEMA_SOURCE if that were the consensus.

DOMParser, Howard Gilbert, 11/19/2004
- RE: DOMParser, Scott Cantor, 11/19/2004
  - RE: DOMParser, Howard Gilbert, 11/19/2004
    - RE: DOMParser, Scott Cantor, 11/22/2004
      - RE: DOMParser, Howard Gilbert, 11/23/2004
- Re: DOMParser, Walter Hoehn, 11/19/2004

List archive

RE: DOMParser