perfsonar-dev - Re: [pS-dev] Alternative, fast XML parser
Subject: perfsonar development work
List archive
- From: Jochen Reinwand <>
- To:
- Subject: Re: [pS-dev] Alternative, fast XML parser
- Date: Tue, 10 Jul 2007 16:36:04 +0200
- Organization: DFN Verein
Hi all,
Some news on XML performance issues:
Since we had severe performance problems with our Perl perfSONAR services
(normally only occurring during connection to Hades MA via perfSONAR-UI and
CNM), we did quite a lot of performance testing during the last days.
It didn't take long to find a lot of small bottlenecks. Some smaller "real"
bugs and some other small pieces that we implemented different to gain a
better performance. Most of the improvements shortened the execution of the
problematic requests by a few minutes.
Just to give you an impression of what I'm talking about: David sends a
request to our MA in order to get the current data for all our measurements.
That makes about 3000 data triggers to be filled with data. Before our
efforts to increase performance answering this request took hours! The
smaller problems I pointed out above were responsible only for a few minutes
more or less.
The _real_ bottleneck was quite a shock for me:
We are using the Perl interface to the standard Open Source XML library
libxml. One important step for creating a response to an MA request is to
find the data trigger for a given data id. After finding the trigger you have
to add the data. For the request David sends to our MA around 3000 data
triggers have to be filled. OK, then creating so much XML content is the
problem? No! In our case it was the step of finding the trigger.
Let me show you some (pseudo) code. Both snippets do (nearly) the same. The
first was our first solution using standard DOM methods. The second is the
new version using the XPath based "find" function of XML::LibXML.
---
foreach $datanode ($dom->getElementsByTagNameNS("ns_nmwg", "data")){
next unless $datanode->getAttribute("id") eq $dataid;
...
}
---
$datanode =
$dom->find("/ns_nmwg:message/ns_nmwg:data[\@id='$dataid']");
if (defined $datanode && $datanode->isa("XML::LibXML::Element")) {
...
}
---
Unbelievable that this small difference makes a runtime difference of hours
compared to minutes!!
Further lessons learned from our performance tests:
- Looks like libxml is _really_ fast. Even faster than implementing things
optimised for perfSONAR in Perl. Therefore: We are using XML::LibXML and we
are not planning to change that ;-)
- The data triggers make things quite slow...
regards,
Jochen
On Tuesday 19 June 2007 10:38, Jochen Reinwand wrote:
> Hi all,
>
> I just stumbled across an interesting XML Parser that also has Java
> bindings: http://vtd-xml.sourceforge.net/
>
> From the homepage:
>
> - VTD-XML is ideally suited for building SOA applications
> - The world's most memory-efficient (1.3x~1.5x the size of an XML document)
> random-access XML parser.
> - The world's fastest XML processor: On an Athlon64 3400+ PC, VTD-XML
> significantly (1.5x~2x) outperforms SAX parsers with NULL content handler,
> delivering 50~60 MB/sec sustained throughput, without sacrificing random
> access.
>
> To sum up: It's perfectly suited for large XML files and perfSONAR often
> has to deal with large XML files...
>
> regards,
> Jochen
--
Jochen Reinwand, DFN-Labor
Friedrich-Alexander-Universität Erlangen-Nürnberg
Regionales RechenZentrum Erlangen (RRZE)
Martensstraße 1, 91058 Erlangen, Germany
Tel. +49 9131 85-28689, -28800, Fax +49 9131 302941
www.win-labor.dfn.de
- Re: [pS-dev] Alternative, fast XML parser, Jochen Reinwand, 07/10/2007
Archive powered by MHonArc 2.6.16.