How can I prevent XML::XPath from fetching a DTD while processing an XML file?

My XML (a.xhtml) starts like this

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...

My code starts like this

use XML::XPath;

use XML::XPath::XMLParser;

my $xp = XML::XPath->new(filename => "a.xhtml");

my $nodeset = $xp->find('/html/body//table'); 

It's very slow, and it turns out that it spends a lot of time getting the DTD (http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd).

Is there a way to explicitly declare an HTTP proxy server in the Perl XML:: family? I hate to modify the original a.xhtml document like having a local copy of the DTD.

Answers


XML::XPath is based on XML::Parser. There is an option in XML::Parser to NOT use LWP to resolve external entities (such as DTDs). And XML::XPath lets you pass an XML::Parser objetc, to use as the parser.

So you can write this:

my $p = XML::Parser->new( NoLWP => 1);
my $xp= XML::XPath->new( parser => $p, filename => "a.xhtml");

Note that in this case you will loose all entities except numerical ones and the default ones (>, <, &, ' and "). The parser will not complain, but they will disappear silently (try including &alpha; in the table and printing it for example).

As a matter of fact you probably should not use XML::XPath, which is not actively maintained.

Try XML::LibXML, if you have no problem with installing libxml2, its interface is very similar to XML::XPath as they both implement the DOM. XML::LibXML is also much more powerful than XML::XPath, and faster to boot. If you want an expat/XML::Parser based module, they you might want to have a look at XML::Twig (that's blatant self-promotion as I am the author of the module, sorry). Also for HTML/dodgy XHTML, you can use HTML::TreeBuilder, which, with the addition of HTML::TreeBuilder::XPath (also by me), supports XPath.


Need Your Help

iPhone iOS how to programmatically check for characters like ✭ ♦ (and others from this set)?

iphone ipad ios6 ascii nscharacterset

I know that both ✭star and ♦ diamond are from the ASCII extended character set. But is there some NScharacterSet available on iOS that I can use to check for characters like these programmatically?

Should I use HTML5 Canvas or CSS3 Sprites to animate objects for a game?

animation html5 canvas

I'm in the process of starting to build a strategy game (think warcraft) for the web. I've been doing research on HTML5 Canvas and CSS3 sprites and still can't decide which technology to use.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.