Regarding Tidy and XQUERY

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding Tidy and XQUERY

Sunil Sharma
Hi,
 
Can we use tidy and xquery to extract information from an html page which has a lot of javascript functions.
 
I am trying to extract information using these but without any success. The xml files being prepared for all the subsequent urls of the website when passed in the program creates the xml file for the home page always.
 
If you have any answer please reply. Also if possible please provide a sample code so that i can look into it.
 
Thanks in advance
 
Regards
 
Sunil Sharma
Software Engineer
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Tidy and XQUERY

Bjoern Hoehrmann

* Sunil Sharma wrote:
>Can we use tidy and xquery to extract information from an html page which
>has a lot of javascript functions.

You can use Tidy to turn HTML-like documents into well-formed XHTML
documents and use XQuery on that like on any other XML document, all
that might be special here is that the elements would be in the XHTML
namespace which you have to specify in the query if you use element
names in it. I'm not sure how JavaScript is relevant here, Tidy won't
execute any script. If you need that, you might want to use one of
the web browsers to convert to XHTML (you'd inject a script that
serializes the document when it is in the state you are interested in).
--
Bj?rn H?hrmann ? mailto:[hidden email] ? http://bjoern.hoehrmann.de
Weinh. Str. 22 ? Telefon: +49(0)621/4309674 ? http://www.bjoernsworld.de
68309 Mannheim ? PGP Pub. KeyID: 0xA4357E78 ? http://www.websitedev.de/