I am trying to extract information using these but without any success. The xml files being prepared for all the subsequent urls of the website when passed in the program creates the xml file for the home page always.
If you have any answer please reply. Also if possible please provide a sample code so that i can look into it.
* Sunil Sharma wrote:
>Can we use tidy and xquery to extract information from an html page which
You can use Tidy to turn HTML-like documents into well-formed XHTML
documents and use XQuery on that like on any other XML document, all
that might be special here is that the elements would be in the XHTML
namespace which you have to specify in the query if you use element
execute any script. If you need that, you might want to use one of
the web browsers to convert to XHTML (you'd inject a script that
serializes the document when it is in the state you are interested in).
Bj?rn H?hrmann ? mailto:[hidden email] ? http://bjoern.hoehrmann.de Weinh. Str. 22 ? Telefon: +49(0)621/4309674 ? http://www.bjoernsworld.de 68309 Mannheim ? PGP Pub. KeyID: 0xA4357E78 ? http://www.websitedev.de/