Tidy and removing attributes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Tidy and removing attributes

EL-HAID Nabil

hello,

I am a student in a french university and i am working over a project in
association with France Telecom R&D. the aim of the project is to extract
informations from bourse web sites.
I encountered a problem during my developpement: i don't know how to remove
all the tag's attributes when i transform my imput stream to an XML file
using Tidy.(i want to remove them because i work with XSLT processor)
for example:
the input stream look like this :

<body bgcolor="FFFFFF" bgsound="toto.wav"> ..... </body>

and the output.xml should look like this :

<body> ..... </body>

my question is: how can i use Tidy to reach my goal??

Any help would be greatly appreciated.

thank you



Reply | Threaded
Open this post in threaded view
|

Re: Tidy and removing attributes

Bjoern Hoehrmann

* EL-HAID Nabil wrote:
><body bgcolor="FFFFFF" bgsound="toto.wav"> ..... </body>
>
>and the output.xml should look like this :
>
><body> ..... </body>
>
>my question is: how can i use Tidy to reach my goal??

There is --drop-proprietary-attributes; if that does not help, you'll
have to make e.g. a XSLT document that removes the attributes you want
to have removed.
--
Bj?rn H?hrmann ? mailto:[hidden email] ? http://bjoern.hoehrmann.de
Weinh. Str. 22 ? Telefon: +49(0)621/4309674 ? http://www.bjoernsworld.de
68309 Mannheim ? PGP Pub. KeyID: 0xA4357E78 ? http://www.websitedev.de/ 

Reply | Threaded
Open this post in threaded view
|

Re: Tidy and removing attributes

Christopher Woods

"Bjoern Hoehrmann" <[hidden email]> wrote in message
news:[hidden email]...

>
> * EL-HAID Nabil wrote:
>><body bgcolor="FFFFFF" bgsound="toto.wav"> ..... </body>
>>
>>and the output.xml should look like this :
>>
>><body> ..... </body>
>>
>>my question is: how can i use Tidy to reach my goal??
>
> There is --drop-proprietary-attributes; if that does not help, you'll
> have to make e.g. a XSLT document that removes the attributes you want
> to have removed.

You can use the TidyAPIs to load/clean the source document and then iterate
over the document tree outputing the nodes (elements) and text data content.
This would give you complete control over which (if any) attributes you want
to output.