removing dom nodes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

removing dom nodes

folkert
Hi,

While iterating the html-document, I would like to remove some nodes.
How can I accomplish this? I don't think I saw anything in the api for
that.


Folkert van Heusden


Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

Marvin Reimer
Removing acceptable DOM nodes is not the task of html-tidy. You need a HTML parser for that after doing html-tidy.

Marvin Reimer

2015-07-12 13:25 GMT+02:00 folkert <[hidden email]>:
Hi,

While iterating the html-document, I would like to remove some nodes.
How can I accomplish this? I don't think I saw anything in the api for
that.


Folkert van Heusden



Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

folkert
Then let's extend tidy to be not only a html purifyer but to be a dom
parser as well.

On Wed, Jul 15, 2015 at 11:11:41AM +0200, Marvin Reimer wrote:

> Removing acceptable DOM nodes is not the task of html-tidy. You need a HTML
> parser for that after doing html-tidy.
>
> Marvin Reimer
>
> 2015-07-12 13:25 GMT+02:00 folkert <[hidden email]>:
>
> > Hi,
> >
> > While iterating the html-document, I would like to remove some nodes.
> > How can I accomplish this? I don't think I saw anything in the api for
> > that.
> >
> >
> > Folkert van Heusden
> >
> >
> >


Folkert van Heusden

--
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

folkert
In reply to this post by Marvin Reimer
This is confusing: the libtidy on github actually merged my patch for
providing that dom-removing.

What is your role in the github versus sourceforge versions?

On Wed, Jul 15, 2015 at 11:11:41AM +0200, Marvin Reimer wrote:

> Removing acceptable DOM nodes is not the task of html-tidy. You need a HTML
> parser for that after doing html-tidy.
>
> Marvin Reimer
>
> 2015-07-12 13:25 GMT+02:00 folkert <[hidden email]>:
>
> > Hi,
> >
> > While iterating the html-document, I would like to remove some nodes.
> > How can I accomplish this? I don't think I saw anything in the api for
> > that.
> >
> >
> > Folkert van Heusden
> >
> >
> >


Folkert van Heusden

--
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

Marvin Reimer
So why are you asking for that feature when you fixed it yourself??

Source link?

html tidy on sourceforge is the original one (HTML4). It is no longer maintained as far as I know.
https://github.com/htacg/tidy-html5 is the new actual one, forked from w3c and now htacg.

I'm not involved in either of one, just a heavy user.

2015-07-15 11:35 GMT+02:00 folkert <[hidden email]>:
This is confusing: the libtidy on github actually merged my patch for
providing that dom-removing.

What is your role in the github versus sourceforge versions?

On Wed, Jul 15, 2015 at 11:11:41AM +0200, Marvin Reimer wrote:
> Removing acceptable DOM nodes is not the task of html-tidy. You need a HTML
> parser for that after doing html-tidy.
>
> Marvin Reimer
>
> 2015-07-12 13:25 GMT+02:00 folkert <[hidden email]>:
>
> > Hi,
> >
> > While iterating the html-document, I would like to remove some nodes.
> > How can I accomplish this? I don't think I saw anything in the api for
> > that.
> >
> >
> > Folkert van Heusden
> >
> >
> >


Folkert van Heusden

--
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: <a href="tel:%2B31-6-41278122" value="+31641278122">+31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

folkert
> So why are you asking for that feature when you fixed it yourself??

I would like it to be in the main distribution.
That's because the product i'm developing would be using the adapted
libtidy (an open source product).

> Source link?

> html tidy on sourceforge is the original one (HTML4). It is no longer
> maintained as far as I know.
> https://github.com/htacg/tidy-html5 is the new actual one, forked from w3c
> and now htacg.

Right. That's the one with my change.

> I'm not involved in either of one, just a heavy user.


Folkert van Heusden

--
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

Marvin Reimer
I'm sure this is not possible. Look at the source code tree. It is really long ago the last changes got into the old libtidy:

You should open a github issue on tidy-html5 and ask them.

Marvin

2015-07-15 11:49 GMT+02:00 folkert <[hidden email]>:
> So why are you asking for that feature when you fixed it yourself??

I would like it to be in the main distribution.
That's because the product i'm developing would be using the adapted
libtidy (an open source product).

> Source link?

> html tidy on sourceforge is the original one (HTML4). It is no longer
> maintained as far as I know.
> https://github.com/htacg/tidy-html5 is the new actual one, forked from w3c
> and now htacg.

Right. That's the one with my change.

> I'm not involved in either of one, just a heavy user.


Folkert van Heusden

--
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: <a href="tel:%2B31-6-41278122" value="+31641278122">+31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

folkert
It is in the github version.

You're responding to a mail from before the submit to them.

On Wed, Jul 15, 2015 at 11:54:12AM +0200, Marvin Reimer wrote:

> I'm sure this is not possible. Look at the source code tree. It is really
> long ago the last changes got into the old libtidy:
> http://tidy.cvs.sourceforge.net/viewvc/tidy/tidy/src/
>
> You should open a github issue on tidy-html5 and ask them.
>
> Marvin
>
> 2015-07-15 11:49 GMT+02:00 folkert <[hidden email]>:
>
> > > So why are you asking for that feature when you fixed it yourself??
> >
> > I would like it to be in the main distribution.
> > That's because the product i'm developing would be using the adapted
> > libtidy (an open source product).
> >
> > > Source link?
> >
> > > html tidy on sourceforge is the original one (HTML4). It is no longer
> > > maintained as far as I know.
> > > https://github.com/htacg/tidy-html5 is the new actual one, forked from
> > w3c
> > > and now htacg.
> >
> > Right. That's the one with my change.
> >
> > > I'm not involved in either of one, just a heavy user.
> >
> >
> > Folkert van Heusden
> >
> > --
> > MultiTail is een flexibele tool voor het volgen van logfiles en
> > uitvoer van commando's. Filteren, van kleur voorzien, mergen,
> > 'diff-view', etc. http://www.vanheusden.com/multitail/
> > ----------------------------------------------------------------------
> > Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
> >


Folkert van Heusden

--
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
               windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

folkert
Praveen,

In the e-mail you received when subscribing to this mailing list, you
can find instructions on how to unsubscribe.

On Wed, Jul 15, 2015 at 03:52:29PM +0530, Praveenkumar [SHLOKLABS] wrote:

> Remove my email id
>
> -----Original Message-----
> From: folkert [mailto:[hidden email]]
> Sent: Wednesday, July 15, 2015 3:41 PM
> To: Marvin Reimer
> Cc: [hidden email]
> Subject: Re: removing dom nodes
>
> It is in the github version.
>
> You're responding to a mail from before the submit to them.
>
> On Wed, Jul 15, 2015 at 11:54:12AM +0200, Marvin Reimer wrote:
> > I'm sure this is not possible. Look at the source code tree. It is
> > really long ago the last changes got into the old libtidy:
> > http://tidy.cvs.sourceforge.net/viewvc/tidy/tidy/src/
> >
> > You should open a github issue on tidy-html5 and ask them.
> >
> > Marvin
> >
> > 2015-07-15 11:49 GMT+02:00 folkert <[hidden email]>:
> >
> > > > So why are you asking for that feature when you fixed it yourself??
> > >
> > > I would like it to be in the main distribution.
> > > That's because the product i'm developing would be using the adapted
> > > libtidy (an open source product).
> > >
> > > > Source link?
> > >
> > > > html tidy on sourceforge is the original one (HTML4). It is no
> > > > longer maintained as far as I know.
> > > > https://github.com/htacg/tidy-html5 is the new actual one, forked
> > > > from
> > > w3c
> > > > and now htacg.
> > >
> > > Right. That's the one with my change.
> > >
> > > > I'm not involved in either of one, just a heavy user.
> > >
> > >
> > > Folkert van Heusden
> > >
> > > --
> > > MultiTail is een flexibele tool voor het volgen van logfiles en
> > > uitvoer van commando's. Filteren, van kleur voorzien, mergen,
> > > 'diff-view', etc. http://www.vanheusden.com/multitail/
> > > --------------------------------------------------------------------
> > > --
> > > Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
> > >
>
>
> Folkert van Heusden
>
> --
> www.vanheusden.com/multitail - multitail is tail on steroids. multiple
>                windows, filtering, coloring, anything you can think of
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com


Folkert van Heusden

--
Want to finally win in the MegaMillions lottery? www.smartwinning.info
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Reply | Threaded
Open this post in threaded view
|

Re: removing dom nodes

Geoff McLane
In reply to this post by folkert
I see 'Tidy!' as two products.

There is libTidy which is a library of services exposed
through an API, tidy.h...

Then there is the console app 'tidy', and as Marvin correctly points
out "Removing acceptable DOM nodes is not the task of html-tidy.".
That is this console app tidy... it is only a 'cleaner'...

But Folkert found that libTidy does have an internal "remove
node", "remove attribute", ... services, and simply exposed that
capability as an API extension of the library...

This in no way changes what 'tidy' (the console app) does, and
does well...

But extends the use of libTidy for those who link it into their
own app... and would welcome more extensions from anyone
using libTidy... what do you want it to do?

It is a fast html parser, and builds a tree of nodes... that
document tree can be enumerated node by node, and now
you can, if you want, discard an element or an attribute of an
element from that tree... before say writing it to a file/buffer...

Simple... and done...

Regards,
Geoff.


On 15/07/15 12:11, folkert wrote:

> It is in the github version.
>
> You're responding to a mail from before the submit to them.
>
> On Wed, Jul 15, 2015 at 11:54:12AM +0200, Marvin Reimer wrote:
>> I'm sure this is not possible. Look at the source code tree. It is really
>> long ago the last changes got into the old libtidy:
>> http://tidy.cvs.sourceforge.net/viewvc/tidy/tidy/src/
>>
>> You should open a github issue on tidy-html5 and ask them.
>>
>> Marvin
>>
>> 2015-07-15 11:49 GMT+02:00 folkert <[hidden email]>:
>>
>>>> So why are you asking for that feature when you fixed it yourself??
>>> I would like it to be in the main distribution.
>>> That's because the product i'm developing would be using the adapted
>>> libtidy (an open source product).
>>>
>>>> Source link?
>>>> html tidy on sourceforge is the original one (HTML4). It is no longer
>>>> maintained as far as I know.
>>>> https://github.com/htacg/tidy-html5 is the new actual one, forked from
>>> w3c
>>>> and now htacg.
>>> Right. That's the one with my change.
>>>
>>>> I'm not involved in either of one, just a heavy user.
>>>
>>> Folkert van Heusden
>>>
>>> --
>>> MultiTail is een flexibele tool voor het volgen van logfiles en
>>> uitvoer van commando's. Filteren, van kleur voorzien, mergen,
>>> 'diff-view', etc. http://www.vanheusden.com/multitail/
>>> ----------------------------------------------------------------------
>>> Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
>>>
>
> Folkert van Heusden
>