Need help with TIDY Configuration File

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Need help with TIDY Configuration File

Nilesh Chavan

Hello,

 

I’m using tidy for creating a wellformed HTML output from a loosely organized HTML file. The HTML files has many closing tags missing. Here’s my sample HTML i/p:

 

HTML I/P

<p class="0">A

<p class="1"><em class="bf">ACCOUNTING BASIS</em>

<p class="2">Taxation, <cite class="section">3.3.3

<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite class="section">3.3.3

<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em>

<p class="2">Taxation, <cite class="section">3.3.5

<p class="1"><em class="bf">ANCILLARY SERVICES</em>

<p class="2">Reimbursement

<p class="3">Payment methodology

<p class="4">Covered ancillary services, <cite class="section">5.1.2.2

<p class="1"><em class="bf">ANESTHESIOLOGY</em>

<p class="2">Anti-kickback statute

<p class="3">Case law and other guidance, <cite class="section">2.4.6.4

 

I’ve defined following parameters in tidy.config file:

 

Config File:

 

add-xml-decl:true

#output-xhtml:true

doctype:omit

hide-comments:yes

preserve-entities:yes

uppercase-tags:0

# DO NOT specify input encoding here unless it never,ever changes.

output-encoding:utf8

word-2000:false

# bare: replaces nbsps with regular spaces as a side-effect

# these nbsps are needed for clues so bare should be left false.

bare:true

enclose-text:yes

numeric-entities:yes

# clean: strips surplus tags from ms word originating docs.

# clean consolidates similar styles and uses references to them.

# trades document size for ease of parsing it -- leave this false.

clean:true

hide-comments:true

# wrap: zero if you want to disable line wrapping

wrap:0

# quote-nbsp: output non-breaking space characters as entities

quote-nbsp:false

show-warnings:false

#

 

 

My O/p looks like this:

 

<p class="0">A</p>

<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>

<p class="2">Taxation, <cite class="section">3.3.3</cite></p>

<p class="1"><cite class="section"><em class="bf">ACCRUAL BASIS

ACCOUNTING,</em> <cite class="section">3.3.3</cite></cite></p>

<p class="1"><cite class="section"><em class="bf">AFFILIATED

SERVICES GROUPS</em></cite></p>

<p class="2"><cite class="section">Taxation, <cite class=

"section">3.3.5</cite></cite></p>

<p class="1"><cite class="section"><em class="bf">ANCILLARY

SERVICES</em></cite></p>

<p class="2"><cite class="section">Reimbursement</cite></p>

<p class="3"><cite class="section">Payment methodology</cite></p>

<p class="4"><cite class="section">Covered ancillary services,

<cite class="section">5.1.2.2</cite></cite></p>

<p class="1"><cite class="section"><em class="bf">ANESTHESIOLOGY</em></cite></p>

<p class="2"><cite class="section">Anti-kickback statute</cite></p>

<p class="3"><cite class="section">Case law and other guidance,

<cite class="section">2.4.6.4</cite></cite></p>

 

You can see the unwanted <cite> tags getting added in the data.

 

I want the o/p to appear as follows:

 

Required O/p:

 

<p class="0">A</p>

<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>

<p class="2">Taxation, <cite class="section">3.3.3</cite></p>

<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite class="section">3.3.3</cite></p>

<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em></p>

<p class="2">Taxation, <cite class="section">3.3.5</cite></cite></p>

<p class="1"><em class="bf">ANCILLARYSERVICES</em></p>

<p class="2">Reimbursement</p>

<p class="3">Payment methodology</p>

<p class="4">Covered ancillary services,<cite class="section">5.1.2.2</cite></p>

<p class="1"><em class="bf">ANESTHESIOLOGY</em></p>

<p class="2”>Anti-kickback statute</p>

<p class="3">Case law and other guidance,<cite class="section">2.4.6.4</cite></p>

 

Please advise the changes in the config file to get the above required o/p. Thanks!!

 

Thanks in advance for your help!!

 

 

Regards,

Nilesh Chavan.

Cell:   +1 (937) 301 0575