A Proposal for Alignment with HTACG

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

A Proposal for Alignment with HTACG

Jim Derry
Crossposted to
 
This message is addressed to HTML Tidy users, developers, maintainers, and other
interested parties in an effort to spur discussion regarding the present and
future of HTML Tidy, including a proposal for the continued maintenance and
development of HTML Tidy.
 
Simply put, my proposal is that responsibility for the current SourceForge
repository be turned over to HTACG.

The preceding simple statement necessarily involves a large amount of
discussion. This is a big discussion with a lot of text, and some of it will
surely please each of you, and some it will certainly infuriate some of you. I
hope that the "big picture" of what I'm presenting will encourage you to support
the HTACG project and the opportunities it offers.

(I apologize for the Markdown like format, but it's very legible and minimizes
the risk of reference mistakes.)


## What is HTACG

On 2015-January-15 I created the HTML Tidy Advocacy Community Group
([HTACG][4]), a [W3C Community Group][5], of which I am currently serving as
Chair. It "is dedicated to the continued support, development, and evolution of
the HTML Tidy command line application and library."

More specifically, it "aims to become the canonical release group for HTML Tidy,
which has been without a stable, public release since 2008. The Community
aspires to achieve the agreement and support of the original and current
developers to this end."

Certainly the above goals cannot be achieved without the cooperation of the
subscribers to this list.

(The above quotes are from our [official description][5]. Although the current
SourceForge repository is regarded as stable by the developers, the _intention_
of the statement is meant to indicate that there have been no _newer_ releases
or bug fixes).

Although HTACG is affiliated with the W3C, it's important to note that W3C does
not provide direction over HTAGC. The community group belongs to the community.

For additional information please see our [HTACG Project Charter][6].


## Meaning of "turned over to HTACG"

The simple proposal "responsibility for the current SourceForge repository be
turned over to HTACG" means that the current maintainers grant access to the
repository to individuals as specified by HTACG. Certainly the current
maintainers are encouraged to affiliate with [HTACG][5] and take part in this
decision process.

The result, publically, is HTML Tidy becoming a community driven, community led
project. It's even possible that the current maintainers dominate HTACG, and
should this happen then at least:

 - it's a community decision
 - it happens under the auspices of a public-facing organization rather than
   individuals.

Although the decision process for granting access has yet to be
[formally defined][6] it's a high priority for HTACG. In general HTACG members
will reach consensus based on public discussion. This discussion should consider
past and present contributions to HTACG and the HTML Tidy project. Strong regard
should be given to the input of the current Chair or Chairs.


## HTACG Leadership and Succession

As mentioned above I am current Chair. This was done for the sake of expediency
in kicking off HTACG. I do not imagine myself to be the "owner" of HTACG, and
the position of Chair is always available to other HTACG members via the
[Community Group Page][5].

The community should expect and desire turnover in the position of Chair. As
such another work in progress is a formal [succession document][6], which will
make provisions for turning over access to repository membership/ownership,
domain names, and other assets of HTACG.

A stable organization should be able to tolerate 100% turnover while remaining
functional.


## Current State of Tidy

HTACG was formed specifically to fill the need of an interested steward for
HTML Tidy. There have been no bug fixes or improvements to the SourceForge
repository in several years and issues go unresolved. Popular operating systems
ship with `tidy` that's not capable of working with HTML5, and popular software
repositories ship with less than capable versions of `tidy`, too.

Additionally a prominent fork of HTML Tidy hosted by W3C featuring support for
HTML5 had grown stagnant, too, with no commits or addressing of issues for some
years.

In many corners of the Internet there are claims that "Tidy is dead," or "Tidy
is outdated," or "Tidy isn't maintained." These are fair assessments and HTACG
hopes to change both the facts and the perception.

HTACG has successfully [taken responsibility][7] for this aforementioned
prominent W3C fork. Due to a _perceived_ endorsement from [Dave Ragett][8] HTACG
had understood that this fork was the approved, natural successor of the
SourceForge project, and has taken steps with this thought in mind.

Due to incomplete knowledge of some details of HTML Tidy's history we were
unaware of a fracture between the W3C fork and the current SourceForge home. I
sincerely hope that our actions are seen as a sign of motivation and enthusiasm
towards HTML Tidy rather than any attempt to usurp the current project. Indeed
the future depends on current project.


## Why not fork?

Open source encourages forking, and there are successful forks of many popular
pieces of software. MariaDB (né MySQL) is a good example of this. Both MariaDB
and MySQL have large installed user bases and a large developer community.
Smaller projects, such as HTML Tidy, aren't as successful at this.

Although HTML Tidy is pervasive, the current developer community is small and
due to lack of maintenance has fractured into scores of personal, private forks.
A lot of these forkers have made improvements (most good, some bad) with high
value for sharing, but without a leader — a known group or organization — these
changes offer value to no one.

Tidy's past reputation is the best reason not to fork. HTACG intends to see
_Tidy_ thrive, not some offshoot that lacks its history. As distasteful as the
word "branding" is to many of us, Tidy is a brand, and it's a brand that
shouldn't be tarnished by withering away and dying.


## HTACG Actions to Date

To date HTACG has achieved the following:

 - Formed on 2015-January-15 ([initial announcement][10]).
 - Assumed control of the W3C fork. (Yes, we now better understand some of the
   circumstances behind the origin of this fork, and are striving to undo the
   damage that resulted).
 - Have setup a draft Project Charter.
 - Have setup the framework for a self-running, community workgroup (WIP).
 - Have reached out with our desire to work with the original maintainers and to
   ask them (you) to support and join our cause.
 - Have closed all but one current pull request in our working branch.
 - Have closed approximately 30 issues in our working branch.
 - Have moved to a modern semantic versioning system.
 - Have begun a new branding initiative.
 - Have promoted the HTML5 capabilities added by Björn.
 - Have put together an HTACG [filler website][4].
 - Have made steps towards a proper [HTML tidy website][12].

 
## HTACG Tentative Plans

The several subsections below provide high-level details of what HTACG proposes
to do. Our goal is to be community-driven, so some or many of these are likely
to change based on what we collectively decide.

### Branding

"Branding" sounds like MBA nonsense in some people's ears, but branding and
positioning a project are important in order to attract new members to the team
and attract the interest of new developers. Tidy's early reputation was largely
gained through network effects, and while it's possible to leverage a network
effect in the future, Tidy requires a relaunch, and a relaunch requires some
branding.

 - Tidy itself is a brand. It has significant name recognition and is regarded
   as the defacto HTML cleaning tool by a significant userbase even today.

 - W3C is a brand. HTACG's affiliation with W3C as a Community group lends 
   significant credibility to the project without any of the dangers in the
   past. We are now completely aware of the on again, off again relationship
   with W3C. As a Community Group there is no danger of that happening again, as
   the primary affiliation is HTACG. HTACG can exist without the W3C if the
   community decides such.
   
 - HTACG itself is capable of becoming a brand. "Who writes Tidy these days?"
   
 - Modernized websites and graphics. If we don't want to be perceived as an
   artifict from 2002, we can't present the image of an artifact from 2002.
   Certainly this is superficial, but the population at large is superficial
   and we can't ignore image these days. It's no longer good enough to say,
   "If what we provide is good, then people will come."
 
 - Modernized communications channels. Similar to the above, there's a large
   element of the population that expects to subscribe to a Twitter feed.
   
In short, a project that _looks_ alive will attract the attention and support
that Tidy needs in order to _stay_ alive.


### Community Resources


#### Repositories

The current, true HTML Tidy is currently hosted at [SourceForge][9], while the
branch inherited by HTACG from the W3C is working out of [GitHub][7].

While CVS and git both have their advantages and disadvantages, I propose that
in the interest of community development, combined with responsible maintainers,
we adopt Github as the official working repository.

If desired we should consider maintaining a mirror of the respository on
SourceForge. Although this subjects us to additional administrative burden,
HTML Tidy has a long history on SourceForge and for many users it is still the
go-to destination for anything Tidy-related.

A mirror also affords an opportunity for the original maintainers to separate
from HTACG if they should determine that they are not satisfied with the
progress that HTACG is promising.


#### Issues Trackers

With the assumption that we work from Github, we should close the issues tracker
at SourceForge after migrating the issues to Github.


#### Websites

We should combine the existing websites. I have procured the domains htacg.org
and html-tidy.org, and they can be pointed to any arbitrary host. (Please note
that these domains will be surrendered to an appropriate, proper person in line
with our work-in-progress [succession plan][6].)

In consideration for the "branding" issues already described, the cohesive,
single website will be in need of an upgrade.

My proposal includes using Github hosting for these websites. Just as for
software projects, this provides the ability for HTACG members and the general
public to issue pull requests and post issues.


#### Mailing Lists

Github does not offer mailing list support. This still leaves us with three
main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and
[W3 Tidy][3]), which will be burdensome to monitor and support.

I will make the suggestion that we move to the set of HTACG mailing lists.

 - As my suggestion is to move towards Github and adding distance from
   SourceForge, it is natural not to favor SourceForge's mailing list.
   
 - The orginal W3 mailing list has a long history, however in that some members
   have expressed disappointment in W3C's previous behaviors, perhaps it is
   good to distance ourselves.
   
 - The HTACG list is _also_ hosted at W3C, however we have more control over it,
   and it provides relevancy to HTACG as an organization.
   
Clearly we as members must be prepared to monitor all of the existing mailing
lists during a transition period.


### Transparency and Working Documents

While debate about specific issues and implementations is suitable for issue
tracker threads, broader discussion towards strategy, leadership, working
documents, standards, etc. should be relegated to the appropriate public mailing
list which provides HTACG members and non-members the ability to provide
feedback.

HTACG currently supports a set of working documents — many of which are
generously called "work in progress" — in our [community respository][6]. As
a github repository these very same working documents are subject to community
comment and modification via pull requests.

It is HTACG's intention (abusing the oft-repeated ISO phrase) "to say what we
do and do what we say."

Current (generously-called) works-in-progress include:

 - Project Charter (the high level principles for HTACG)
 - Contributor agreement (so we aren't burdened by proprietary licenses)
 - Chair succession plan (so no one person can hold HTACG hostage)
 - Guidelines for providing commit access (whom do we trust?)
 - Guidelines for design criteria (code style, compiler specifications, etc.)
 - Guidelines for release criteria (when do we roll to "master"?)
 - Guidelines and instructions for regression testing.
 - Policy for accepting pull requests (for contributors and maintainers).
 - Roadmap, including a description of Tidy's versioning (where do we go?)


### Relaunch Branch

A lot of development has been based on the branch derived from Björn Höhrmann's
original patch for HTML5 and then taken by W3C. Although there may be some
design decisions that the current maintainers disagree with, the code is much
more updated and several important contributions have been added based upon
Björn's work.

Therefore I suggest:

 - We start with the current HTACG develop-500 branch.
 
 - We run regression tests for all of the < HTML5 test cases. Successful
   tests (or bug fixes) should satisfy everyone that HTACG Tidy is nominally
   at the same level as SourceForge Tidy.
   
 - All HTACG members are requested to review the code and test cases for the
   new HTML5 functionality, and issues can be posted to the issue tracker if
   they are technical in nature, or posted to the mailing list if they are more
   strategic or fundamental in nature.
   

### Revision Control History

Contributor history is an important aspect of FOSS software development, and
every effort to recognize contributors should be made.

Github offers an automatic version control history that records the individual
who made a push, who accepted a pull request, and who originated a pull request.

The current development branch at Github did not adequately record the commit
history when it was first forked from SourceForge. However due to the nature of
git, it seems that it might be possible to pull the SourceForge source while
maintaining its history, and then merge the current branch atop it while
maintaining the entire release history.


### Tidy History

The purpose of HTACG is, among other things, to keep HTML Tidy alive and well,
and that includes honoring its past. HTACG will ensure that all previous
contributors, maintainers, and participants are prominently recognized on its
websites using material sourced from SourceForge and Dave Ragett's W3C page.


## Summary

As you can see, in the 22 days since establishing HTACG, a lot of thought and
effort have been put into promoting and maintaining HTML Tidy. While it's true
that there is still a lot of work to be done, the framework for good governance
and stewardship has been put into place.

I hope that subscribers to this list can recognize that Tidy needs help in order
to remain relevant, and can grant support for this proposal or a modified form
of this proposal.

Thank you for the significant amount of time you have invested in reading this.


* * * 
 
 References:
  

--
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC
Reply | Threaded
Open this post in threaded view
|

Re: A Proposal for Alignment with HTACG

Alice Wonder


On 02/02/2015 11:48 PM, Jim Derry wrote:
*snip*
>
> In many corners of the Internet there are claims that "Tidy is dead," or
> "Tidy
> is outdated," or "Tidy isn't maintained." These are fair assessments and
> HTACG
> hopes to change both the facts and the perception.

I have expressed those concerns on Tumblr.

It has been rather frustrating to me, I use tidy to clean up user input
before import into a php libxml2/DOMDocument xml DOM node and the
stagnant state of libtidy that ships in RHEL is really frustrating.

I have to manually go through the imported node checking every attribute
etc. for things that tidy should catch but doesn't because of the
stagnant state.

I am really glad to see signs that something is being done about this.


>
> Tidy's past reputation is the best reason not to fork. HTACG intends to see
> _Tidy_ thrive, not some offshoot that lacks its history. As distasteful
> as the
> word "branding" is to many of us, Tidy is a brand, and it's a brand that
> shouldn't be tarnished by withering away and dying.

Tidy should not be forked if at all possible because of libtidy. A lot
of third party tools (like php) have bindings to libtidy and a fork
could be problematic in that respect and in my opinion, avoided if at
all possible.


>
>   - Tidy itself is a brand. It has significant name recognition and is
> regarded
>     as the defacto HTML cleaning tool by a significant userbase even today.

Yes, even in its stale state, it is the best there is, at least that
exists in the FLOSS world.


> in the interest of community development, combined with responsible
> maintainers,
> we adopt Github as the official working repository.

I am 100% behind this. git is amazing and github has been a phenomenal
resource. With CVS / SVN I always ended up setting up my own server
which was a pita to maintain but since I started using github I haven't
ever felt a need to run a git server myself, it just works and works
everywhere and works well.

>
> If desired we should consider maintaining a mirror of the respository on
> SourceForge. Although this subjects us to additional administrative burden,
> HTML Tidy has a long history on SourceForge and for many users it is
> still the
> go-to destination for anything Tidy-related.

sourceforge isn't what they once were, I don't see the point.

>
> A mirror also affords an opportunity for the original maintainers to
> separate
> from HTACG if they should determine that they are not satisfied with the
> progress that HTACG is promising.

original maintainers can fork github at any point they want from any
revision they want, I don't think maintaining a sourceforge mirror for
that purpose makes sense.

I'm not opposed to sourceforge, I just don't see the point.

>
> My proposal includes using Github hosting for these websites. Just as for
> software projects, this provides the ability for HTACG members and the
> general
> public to issue pull requests and post issues.
>

Does W3C have hosting that can be used?

>
> #### Mailing Lists
>
> Github does not offer mailing list support. This still leaves us with three
> main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and
> [W3 Tidy][3]), which will be burdensome to monitor and support.
>
> I will make the suggestion that we move to the set of HTACG mailing lists.
>
>   - As my suggestion is to move towards Github and adding distance from
>     SourceForge, it is natural not to favor SourceForge's mailing list.
>   - The orginal W3 mailing list has a long history, however in that some
> members
>     have expressed disappointment in W3C's previous behaviors, perhaps it is
>     good to distance ourselves.
>   - The HTACG list is _also_ hosted at W3C, however we have more control
> over it,
>     and it provides relevancy to HTACG as an organization.
> Clearly we as members must be prepared to monitor all of the existing
> mailing
> lists during a transition period.

HTACG mailing list would I think be the best one, I'm not on it (yet)
but if tidy is getting a fresh injection of activity, I think that would
be good. In my opinion.

For whatever my opinion is worth.

--
-=-
Sent my from my laptop, may not be able to respond timely

Reply | Threaded
Open this post in threaded view
|

Re: A Proposal for Alignment with HTACG

Richard A. O'Keefe
In reply to this post by Jim Derry
I use Tidy a lot and have been very pleased with it.
However, I've not being using HTML5 much yet, and
HTML5 has changed the rules to the point where it looks
to me as if a good SGML parser is no longer an alternative
to Tidy.

So I think it's *brilliant* altogether if Tidy gets updated.

>  - Have setup a draft Project Charter.

"to set up" is a phrasal verb (http://en.wikipedia.org/wiki/Phrasal_verb),
which means that the two words should be written as two words.
("setup" is the derived phrasal _noun_.)

>  - Have closed approximately 30 issues in our working branch.

You heroes.

Is there any prospect of "official" Tidy plugins for Brackets and WebStorms?
I got hopelessly lost trying to understand the code of Brackets, or I'd
offer to do it.

> Github does not offer mailing list support. This still leaves us with three
> main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and
> [W3 Tidy][3]), which will be burdensome to monitor and support.
>
> I will make the suggestion that we move to the set of HTACG mailing lists.

I have been subscribed to [hidden email] for a long time.
My experience with such shifts is that somehow my subscription _always_
gets lost.  If subscriptions can be moved over *automatically* with no
losses and no subscriber action required, fine.  If not, you *WILL*
lose people.
>    
>  - The orginal W3 mailing list has a long history, however in that some members
>    have expressed disappointment in W3C's previous behaviors, perhaps it is
>    good to distance ourselves.

Sorry, I don't buy that.  Talk about distancing yourselves from the
organisation with custody over the very format that Tidy is *about* makes
no sense to me.  W3?  Anyone working with Web stuff knows what that is.
HTACG?  What's that?  The W3C may have a poor history of keeping Tidy up to
date, but at this stage it's a *better* brand than HTACG because it *has* a
history behind it.  It is only the W3 brand that will keep HTACG from looking
like just another fork.


Reply | Threaded
Open this post in threaded view
|

Re: A Proposal for Alignment with HTACG

Jim Derry
In reply to this post by Alice Wonder
On Tue, Feb 3, 2015 at 4:43 PM, Alice Wonder <[hidden email]> wrote:
> original maintainers can fork github at any point they want from any revision they want, I don't think maintaining a sourceforge mirror for that purpose makes sense.

There was some concern expressed privately that they might not like the direction we take things, and so this gesture is an olive branch to the current maintainers. They have a lot at stake, and although activity has been low, Tidy is still near and dear to their hearts. If we can get by without the SF repository, so much the better.

>> My proposal includes using Github hosting for these websites. Just as for
>> software projects, this provides the ability for HTACG members and the
>> general
>> public to issue pull requests and post issues.
>>
>
> Does W3C have hosting that can be used?

W3C has a repository on github. The current HTACG repository was actually transferred from there, and it currently redirects to our repository:
    

My understanding is that community groups such as our can request to move to the w3c/ repository. We haven't done such as yet but that certainly would in line with our goals.

As for standard hosting, the W3C provides a Wordpress-based portal for the group: http://www.w3.org/community/htacg/ -- However it's very basic and we've not done much with it yet.


--
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC
Reply | Threaded
Open this post in threaded view
|

Re: A Proposal for Alignment with HTACG

Jim Derry
In reply to this post by Richard A. O'Keefe
On Wed, Feb 4, 2015 at 8:46 AM, Richard A. O'Keefe <[hidden email]> wrote:

> >  - Have setup a draft Project Charter.
>
> "to set up" is a phrasal verb (http://en.wikipedia.org/wiki/Phrasal_verb),
> which means that the two words should be written as two words.
> ("setup" is the derived phrasal _noun_.)

As someone who is constantly irritated by such things, that's an embarrassing typo.


> Is there any prospect of "official" Tidy plugins for Brackets and WebStorms?
> I got hopelessly lost trying to understand the code of Brackets, or I'd
> offer to do it.

There's a _prospect_ for anything. Presently the team is focused on trying to release a stable 5.0.0 of HTML Tidy with a single build system. However there is a lot of interest in other build systems and IDE templates, perl and php bindings, text editor integration, and so on. These are _all_ certainly things that HTACG could be interested in helping to maintain/present/develop as the community grows. I don't know if anyone on the small team right now is familiar with the Brackets and WebStorms, so I cannot make any promises. 

> I have been subscribed to [hidden email] for a long time.
> My experience with such shifts is that somehow my subscription _always_
> gets lost.  If subscriptions can be moved over *automatically* with no
> losses and no subscriber action required, fine.  If not, you *WILL*
> lose people.

This point is well taken. If it were only my personal decision I suppose I would prefer to use the [hidden email] list, too. In the interest of making a proposal that would be palatable to the current maintainers my actual suggestion was to distance ourselves from this list.

> >  - The [original] W3 mailing list has a long history, however in that some members
> >    have expressed disappointment in W3C's previous behaviors, perhaps it is
> >    good to distance ourselves.
>
> Sorry, I don't buy that.  Talk about distancing yourselves from the
> organisation with custody over the very format that Tidy is *about* makes
> no sense to me.  W3?  Anyone working with Web stuff knows what that is.

Yes, and that's why HTACG is a W3C Community Group. HTACG is counting on the current maintainers' cooperation, and there is some history between HTML Tidy's current maintainers and the W3C that has caused friction in the past. As I mentioned in an earlier reply to Alice on the list, my own personal preference is to use the current list, however the proposal is geared towards acceptability to the current maintainers.


> HTACG?  What's that?  The W3C may have a poor history of keeping Tidy up to
> date, but at this stage it's a *better* brand than HTACG because it *has* a
> history behind it.  It is only the W3 brand that will keep HTACG from looking
> like just another fork.

HTACG is a convenient label to represent a community group that is "working with the W3C," but isn't the W3C. We cannot claim to be the W3C, and the W3C has long ago expressed its disinterest in maintaining Tidy. However as you rightly point out, it's currently *only* this W3C affiliation (including the repository redirect) that gives HTACG any credibility whatsoever. Given how simple it is to create a community group, it's even arguable that the credibility is minimal, which makes it critical for our goals to work with the current SourceForge maintainers.


--
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC