XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

Dan Connolly

Mimasa, Shane,

I'm interested in a form of extensibility where a markup
language designer can make a new my:box element and
say "it's an HTML block element"; then, when a
document containing a my:block element is checked
for syntactic happiness, the checking tool uses
normal HTML schemas until it gets to my:box; then
it looks up my:box in the web, finds that it's
declared to be an HTML block, and find than
an HTML block is allowed here, and carries on happily.

The TAG discussed this in Vancouver in October
http://www.w3.org/2001/tag/2006/10/04-tagmem-minutes#item05
It came up again yesterday in a discussion of RDFa
(in discussion of RDFinXHTML-35) and relates to recent
discussions of TagSoupIntegration-54.
http://www.w3.org/2001/tag/2007/02/12-tagmem-minutes.html#item02
http://www.w3.org/2001/tag/2007/02/05-tagmem-minutes#item04

XML Schema substitution groups are designed for this use case.
Legend has it you tried to use them in XHTML modularization but
it didn't work out or something. We're interested to know the
whole story.

Shane, I understand you have some worked examples of XML Schemas
somewhere in this neighborhood?

When I was working on XML Schema, I convinced myself with some
examples that this sort of modularization works.
  http://www.w3.org/XML/2000/04schema-hacking/


I'm also interested in whether CDF/WICD can/should use substitution
groups.
http://www.w3.org/TR/WICD/


p.s. for reference... public-xml-versioning is a list that comes
out of joint TAG/XML Schema WG discussions of XML versioning.
http://lists.w3.org/Archives/Public/public-xml-versioning/

--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E


Reply | Threaded
Open this post in threaded view
|

Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

Chris Lilley

Hello public-xml-versioning,

Dan wrote:
> I'm interested in a form of extensibility where a markup
> language designer can make a new my:box element and
> say "it's an HTML block element"; then, when a
> document containing a my:block element is checked
> for syntactic happiness, the checking tool uses
> normal HTML schemas until it gets to my:box; then
> it looks up my:box in the web, finds that it's
> declared to be an HTML block, and find than
> an HTML block is allowed here, and carries on happily.

Thats interesting, but it seems to assume a top-down model where
extensions are tightly bound to their expected environment. What if I
want to use my:box inside Timed Text, or inside SVG?

> XML Schema substitution groups are designed for this use case.
> Legend has it you tried to use them in XHTML modularization but
> it didn't work out or something. We're interested to know the
> whole story.

I recall that Mark Birbeck demonstrated jumping through a large number
of hoops (some in the spec, and some in implementations) to show how to
do this in W3C XML Schema.

As a result of that, more groups became interested in using RelaxNG
(nowadays, with Schematron annotations in the RelaxNG and with NVDL to
do the dispatching).

W3C XML Schema is very popular for database-like XML data, but RelaxNG
seems better suited (and much more popular) for document-like XML.

> Shane, I understand you have some worked examples of XML Schemas
> somewhere in this neighborhood?

I can point to some worked examples using RNG (and the same in DTD)

Here
http://www.rddl.org/xhtml-rddl.rng
is the schema for RDDL. It could hardly be simpler.

(RDDL is a language for namespace documents, defined using the W3C
XHTML Modularization).

Of course it much more verbose in DTD language
http://www.rddl.org/rddl-xhtml.dtd
because its hacking around in a less expressive, and namespace unaware,
language.

But its basically doing the same thing - listing a bunch of modules.


> I'm also interested in whether CDF/WICD can/should use substitution
> groups.
> http://www.w3.org/TR/WICD/

They decided to use NVDL and RelaxNG instead.


--
 Chris Lilley                    mailto:[hidden email]
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG


Reply | Threaded
Open this post in threaded view
|

Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

Dan Connolly

On Thu, 2007-02-15 at 18:02 +0100, Chris Lilley wrote:

> Hello public-xml-versioning,
>
> Dan wrote:
> > I'm interested in a form of extensibility where a markup
> > language designer can make a new my:box element and
> > say "it's an HTML block element"; then, when a
> > document containing a my:block element is checked
> > for syntactic happiness, the checking tool uses
> > normal HTML schemas until it gets to my:box; then
> > it looks up my:box in the web, finds that it's
> > declared to be an HTML block, and find than
> > an HTML block is allowed here, and carries on happily.
>
> Thats interesting, but it seems to assume a top-down model where
> extensions are tightly bound to their expected environment. What if I
> want to use my:box inside Timed Text, or inside SVG?

Yes, I'm influenced by the "XML functions" idea that Tim
has advocated in the context of a related issue that I
neglected to mention...
  http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34
  -> http://www.w3.org/DesignIssues/XML

It's largely top-down, i.e. compositional. See
also "4. Elaboration defined: top-down treewalk, signals and namespaces"
in a recent draft by Henry Thompson.
  http://www.w3.org/2001/tag/doc/elabInfoset.html

I expect that the 'HTML block' concept (substitution group?) could
be shared with Timed Text and SVG, though I haven't worked out any
of the details.

[...]
> I can point to some worked examples using RNG (and the same in DTD)
>
> Here
> http://www.rddl.org/xhtml-rddl.rng
> is the schema for RDDL. It could hardly be simpler.

If you have time to elaborate some other piece of the puzzle,
I'd appreciate it.

I'd like to see how that schema is used with other schemas in
a document.

something analagous to one of these two...
http://www.w3.org/XML/2000/04schema-hacking/xhtml-mathml-ex.html
http://www.w3.org/XML/2000/04schema-hacking/comment-test.html

> > I'm also interested in whether CDF/WICD can/should use substitution
> > groups.
> > http://www.w3.org/TR/WICD/
>
> They decided to use NVDL and RelaxNG instead.

Anybody have pointers to more details about that? I don't
see "NVDL" in that particular tech report.

--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E


Reply | Threaded
Open this post in threaded view
|

Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

Chris Lilley

On Thursday, February 15, 2007, 10:22:00 PM, Dan wrote:

DC> On Thu, 2007-02-15 at 18:02 +0100, Chris Lilley wrote:

>> Hello public-xml-versioning,
>>
>> Dan wrote:
>> > I'm interested in a form of extensibility where a markup
>> > language designer can make a new my:box element and
>> > say "it's an HTML block element"; then, when a
>> > document containing a my:block element is checked
>> > for syntactic happiness, the checking tool uses
>> > normal HTML schemas until it gets to my:box; then
>> > it looks up my:box in the web, finds that it's
>> > declared to be an HTML block, and find than
>> > an HTML block is allowed here, and carries on happily.
>>
>> Thats interesting, but it seems to assume a top-down model where
>> extensions are tightly bound to their expected environment. What if I
>> want to use my:box inside Timed Text, or inside SVG?

DC> Yes, I'm influenced by the "XML functions" idea that Tim
DC> has advocated in the context of a related issue that I
DC> neglected to mention...
DC>   http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34
DC>   -> http://www.w3.org/DesignIssues/XML

DC> It's largely top-down, i.e. compositional.

Yes. Which is why I was puzzled recently to hear 'top down' used as a
negative term, and used in opposition to 'Web-like'.


DC>  See
DC> also "4. Elaboration defined: top-down treewalk, signals and namespaces"
DC> in a recent draft by Henry Thompson.
DC>   http://www.w3.org/2001/tag/doc/elabInfoset.html

DC> I expect that the 'HTML block' concept (substitution group?) could
DC> be shared with Timed Text and SVG, though I haven't worked out any
DC> of the details.

I suspect you will find that it works sort of for TT, not at all for
SMIL, and only slightly for SVG.

Its another example of "it seems to work for HTML" != "works for any
generic XML".

I say "seems to" because it doesn't always work for HTML either. Lets
suppose HTML had no style element and no script element and I propose
to add them.

Suppose I tell you that cl:style and cl:script are html:block. Where
does that get you?

DC> [...]
>> I can point to some worked examples using RNG (and the same in DTD)
>>
>> Here
>> http://www.rddl.org/xhtml-rddl.rng
>> is the schema for RDDL. It could hardly be simpler.

DC> If you have time to elaborate some other piece of the puzzle,
DC> I'd appreciate it.

I strongly suggest reading Robin Berjon's thoughts on composing
namespaces and designing schemas:

http://lists.w3.org/Archives/Public/www-archive/2005Sep/att-0014/schema-compounding-and-BP.html

DC> I'd like to see how that schema is used with other schemas in
DC> a document.

DC> something analagous to one of these two...
DC> http://www.w3.org/XML/2000/04schema-hacking/xhtml-mathml-ex.html
DC> http://www.w3.org/XML/2000/04schema-hacking/comment-test.html

>> > I'm also interested in whether CDF/WICD can/should use substitution
>> > groups.
>> > http://www.w3.org/TR/WICD/
>>
>> They decided to use NVDL and RelaxNG instead.

DC> Anybody have pointers to more details about that? I don't
DC> see "NVDL" in that particular tech report.

The first draft of CDI - Compound Documents by Inclusion - has not
been published. The question of composing multi-namespace,
multi-schema documents does not arise if they are only using linking
rather than inclusion.

Or were you asking for more details on NVDL in general?

--
 Chris Lilley                    mailto:[hidden email]
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG


Reply | Threaded
Open this post in threaded view
|

Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

C. M. Sperberg-McQueen
In reply to this post by Dan Connolly

On 13 Feb 2007, at 10:13 , Dan Connolly wrote:

 > Mimasa, Shane,

 > I'm interested in a form of extensibility where a markup language
 > designer can make a new my:box element and say "it's an HTML block
 > element"; then, when a document containing a my:block element is
 > checked for syntactic happiness, the checking tool uses normal HTML
 > schemas until it gets to my:box; then it looks up my:box in the web,
 > finds that it's declared to be an HTML block, and find than an HTML
 > block is allowed here, and carries on happily.

If we assume that my:box is in namespace http://example.com/mine, and
(I have not checked) that HTML has an element (perhaps an abstract
element) named 'box', then one way to do this is with this schema
document, retrievable from the namespace URI (either directly or via
RDDL).

   <schema
       xmlns ="http://www.w3.org/2001/XMLSchema"
       xmlns:my="http://example.com/mine"
       xmlns:html="http://www.w3.org/1999/xhtml"
       targetNamespace="http://example.com/mine" >
     <element name="box" substitutionGroup="html:block"/>
   </schema>

A schema processor processing your enclosing document will see
something like:

   <html  xmlns:my="http://example.com/mine" ...>
    ...
   <div><h3>More details</h3>
   <p>What is really neat about this idea is:</p>
   <my:box>
     IT WORKS
   </my:box>
   <p>And what's more, it was my idea.</p>
   ...
   </html>

and will know that in order to validate 'box' correctly, it's going to
need to find a declaration.  Unless you have instructed it otherwise,
a typical processor will then look for schema components for the
namespace http://example.com/mine.  They might be hard-coded into the
processor -- unlikely in this case.  The user might have told the
processor in advance to load components for that namespace from a
particular URI -- probably more likely, but not what you are
interested in, so we assume that didn't happen.  Or they might be
dereferenceable from the namespace URI.  Since I'm assuming this is
your namespace, and you are keen on making sure things can work using
the follow-your-nose principle, let's assume the schema document
above, or the equivalent, is at the namespace URI or pointed to from a
RDDL document there.  The schema processor reads it, and knows about
my:box.  It knows

   - There's a top-level element in namespace http://example.com/mine
     whose local name is 'box'.
   - That element wants to be allowed to appear wherever html:block
     can appear.
   - Its type is whatever the type of html:box is (you could have
     declared it with a restriction or an extension of that type, but
     in formulating the example you said "It's an HTML box" and nothing
     more; I take you at your word).

Your schema processor can now validate the element.

Without looking at the (X)HTML schema documents, I can't tell you
whether your instance is now valid or not.  It depends on how they are
defined.

Unless the schema author has taken active steps to get in your way,
your document should be valid.  Your schema document, together with
the schema document(s) defining the other namespaces found in your
document, creates a schema in which my:box acts like html:block, in
having the same type and being legal in the same locations.

If the author of the original schema wished to block this kind of
extension, however, there are several ways it could be blocked.  If
the HTML schema you're validating with took the trouble to forbid the
substitution of other elements for html:block, then your instance is
invalid.  If you restrict the type of block, or extend it, and the
schema took the trouble to forbid restriction of the type, or
extension of the type, or to allow the restriction or extension of the
type itself but forbid the substitution of restrictions or extensions
for instances of html:block, then your document is again invalid.

Similarly, if an agent running a validator wishes to block this kind
of extension, in order (for example) to tell whether your instance is
legal against the HTML schema WITHOUT EXTENSIONS, there are some ways
it can be blocked at validation time, too.  In particular, the
validator can be invoked with run-time options specifying "read these
schema documents AND NO OTHERS, and use the schema built from them,
without extensions."  At least, that's possible if the validator
provides user control over how schema documents are looked for.  Some
do, some don't; if you pay money for a validator, make sure it gives
you the kinds of control knobs you want to have.

 > XML Schema substitution groups are designed for this use case.

Yep.

 > Legend has it you tried to use them in XHTML modularization but it
 > didn't work out or something. We're interested to know the whole
 > story.

+1.

But it's probably worth pointing out that you, Dan, can as a user
extend the HTML schema as shown above, whether the HTML schema
documents use substitution groups or not.  (Unless, that is, the
schema author went out of the way to get in your way and close the
schema to this kind of extension.)

Full disclosure: sometimes extensibility as shown above is not quite
what you want.  Several things can go wrong; here are some of them.
(1) You want the my:box element to work just like an html:block, and
also like a MathML blort, and also like an SVG whammo element.

Sorry, in XML Schema 1.0, elements can only point to a single
substitution group head.  Your box element can be substitutable for
html:block, but not also for mathml:blort and svg:whammo.

Of course, if they are described as being substitutable for
html:block, things are slightly better.  If svg:whammo is
substitutable for html:block, then you can write

     <element name="box" substitutionGroup="svg:whammo"/>

and since substitution group membership is transitive (subject to
complex blocking rules which I can't explain and which you will never
encounter outside a markup pathology classroom), my:box is also
substitable for whatever svg:whammo is substitutable for.  If you know
that you'll always use my:box with SVG, then this transitive
membership is fine; otherwise, it is likely to strike you as not
solving your real problem, which is that you want multiple
substitution group affiliations for my:box.

Some people have urged that XML Schema 1.1 allow elements to have
multiple substitution-group heads.  It might happen.  Actually
(speaking for myself, not the WG) I think there's a very good chance
that it WILL happen.  But it hasn't happened yet; if you want it to
happen, tell the XML Schema WG.

(2) You want the my:box element to be substitutable for several
different elements, you use XSD 1.1 to get that functionality, and
when you specify

     <element name="box"
              substitutionGroup="html:block svg:whammo mathml:blort"/>

the processor rejects your schema because there is some point at which
EITHER an html:block element OR an svg:whammo element OR a
mathml:blort element may occur, and the processor can't decide which
part of the content model your my:box element belongs to.

Unfortunately, the user community of XSD 1.0 has not risen up and
demanded that XSD 1.1 eliminate the 'unique particle attribution
constraint' (aka 'UPA', aka the 'deterministic content model' rule,
which XSD took over from XML DTDs, and which XML DTDs inherited from
SGML, and for which no one has ever formulated a persuasive
rationale).  Pretty much the entire community of people interested in
document-oriented XML has said so, but to OO and Web Services people,
the use of XML for documents appears to represent an edge case that's
not worth worrying about.  So complaints about UPA have routinely been
dismissed as unimportant.  (This is perhaps one salient reason that
many schema authors prefer to work in Relax NG, which ditched the
determinism rule years ago.)

So while I expect the next draft of XSD 1.1 to have multiple
substitution-group heads, I don't expect it to have gotten rid of the
UPA constraint.  And some number of people who attempt to exploit
multiple substitution-group heads will find that UPA makes it
impossible to do so.  All I can say is: file bug reports.  Maybe
eventually the responsible WG will be responsive.

(3) You might want my:box to have a fresh, brand-new type of your own
devising, indpendent of and unrelated to the type assigned by the
schema to html:block.  If you do, you may be out of luck.  Some schema
authors will have chosen to design in extensibility points by
defining elements like

   <element name="block" abstract="true" type="anyType"/>

Since any type you can define is substitutable for xsd:anyType, this
kind of declaration gives you maximum freedom.

But other schema authors will have written

   <element name="block"
   type="my:block-type-so-specialized-no-one-else-can-use-it"/>

Since some members of the XSD 1.0 WG were very insistent on it, the
1.0 rule says that any element substitutable for 'block' must have
either the same type as 'block' or a type which is substitutable for
that of 'block'.  This makes element substitution groups feel a little
more like object inheritance classes, and it makes some sense in its
own way.  (If the OO people had been happier with this restriction,
I'd think it made sense.  But as a way of making XSD 1.0 work better
in OO terms, it seems to have had no effect at all.)

These problems do make the substitution groups of XSD 1.0 a little
less beautiful than I wish they were.  But there are a lot of cases
where substitution groups can be used without running into any of
these problems.  And where they work, I think substitution groups work
very nicely and could usefully be a lot more widely exploited.


--C. M. Sperberg-McQueen



Reply | Threaded
Open this post in threaded view
|

Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

C. M. Sperberg-McQueen
In reply to this post by Chris Lilley


On 15 Feb 2007, at 10:02 , Chris Lilley wrote:

> Dan wrote:
>> I'm interested in a form of extensibility where a markup
>> language designer can make a new my:box element and
>> say "it's an HTML block element"; then, when a
>> document containing a my:block element is checked
>> for syntactic happiness, the checking tool uses
>> normal HTML schemas until it gets to my:box; then
>> it looks up my:box in the web, finds that it's
>> declared to be an HTML block, and find than
>> an HTML block is allowed here, and carries on happily.
>
> Thats interesting, but it seems to assume a top-down model where
> extensions are tightly bound to their expected environment. What if I
> want to use my:box inside Timed Text, or inside SVG?

It depends.  If the schemas for those vocabularies have
specified places where any element at all can go, then
my:box can go there.

If they haven't, then I can make my:box substitutable for
an appropriate element in those vocabularies.  (Unfortunately,
in XSD 1.0 I have to choose:  my:box is substitutable for
html:block or for svg:something-or-other, not both, unless
one of those is itself substitutable for the other.)

Any vocabulary defined as open (by having, for example,
an xsd:any wildcard in one or more content models) can
be combined with a vocabulary containing my:box without
trouble.

If I want my:box to be allowed at OTHER locations, e.g. wherever
an html:block can appear, then I have to pick one element as
the substitution-group head of my:box.

You are right, of course, that most grammar-based schema
languages work in a kind of top-down fashion.  In a CFG,
the left hand side is replaced by the right-hand side; in an
XML vocabulary, or any other bracketed grammar, the left-hand
side is kept around as a label for the right-hand side.

Does NVDL not work by traversing the tree, top down, and
saying, at appropriate nodes, "OK, this subtree goes
to that validation tool ..." ?

But substitution groups seem to me to go much more
bottom-up:  the content model which will, under the right
circumstances, accept a my:box element doesn't need to
say anything about my:box -- it's the declaration of my:box
that says "I can go ... over THERE."

--Michael Sperberg-McQueen