We need a EBNF spec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

We need a EBNF spec

Bjoern Hoehrmann

Hi,

  I think W3C should publish a Recommendation or a Group Note defining
the EBNF format "defined" in http://www.w3.org/TR/REC-xml/#sec-notation
and elsewhere. This is needed because the definition in the XML 1.0
Recommendation is incomplete and W3C technical reports define more and
more variants of it for which it is not easy to tell whether they are
different or not.

For example, the XML 1.0 Recommendation does not define whether a symbol
like "Clock-value" may be used; on the right hand side this might be
interpreted as Clock - value, so maybe not, but e.g. SMIL 2.1 uses this
syntax. The result is that some EBNF parsers don't accept the grammars
in SMIL 2.1, which is bad. The lack of good parsers then leads to having
no means to verify grammars in technical reports, so the other errors in
the SMIL 2.1 grammars are even harder to find.

Some technical reports like http://www.w3.org/TR/xpath20/#id-grammar
also modify certain aspects of EBNF and/or include certain parts of the
original EBNF "specification" which makes it even harder to recognize
whether EBNF in one technical report can be processed just like EBNF in
some other specification, you have to study the details first to do
that.

Some technical reports like http://www.w3.org/TR/2005/WD-its-20051122/
and http://www.w3.org/TR/2005/WD-emma-20050916/ then use ::= grammars
without defining the format at all (and in case of EMMA it's not EBNF
as defined in XML 1.0...) and yet other technical reports refer to EBNF
http://www.w3.org/TR/2005/WD-SVGMobile12-20051207/paths.html#PathDataBNF
as defined in XML 1.0 but the grammar does not actually use it, and some
like http://www.w3.org/TR/2005/WD-P3P11-20050701/ don't even use EBNF or
another standard format, but invent new variants of other formats.

Most of this is better though than the usual handwaving reference to
some vague terms to define certain lexical constraints.

I think that a complete stand-alone reference for this format will
encourage more working groups to make use of it rather than no formal
grammar or some other format instead, encourage to make normative
reference to it rather than copy and paste some extended subsets across
multiple technical reports, encourage tool development around EBNF which
will then help to verify the technical reports, which in turn further
encourages making use of it. It will also help me to introduce {min,max}
quantifiers into EBNF.

Writing the specification should be an easy mostly copy'n'paste job.

Thanks,
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Ian Hickson

On Mon, 9 Jan 2006, Bjoern Hoehrmann wrote:
>
> I think W3C should publish a Recommendation or a Group Note defining the
> EBNF format "defined" in http://www.w3.org/TR/REC-xml/#sec-notation and
> elsewhere. This is needed because the definition in the XML 1.0
> Recommendation is incomplete and W3C technical reports define more and
> more variants of it for which it is not easy to tell whether they are
> different or not.

An alternative would be for the W3C to standardise on ISO 14977:1996 or
RFC 2234.

Personally I would discourage the use of BNF, however, as it makes it very
difficult to define error handling rules, and specifications often forget
to define how to go from the parsed tree to the semantics that the
specification defines, leaving it up to UA implementors to work out the
implied mapping.

For example, as far as I can tell, there is nothing in the XML 1.0 spec
that says what the syntax of an XML Declaration (as found in a prolog) is.
One can make a guess, but the spec doesn't say whether we are right. The
reliance on EBNF has made it easier to leave the mapping of the strict
syntax definitions to the actual semantics to implication than to make the
spec full and complete.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Felix Sasaki

+1 for the proposal of Bjoern to standardize the EBNF, and my many thanks  
as a co-author of http://www.w3.org/TR/2005/WD-its-20051122/ for pointing  
out that we need to define the EBNF format in that document. If you want  
to make that point as a comment on the working draft, please tell me or  
just send a mail to [hidden email]

Also an encouragement to use the EBNF. At least for specification whose  
semantics is very closely bounded to a context free grammar with a lot of  
non-terminals (like XPath, XQuery, XML itself, ...), the EBNF helps the  
reader and implementer a lot. Relying mainly on formal grammar was a main  
design principle of XML, see  
http://www.textuality.com/sgml-erb/dd-1996-0001.html (principle 8).

Regards, Felix

On Mon, 09 Jan 2006 16:05:01 +0900, Ian Hickson <[hidden email]> wrote:

>
> On Mon, 9 Jan 2006, Bjoern Hoehrmann wrote:
>>
>> I think W3C should publish a Recommendation or a Group Note defining the
>> EBNF format "defined" in http://www.w3.org/TR/REC-xml/#sec-notation and
>> elsewhere. This is needed because the definition in the XML 1.0
>> Recommendation is incomplete and W3C technical reports define more and
>> more variants of it for which it is not easy to tell whether they are
>> different or not.
>
> An alternative would be for the W3C to standardise on ISO 14977:1996 or
> RFC 2234.
>
> Personally I would discourage the use of BNF, however, as it makes it  
> very
> difficult to define error handling rules, and specifications often forget
> to define how to go from the parsed tree to the semantics that the
> specification defines, leaving it up to UA implementors to work out the
> implied mapping.
>
> For example, as far as I can tell, there is nothing in the XML 1.0 spec
> that says what the syntax of an XML Declaration (as found in a prolog)  
> is.

http://www.w3.org/TR/REC-xml/#NT-XMLDecl does not fulfill your needs?

Regards, Felix.

> One can make a guess, but the spec doesn't say whether we are right. The
> reliance on EBNF has made it easier to leave the mapping of the strict
> syntax definitions to the actual semantics to implication than to make  
> the
> spec full and complete.
>



Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Dominique Hazael-Massieux-2
In reply to this post by Bjoern Hoehrmann
Hi Bjoern,

Le lundi 09 janvier 2006 à 07:46 +0100, Bjoern Hoehrmann a écrit :
> I think that a complete stand-alone reference for this format will
> encourage more working groups to make use of it rather than no formal
> grammar or some other format instead, encourage to make normative
> reference to it rather than copy and paste some extended subsets across
> multiple technical reports, encourage tool development around EBNF which
> will then help to verify the technical reports, which in turn further
> encourages making use of it.

A very interesting suggestion, indeed.

>  It will also help me to introduce {min,max}
> quantifiers into EBNF.
>
> Writing the specification should be an easy mostly copy'n'paste job.

Would you mind draft such a document, then? The QA IG could certainly
publish it as a Note after some review time.

Dom
--
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C/ERCIM
mailto:[hidden email]

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Ian Hickson
In reply to this post by Felix Sasaki

On Mon, 9 Jan 2006, Felix Sasaki wrote:

> >
> > Personally I would discourage the use of BNF, however, as it makes it
> > very difficult to define error handling rules, and specifications
> > often forget to define how to go from the parsed tree to the semantics
> > that the specification defines, leaving it up to UA implementors to
> > work out the implied mapping.
> >
> > For example, as far as I can tell, there is nothing in the XML 1.0
> > spec that says what the syntax of an XML Declaration (as found in a
> > prolog) is.
>
> http://www.w3.org/TR/REC-xml/#NT-XMLDecl does not fulfill your needs?

Nowhere in the prose does it say that the "XMLDecl" production is the XML
Declaration. That is entirely my point.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Dan Connolly
In reply to this post by Ian Hickson

On Mon, 2006-01-09 at 07:05 +0000, Ian Hickson wrote:
[...]
> Personally I would discourage the use of BNF, however, as it makes it very
> difficult to define error handling rules, and specifications often forget
> to define how to go from the parsed tree to the semantics that the
> specification defines, leaving it up to UA implementors to work out the
> implied mapping.

Defining error handling rules is tricky, no doubt. But I wonder why
you say that BNF makes it more so. What do you prefer?


--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E


Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Dan Connolly
In reply to this post by Bjoern Hoehrmann
On Mon, 2006-01-09 at 07:46 +0100, Bjoern Hoehrmann wrote:
> Hi,
>
>   I think W3C should publish a Recommendation or a Group Note defining
> the EBNF format "defined" in http://www.w3.org/TR/REC-xml/#sec-notation
> and elsewhere. This is needed because the definition in the XML 1.0
> Recommendation is incomplete and W3C technical reports define more and
> more variants of it for which it is not easy to tell whether they are
> different or not.

FYI, I have some related implementaiton experience to report.

There's a grammar in the SPARQL spec that follows the XML spec's
notation
  http://www.w3.org/TR/rdf-sparql-query/#grammar

Since important things deserve URIs*, we do readers the favor
of giving them the raw grammar, so they don't have to extract
it manually. (FYI, attached is a little XSLT ditty for
extracting the XML grammar from the XML version of the XML spec).
  http://www.w3.org/TR/rdf-sparql-query/parsers/sparql.bnf

And not only does the whole grammar deserve a URI, but the
symbols in the grammar. So we publish the grammar itself
in RDF/XML (and turtle):

 http://www.w3.org/TR/rdf-sparql-query/parsers/sparql.rdf
 http://www.w3.org/TR/rdf-sparql-query/parsers/sparql.ttl

The RDF vocabulary I used for modelling EBNF is a work-in-progress.
I wrote a little bit about it a couple weeks ago...

 bnf2turtle -- write a turtle version of an EBNF grammar
 http://dig.csail.mit.edu/breadcrumbs/node/85


TimBL used a similar RDF vocabulary to specify N3
 http://www.w3.org/2000/10/swap/grammar/n3-report.html
 http://www.w3.org/2000/10/swap/grammar/n3.n3
 <- http://www.w3.org/DesignIssues/Notation3

We're in discussion about working out the differences between our
grammars.

We're making slow progress on it. Anybody who wants to help it
go a little faster will please contact me.


* http://www.w3.org/TR/webarch/#pr-use-uris


> For example, the XML 1.0 Recommendation does not define whether a symbol
> like "Clock-value" may be used; on the right hand side this might be
> interpreted as Clock - value, so maybe not, but e.g. SMIL 2.1 uses this
> syntax. The result is that some EBNF parsers don't accept the grammars
> in SMIL 2.1, which is bad.

Let's see...
http://www.w3.org/TR/SMIL/smil-timing.html#Timing-TimingAttributeGrammars

Indeed, my code doesn't grok '-'s in symbols. I just used \w+

    elif s[0].isalpha():
        i = re.match("\w+", s).end(0)
        return (('id', s[:i]), s[i:])

>  The lack of good parsers then leads to having
> no means to verify grammars in technical reports, so the other errors in
> the SMIL 2.1 grammars are even harder to find.


>
> Some technical reports like http://www.w3.org/TR/xpath20/#id-grammar
> also modify certain aspects of EBNF and/or include certain parts of the
> original EBNF "specification" which makes it even harder to recognize
> whether EBNF in one technical report can be processed just like EBNF in
> some other specification, you have to study the details first to do
> that.
>
> Some technical reports like http://www.w3.org/TR/2005/WD-its-20051122/
> and http://www.w3.org/TR/2005/WD-emma-20050916/ then use ::= grammars
> without defining the format at all (and in case of EMMA it's not EBNF
> as defined in XML 1.0...) and yet other technical reports refer to EBNF
> http://www.w3.org/TR/2005/WD-SVGMobile12-20051207/paths.html#PathDataBNF
> as defined in XML 1.0 but the grammar does not actually use it, and some
> like http://www.w3.org/TR/2005/WD-P3P11-20050701/ don't even use EBNF or
> another standard format, but invent new variants of other formats.
>
> Most of this is better though than the usual handwaving reference to
> some vague terms to define certain lexical constraints.
>
> I think that a complete stand-alone reference for this format will
> encourage more working groups to make use of it rather than no formal
> grammar or some other format instead, encourage to make normative
> reference to it rather than copy and paste some extended subsets across
> multiple technical reports, encourage tool development around EBNF which
> will then help to verify the technical reports, which in turn further
> encourages making use of it. It will also help me to introduce {min,max}
> quantifiers into EBNF.
>
> Writing the specification should be an easy mostly copy'n'paste job.
Umm... the variance you cite above suggests eactly the opposite;
it suggests that getting consensus on an EBNF spec will be quite
challenging.

> Thanks,
--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

extractGrammar.xsl (770 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

scott_boag

> > Some technical reports like http://www.w3.org/TR/xpath20/#id-grammar
> > also modify certain aspects of EBNF and/or include certain parts of
the
> > original EBNF "specification"

Note that the differences are pretty minor at this point:

    *  All named symbols have a name that begins with an uppercase letter.
 
    *  It adds a notation for referring to productions in external specs.

    *  Comments or extra-grammatical constraints on grammar productions
are between '/*' and '*/' symbols.
          o A 'xgc:' prefix is an extra-grammatical constraint, the
details of which are explained in A.1.2 Extra-grammatical Constraints
          o A 'ws:' prefix explains the whitespace rules for the
production, the details of which are explained in A.2.4 Whitespace Rules
          o A 'gn:' prefix means a 'Grammar Note', and is meant as a
clarification for parsing rules, and is explained in A.1.3 Grammar Notes.
These notes are not normative.

Somewhat relevant to this discussion, the grammar in the XPath/XQuery
specs is defined in
http://www.w3.org/XML/Group/xsl-query-specs/grammar/xpath-grammar.xml, and
semi-modeled in
http://www.w3.org/XML/Group/xsl-query-specs/grammar/grammar.dtd.  This
format was originally invented by James Clark, though I have
evolved/corrupted it over time.  It is used to define multiple subsets and
supersets of the language (xpath 2.0, XQuery, XQuery-fulltext,
XQuery-update, XQuery-Formal-Semantics).  This file goes through an XSLT
preprocess with the spec documents to form the <prod> productions.  We
also process xpath-grammar.xml through XSLT transforms to produce a
functional JavaCC parser (
http://www.w3.org/XML/Group/xsl-query-specs/grammar/parser/applets/xquery-updateApplet.html
, for instance).

This process is likely too heavy handed for the needs of many specs, but
it's worth keeping in mind I think.  I'm not quite sure if and how the
grammar.dtd format might intersect with the work that Dan and Tim have
done, but I would be interested in discussions.

In any case, +1 to making some sort of specification for W3C EBNF, just to
make it easier for future spec writers.

-scott

Reply | Threaded
Open this post in threaded view
|

Re: We need a EBNF spec

Ian Hickson
In reply to this post by Dan Connolly

On Wed, 22 Feb 2006, Dan Connolly wrote:

>
> On Mon, 2006-01-09 at 07:05 +0000, Ian Hickson wrote:
> [...]
> > Personally I would discourage the use of BNF, however, as it makes it very
> > difficult to define error handling rules, and specifications often forget
> > to define how to go from the parsed tree to the semantics that the
> > specification defines, leaving it up to UA implementors to work out the
> > implied mapping.
>
> Defining error handling rules is tricky, no doubt. But I wonder why
> you say that BNF makes it more so. What do you prefer?

Prose.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'