Re: host-meta file format comments (draft-nottingham-site-meta-01)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: host-meta file format comments (draft-nottingham-site-meta-01)

Thomas Roessler




(diverting to www-talk, too...)

On 11 Feb 2009, at 01:20, Mark Nottingham wrote:

> Yeah, I'm not completely happy with it yet. The thought was that  
> since blank lines don't introduce ambiguity here, they're not  
> harmful. OTOH one of my goals for the format is to allow existing  
> HTTP header and MIME parsers (e.g., in Python) to be used on the  
> format, and they very well may barf on a blank line.

Well, they'll barf on blank lines and declare the header over;  
changing that within the parser (or just restarting it on the rest of  
the file) should be relatively cheap.

BTW, I notice that this draft is silent on the HTTP header syntax's  
combining feature for multiple occurences of the same field (last  
paragraph of 4.2, RFC 2616); I suspect that to be one of the more  
likely causes for surprises if HTTP header parsers are re-used.  (No  
such risk with MIME parsers.)

Finally, why disallow whitespace stuffed folding?  It's pretty useful  
to make long lines editable, and I suspect that we're assuming /host-
meta to be the product of some human with emacs in their hands. ;-)  
Implementing it is easy, and a given if existing parsers are used.

> So, the right thing to do might be to explicitly disallow them, both  
> in BNF and prose. Eran, thoughts?

I'd just prefer to not have the BNF say "no empty lines", and then  
have prose that says the opposite, but with a SHOULD.

>>> 5. Minting New meta-fields
>>
>>> Applications that wish to mint new meta-fields for use in the  
>>> host- meta format MUST register them in the host-meta field-
>>> registry, following the procedures in Section 7.2. Field-names  
>>> MUST conform to the field-name ABNF Section 3, and field-value  
>>> syntax MUST be well- defined (e.g., using ABNF, or a reference to  
>>> the syntax of an existing header field-value). Field-values SHOULD  
>>> use the ISO-859-1 character encoding. If a field-value applies to  
>>> a scope other than the entire authority, that scope MUST be well-
>>> defined.
>>
>> Editorial nit: ISO-8859-1 is missing an 8 here.
>
> That one always gets me, thanks.
>
>> More substantially, is there any particular reason to not just go  
>> with utf-8 here?  After all, the content type is *appplication*/
>> host-meta anyway.
>
> Same as above; allowing existing parsers and serialisation libraries  
> to be used. That said, there have been many arguments in HTTPbis  
> that existing libraries won't harm non-ASCII characters in transit,  
> but IIRC no one has actually gone out and surveyed what they do...

That suggests that it's a coin toss, unless the mythical "someone"  
does that work.  May I, in that event, suggest that we use a coin  
biased in favor of broader internationalization, i.e., UTF-8?



Reply | Threaded
Open this post in threaded view
|

Re: host-meta file format comments (draft-nottingham-site-meta-01)

Mark Nottingham-4


On 11/02/2009, at 12:05 PM, Thomas Roessler wrote:

> (diverting to www-talk, too...)
>
> On 11 Feb 2009, at 01:20, Mark Nottingham wrote:
>
>> Yeah, I'm not completely happy with it yet. The thought was that  
>> since blank lines don't introduce ambiguity here, they're not  
>> harmful. OTOH one of my goals for the format is to allow existing  
>> HTTP header and MIME parsers (e.g., in Python) to be used on the  
>> format, and they very well may barf on a blank line.
>
> Well, they'll barf on blank lines and declare the header over;  
> changing that within the parser (or just restarting it on the rest  
> of the file) should be relatively cheap.

This assumes that people will be comfortable modifying libraries. IME  
people tend to treat them as magical black boxes that shouldn't be  
opened (or even questioned) under any circumstances...


> BTW, I notice that this draft is silent on the HTTP header syntax's  
> combining feature for multiple occurences of the same field (last  
> paragraph of 4.2, RFC 2616); I suspect that to be one of the more  
> likely causes for surprises if HTTP header parsers are re-used.  (No  
> such risk with MIME parsers.)

I'll add a note.


> Finally, why disallow whitespace stuffed folding?  It's pretty  
> useful to make long lines editable, and I suspect that we're  
> assuming /host-meta to be the product of some human with emacs in  
> their hands. ;-)  Implementing it is easy, and a given if existing  
> parsers are used.

Not necessarily; it's not very widely supported, IME.


>> So, the right thing to do might be to explicitly disallow them,  
>> both in BNF and prose. Eran, thoughts?
>
> I'd just prefer to not have the BNF say "no empty lines", and then  
> have prose that says the opposite, but with a SHOULD.
>
>>>> 5. Minting New meta-fields
>>>
>>>> Applications that wish to mint new meta-fields for use in the  
>>>> host- meta format MUST register them in the host-meta field-
>>>> registry, following the procedures in Section 7.2. Field-names  
>>>> MUST conform to the field-name ABNF Section 3, and field-value  
>>>> syntax MUST be well- defined (e.g., using ABNF, or a reference to  
>>>> the syntax of an existing header field-value). Field-values  
>>>> SHOULD use the ISO-859-1 character encoding. If a field-value  
>>>> applies to a scope other than the entire authority, that scope  
>>>> MUST be well-defined.
>>>
>>> Editorial nit: ISO-8859-1 is missing an 8 here.
>>
>> That one always gets me, thanks.
>>
>>> More substantially, is there any particular reason to not just go  
>>> with utf-8 here?  After all, the content type is *appplication*/
>>> host-meta anyway.
>>
>> Same as above; allowing existing parsers and serialisation  
>> libraries to be used. That said, there have been many arguments in  
>> HTTPbis that existing libraries won't harm non-ASCII characters in  
>> transit, but IIRC no one has actually gone out and surveyed what  
>> they do...
>
> That suggests that it's a coin toss, unless the mythical "someone"  
> does that work.  May I, in that event, suggest that we use a coin  
> biased in favor of broader internationalization, i.e., UTF-8?

Well, the other side of the coin is interoperability, something that  
is also close to our collective hearts.

OTOH we're talking about a SHOULD here. Maybe it just needs more  
careful guidance; i.e., that you should stick to ASCII unless you're  
conveying elements for presentation to end users.


--
Mark Nottingham       [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: host-meta file format comments (draft-nottingham-site-meta-01)

Thomas Roessler

On 11 Feb 2009, at 02:18, Mark Nottingham wrote:

[ASCII vs UTF-8]

> OTOH we're talking about a SHOULD here. Maybe it just needs more  
> careful guidance; i.e., that you should stick to ASCII unless you're  
> conveying elements for presentation to end users.

Well, one point to consider is how you expect IRIs and IRI references  
to be represented.

There's one school of thought (more common in the IETF crowd) that  
says that these should be convereted to ASCII early, and therefore  
shouldn't occur here.

The other school of thought (more common at W3C) says that they're  
fine in the places where XML and other document formats have always  
accepted URIs, and therefore should be representable in this spot.

There are some properties of the direction that the IDNA update effort  
is going into that suggest that the IETF school of thought is less  
likely to cause interoperability problems.

The other question is what the cost of violating this SHOULD is.  
Assume that some people have a really good reason to violate an ASCII  
or ISO-8859-1 SHOULD, and actually go for UTF-8.  You now get mixed  
character sets in a single metadata file.  I'm not sure that's  
desirable...

(BTW, are we just going down the rathole of defining yet another tag-
value format that's subtly different?  Maybe the spec should just say  
"use HTTP header format, but with UTF-8", or "use RFC 822, but with  
UTF-8".)

--
Thomas Roessler, W3C  <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: host-meta file format comments (draft-nottingham-site-meta-01)

Mark Nottingham-4


On 11/02/2009, at 12:28 PM, Thomas Roessler wrote:

> On 11 Feb 2009, at 02:18, Mark Nottingham wrote:
>
> [ASCII vs UTF-8]
>
>> OTOH we're talking about a SHOULD here. Maybe it just needs more  
>> careful guidance; i.e., that you should stick to ASCII unless  
>> you're conveying elements for presentation to end users.
>
> Well, one point to consider is how you expect IRIs and IRI  
> references to be represented.
>
> There's one school of thought (more common in the IETF crowd) that  
> says that these should be convereted to ASCII early, and therefore  
> shouldn't occur here.
>
> The other school of thought (more common at W3C) says that they're  
> fine in the places where XML and other document formats have always  
> accepted URIs

IRIs?


> , and therefore should be representable in this spot.
>
> There are some properties of the direction that the IDNA update  
> effort is going into that suggest that the IETF school of thought is  
> less likely to cause interoperability problems.

That's my experience as well. It's very well to say that IRIs should  
be usable everywhere, but they make things substantially more complex,  
and error-prone. For example, I think it was a mistake for Atom to  
specify the use of IRIs everywhere, including as identifiers for  
relation types. However, that's a discussion that still needs to take  
place, and a different draft...


> The other question is what the cost of violating this SHOULD is.  
> Assume that some people have a really good reason to violate an  
> ASCII or ISO-8859-1 SHOULD, and actually go for UTF-8.  You now get  
> mixed character sets in a single metadata file.  I'm not sure that's  
> desirable...
>
> (BTW, are we just going down the rathole of defining yet another tag-
> value format that's subtly different?  Maybe the spec should just  
> say "use HTTP header format, but with UTF-8", or "use RFC 822, but  
> with UTF-8".)

But that's already a different thing; although arguably HTTP headers  
allow UTF-8 (Roy makes this point regularly and forcefully), the  
impact on existing software isn't clear.

I see two possible paths forward;

1) require ASCII, using encoding where human-viewable content is  
conveyed, or

2) require ASCII, or UTF-8 where human-viewable content is conveyed  
(i.e., only one of those two).

Input?

--
Mark Nottingham



Reply | Threaded
Open this post in threaded view
|

RE: host-meta file format comments (draft-nottingham-site-meta-01)

Eran Hammer-Lahav
In reply to this post by Thomas Roessler

(not sure how my work email got into this thread... but please replace it with this one)

> From: Mark Nottingham [mailto:[hidden email]]
> Sent: Tuesday, February 10, 2009 4:21 PM
>
> On 11/02/2009, at 12:38 AM, Thomas Roessler wrote:
>
> >> As with HTTP headers, field-names are not case-sensitive,
> >> unrecognised field-names SHOULD be silently ignored when parsing
> >> this format, and ordering of fields SHOULD NOT be considered
> >> significant unless specified otherwise. Additionally, although the
> >> syntax does not explicitly allow empty lines between fields,
> >> parsers SHOULD silently discard them (i.e., be permissive in what
> >> they accept). Field content is constrained by the specification
> >> indicated by its associated field-name.
> >
> > What's the cost of just permitting empty lines between fields in the
> > sytnax vs having the current "SHOULD parse"?  The current text
> > sounds like a gratuitous interop problem to me.
>
> Yeah, I'm not completely happy with it yet. The thought was that since
> blank lines don't introduce ambiguity here, they're not harmful. OTOH
> one of my goals for the format is to allow existing HTTP header and
> MIME parsers (e.g., in Python) to be used on the format, and they very
> well may barf on a blank line.
>
> So, the right thing to do might be to explicitly disallow them, both
> in BNF and prose. Eran, thoughts?

I wanted to either allow in both or explicitly disallow in both. Allowing them has the advantage of disabling the ability to stick other payload after the line break. But I see the logic in following the HTTP header structure which was my original inspiration for this general structure. I would say I am neutral with slight lean towards disallowing.

EHL

Reply | Threaded
Open this post in threaded view
|

RE: host-meta file format comments (draft-nottingham-site-meta-01)

Eran Hammer-Lahav
In reply to this post by Thomas Roessler


> -----Original Message-----
> From: Thomas Roessler [mailto:[hidden email]]
> Sent: Tuesday, February 10, 2009 5:06 PM
>
> BTW, I notice that this draft is silent on the HTTP header syntax's
> combining feature for multiple occurences of the same field (last
> paragraph of 4.2, RFC 2616); I suspect that to be one of the more
> likely causes for surprises if HTTP header parsers are re-used.  (No
> such risk with MIME parsers.)

It is somewhat implied since Link header supports it and the Link field uses Link header as-is. My upcoming Link-Pattern field also supports it.

> Finally, why disallow whitespace stuffed folding?  It's pretty useful
> to make long lines editable, and I suspect that we're assuming /host-
> meta to be the product of some human with emacs in their hands. ;-)
> Implementing it is easy, and a given if existing parsers are used.

I would actually like to be able to break long lines... it makes writing spec example much easier.

EHL