reviving the file URI scheme

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

reviving the file URI scheme

Matthew Kerwin
Hi, some of you may have seen that about six months ago I somewhat naively created an ID to resurrect the 'file' URI scheme.  In the intervening months I've spent a bit of time lurking on IETF and W3C mailing lists familiarising myself with the standardisation process, and studying up on how people are using and supporting file URIs, and updating the ID.  The latest version, 09, was published yesterday: <http://tools.ietf.org/html/draft-kerwin-file-scheme-09>

What I would really like is your opinions as experts, whether you think it's a worthwhile effort, or if my approach is suitable, or any specific issues (technical or editorial) with the ID itself.

An alternative approach I've considered is creating an Informational RFC that "deobsoletes" parts of RFC 1738, since it's a bit unclear whether4248 (telnet) and 4266 (gopher) obsolete *all* of it, or just those scheme definitions.  If you think that would be a better (or worse, or silly) approach, I'd also like to hear so..

Cheers
--
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Mike Brown-3
Matthew Kerwin wrote:

> Hi, some of you may have seen that about six months ago I somewhat naively
> created an ID to resurrect the 'file' URI scheme.  In the intervening
> months I've spent a bit of time lurking on IETF and W3C mailing lists
> familiarising myself with the standardisation process, and studying up on
> how people are using and supporting file URIs, and updating the ID.  The
> latest version, 09, was published yesterday: <
> http://tools.ietf.org/html/draft-kerwin-file-scheme-09>
>
> What I would really like is your opinions as experts, whether you think
> it's a worthwhile effort, or if my approach is suitable, or any specific
> issues (technical or editorial) with the ID itself.

The last go-'round on this went nowhere, so I wish you luck. It's good to see
someone putting in some effort on this again.

Maybe you already saw it, but back in 2005-2006, I thought maybe a
MediaWiki-based wiki would work better than the discussion list as a sort-of
shared whiteboard, but it never took off:

  https://offset.skew.org/wiki/URI/File_scheme

I think Larry Masinter is the only one besides me who ever used it. I'm happy
to keep the wiki going, though, and can set up accounts for whoever wants
them. Or it can just stay up as a historical reference; no problem.

This really isn't much help, and it's pretty typical of me to be distracted by
something so minor, but the first thing I'd change in your draft isn't
technical, but rather just a pet peeve of the copy editor in me:

Unless you're writing a math tutorial, you should avoid repeatedly telling the
reader to "note" things, or that things should be noted, or any other explicit
ways of highlighting content or directing the reader's attention. I feel you
can do it one time, but that's it...so make it a good one...and that's if
you're OK with addressing or referring to the reader at all, rather than just
stating facts and giving examples.

So, generally speaking, if something is too important for the reader to
overlook, maybe it should be mentioned sooner, or prefaced with some
background info that will indicate why it's important (without actually saying
"this is important", of course).

Another copy edit to consider is deciding whether "filesystem" is a word, or
if it's best written with as the more formal "file system", and be consistent
about it.

As for the actual content, the question that comes to my mind when reading the
introduction is "does this document fulfill its goals?" So, does it
acknowledge specific interoperability issues and then demonstrate how they can
be resolved? Does it acknowledge specific syntax disagreements on the major
file systems and then provide a syntax that works for each of them? Does it
define an interoperable scheme parseable in the same way as other URIs, and
does it make it clear that this is a change from what we had before? The
answers should be easy to find.

The intro also says "Because that document has been made obsolete, this
document copies the 'file' URI scheme from it to allow that material to remain
on standards track." That was Paul Hoffman's text, and reflected his approach
of not actually changing anything at first. I think we kinda shot that down as
not being a good use of a new RFC, so I think this paragraph should be updated
to indicate that you're not just moving relatively useless material from RFC
1738 into a new RFC; you're building on it and making it into something
beneficial.

I think a concern that was raised before is that we need to acknowledge what
implementations are doing in the real world right now, and not break what
works. If the major 'file' URI consumers and producers all behave a certain
way which doesn't quite comport with the new draft's principles, then saying
that they have to all change in the interest of tidiness and conformance to a
new spec might be an uphill battle.

It looks like maybe you're aware of this; you expressly avoid prescribing how
to handle relative file paths. I'm just having trouble reconciling that with
the stated goals from the introduction.

For example, if an RFC 3986-compliant URI processor encounters a relative URI
reference with a base URI that includes a Windows drive letter as the first
path segment, it's at risk of obliterating or overwriting the drive letter
when it resolves the reference to absolute form. This is why the drive letter
sometimes is shoved into the authority component. If it's not OK to produce a
URI with a drivespec as authority (and it's discouraged by RFC 3986 sec. 3.2.3
as well), then how does leaving it up to the implementation to decide how to
handle it improve interoperability or compatibility with standard URI
processing tools?

Same issue with UNC paths...I think people expect the hostname to be 'sticky'.

Maybe I'm overthinking it or just haven't spent enough time with it, though. I
haven't actually touched this stuff in years, so take what I say with a huge
grain of salt. On that note, I'll bow out of the discussion now, unless you
have questions for me.

Mike

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Bjoern Hoehrmann
In reply to this post by Matthew Kerwin
* Matthew Kerwin wrote:

>Hi, some of you may have seen that about six months ago I somewhat naively
>created an ID to resurrect the 'file' URI scheme.  In the intervening
>months I've spent a bit of time lurking on IETF and W3C mailing lists
>familiarising myself with the standardisation process, and studying up on
>how people are using and supporting file URIs, and updating the ID.  The
>latest version, 09, was published yesterday: <
>http://tools.ietf.org/html/draft-kerwin-file-scheme-09>
>
>What I would really like is your opinions as experts, whether you think
>it's a worthwhile effort, or if my approach is suitable, or any specific
>issues (technical or editorial) with the ID itself.

I think it is worthwhile and, after skimming the document, your approach
seems suitable. I will probably be available to review the document once
you consider it ready for Last Call.

>An alternative approach I've considered is creating an Informational RFC
>that "deobsoletes" parts of RFC 1738, since it's a bit unclear whether4248
>(telnet) and 4266 (gopher) obsolete *all* of it, or just those scheme
>definitions.  If you think that would be a better (or worse, or silly)
>approach, I'd also like to hear so.

That sounds worse to me.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Marcos Caceres-4
In reply to this post by Matthew Kerwin


On Thursday, December 12, 2013 at 5:46 PM, Matthew Kerwin wrote:

> Hi, some of you may have seen that about six months ago I somewhat naively created an ID to resurrect the 'file' URI scheme. In the intervening months I've spent a bit of time lurking on IETF and W3C mailing lists familiarising myself with the standardisation process, and studying up on how people are using and supporting file URIs, and updating the ID. The latest version, 09, was published yesterday: <http://tools.ietf.org/html/draft-kerwin-file-scheme-09>
>
> What I would really like is your opinions as experts, whether you think it's a worthwhile effort, or if my approach is suitable, or any specific issues (technical or editorial) with the ID itself.
>
> An alternative approach I've considered is creating an Informational RFC that "deobsoletes" parts of RFC 1738, since it's a bit unclear whether4248 (telnet) and 4266 (gopher) obsolete *all* of it, or just those scheme definitions. If you think that would be a better (or worse, or silly) approach, I'd also like to hear so..
>


You might also want to take a look at:
http://url.spec.whatwg.org/

IIRC, it tries to standardize the behavior of file:// across browsers also.



Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

John Cowan-3
Marcos Caceres scripsit:

> http://url.spec.whatwg.org/
>
> IIRC, it tries to standardize the behavior of file:// across browsers also.

It's in his bibliography.  But like most (all?) WHATWG products, it is a
reference implementation, not a standard.

--
When I'm stuck in something boring              John Cowan
where reading would be impossible or            (who loves Asimov too)
rude, I often set up math problems for          [hidden email]
myself and solve them as a way to pass          http://www.ccil.org/~cowan
the time.      --John Jenkins

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Marcos Caceres-4


On Friday, December 13, 2013 at 12:43 AM, John Cowan wrote:

> It's in his bibliography. But like most (all?) WHATWG products, it is a
> reference implementation, not a standard.

I think you might be confused: a browser is a reference implementation (in that you can reference it as attempting to implement a standard); a standard is a technical specification that has multiple implementations and is overseen by a standardization authority (in this case, the WHATWG).  
 
--
Marcos Caceres




Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

John Cowan-3
Marcos Caceres scripsit:

>
>
> On Friday, December 13, 2013 at 12:43 AM, John Cowan wrote:
>
> > It's in his bibliography. But like most (all?) WHATWG products, it is a
> > reference implementation, not a standard.
>
> I think you might be confused: a browser is a reference implementation
> (in that you can reference it as attempting to implement a standard); a
> standard is a technical specification that has multiple implementations
> and is overseen by a standardization authority (in this case, the
> WHATWG).

A reference implementation is an implementation that itself constitutes the
standard; if you want to know what the standard prescribes, you fire up
the implementation and try it.  WHATWG standards are written in code
(it would be perfectly feasible to write a compiler for it), and that's
why they are reference implementations.

--
Work hard,                                      John Cowan
play hard,                                      [hidden email]
die young,                                      http://www.ccil.org/~cowan
rot quickly.

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Marcos Caceres-4



On Friday, December 13, 2013 at 12:46 PM, John Cowan wrote:

> Marcos Caceres scripsit:
> >  
> >  
> > On Friday, December 13, 2013 at 12:43 AM, John Cowan wrote:
> >  
> > > It's in his bibliography. But like most (all?) WHATWG products, it is a
> > > reference implementation, not a standard.
> >  
> >  
> >  
> > I think you might be confused: a browser is a reference implementation
> > (in that you can reference it as attempting to implement a standard); a
> > standard is a technical specification that has multiple implementations
> > and is overseen by a standardization authority (in this case, the
> > WHATWG).
>  
>  
>  
> A reference implementation is an implementation that itself constitutes the
> standard; if you want to know what the standard prescribes, you fire up
> the implementation and try it. WHATWG standards are written in code
> (it would be perfectly feasible to write a compiler for it), and that's
> why they are reference implementations.
>  

I don’t understand what you mean by they are written in code?  



Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Julian Reschke
On 2013-12-13 03:48, Marcos Caceres wrote:

>
>
>
> On Friday, December 13, 2013 at 12:46 PM, John Cowan wrote:
>
>> Marcos Caceres scripsit:
>>>
>>>
>>> On Friday, December 13, 2013 at 12:43 AM, John Cowan wrote:
>>>
>>>> It's in his bibliography. But like most (all?) WHATWG products, it is a
>>>> reference implementation, not a standard.
>>>
>>>
>>>
>>> I think you might be confused: a browser is a reference implementation
>>> (in that you can reference it as attempting to implement a standard); a
>>> standard is a technical specification that has multiple implementations
>>> and is overseen by a standardization authority (in this case, the
>>> WHATWG).
>>
>>
>>
>> A reference implementation is an implementation that itself constitutes the
>> standard; if you want to know what the standard prescribes, you fire up
>> the implementation and try it. WHATWG standards are written in code
>> (it would be perfectly feasible to write a compiler for it), and that's
>> why they are reference implementations.
>>
>
> I don’t understand what you mean by they are written in code?

They are written in pseudo-code written in English (at least this one).

This might be very useful for people writing implementation code, but
it's not so helpful for people *using* the feature (like authoring file
URIs), or people trying to understand why something works the way it works.

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Executable specifications (was: Re: reviving the file URI scheme)

Martin J. Dürst
In reply to this post by Marcos Caceres-4
On 2013/12/13 11:48, Marcos Caceres wrote:

> On Friday, December 13, 2013 at 12:46 PM, John Cowan wrote:
>
>> Marcos Caceres scripsit:

>>> On Friday, December 13, 2013 at 12:43 AM, John Cowan wrote:
>>>
>>>> It's in his bibliography. But like most (all?) WHATWG products, it is a
>>>> reference implementation, not a standard.

>>> I think you might be confused: a browser is a reference implementation
>>> (in that you can reference it as attempting to implement a standard); a
>>> standard is a technical specification that has multiple implementations
>>> and is overseen by a standardization authority (in this case, the
>>> WHATWG).

>> A reference implementation is an implementation that itself constitutes the
>> standard; if you want to know what the standard prescribes, you fire up
>> the implementation and try it. WHATWG standards are written in code
>> (it would be perfectly feasible to write a compiler for it), and that's
>> why they are reference implementations.
>>
>
> I don’t understand what you mean by they are written in code?

I can't speak for John, but I think what he means is that WHATWG specs
are written in a kind of quite formal, pseudocode-like language.

I also had the idea that it should be possible to analyze this
pseudocode and do something with it (e.g. find inconsistencies,...), and
was thinking of e.g. propose this as a project to a student of mine

But my current thinking is that like the original 1950s claims of
machine translation being a solved problem within three to five years,
claims like "it would be perfectly feasible to write a compiler for it"
are mainly based on the fact that we all tend to underestimate the
complexity of human language. So any research project I'd start
currently would at this stage be more focused on exploring the limits
and irregularities of the language used in the WHATWG (and related W3C)
specs rather than aiming at making them executable.

Regards,   Martin.

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Marcos Caceres-4
In reply to this post by Julian Reschke



On Friday, December 13, 2013 at 5:56 PM, Julian Reschke wrote:

> They are written in pseudo-code written in English (at least this one).
>  
> This might be very useful for people writing implementation code, but  
> it's not so helpful for people *using* the feature (like authoring file  
> URIs), or people trying to understand why something works the way it works.  

If one is looking for developer documentation, then MDN or webplatform.org might be more appropriate that looking at a spec.

However, if a spec doesn’t answer “why” something works in some particular way, then that’s bug in the spec (certainly not a feature of it!).  

Anyway, this is really a discussion for [hidden email], not the URI list.

My point was that the WHATWG’s url spec already standardizes file://, so I’m trying to understand if there is need for the RCF also? or if the two can be somehow merged so we don’t end up with duplicate specs.

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Matthew Kerwin
On 13 December 2013 20:35, Marcos Caceres <[hidden email]> wrote:

My point was that the WHATWG’s url spec already standardizes file://, so I’m trying to understand if there is need for the RCF also? or if the two can be somehow merged so we don’t end up with duplicate specs.

The WhatWG's URL living standard has rules for parsing a 'file' URI, but it doesn't have rules for generating one (you could infer them from the parsing rules, with a bit of creative license) nor does it specify any semantics for what to do with that URI once you've parsed it (e.g. mapping it to the filesystem).

It really does read more like a pseudocode reference implementation of the interesting bits of RFCs 3986, 1738, etc. than a spec, per se.


--
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

John Cowan-3
In reply to this post by Mike Brown-3
Mike Brown scripsit:

> Unless you're writing a math tutorial, you should avoid repeatedly
> telling the reader to "note" things, or that things should be noted,
> or any other explicit ways of highlighting content or directing the
> reader's attention.

I disagree entirely.  The notion that readers always read with full attention
is not justified, and so calling their attention to particularly important
points is very appropriate.

--
As you read this, I don't want you to feel      John Cowan
sorry for me, because, I believe everyone       [hidden email]
will die someday.                               http://www.ccil.org/~cowan
        --From a Nigerian-type scam spam

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Ian Hickson
In reply to this post by John Cowan-3
On Thu, 12 Dec 2013, John Cowan wrote:
>
> WHATWG standards are written in code (it would be perfectly feasible to
> write a compiler for it)

Man, if that was true that would be awesome on so many levels.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

David Sheets-2
On Fri, Dec 13, 2013 at 9:19 PM, Ian Hickson <[hidden email]> wrote:
> On Thu, 12 Dec 2013, John Cowan wrote:
>>
>> WHATWG standards are written in code (it would be perfectly feasible to
>> write a compiler for it)
>
> Man, if that was true that would be awesome on so many levels.

http://lists.w3.org/Archives/Public/public-html/2007Jul/1103.html

> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>


Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Ian Hickson
On Fri, 13 Dec 2013, David Sheets wrote:
> On Fri, Dec 13, 2013 at 9:19 PM, Ian Hickson <[hidden email]> wrote:
> > On Thu, 12 Dec 2013, John Cowan wrote:
> >>
> >> WHATWG standards are written in code (it would be perfectly feasible
> >> to write a compiler for it)
> >
> > Man, if that was true that would be awesome on so many levels.
>
> http://lists.w3.org/Archives/Public/public-html/2007Jul/1103.html

"Some parts of one of the parsers can be mechanically converted to code"
is a far from "WHATWG standards are written in code", unfortunately.

There are actually much better examples if you want to look for parts of
WHATWG specs that are specified in compilable code, e.g. the table sorting
model has a parser that is literally written in pseudo-code in the HTML
spec source, and the postprocessor turns it into English!

Having parts of specs in computer-readable form isn't unusual; many IETF
specs use BNF, for example, from which it is somewhat easy to generate
syntax validators. However, this is a long way from what John said.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

John Cowan-3
In reply to this post by Matthew Kerwin
Matthew Kerwin scripsit:

> What I would really like is your opinions as experts, whether you
> think it's a worthwhile effort, or if my approach is suitable, or any
> specific issues (technical or editorial) with the ID itself.

Specific issues:

Fragments shouldn't be mentioned at all, because they are defined by
the media type of the entity (like text/html), not by the URI scheme.
However, it would be useful to note that media types are typically
inferred from the last component of the URI path: if the component ends in
".txt", then text/plain is inferred, and so on.

For "(some) Macintosh OS versions" read "older MacOS versions".

You should point out that modern MacOS file names are in Normalization
Form D, and consequently, conversion between Normalization Forms C and
D is appropriate when translating from file: URIs to MacOS paths or
vice versa.

> An alternative approach I've considered is creating an Informational
> RFC that "deobsoletes" parts of RFC 1738, since it's a bit unclear
> whether4248 (telnet) and 4266 (gopher) obsolete *all* of it, or just
> those scheme definitions.  If you think that would be a better (or
> worse, or silly) approach, I'd also like to hear so.

I see no benefit to this approach.

--
"Why yes, I'm ten percent Jewish on my manager's side."      John Cowan
    --Connie Francis                         http://www.ccil.org/~cowan

Reply | Threaded
Open this post in threaded view
|

Re: reviving the file URI scheme

Martin J. Dürst
On 2013/12/16 16:11, John Cowan wrote:
> Matthew Kerwin scripsit:

> You should point out that modern MacOS file names are in Normalization
> Form D, and consequently, conversion between Normalization Forms C and
> D is appropriate when translating from file: URIs to MacOS paths or
> vice versa.

What does "modern" in "modern MacOS" mean? Of course a 1984 or so Mac
wouldn't be able to use Unicode, but this practice goes way, way back.

Also, it's not exactly Normalization Form D (NFD). It's definitely NFD
for many if not most parts of Unicode, but not everywhere. Please see
e.g. https://developer.apple.com/library/mac/qa/qa1173/_index.html for
more details.

Regards,   Martin.