Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Matthew Kerwin
FYI: this is my file URI draft, now listing apps-discuss as the official place for discussion.

>>>

A new version of I-D, draft-kerwin-file-scheme-13.txt
has been successfully submitted by Matthew Kerwin and posted to the
IETF repository.

Name:           draft-kerwin-file-scheme
Revision:       13
Title:          The file URI Scheme
Document date:  2014-09-26
Group:          Individual Submission
Pages:          16
URL:            http://www.ietf.org/internet-drafts/draft-kerwin-file-scheme-13.txt
Status:         https://datatracker.ietf.org/doc/draft-kerwin-file-scheme/
Htmlized:       http://tools.ietf.org/html/draft-kerwin-file-scheme-13
Diff:           http://www.ietf.org/rfcdiff?url2=draft-kerwin-file-scheme-13

Abstract:
   This document specifies the "file" Uniform Resource Identifier (URI)
   scheme, replacing the definition in RFC 1738.

   It attempts to document current practices, while at the same time
   defining a common core which is intended to interoperate across the
   broad spectrum of existing implementations.

Note to Readers (To be removed by the RFC Editor)

   This draft should be discussed on the IETF Applications Area Working
   Group discussion list <[hidden email]>.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat




--
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Reply | Threaded
Open this post in threaded view
|

RE: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

masinter

I applaud the move to document file:.

 

Three comments on http://tools.ietf.org/html/draft-kerwin-file-scheme

 

First comment:

Please align with the special processing for “file:” in:
https://specs.webplatform.org/url/webspecs/develop/

 

That spec (developed in sync with WHATWG URL) focuses on

parsing and processing of base + relative, when the scheme

(or base scheme) is  “file:”, while the kerwin-file-scheme
talks more about mapping between local file paths and “file:”

IRIs (URLs).

 

But the overlap is significant, and I hope could be eliminated,
if the direction of moving the webplatform.org spec to
obsolete 3987 is accepted.

 

Second comment:

 

You describe the relationship between local file names
and ‘file:’ URLS for “UNIX-like”, “DOS- or Windows-based”
and “VMS Files-11”. These categories aren’t precise or
exhaustive, and file system processing of file names can
vary depending on version, language pack installed in OS,
as well as network protocol (Nfs?).

 

I would rather see a more extensible organization of
the specification by establishing a template, and then

separately documenting, on a file-system by file-system
basis, how to translate and match, based not on the
URI but on the parsed components resulting from
URL-parsing. This would allow new or different file
systems to be documented.

 

Third comment:

 

The translation between file names and URLs and things
like them happen in several specs:

 

content-disposition for file download, multipart/form-data
for file upload, by common HTTP servers, when packaging files

together into archive types as being discussed by those proposing

a new ‘archive’ top level media type.

 

It would be beneficial if we converge these, as it reduces interoperability
to have incompatible methods of translation for local files names depending
on the categorization.

Larry

--

http://larry.masinter.net

 

From: apps-discuss [mailto:[hidden email]] On Behalf Of Matthew Kerwin
Sent: Thursday, September 25, 2014 6:13 PM
To: IETF Apps Discuss; [hidden email]
Subject: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

 

FYI: this is my file URI draft, now listing apps-discuss as the official place for discussion.

 

>>> 


A new version of I-D, draft-kerwin-file-scheme-13.txt
has been successfully submitted by Matthew Kerwin and posted to the
IETF repository.

Name:           draft-kerwin-file-scheme
Revision:       13
Title:          The file URI Scheme
Document date:  2014-09-26
Group:          Individual Submission
Pages:          16
URL:            http://www.ietf.org/internet-drafts/draft-kerwin-file-scheme-13.txt
Status:         https://datatracker.ietf.org/doc/draft-kerwin-file-scheme/
Htmlized:       http://tools.ietf.org/html/draft-kerwin-file-scheme-13
Diff:           http://www.ietf.org/rfcdiff?url2=draft-kerwin-file-scheme-13

Abstract:
   This document specifies the "file" Uniform Resource Identifier (URI)
   scheme, replacing the definition in RFC 1738.

   It attempts to document current practices, while at the same time
   defining a common core which is intended to interoperate across the
   broad spectrum of existing implementations.

Note to Readers (To be removed by the RFC Editor)

   This draft should be discussed on the IETF Applications Area Working
   Group discussion list <[hidden email]>.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat



 

--

  Matthew Kerwin
  http://matthew.kerwin.net.au/

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Nico Williams
In reply to this post by Matthew Kerwin

FWIW, HFS+ normalizes to something close to NFD.  Not adhering to the
recommendation to normalize might cause problems here.

Also, if file: is intended for local use only, then there may not be any
point to normalizing.  Hmmm, that will vary, no?  It will generally
depend on what the filesystem does.  Most just-use-8, some do better.
ZFS does normalization-insensitive matching; HFS+ normalizes to NFD.

Anyways, normalizing to NFC is probably the best thing to do, but
mentioning the above might give implementors the sense of urgency they
might need to actually implement this recommendation (or a clue as to
bug reports they run into).

Nico
--

Reply | Threaded
Open this post in threaded view
|

RE: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Phillips, Addison-2
Although normalization is often a good idea... normalization might be a problem if the local filesystem allows normalized and non-normalized representations both to appear. You wouldn't be able to specify a non-normalized representation.

Addison

> -----Original Message-----
> From: Nico Williams [mailto:[hidden email]]
> Sent: Tuesday, December 09, 2014 2:27 PM
> To: Matthew Kerwin
> Cc: IETF Apps Discuss; [hidden email]
> Subject: Re: [apps-discuss] Fwd: FW: New Version Notification for draft-
> kerwin-file-scheme-13.txt
>
>
> FWIW, HFS+ normalizes to something close to NFD.  Not adhering to the
> recommendation to normalize might cause problems here.
>
> Also, if file: is intended for local use only, then there may not be any point to
> normalizing.  Hmmm, that will vary, no?  It will generally depend on what the
> filesystem does.  Most just-use-8, some do better.
> ZFS does normalization-insensitive matching; HFS+ normalizes to NFD.
>
> Anyways, normalizing to NFC is probably the best thing to do, but mentioning
> the above might give implementors the sense of urgency they might need to
> actually implement this recommendation (or a clue as to bug reports they run
> into).
>
> Nico
> --

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Nico Williams
On Tue, Dec 09, 2014 at 10:37:19PM +0000, Phillips, Addison wrote:
> Although normalization is often a good idea... normalization might be
> a problem if the local filesystem allows normalized and non-normalized
> representations both to appear. You wouldn't be able to specify a
> non-normalized representation.

At least for Latin scripts most input methods produce NFC-ish output.
This is true even on OS X even though its own HFS+ decomposes file/
directory names (!).  What can I say other than "add that to the pile of
mistakes we need a time machine to correct"?

Anyways, in general normalizing to NFC does less damage than any other
normalization, and not normalizing can cause weird things, but then, so
can normalizing.

My preference (for those who don't already know) is for all [new]
filesystems to do normalization-insensitive matching, as that's the
least problematic option.  But this isn't going to happen anytime soon,
apps can't tell if the filesystem is going to do this, and so on and so
forth.

Nico
--

Reply | Threaded
Open this post in threaded view
|

RE: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Phillips, Addison-2
> Anyways, in general normalizing to NFC does less damage than any other
> normalization, and not normalizing can cause weird things, but then, so can
> normalizing.

Yes, I completely agree that NFC is the right choice and does the least harm. For wire-format exchange the ideal would be for NFC to be the rule.

file:// is a little different, since it is mainly trying to represent the characters/byte codes the file system is using while still trying to be recognizable to the user.  Maybe it doesn't matter, though: most user-agents provide a dialog box, auto-complete, or some other mechanism. Perhaps the rule is "NFC for the URI, file system/user agent provides re-normalization or normalization-independent selection for file systems that use a different normalization form, de-normalized files use escapes (== don't do that)"?

Addison
Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Matthew Kerwin
In reply to this post by Phillips, Addison-2
On 10 December 2014 at 08:37, Phillips, Addison <[hidden email]> wrote:
Although normalization is often a good idea... normalization might be a problem if the local filesystem allows normalized and non-normalized representations both to appear. You wouldn't be able to specify a non-normalized representation.


Do you have an example? I'm trying to think it through, but I keep going in circles. The one I think of is ext[2-4] where the filesystem stores octet sequences, and shell/applications/etc. use things like the user's locale environment when representing those octets as text strings. Are you saying that if we mandate NFC normalisation of URIs, you can't distinguish between a files whose filename octets are {0xE4} vs {0xC3, 0xA4} (i.e. U+00E4 "ä" in WIndows-1252 / UTF-8)?

Wouldn't "file://%E4" would cover that?


--
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Reply | Threaded
Open this post in threaded view
|

RE: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Phillips, Addison-2

Yeah, that’s exactly the example filesystem I had in mind.

 

Actually, my thought was that U+00E4 and U+0061.0308 would be:

 

{ 0xC3.A4 } vs. { 0x61.CC.88 }

 

These are both in UTF-8, are visually indistinguishable, and are identical under NFC, but fopen() cares which bag of bytes you grab.

 

Addison

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Matthew Kerwin
Sent: Tuesday, December 09, 2014 3:00 PM
To: Phillips, Addison
Cc: Nico Williams; IETF Apps Discuss; [hidden email]
Subject: Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

 

On 10 December 2014 at 08:37, Phillips, Addison <[hidden email]> wrote:

Although normalization is often a good idea... normalization might be a problem if the local filesystem allows normalized and non-normalized representations both to appear. You wouldn't be able to specify a non-normalized representation.

 

Do you have an example? I'm trying to think it through, but I keep going in circles. The one I think of is ext[2-4] where the filesystem stores octet sequences, and shell/applications/etc. use things like the user's locale environment when representing those octets as text strings. Are you saying that if we mandate NFC normalisation of URIs, you can't distinguish between a files whose filename octets are {0xE4} vs {0xC3, 0xA4} (i.e. U+00E4 "ä" in WIndows-1252 / UTF-8)?

 

Wouldn't "<a href="file:///\\%E4">file://%E4" would cover that?


 

--

  Matthew Kerwin
  http://matthew.kerwin.net.au/

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Matthew Kerwin
In reply to this post by Matthew Kerwin
On 10 December 2014 at 09:26, Murray S. Kucherawy <[hidden email]> wrote:
Hi Matthew,

Are you looking to process this through APPSAWG or as an individual submission?

-MSK


I'd prefer it to be through the WG.


--
  Matthew Kerwin
  http://matthew.kerwin.net.au/
Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Nico Williams
In reply to this post by Phillips, Addison-2
On Tue, Dec 09, 2014 at 10:57:17PM +0000, Phillips, Addison wrote:
> Yes, I completely agree that NFC is the right choice and does the
> least harm. For wire-format exchange the ideal would be for NFC to be
> the rule.
>
> file:// is a little different, since it is mainly trying to represent

It may not be for a wire-format, but it exists nonetheless to
interoperate, even if locally-only, between different apps, possibly
portable apps.  Therefore the issue comes up.

> the characters/byte codes the file system is using while still trying
> to be recognizable to the user.  Maybe it doesn't matter, though: most

Even on a typical Unix system that doesn't help: the filesystem
typically treats filenames as octet strings not including '/'.

> user-agents provide a dialog box, auto-complete, or some other
> mechanism. Perhaps the rule is "NFC for the URI, file system/user
> agent provides re-normalization or normalization-independent selection
> for file systems that use a different normalization form,
> de-normalized files use escapes (== don't do that)"?

This comes up in many places (NFSv4, WebDAV, ...).

Nico
--

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Nico Williams
In reply to this post by Matthew Kerwin
On Wed, Dec 10, 2014 at 08:59:35AM +1000, Matthew Kerwin wrote:

> On 10 December 2014 at 08:37, Phillips, Addison <[hidden email]> wrote:
> > Although normalization is often a good idea... normalization might be a
> > problem if the local filesystem allows normalized and non-normalized
> > representations both to appear. You wouldn't be able to specify a
> > non-normalized representation.
>
> Do you have an example? I'm trying to think it through, but I keep going in
> circles. The one I think of is ext[2-4] where the filesystem stores octet
> sequences, and shell/applications/etc. use things like the user's locale
> environment when representing those octets as text strings. Are you saying
> that if we mandate NFC normalisation of URIs, you can't distinguish between
> a files whose filename octets are {0xE4} vs {0xC3, 0xA4} (i.e. U+00E4 "ä"
> in WIndows-1252 / UTF-8)?
>
> Wouldn't "file://%E4" would cover that?

Suppose one app doesn't normalize.  And another does.  The user might be
unable to type in the name of a file they want to open that exists.
This is a bit contrived because they'll be able to pick the file in a
file selection combo box, but still.

A classic example many years ago was a git repo that had such characters
in some filenames and which then broke on OS X.  I don't have a link
handy.

Nico
--

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

John Cowan-3
In reply to this post by Phillips, Addison-2
Phillips, Addison scripsit:

> These are both in UTF-8, are visually indistinguishable, and are
> identical under NFC, but fopen() cares which bag of bytes you grab.

The same is true on Windows, where filenames are 16-bit code units rather
than 8-bit code units.  In general, we simply cannot normalize file names,
because both Windows and Unix filesystems distinguish between names that
are equivalent under canonical equivalence.

--
John Cowan          http://www.ccil.org/~cowan        [hidden email]
The whole of Gaul is quartered into three halves.
        --Julius Caesar

Reply | Threaded
Open this post in threaded view
|

Re: [apps-discuss] Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

Nico Williams
On Tue, Dec 09, 2014 at 09:07:59PM -0500, John Cowan wrote:
> Phillips, Addison scripsit:
> > These are both in UTF-8, are visually indistinguishable, and are
> > identical under NFC, but fopen() cares which bag of bytes you grab.
>
> The same is true on Windows, where filenames are 16-bit code units rather
> than 8-bit code units.  In general, we simply cannot normalize file names,
> because both Windows and Unix filesystems distinguish between names that
> are equivalent under canonical equivalence.

The preferable thing to do is to have form-preserve-on-create and form-
insensitive lookups in the filesystem, which creates some unlikely
aliases, but mostly avoids real problems.

In practice few filesystems do this, so applications layered above them
have to do something.  "Nothing" works most of the time.  Sometimes it
doesn't work, and when it doesn't, it hurts.  The example that comes to
mind is git on HFS+.  git has a configuration option to normalize:

core.precomposeunicode::
        This option is only used by Mac OS implementation of Git.
        When core.precomposeunicode=true, Git reverts the unicode decomposition
        of filenames done by Mac OS. This is useful when sharing a repository
        between Mac OS and Linux or Windows.
        (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7).
        When false, file names are handled fully transparent by Git,
        which is backward compatible with older versions of Git.

Also, the ideal is that the filesystem stores Unicode filenames, and
consumers in any non-Unicode locales convert.

I think I have an I-D lying about discussing this.  This keeps coming
up.  Maybe we should publish it?  Though the file: URI scheme makes a
poor trigger for publishing it: it'll be easier to ignore normalization
in the file: scheme.

Nico
--