scheme specific case normalization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

scheme specific case normalization

Jeremy Carroll



A comment on
http://tools.ietf.org/html?draft=draft-hansen-2717bis-2718bis-uri-guidelines-07.txt


In RFC 3986 I read:

[[
  Some schemes define additional subcomponents that consist of case-
insensitive data, giving an implicit license to normalizers to convert
this data to a common case (e.g., all lowercase).
]]
page 42, section 6.2.3


It is helpful if "schemes with subcomponents  that consist of case-
insensitive data" in their definition documents would specify that
usually lowercase SHOULD be used. This is particularly pertinent in
applications such as XML Namespaces and Semantic Web, where
character-by-characters comparison is the norm, and unnormalized URIs
result in false negatives.

Suggested text along the lines of
[[
When a scheme defines subcomponents that consist of case-insensitive
data, then it SHOULD specify that implementations should accept
uppercase letters as equivalent to lowercase for the sake of robustness
but should only produce lowercase scheme names for consistency.
]]



Jeremy


Reply | Threaded
Open this post in threaded view
|

Re: scheme specific case normalization

Dan Brickley-2

* Jeremy Carroll <[hidden email]> [2006-01-25 12:51+0000]

>
>
>
> A comment on
> http://tools.ietf.org/html?draft=draft-hansen-2717bis-2718bis-uri-guidelines-07.txt
>
>
> In RFC 3986 I read:
>
> [[
>  Some schemes define additional subcomponents that consist of case-
> insensitive data, giving an implicit license to normalizers to convert
> this data to a common case (e.g., all lowercase).
> ]]
> page 42, section 6.2.3
>
>
> It is helpful if "schemes with subcomponents  that consist of case-
> insensitive data" in their definition documents would specify that
> usually lowercase SHOULD be used. This is particularly pertinent in
> applications such as XML Namespaces and Semantic Web, where
> character-by-characters comparison is the norm, and unnormalized URIs
> result in false negatives.
>
> Suggested text along the lines of
> [[
> When a scheme defines subcomponents that consist of case-insensitive
> data, then it SHOULD specify that implementations should accept
> uppercase letters as equivalent to lowercase for the sake of robustness
> but should only produce lowercase scheme names for consistency.
> ]]

I like your suggestion...

Dan


>
>
>
> Jeremy
>

Reply | Threaded
Open this post in threaded view
|

Re: scheme specific case normalization

Jeremy Carroll

Sorry, copy paste/error: delete "scheme names" from last line:

i.e.
>> Suggested text along the lines of
>> [[
>> When a scheme defines subcomponents that consist of case-insensitive
>> data, then it SHOULD specify that implementations should accept
>> uppercase letters as equivalent to lowercase for the sake of robustness
>> but should only produce lowercase for consistency.
>> ]]
>


Reply | Threaded
Open this post in threaded view
|

Re: scheme specific case normalization

Al Gilman
In reply to this post by Jeremy Carroll

-1

>A comment on
>http://tools.ietf.org/html?draft=draft-hansen-2717bis-2718bis-uri-guidelines-07.txt
>
>In RFC 3986 I read:
>
>[[
>  Some schemes define additional subcomponents that consist of case-
>insensitive data, giving an implicit license to normalizers to
>convert this data to a common case (e.g., all lowercase).
>]]
>page 42, section 6.2.3

That's a stretch.  There is an implicit requirement for comparators;
normalization outside comparison is moot.

>It is helpful if "schemes with subcomponents  that consist of case-
>insensitive data" in their definition documents would specify that
>usually lowercase SHOULD be used. This is particularly pertinent in
>applications such as XML Namespaces and Semantic Web, where
>character-by-characters comparison is the norm, and unnormalized
>URIs result in false negatives.

These applications have made their bed with the false negatives in
them.  Let them sleep in it.

At least for email addresses, we should not be buggering the mnemonic
advantages of original case
in a misguided advocacy of man-in-the-middle normalization.

www.YourBusinessName.com reads right in a screen reader.
www.yourbusinessname.com does not.

etc.

If the field is case-insensitive, the end user should bear the burden
of normalization, and the data
crossing the network should be inviolate.

>Suggested text along the lines of
>[[
>When a scheme defines subcomponents that consist of case-insensitive
>data, then it SHOULD specify that implementations should accept
>uppercase letters as equivalent to lowercase for the sake of
>robustness but should only produce lowercase scheme names for
>consistency.
>]]

Note the issue is not just scheme names, it is URI components per the
scheme syntax.

Al

>
>
>
>Jeremy


Reply | Threaded
Open this post in threaded view
|

RE: scheme specific case normalization

Larry Masinter
In reply to this post by Jeremy Carroll

>From a process point of view: The RFC is about to be issued,
and there won't be substantive changes without a very
strong case that the text that's there is really wrong
or lacking.

This document's purpose is to establish a baseline
of guidelines for what new scheme definitions should
or shouldn't contain, but the actual process is "expert
review" with a period of mailing list discussion. So it's
always possible for a proposed scheme definition to have
additional considerations -- not in the document -- to be
raised. There's judgment involved.

Specifically:

"x-" private use: we decided against recommending this
long ago, for the same reasons why "x-" tokens have turned
out to be a bad idea in MIME types: the experiments are
successful gradually, and there's never an opportunity to
change the name from "x-blah" to "blah". So register the
name you want in the first place, albeit provisionally.

"scheme specific case normalization": the document's purpose
is to give guidelines for registration, not to make normative
assertions about what implementations should or shouldn't do.

consistent use of components: I think this is also a
matter of judgment. I regret that "file:" and "ftp:"
are inconsistent, but I think the first thing to do
is to update those specs. I've dropped the ball on updating
the "file:" specification (It's the oldest item on
my 'todo' list), but I'm still hopeful.



Reply | Threaded
Open this post in threaded view
|

Re: scheme specific case normalization

Jeremy Carroll

Larry Masinter wrote:
>>From a process point of view: The RFC is about to be issued,
> and there won't be substantive changes without a very
> strong case that the text that's there is really wrong
> or lacking.

OK - I have very limited understanding of IETF process, and agree that
this document is a substantial improvement on the older docs, and that
my comments don't merit disrupting the process.

>
> This document's purpose is to establish a baseline
> of guidelines for what new scheme definitions should
> or shouldn't contain, but the actual process is "expert
> review" with a period of mailing list discussion. So it's
> always possible for a proposed scheme definition to have
> additional considerations -- not in the document -- to be
> raised. There's judgment involved.
>
> Specifically:
>
> "x-" private use: we decided against recommending this
> long ago, for the same reasons why "x-" tokens have turned
> out to be a bad idea in MIME types: the experiments are
> successful gradually, and there's never an opportunity to
> change the name from "x-blah" to "blah". So register the
> name you want in the first place, albeit provisionally.
>

OK. I did try to find the earlier discussion; thanks for the explanation.

> "scheme specific case normalization": the document's purpose
> is to give guidelines for registration, not to make normative
> assertions about what implementations should or shouldn't do.
>

seems a shame ... but not worth holding things up for.

> consistent use of components: I think this is also a
> matter of judgment. I regret that "file:" and "ftp:"
> are inconsistent, but I think the first thing to do
> is to update those specs. I've dropped the ball on updating
> the "file:" specification (It's the oldest item on
> my 'todo' list), but I'm still hopeful.
>
>
Fixing the old specs would probably be a better approach to this problem.

Jeremy