respecting IETF customs?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

respecting IETF customs?

masinter
As a side note to the ongoing URL discussion:

> If I wanted something from the IETF community, I would try to respect
> their customs. You are welcome to make an IETF Contribution
> submitting our ideas as an Internet-Draft, for instance. Your sarcasm also does
> not help.

I don't think the comments were sarcastic.

He makes a reasonable request, and facilitating around 'customs'
is the job of IETF-W3C liaison group.

What customs, anyway?

This happens often.

Larry
--
http://larry.masinter.net



Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2
On 12/06/2014 12:54 AM, Larry Masinter wrote:

> As a side note to the ongoing URL discussion:
>
>> If I wanted something from the IETF community, I would try to respect
>> their customs. You are welcome to make an IETF Contribution
>> submitting our ideas as an Internet-Draft, for instance. Your sarcasm also does
>> not help.
>
> I don't think the comments were sarcastic.
>
> He makes a reasonable request, and facilitating around 'customs'
> is the job of IETF-W3C liaison group.
>
> What customs, anyway?
>
> This happens often.

Thanks Larry!

I will say that if the IETF-W3C liaison group feels that submitting this
content as an Internet-Draft makes sense, I will follow through on that.
  After all, publishing this content on WebPlatform.org was a result of
me following up on a suggestion[1].  If there are other serious
suggestions, I WILL follow up on them.

Meanwhile, here is a more stable version of that content, presented in
W3C Working Draft form:

http://www.w3.org/TR/2014/WD-url-1-20141209/

It doesn't have some of the things I'm working on in it yet, but after
all, that's what "more stable" means.

In the the "Participate" section in the front of that W3C Working Draft
you will find a list of alternate venues for feedback.  Here's another
venue:

http://discourse.specifiction.org/c/url

If creating another mailing list or tracking system would help, lets
make that happen.  Meanwhile, I will say that most of the activity is
happening or finding its way back here:

https://www.w3.org/Bugs/Public/buglist.cgi?component=URL&resolution=---

An example where help would be very much appreciated: would it be
possible for somebody who not only is familiar with RFC 3986 but also
has a sense for what parts might be changeable and what parts can't
change to review the following:

https://url.spec.whatwg.org/interop/urltest-results/

That page shows how a number of user agents (including browser and
non-browsers) parse strings as a URLs/URIs.  Search that page for the
word "addressable".  Addressable[2] is intended to be a highly RFC
compliant implementation.  Modulo any bugs it may have, it should be
fairly representative of what the combination of RFC 3986 and RFC 3987
require.  If there are any differences, lets let the author of
addressable Bob Aman know.

And while that is a broad request, here is a much more focused request,
define some test cases which will define how relative references should
be evaluated against a base with an unknown URLs/URIs scheme:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233

Sometimes reading prose in the various specifications is difficult, and
what you might want to explore isn't covered by the existing test cases.
  Here are more interactive ways to compare the browser you are
currently using against the specification I previously pointed you to:

https://url.spec.whatwg.org/reference-implementation/liveview.html
https://url.spec.whatwg.org/reference-implementation/liveview2.html
https://url.spec.whatwg.org/reference-implementation/liveview3.html

I encourage you to try Larry's example[3]: unknownscheme://コーヒー.  I
don't know of any spec that recommends punycoding paths, but what the
URL spec I pointed to does specify matches IE but not Firefox or Chrome.
  There is some room for determining what the correct behavior is here.

Julian[4] and Graham[5] seem interested in making this work consistent
with RFC 3986.  That doesn't surprise me at all, in fact, that's exactly
what I expected the IETF to recommend[6], possibly with some errata.  If
there are people are interested in making that work, I'm interested in
working with them.  I've pointed out some differences above, and I'm
willing to to the research into how existing user agents behave, and I'm
willing to discuss how we can converge onto a single definition.

After all "rough consensus and running code" is what we all want to
achieve, right?

- Sam Ruby

[1] http://lists.w3.org/Archives/Public/public-w3process/2014Nov/0177.html
[2] https://github.com/sporkmonger/addressable
[3] http://lists.w3.org/Archives/Public/public-ietf-w3c/2014Dec/0011.html
[4] http://lists.w3.org/Archives/Public/public-ietf-w3c/2014Dec/0015.html
[5] http://lists.w3.org/Archives/Public/public-ietf-w3c/2014Dec/0016.html
[6]
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0000.html

Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Bjoern Hoehrmann
* Sam Ruby wrote:
>I will say that if the IETF-W3C liaison group feels that submitting this
>content as an Internet-Draft makes sense, I will follow through on that.
>  After all, publishing this content on WebPlatform.org was a result of
>me following up on a suggestion[1].  If there are other serious
>suggestions, I WILL follow up on them.

You could also consider submitting a problem statement or other kind of
higher level document with pointers to your proposals. Something, in any
case, is better than nothing, if you want to raise awareness within and
get feedback from the IETF community.

>An example where help would be very much appreciated: would it be
>possible for somebody who not only is familiar with RFC 3986 but also
>has a sense for what parts might be changeable and what parts can't
>change to review the following:
>
>https://url.spec.whatwg.org/interop/urltest-results/

This page is rather difficult to digest. One problem is that there is no
indication of expected results, and the colour coding does not indicate,
for instance, where test results diverge from the relevant RFCs. My

  http://shadowregistry.org/js/misc/

presents tests and results in a form that makes such information more
readily available.

>And while that is a broad request, here is a much more focused request,
>define some test cases which will define how relative references should
>be evaluated against a base with an unknown URLs/URIs scheme:
>
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233

You already seem to have plenty of tests if you replace the scheme in
them, and if you have a setup that can automatically evaluate tests, I
would simply automatically generate test cases. For an example, see

  http://lists.w3.org/Archives/Public/www-archive/2011Aug/0001.html

I also note that RFC 3986 already fully defines this, and I am not aware
of differences in deployed code that cannot be changed in this regard.
If there are, they ought to be brought up on the `public-iri` or `uri`
list.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
 Available for hire in Berlin (early 2015)  · http://www.websitedev.de/ 

Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2
On 12/06/2014 08:21 AM, Bjoern Hoehrmann wrote:

> * Sam Ruby wrote:
>> I will say that if the IETF-W3C liaison group feels that submitting this
>> content as an Internet-Draft makes sense, I will follow through on that.
>>   After all, publishing this content on WebPlatform.org was a result of
>> me following up on a suggestion[1].  If there are other serious
>> suggestions, I WILL follow up on them.
>
> You could also consider submitting a problem statement or other kind of
> higher level document with pointers to your proposals. Something, in any
> case, is better than nothing, if you want to raise awareness within and
> get feedback from the IETF community.

Is this something you would be willing to co-author with me?

As a starter set, I see three problem areas:

Nomenclature
---

URL/URI/IRI is just the beginning.  Over time different terms have been
used by different organizations.  One survey can be found here:
http://tantek.com/2011/238/b1/many-ways-slice-url-name-pieces

At a minimum, this information belongs as an appendix in some
RFC/Recommendation/Standard.  It even could stand alone.

Applicable standards
---

While the problem space seems like it would be reasonably
self-contained, in practice concepts like IDNA and UNICODE make profound
differences.  Even those standards have versions and even those versions
have options like Normalization forms and UseSTD3ASCIIRules.

Two examples:
https://url.spec.whatwg.org/interop/urltest-results/61a4a14209
https://url.spec.whatwg.org/interop/urltest-results/683ac9869d

RFC3986, for example, mentions IDNA and UTF-8, but doesn't nail down
these options.

Interop
---

Lets face it, every programming language these days has some form of
standard library, and in that library is some form of URL or URI parse
function.  Many are horribly broken, even after we take into account the
nomenclature and applicable standard differences.  Here is a concrete
example:

http://intertwingly.net/blog/2004/07/31/URI-Equivalence

That was a decade ago.  At the time, C# was the winner; Perl a close
second, and Java was a far distant third.  A decade later, I've rerun
the tests for Perl and Java, and sadly, they haven't changed.

If you take a survey of implementations, you will find that in addition
to the outliers, there are two families of implementations.  One that
collect around RFC 3986 are precise (in that they tend to produce the
same results) but not necessary accurate in the face of IDNA and Unicode
considerations.  And another that collect around browser results.  The
latter is less precise (in that there are variations), but tend overall
to be more accurate with respect to other applicable standards.

By the way, another and more insidious problem lurking in places like
file:// URIs.  For now, I'll just leave it at that.

>> An example where help would be very much appreciated: would it be
>> possible for somebody who not only is familiar with RFC 3986 but also
>> has a sense for what parts might be changeable and what parts can't
>> change to review the following:
>>
>> https://url.spec.whatwg.org/interop/urltest-results/
>
> This page is rather difficult to digest. One problem is that there is no
> indication of expected results, and the colour coding does not indicate,
> for instance, where test results diverge from the relevant RFCs. My
>
>    http://shadowregistry.org/js/misc/
>
> presents tests and results in a form that makes such information more
> readily available.

Actually, it is color coded (skip to the bottom of the page), but it
doesn't start from the assumption that there is one right answer and
that there are a number of errant implementations that don't conform to
that right answer.  Such an assumption, if it could be made, would
indeed simplify the presentation.

Items colored in a redish color are examples where there doesn't seem to
be agreement on what the right answer is.

The next two colors cover the cases where the IETF or WHATWG
specifications are not in line with the consensus.

The final color is where IETF and WHATWG agree.  Even in those cases,
there often are a few outliers.

If you have other ideas on how to present this information, here's the
raw data captured for a number of user agents:

https://github.com/webspecs/url/tree/develop/evaluate/useragent-results

I welcome people to take this data and present it other ways.  I welcome
but don't require people to contribute back: possible things I would be
interested in are other ways to present this data, more tests, or other
result sets that should be included.  For example, adding Perl to this
evaluation results would make perfect sense.

>> And while that is a broad request, here is a much more focused request,
>> define some test cases which will define how relative references should
>> be evaluated against a base with an unknown URLs/URIs scheme:
>>
>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233
>
> You already seem to have plenty of tests if you replace the scheme in
> them, and if you have a setup that can automatically evaluate tests, I
> would simply automatically generate test cases. For an example, see
>
>    http://lists.w3.org/Archives/Public/www-archive/2011Aug/0001.html
>
> I also note that RFC 3986 already fully defines this, and I am not aware
> of differences in deployed code that cannot be changed in this regard.
> If there are, they ought to be brought up on the `public-iri` or `uri`
> list.

This is a case where my precision vs accuracy comment applies.  My fear
is that people by comparing test results have come to standardize on
things outside of the spec when it comes to matters like IDNA and
Unicode.  And in the places where they have done so, they may not be in
compliance with those others standards.

Some of these choices may be defensible.  Perhaps Perl can't make
assumptions about character encoding when faced with % encoded bytes.
But then perhaps URI::eq shouldn't be providing boolean answers on
questions of equivalence.  It isn't that difficult to come up with
scenarios where such differences have security implications.

If you believe that this should be discussed on one of those lists,
please do so.  Feel free to copy me, point to this email (it is publicly
archived), or to even forward some or all of this email to those lists.

- Sam Ruby

Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2
On 12/06/2014 01:40 PM, Sam Ruby wrote:
>
> If you take a survey of implementations, you will find that in addition
> to the outliers, there are two families of implementations.  One that
> collect around RFC 3986 are precise (in that they tend to produce the
> same results) but not necessary accurate in the face of IDNA and Unicode
> considerations.  And another that collect around browser results.  The
> latter is less precise (in that there are variations), but tend overall
> to be more accurate with respect to other applicable standards.

I've added Perl to my test results using the following program:

https://github.com/webspecs/url/blob/develop/evaluate/testuri.pl

It has been a while since I've programmed in Perl.  If there are things
I missed, bugs in general, or even simply better ways of doing things,
please let me know.

   - - -

I then took a look at the results, and believe that there being two
families of implementations is more a matter of conventional wisdom;
whereas reality isn't quite so clean.

Here's an example:

https://url.spec.whatwg.org/interop/urltest-results/683ac9869d

Looking at this, it doesn't look like addressable or rust do IDNA
processing.  Rust at least fesses up to this. :-)

Node.js and Perl do less IDNA processing steps than other
implementations.  In particular, they skip step 1, but do steps 2 and 3
of the following page:

http://www.unicode.org/reports/tr46/#ToASCII

Everybody else does all 3 steps.  Note: this isn't necessarily because
Node.js and Perl skipped a step, it may very well be that they implement
an entirely different version of IDNA than everybody else does[1].

Chrome goes an extra step, and recognizes that the result is a IPv4,
albeit one expressed in an uncommon way, and canonicalizes it.

On the theory that canonical URIs should round-trip; the current draft
of the WebPlatform URL Specification aligns with Chrome on this, even
though it is the only browser that exhibits this behavior.

   - - -

This is an example of the type of issue I'd like to explore with those
interested in the topic of interoperable parsing behavior.

- Sam Ruby

[1] https://annevankesteren.nl/2012/11/idna-hell

Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2
In reply to this post by Bjoern Hoehrmann
On 12/06/2014 08:21 AM, Bjoern Hoehrmann wrote:

> * Sam Ruby wrote:
>
>> An example where help would be very much appreciated: would it be
>> possible for somebody who not only is familiar with RFC 3986 but also
>> has a sense for what parts might be changeable and what parts can't
>> change to review the following:
>>
>> https://url.spec.whatwg.org/interop/urltest-results/
>
> This page is rather difficult to digest. One problem is that there is no
> indication of expected results, and the colour coding does not indicate,
> for instance, where test results diverge from the relevant RFCs.

I hope that you find the following page to be easier to digest:

   https://url.spec.whatwg.org/interop/test-results/

With this page, you can do more than simply compare user agents against
the reference implementation of the URL Standard.  You can compare one
browser against other browsers.  You can compare Perl against Python.
If you feel that there is a RFC 3986 compliant application in the set,
you can compare it against the reference implementation.

If you select a collection to compare against a baseline, then yellow
will mean that less than two implementations pass.  If you select an
individual user agent, yellow will show differences.

Clicking on a row will take you to more details on the individual
results for that test.  Yellow will show individual differences against
the baseline you selected.

Clicking on either the input or base on such a page will take you to a
page where you can interactively explore differences between the browser
you are using to view the page and what the reference implementation
provides.

Exceptions show up as hot pink.

- Sam Ruby


Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Roy T. Fielding-3
On Dec 12, 2014, at 9:18 AM, Sam Ruby wrote:
> I hope that you find the following page to be easier to digest:
>
>  https://url.spec.whatwg.org/interop/test-results/
>
> With this page, you can do more than simply compare user agents against the reference implementation of the URL Standard.  You can compare one browser against other browsers.  You can compare Perl against Python. If you feel that there is a RFC 3986 compliant application in the set, you can compare it against the reference implementation.

Nice, but it would be a lot better if abnormal URL references were grouped
separately from normal references.  Many of the "test failures" are decisions
by one or more of the implementations to reject a reference due to potential
security problems (e.g., TCP well-known ports [0-53] that might be explicitly
forbidden regardless of parsing) or syntax that is specifically forbidden
by the scheme.  Those should not be considered parser differences.

What are you using to extract the result? Beware that some implementations
will parse and provide one URL in a javascript API, but will actually
fix or reject that URL before using it via HTTP. RFC3986 only defines
what would be sent.

Also, please feel free to include my RFC test cases, located at

  https://svn.apache.org/repos/asf/labs/webarch/trunk/uri/test/

....Roy




Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2
On 12/12/2014 03:26 PM, Roy T. Fielding wrote:

> On Dec 12, 2014, at 9:18 AM, Sam Ruby wrote:
>> I hope that you find the following page to be easier to digest:
>>
>> https://url.spec.whatwg.org/interop/test-results/
>>
>> With this page, you can do more than simply compare user agents
>> against the reference implementation of the URL Standard.  You can
>> compare one browser against other browsers.  You can compare Perl
>> against Python. If you feel that there is a RFC 3986 compliant
>> application in the set, you can compare it against the reference
>> implementation.
>
> Nice, but it would be a lot better if abnormal URL references were
> grouped separately from normal references.  Many of the "test
> failures" are decisions by one or more of the implementations to
> reject a reference due to potential security problems (e.g., TCP
> well-known ports [0-53] that might be explicitly forbidden regardless
> of parsing) or syntax that is specifically forbidden by the scheme.
> Those should not be considered parser differences.

Here is the master set of test data:

https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt

If reverse engineering undocumented JavaScript isn't your thing (I know
it wasn't was what happy with), here is the data that the parser produces:

https://url.spec.whatwg.org/interop/urltestdata.json

Seeing that data expanded helped me "grok" the original format, which
isn't too bad.  Just be aware that two spaces after the first (i.e.
input) field means to reuse the base from above.

I encourage you to submit a pull request that sorts or splits the data
to your taste.  Additions are also welcome, and even encouraged!

As to whether or not "forbidden" syntaxes should be considered parser
differences, that's a subject of honest debate, with the key being how
likely is the input to be experienced in practice. If it is common
enough (and user facing tools are more subject to this issue than back
end servers), then the differences are an issue even if the input may be
considered forbidden.

The current draft of the URL standard is intentionally very unforgiving,
for example, when presented with a mal-formed IPv6 address; but is very
tolerant of backslashes.  I would be very amenable to rules that making
well-known port numbers explicitly disallowed for security reasons.

> What are you using to extract the result? Beware that some
> implementations will parse and provide one URL in a javascript API,
> but will actually fix or reject that URL before using it via HTTP.
> RFC3986 only defines what would be sent.

The code used to extract the results can be found here:

https://github.com/webspecs/url/tree/develop/evaluate

The actual data collected can be found here:

https://github.com/webspecs/url/tree/develop/evaluate/useragent-results

Again pull requests are welcome!

> Also, please feel free to include my RFC test cases, located at
>
> https://svn.apache.org/repos/asf/labs/webarch/trunk/uri/test/

If you are willing to do a pull request, consider adding some or all of
these in there?

If you do, I'll capture results for all of the user agents that I've
done to date (and any others that people might suggest -- preferably in
the form of a pull request :-)), and update my results page to include a
suitable separation.

> ....Roy

- Sam Ruby

Reply | Threaded
Open this post in threaded view
|

Re: respecting IETF customs?

Sam Ruby-2


On 12/12/2014 04:10 PM, Sam Ruby wrote:

> On 12/12/2014 03:26 PM, Roy T. Fielding wrote:
>> On Dec 12, 2014, at 9:18 AM, Sam Ruby wrote:
>>> I hope that you find the following page to be easier to digest:
>>>
>>> https://url.spec.whatwg.org/interop/test-results/
>>>
>>> With this page, you can do more than simply compare user agents
>>> against the reference implementation of the URL Standard.  You can
>>> compare one browser against other browsers.  You can compare Perl
>>> against Python. If you feel that there is a RFC 3986 compliant
>>> application in the set, you can compare it against the reference
>>> implementation.
>>
>> Nice, but it would be a lot better if abnormal URL references were
>> grouped separately from normal references.  Many of the "test
>> failures" are decisions by one or more of the implementations to
>> reject a reference due to potential security problems (e.g., TCP
>> well-known ports [0-53] that might be explicitly forbidden regardless
>> of parsing) or syntax that is specifically forbidden by the scheme.
>> Those should not be considered parser differences.
>
> Here is the master set of test data:
>
> https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.txt
>
> If reverse engineering undocumented JavaScript isn't your thing (I know
> it wasn't was what happy with), here is the data that the parser produces:
>
> https://url.spec.whatwg.org/interop/urltestdata.json
>
> Seeing that data expanded helped me "grok" the original format, which
> isn't too bad.  Just be aware that two spaces after the first (i.e.
> input) field means to reuse the base from above.
>
> I encourage you to submit a pull request that sorts or splits the data
> to your taste.  Additions are also welcome, and even encouraged!

Meanwhile, I've added a filter so that you can see only URLs that the
URL standard considers to be valid or invalid.

   https://url.spec.whatwg.org/interop/test-results/?filter=valid
   https://url.spec.whatwg.org/interop/test-results/?filter=invalid

Feedback (in the form of comments, bugs, issues, or pull requests) is of
course welcome as to which of these categories each input should be placed.

- Sam Ruby