Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
86 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Tim Bray-3
On Mon, Oct 22, 2012 at 3:35 PM, Ian Hickson <[hidden email]> wrote:
> The notion that curl, or an HTTP cache manager, or an XML namespace
> processor, is going to be routing around errors, strikes me on the face
> of it as being wrong.  One of the main uses I put curl to is making sure
> I have the URL exactly right before I drop it into chat or whatever.

   $ wget 'http://example.com/a b'
   --2012-10-23 00:27:43--  http://example.com/a%20b

   # test.cgi returns a 301 with "Location: a b"
   $ curl -L http://damowmow.com/playground/demos/url/in-http-headers/test.cgi
   This file is: http://damowmow.com/playground/demos/url/in-http-headers/a%20b

Software does this stuff already. We need to make sure it does it
interoperably.

Hmm.  I went to tbray.org and made a file at '$ROOT_DIR/tmp/a b' - note the space.

Then I did

curl -I 'http://www.tbray.org/tmp/a%20b'
curl -I 'http://www.tbray.org/tmp/a b'

Curl, quite properly, doesn’t fuck with what I ask it, and revealed a very interesting fact: That my Apache httpd returns 200 for both of these, but, with, uh, interesting variations, amounting to what I think is quite possibly a bug.  I also pasted the version with the space into the nearest Web browser, and it quite properly auto-corrected to a%20b. 

I think it’s a bug that curl is claiming the 301 pointed at "a%20b" not "a b".  Because suppose it had pointed at "a%20b" - I don’t want middleware lying to me.

It seems like a good idea to document the steps by which "a b" pasted in becomes "a%20b" in the address bar. But I don’t see the relevance outside human-authored strings.  -T
 


On Tue, 23 Oct 2012, Julian Reschke wrote:
>
> This always was about venue, not people. If people want to "fix" or
> "augment" URIs/IRIs, they should come over to the IETF.

I think the person doing the work has the prerogative to do it wherever he
or she wants to do it. Maybe the IETF should consider why Anne isn't doing
it in the IETF.


> > The specs don't define everything that implementations have to do to
> > be interoperable. If the IETF doesn't think that's a problem, then
> > that's fine, but then y'all shouldn't be surprised when people who
> > _do_ think that's a problem try and fix it.
>
> Yes, please fix *that*, but *just* that without messing with the basics
> without consensus/review.

Consensus isn't a value I hold highly, but review of Anne's work is
welcome.

If the IETF community didn't want Anne to do this work, then the IETF
community should have done it. Having not done it, having not even
understood that the problem exists, means the IETF has lost the
credibility it needs to claim that this is in the IETF's domain.

You don't get to claim authority over an area while at the same time
telling someone else "please fix that" for the hard work that comes with
that area. The reality is, he who does the hard work, gets the authority.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Mark Nottingham-2
In reply to this post by Ian Hickson

On 23/10/2012, at 10:16 AM, Ian Hickson <[hidden email]> wrote:

> I can't speak for Anne, but having experienced the IETF via the hybi work,
> my own opinion is that the main reason I wouldn't work with the IETF is
> that the community these days values consensus over technical value and
> running code, and the culture in the IETF doesn't value the kind of
> specification style that IMHO leads to better interop. For example, this
> very thraed -- we're having to argue to convince people that defining
> error handling is even a valuable thing to do.

Wait - who's making that argument? References, please.

> I have no interest in
> attempting to get anything done in an environment where that's the level
> at which the conversation starts.


--
Mark Nottingham   http://www.mnot.net/




Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
In reply to this post by Tim Bray-3
On Tue, 23 Oct 2012, Mark Nottingham wrote:

> On 23/10/2012, at 9:35 AM, Ian Hickson <[hidden email]> wrote:
> >
> > Consensus isn't a value I hold highly, but review of Anne's work is
> > welcome.
> >
> > If the IETF community didn't want Anne to do this work, then the IETF
> > community should have done it. Having not done it, having not even
> > understood that the problem exists, means the IETF has lost the
> > credibility it needs to claim that this is in the IETF's domain.
> >
> > You don't get to claim authority over an area while at the same time
> > telling someone else "please fix that" for the hard work that comes
> > with that area. The reality is, he who does the hard work, gets the
> > authority.
>
> All very interesting, but please address the point that's now been made
> repeatedly -- why is it necessary for you to redefine URIs, rather than
> doing as we suggest?
What exactly do you suggest?

Doing the work but at the IETF? See my reply to James.

Waiting for the IETF to do the work? We did, and timed out.

Not doing the work? That doesn't lead to interop.

Doing the work as a diff spec? That's what we did for a while, but it
doesn't work. Having to reference three specs (pre-parse, IRI, URI) just
to parse and resolve a URL is not what leads to implementors having a good
time and thus not what leads to interop.

What else do you suggest?


On Mon, 22 Oct 2012, Tim Bray wrote:

> >
> >    $ wget 'http://example.com/a b'
> >    --2012-10-23 00:27:43--  http://example.com/a%20b
> >
> >    # test.cgi returns a 301 with "Location: a b"
> >    $ curl -L http://damowmow.com/playground/demos/url/in-http-headers/test.cgi
> >    This file is: http://damowmow.com/playground/demos/url/in-http-headers/a%20b
>
> Hmm.  I went to tbray.org and made a file at '$ROOT_DIR/tmp/a b' - note
> the space.
>
> Then I did
>
> curl -I 'http://www.tbray.org/tmp/a%20b'
> curl -I 'http://www.tbray.org/tmp/a b'
>
> Curl, quite properly, doesn't fuck with what I ask it
Instead it makes an invalid HTTP request. Your offensive language
notwithstanding, that means wget and curl don't interoperate. This is bad.
This is what we want to fix.


> and revealed a very interesting fact: That my Apache httpd returns 200
> for both of these, but, with, uh, interesting variations, amounting to
> what I think is quite possibly a bug.

How could it be a bug, since there's no spec that says how to handle a URL
with spaces in it?


> I also pasted the version with the space into the nearest Web browser,
> and it quite properly auto-corrected to a%20b.

Quite properly according to whom? There's no spec that defines this.


> I think it�s a bug that curl is claiming the 301 pointed at "a%20b" not
> "a b".

You're wrong, but only because the de facto standard of "most software
does it that way" says so. No IETF spec does. That's the problem.


> Because suppose it had pointed at "a%20b" - I don�t want middleware
> lying to me.

What you want isn't really the issue. Compatibility with deployed code is
the issue.


> It seems like a good idea to document the steps by which "a b" pasted in
> becomes "a%20b" in the address bar. But I don�t see the relevance
> outside human-authored strings.

All the strings in question are human-authored.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
In reply to this post by Mark Nottingham-2
On Tue, 23 Oct 2012, Mark Nottingham wrote:

> On 23/10/2012, at 10:16 AM, Ian Hickson <[hidden email]> wrote:
> >
> > I can't speak for Anne, but having experienced the IETF via the hybi
> > work, my own opinion is that the main reason I wouldn't work with the
> > IETF is that the community these days values consensus over technical
> > value and running code, and the culture in the IETF doesn't value the
> > kind of specification style that IMHO leads to better interop. For
> > example, this very thraed -- we're having to argue to convince people
> > that defining error handling is even a valuable thing to do.
>
> Wait - who's making that argument?

Me.


> References, please.

This very thread is evidence enough, but see also the complete disinterest
in fixing the URL specs, or the reaction abarth got from MIME sniffing, or
the disaster that was hybi, or this complete disinterest in fixing the
problem with encodings:

   http://mail.apps.ietf.org/ietf/charsets/threads.html#01830
   http://mail.apps.ietf.org/ietf/charsets/threads.html#02027
   http://mail.apps.ietf.org/ietf/charsets/threads.html#02034

...or the way IANA registrations for MIME types get handled, or HTTP bis'
reaction to browser feedback, or the way process is put ahead of progress
(there's no way to fix an RFC once it's published, even errata are often
rejected), or the lack of any testing culture...

I understand that you disagree that most of those were a problem. But the
original question was "why don't you work at IETF", and that's the answer.
It may be that you conclude that it's a good thing, therefore, that I and
others don't work at the IETF, but in that case you shouldn't complain
when we go and do stuff outside the IETF.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Mark Nottingham-2
In reply to this post by Ian Hickson

On 23/10/2012, at 10:25 AM, Ian Hickson <[hidden email]> wrote:
>
> What exactly do you suggest?
>
> Doing the work but at the IETF? See my reply to James.

Don't much care about the venue, as long as there's *some* coordination / communication.

> Waiting for the IETF to do the work? We did, and timed out.

Understood, and unfortunate. Arguably, you waited longer than the timeout.

> Not doing the work? That doesn't lead to interop.

Absolutely - again, I don't see anyone suggesting that. Do I smell straw?

> Doing the work as a diff spec? That's what we did for a while, but it
> doesn't work. Having to reference three specs (pre-parse, IRI, URI) just
> to parse and resolve a URL is not what leads to implementors having a good
> time and thus not what leads to interop.

Really? You're comfortable with the current weight and depth of the HTML5 spec, but balk at a pre-processing step for URIs? Seriously?

The underlying point that people seem to be making is that there's legitimate need for URIs to be a separate concept from "strings that will become URIs." By collapsing them into one thing, you're doing those folks a disservice. Browser implementers may not care, but it's pretty obvious that lots of other people do.

BTW, it doesn't have to be a separate spec, although it probably would benefit from being one. Browser implementers already have to reference TCP, IP, DNS, and likely tens to hundreds of other specs to get what they want done -- unless you have bigger plans?

Regards,


--
Mark Nottingham   http://www.mnot.net/




Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
On Tue, 23 Oct 2012, Mark Nottingham wrote:
>
> Don't much care about the venue, as long as there's *some* coordination
> / communication.

Everyone is welcome to participate in the WHATWG list.


> > Doing the work as a diff spec? That's what we did for a while, but it
> > doesn't work. Having to reference three specs (pre-parse, IRI, URI)
> > just to parse and resolve a URL is not what leads to implementors
> > having a good time and thus not what leads to interop.
>
> Really? You're comfortable with the current weight and depth of the
> HTML5 spec, but balk at a pre-processing step for URIs? Seriously?

Good lord, no. Who's comfortable with the HTML spec's size?

Unfortunately, the size of the HTML spec is dictated by the complexity of
the platform it is describing.

There's no reason to have three specs when one suffices.


> The underlying point that people seem to be making is that there's
> legitimate need for URIs to be a separate concept from "strings that
> will become URIs."

Anne's spec will define "valid URL", which addressed that need.


> By collapsing them into one thing, you're doing those folks a
> disservice.

They are not collapsed into one thing.


> Browser implementers may not care, but it's pretty obvious that lots of
> other people do.

Browser implementors aren't particularly special here.


> BTW, it doesn't have to be a separate spec, although it probably would
> benefit from being one. Browser implementers already have to reference
> TCP, IP, DNS, and likely tens to hundreds of other specs to get what
> they want done -- unless you have bigger plans?

The difference is that the DNS implementor doesn't need to implement TCP,
he uses TCP (and UDP) and builds on it. And so on. Whereas here we're
talking about one thing, URLs, being specified in one place vs three.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Mark Nottingham-2
In reply to this post by Ian Hickson

On 23/10/2012, at 10:31 AM, Ian Hickson <[hidden email]> wrote:

> On Tue, 23 Oct 2012, Mark Nottingham wrote:
>> On 23/10/2012, at 10:16 AM, Ian Hickson <[hidden email]> wrote:
>>>
>>> I can't speak for Anne, but having experienced the IETF via the hybi
>>> work, my own opinion is that the main reason I wouldn't work with the
>>> IETF is that the community these days values consensus over technical
>>> value and running code, and the culture in the IETF doesn't value the
>>> kind of specification style that IMHO leads to better interop. For
>>> example, this very thraed -- we're having to argue to convince people
>>> that defining error handling is even a valuable thing to do.
>>
>> Wait - who's making that argument?
>
> Me.

So, you're saying that you can't work in this environment (*fans self*) because of the arguments you're making?

OK.


>> References, please.
>
> This very thread is evidence enough, but see also the complete disinterest
> in fixing the URL specs

Also? I thought that was what we were talking about...

> or the reaction abarth got from MIME sniffing

AIUI Adam walked away from it because two people expressed individual concerns about it. Had he stuck with it, I'm personally convinced it would have gotten through pretty easily.

> or
> the disaster that was hybi

I personally think websockets was a bad idea from the start, so I'll refrain from further comment.

> or this complete disinterest in fixing the
> problem with encodings:
>
>   http://mail.apps.ietf.org/ietf/charsets/threads.html#01830
>   http://mail.apps.ietf.org/ietf/charsets/threads.html#02027
>   http://mail.apps.ietf.org/ietf/charsets/threads.html#02034

No comment, would have to look into it.

> ...or the way IANA registrations for MIME types get handled

... the process for which was recently revised, based partially on those experiences.

> or HTTP bis' reaction to browser feedback

As far as I know, we addressed all of that feedback to the satisfaction of those who brought it. If you believe otherwise, we're currently in WGLC.

> or the way process is put ahead of progress
> (there's no way to fix an RFC once it's published, even errata are often
> rejected), or the lack of any testing culture...
>
> I understand that you disagree that most of those were a problem.

Oh, no, I agree that there are some pretty serious problems here.

> But the
> original question was "why don't you work at IETF", and that's the answer.
> It may be that you conclude that it's a good thing, therefore, that I and
> others don't work at the IETF, but in that case you shouldn't complain
> when we go and do stuff outside the IETF.


Again, I'm not stuffed about the venue, and you can do what you like. However, when the *W3C* does things that interact with IETF technologies, we coordinate to make sure that there aren't overlaps, conflicts, etc.

Regards,


--
Mark Nottingham   http://www.mnot.net/




Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Mark Nottingham-2
In reply to this post by Ian Hickson

On 23/10/2012, at 10:40 AM, Ian Hickson <[hidden email]> wrote:

> On Tue, 23 Oct 2012, Mark Nottingham wrote:
>>
>> Don't much care about the venue, as long as there's *some* coordination
>> / communication.
>
> Everyone is welcome to participate in the WHATWG list.

As they are on the IETF list. The difference is that the WHATWG is run by an unelected board of "members" - <http://www.whatwg.org/charter>.


> There's no reason to have three specs when one suffices.

OK.


> The underlying point that people seem to be making is that there's
>> legitimate need for URIs to be a separate concept from "strings that
>> will become URIs."
>
> Anne's spec will define "valid URL", which addressed that need.

Why not define (or reuse) a separate term for the input stream, and leave "URL" alone?


> By collapsing them into one thing, you're doing those folks a
>> disservice.
>
> They are not collapsed into one thing.

OK, good.


>> Browser implementers may not care, but it's pretty obvious that lots of
>> other people do.
>
> Browser implementors aren't particularly special here.

No, but your arguments are often coloured by your perspective -- just as everyone else's are.


> BTW, it doesn't have to be a separate spec, although it probably would
>> benefit from being one. Browser implementers already have to reference
>> TCP, IP, DNS, and likely tens to hundreds of other specs to get what
>> they want done -- unless you have bigger plans?
>
> The difference is that the DNS implementor doesn't need to implement TCP,
> he uses TCP (and UDP) and builds on it. And so on. Whereas here we're
> talking about one thing, URLs, being specified in one place vs three.


OK.

If I believed that Anne was willing to and capable of re-specifying RFC3986 in such a way that the definition, syntax and semantics of URLs (or whatever they ends up being called) doesn't change at all, I'd be less concerned.

However, that doesn't seem very likely, especially when he isn't engaging with the folks that wrote that spec (especially, Roy).

RFC3986 is referenced by a LOT of technologies, not just Web browsers, not just HTML. Replacing it unilaterally with input from the browser / HTML community from an implementer perspective is very likely to break most of them.

As such, they won't use your new spec, and we'll be living in a world where there will be two definitions of "URL" -- the IETF one and the WHATWG one (or perhaps the W3C one, I can never remember the relationship there).

That seems a pretty bad tradeoff for the benefits you're getting -- a slightly easier-to-read spec for browser implementers (a relatively tiny audience).

Regards,

--
Mark Nottingham   http://www.mnot.net/




Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
In reply to this post by Mark Nottingham-2
On Tue, 23 Oct 2012, Mark Nottingham wrote:
>
> So, you're saying that you can't work in this environment (*fans self*)
> because of the arguments you're making?

I'm saying this is why I don't want to work here, yes.


> >> References, please.
> >
> > This very thread is evidence enough, but see also the complete
> > disinterest in fixing the URL specs
>
> Also? I thought that was what we were talking about...

The disinterest spans many years. This thread demonstrates not so much the
disinterest so much as the lack of understanding that there's a problem to
fix, which is a bit of a different issue, though equally frustrating and
equally discouraging.


> > or the reaction abarth got from MIME sniffing
>
> AIUI Adam walked away from it because two people expressed individual
> concerns about it. Had he stuck with it, I'm personally convinced it
> would have gotten through pretty easily.

The question here is "why do people walk away from the IETF or not
participate in the IETF". Adam walked away. It doesn't matter whether he
would have been able to get his stuff done if he'd stayed -- the fact is
he walked away, so if you're looking for reasons why people walked away,
it's a relevant data point.

Imagine you're trying to sell me a car, and I think the car is ugly, so I
don't buy it. Then you're trying to figure out why people like me don't
buy the car, and you ask me, and I say it's ugly. It doesn't matter if you
think it's the prettiest thing in the world; if your goal is to sell me
the car, then I can't think it's ugly.

Now if your goal _isn't_ to sell me that car, then that's fine.


> > or the disaster that was hybi
>
> I personally think websockets was a bad idea from the start, so I'll
> refrain from further comment.

Whether WebSockets is a good idea or not is besides the point. The point
is that the hybi group was not a pleasant experience for me. If I were to
be in a position to do Web Sockets again, I would decline the opportunity
to do it through the IETF. Doing it through the IETF made the work take a
year longer than it would have, made the protocol less secure (the WG
removed a number of defense-in-depth features), and made the spec a mess
(it's a mishmash of different editing styles). Plus, the group _still_
hasn't done multiplexing, which some of the vendors said was a prereq to
implementation, something which, prior to the IETF getting involved, was
only 3 to 6 months out on the roadmap.


> > ...or the way IANA registrations for MIME types get handled
>
> ... the process for which was recently revised, based partially on those
> experiences.

If it was revised more than about 2 weeks ago, the problems aren't solved,
based on what I've seen in the past 2 weeks.


> > or HTTP bis' reaction to browser feedback
>
> As far as I know, we addressed all of that feedback to the satisfaction
> of those who brought it. If you believe otherwise, we're currently in
> WGLC.

There's a number of people who raised feedback who gave up trying to get
it addressed, but I haven't been following this closely enough to tell you
what those are, because I gave up too.


> > But the original question was "why don't you work at IETF", and that's
> > the answer. It may be that you conclude that it's a good thing,
> > therefore, that I and others don't work at the IETF, but in that case
> > you shouldn't complain when we go and do stuff outside the IETF.
>
> Again, I'm not stuffed about the venue, and you can do what you like.
> However, when the *W3C* does things that interact with IETF
> technologies, we coordinate to make sure that there aren't overlaps,
> conflicts, etc.

Yeah, well, I've mostly given up on the W3C, too. :-)

Anyway, this URL work isn't happening at the W3C.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
In reply to this post by Mark Nottingham-2
On Tue, 23 Oct 2012, Mark Nottingham wrote:

> On 23/10/2012, at 10:40 AM, Ian Hickson <[hidden email]> wrote:
> > On Tue, 23 Oct 2012, Mark Nottingham wrote:
> >>
> >> Don't much care about the venue, as long as there's *some*
> >> coordination / communication.
> >
> > Everyone is welcome to participate in the WHATWG list.
>
> As they are on the IETF list. The difference is that the WHATWG is run
> by an unelected board of "members" - <http://www.whatwg.org/charter>.

"Run" is a bit of a strong word. There's basically no non-public activity
from the charter members.


> > Anne's spec will define "valid URL", which addressed that need.
>
> Why not define (or reuse) a separate term for the input stream, and
> leave "URL" alone?

Because everyone calls these things URLs (except STD 66).


> >> Browser implementers may not care, but it's pretty obvious that lots
> >> of other people do.
> >
> > Browser implementors aren't particularly special here.
>
> No, but your arguments are often coloured by your perspective -- just as
> everyone else's are.

Which arguments in particular are we talking about here? I've mostly been
talking about curl, wget, GoogleBot, Perl libraries, etc.


> If I believed that Anne was willing to and capable of re-specifying
> RFC3986 in such a way that the definition, syntax and semantics of URLs
> (or whatever they ends up being called) doesn't change at all, I'd be
> less concerned.
>
> However, that doesn't seem very likely, especially when he isn't
> engaging with the folks that wrote that spec (especially, Roy).
>
> RFC3986 is referenced by a LOT of technologies, not just Web browsers,
> not just HTML. Replacing it unilaterally with input from the browser /
> HTML community from an implementer perspective is very likely to break
> most of them.

I suspect it will break nothing, but I guess we'll find out.

I don't really understand how it _could_ break anything, so long as the
processing of IRI and URIs as defined by IETF is the same in the WHATWG
spec, except where software already differs with the IETF specs.

Do you have a concrete example I could study?


> As such, they won't use your new spec, and we'll be living in a world
> where there will be two definitions of "URL" -- the IETF one and the
> WHATWG one [...].
>
> That seems a pretty bad tradeoff for the benefits you're getting -- a
> slightly easier-to-read spec for browser implementers (a relatively tiny
> audience).

If you have any concrete concerns, please don't hesitate to e-mail the
WHATWG list, showing the specific examples you're worried about. Browsers
are but one of many implementation classes that are relevant.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Tim Bray-3
One more data point... I work on Web software all the time and have for many years; in recent years mostly at the REST (app-to-app HTTP conversations) rather than browser-wrangling level.  I’d have to say that URI interoperability problems haven’t come near getting into the list of top-20 pain points.   So either my experience is wildly untypical, or maybe it’s a combination of being a little bit lucky, and that the pain which exists is highly concentrated in the browser space.  -T

On Mon, Oct 22, 2012 at 5:05 PM, Ian Hickson <[hidden email]> wrote:
On Tue, 23 Oct 2012, Mark Nottingham wrote:
> On 23/10/2012, at 10:40 AM, Ian Hickson <[hidden email]> wrote:
> > On Tue, 23 Oct 2012, Mark Nottingham wrote:
> >>
> >> Don't much care about the venue, as long as there's *some*
> >> coordination / communication.
> >
> > Everyone is welcome to participate in the WHATWG list.
>
> As they are on the IETF list. The difference is that the WHATWG is run
> by an unelected board of "members" - <http://www.whatwg.org/charter>.

"Run" is a bit of a strong word. There's basically no non-public activity
from the charter members.


> > Anne's spec will define "valid URL", which addressed that need.
>
> Why not define (or reuse) a separate term for the input stream, and
> leave "URL" alone?

Because everyone calls these things URLs (except STD 66).


> >> Browser implementers may not care, but it's pretty obvious that lots
> >> of other people do.
> >
> > Browser implementors aren't particularly special here.
>
> No, but your arguments are often coloured by your perspective -- just as
> everyone else's are.

Which arguments in particular are we talking about here? I've mostly been
talking about curl, wget, GoogleBot, Perl libraries, etc.


> If I believed that Anne was willing to and capable of re-specifying
> RFC3986 in such a way that the definition, syntax and semantics of URLs
> (or whatever they ends up being called) doesn't change at all, I'd be
> less concerned.
>
> However, that doesn't seem very likely, especially when he isn't
> engaging with the folks that wrote that spec (especially, Roy).
>
> RFC3986 is referenced by a LOT of technologies, not just Web browsers,
> not just HTML. Replacing it unilaterally with input from the browser /
> HTML community from an implementer perspective is very likely to break
> most of them.

I suspect it will break nothing, but I guess we'll find out.

I don't really understand how it _could_ break anything, so long as the
processing of IRI and URIs as defined by IETF is the same in the WHATWG
spec, except where software already differs with the IETF specs.

Do you have a concrete example I could study?


> As such, they won't use your new spec, and we'll be living in a world
> where there will be two definitions of "URL" -- the IETF one and the
> WHATWG one [...].
>
> That seems a pretty bad tradeoff for the benefits you're getting -- a
> slightly easier-to-read spec for browser implementers (a relatively tiny
> audience).

If you have any concrete concerns, please don't hesitate to e-mail the
WHATWG list, showing the specific examples you're worried about. Browsers
are but one of many implementation classes that are relevant.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

mamund
<snip>
I’d have to say that URI interoperability problems haven’t come near getting into the list of top-20 pain points. 
</snip>
I can't recall the last time i experienced "URI interoperability problems" across various user agents/implementations on the public Internet. My problems w/ browser implementations is another thing.

In this particular case (URIs), I applaud Anne's attempt to fix the broken way browsers handle these strings within their own executable code (i.e "browsers (apart from Chrome) do not unescape URL escapes."[1]).

However, I find the announcement that he plans to change the names of things along the way (i.e. "And yes, the plan is to do away with IRI/URI and just call them all URLs"[1]) a waste of time.

Fixing the code can happen regardless of naming. Do that and do it now.

Thanks.



mca+1.859.757.1449
skype: mca.amundsen
http://amundsen.com/blog/
http://twitter.com/mamund
https://github.com/mamund
http://www.linkedin.com/in/mikeamundsen


On Mon, Oct 22, 2012 at 8:15 PM, Tim Bray <[hidden email]> wrote:
One more data point... I work on Web software all the time and have for many years; in recent years mostly at the REST (app-to-app HTTP conversations) rather than browser-wrangling level.  I’d have to say that URI interoperability problems haven’t come near getting into the list of top-20 pain points.   So either my experience is wildly untypical, or maybe it’s a combination of being a little bit lucky, and that the pain which exists is highly concentrated in the browser space.  -T

On Mon, Oct 22, 2012 at 5:05 PM, Ian Hickson <[hidden email]> wrote:
On Tue, 23 Oct 2012, Mark Nottingham wrote:
> On 23/10/2012, at 10:40 AM, Ian Hickson <[hidden email]> wrote:
> > On Tue, 23 Oct 2012, Mark Nottingham wrote:
> >>
> >> Don't much care about the venue, as long as there's *some*
> >> coordination / communication.
> >
> > Everyone is welcome to participate in the WHATWG list.
>
> As they are on the IETF list. The difference is that the WHATWG is run
> by an unelected board of "members" - <http://www.whatwg.org/charter>.

"Run" is a bit of a strong word. There's basically no non-public activity
from the charter members.


> > Anne's spec will define "valid URL", which addressed that need.
>
> Why not define (or reuse) a separate term for the input stream, and
> leave "URL" alone?

Because everyone calls these things URLs (except STD 66).


> >> Browser implementers may not care, but it's pretty obvious that lots
> >> of other people do.
> >
> > Browser implementors aren't particularly special here.
>
> No, but your arguments are often coloured by your perspective -- just as
> everyone else's are.

Which arguments in particular are we talking about here? I've mostly been
talking about curl, wget, GoogleBot, Perl libraries, etc.


> If I believed that Anne was willing to and capable of re-specifying
> RFC3986 in such a way that the definition, syntax and semantics of URLs
> (or whatever they ends up being called) doesn't change at all, I'd be
> less concerned.
>
> However, that doesn't seem very likely, especially when he isn't
> engaging with the folks that wrote that spec (especially, Roy).
>
> RFC3986 is referenced by a LOT of technologies, not just Web browsers,
> not just HTML. Replacing it unilaterally with input from the browser /
> HTML community from an implementer perspective is very likely to break
> most of them.

I suspect it will break nothing, but I guess we'll find out.

I don't really understand how it _could_ break anything, so long as the
processing of IRI and URIs as defined by IETF is the same in the WHATWG
spec, except where software already differs with the IETF specs.

Do you have a concrete example I could study?


> As such, they won't use your new spec, and we'll be living in a world
> where there will be two definitions of "URL" -- the IETF one and the
> WHATWG one [...].
>
> That seems a pretty bad tradeoff for the benefits you're getting -- a
> slightly easier-to-read spec for browser implementers (a relatively tiny
> audience).

If you have any concrete concerns, please don't hesitate to e-mail the
WHATWG list, showing the specific examples you're worried about. Browsers
are but one of many implementation classes that are relevant.


--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Ian Hickson
In reply to this post by Tim Bray-3
On Mon, 22 Oct 2012, Tim Bray wrote:
>
> One more data point... I work on Web software all the time and have for
> many years; in recent years mostly at the REST (app-to-app HTTP
> conversations) rather than browser-wrangling level.  I�d have to say
> that URI interoperability problems haven�t come near getting into the
> list of top-20 pain points.  So either my experience is wildly
> untypical, or maybe it�s a combination of being a little bit lucky, and
> that the pain which exists is highly concentrated in the browser space.  

The importance of error handling goes up dramatically as the number of
participants in a space increases. When one is writing private software
where you write the server and the client, one is unlikely to run into any
problems that one would attribute to the specs. When one is writing
software with a few hundred participants, these kinds of errors occur, but
it's trivial to deal with them by telling the offending people to follow
the spec. When one is dealing with trillions of items, it becomes
impossible to fix the problems, and error-handling becomes necessary.

The Web is a classic example of the latter, so browser vendors and authors
of software that interacts with the Web, e.g. Web search engine software
(GoogleBot), Web mirroring software (wget), etc, often run into it.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Julian Reschke
In reply to this post by Ian Hickson
On 2012-10-23 02:05, Ian Hickson wrote:
> ...
> I suspect it will break nothing, but I guess we'll find out.
>
> I don't really understand how it _could_ break anything, so long as the
> processing of IRI and URIs as defined by IETF is the same in the WHATWG
> spec, except where software already differs with the IETF specs.

Define "software". *All* software? How do you test that?

> Do you have a concrete example I could study?

Do you?

This brings me back to something I've been asking for many times: a
*concrete* list of things that are "broken" in RFC 3986 (as opposed to
be "undefined").

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Graham Klyne-4
In reply to this post by Ian Hickson
On 22/10/2012 23:35, Ian Hickson wrote:
> Consensus isn't a value I hold highly,

!

#g
--


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Anne van Kesteren-4
In reply to this post by Julian Reschke
I'm not subscribed to this list, but I guess there's a few points I can make.

* This is not about the address bar. The address bar is UI.
Standardizing UI does not pass the test of time.

* If you think the URL Standard fragments, you have cause and effect reversed.

* Building on top of STD 66 is not practical. You want a single
algorithm that deals with parsing, resolving, and canonicalizing.

* That the URL standard will call the input a URL matches common
usage. Given that a relative URL can be empty string, a URL can indeed
be the empty string. Roy thinks this is absurd, I think it's quite
logical. The object model I will probably call "parsed URL", but I'm
open to suggestions.

* For Julian, an example of a URL that would be invalid per STD 66 yet
is transmitted over the wire just fine: <a href="http://www.w3.org/%">http://www.w3.org/% or
<a href="http://www.w3.org/?%">http://www.w3.org/?% Also fragments such as #™ do not undergo any
transformation. Fragments are pretty much parsed as literals except
for thirty or so code points.

* And that complex technology creates large standards, I'm not even
sure what to say to that Noah. Yes, the world is uglier than we
thought, but at least now it's written down and should hopefully serve
as a lesson for new things we invent. If there is one thing the WHATWG
has been doing in this space is to reduce the actual complexity by
actively converging implementations. (Actual complexity is increased
by poor standards that not even come close to defining the technology
they are about, such as HTML4.)

* As for contributing to the IETF. I have tried many times (Atom,
HTTP, ietf-charsets, httpstate, IRI, websec, HyBi). There's a lot of
Stop Energy so rather than solving problems I end up explaining the
problem many times over. There's not much interest in doing the hard
work such as writing tests, figuring out how implementations have
actually implemented the standard. Whenever I bring up a couple of
issues in one specification or another I have to be exhaustive rather
than someone taking it as a sign of a larger problem with their body
of work. And if I manage to get the standard changed it's almost often
some watered down wording that does not guide implementors in any way
(e.g. redirects, Content-Location, invalid relative URLs in HTTP).
You'd have to write another standard that actually takes a stance on
the issue. This is why I now rather focus on just doing the work. I
can talk and email all I want, but attempting to prove I'm right by
example seems a more worthwhile use of my time.


--
http://annevankesteren.nl/

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Julian Reschke
On 2012-10-23 10:36, Anne van Kesteren wrote:
> I'm not subscribed to this list, but I guess there's a few points I can make.
>
> * This is not about the address bar. The address bar is UI.
> Standardizing UI does not pass the test of time.
>
> * If you think the URL Standard fragments, you have cause and effect reversed.
>
> * Building on top of STD 66 is not practical. You want a single
> algorithm that deals with parsing, resolving, and canonicalizing.

Sounds like three algorithms with well-defined interfaces to me.

> * That the URL standard will call the input a URL matches common
> usage. Given that a relative URL can be empty string, a URL can indeed
> be the empty string. Roy thinks this is absurd, I think it's quite
> logical. The object model I will probably call "parsed URL", but I'm
> open to suggestions.

What's "common usage" may depend on context. It may be true for the
browser world.

> * For Julian, an example of a URL that would be invalid per STD 66 yet
> is transmitted over the wire just fine: <a href="http://www.w3.org/%">http://www.w3.org/% or
> <a href="http://www.w3.org/?%">http://www.w3.org/?% Also fragments such as #™ do not undergo any
> transformation. Fragments are pretty much parsed as literals except
> for thirty or so code points.

Again, I'm mainly interested in *valid* URIs where you think RFC 3986
needs fixing.

> ...

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Anne van Kesteren-4
On Tue, Oct 23, 2012 at 11:45 AM, Julian Reschke <[hidden email]> wrote:
> On 2012-10-23 10:36, Anne van Kesteren wrote:
>> * Building on top of STD 66 is not practical. You want a single
>> algorithm that deals with parsing, resolving, and canonicalizing.
>
> Sounds like three algorithms with well-defined interfaces to me.

Feel free to take mine, do that, and convince people it's better. It's
in the public domain. I have not done it as it just results in
overhead and no benefit.


> What's "common usage" may depend on context. It may be true for the browser
> world.

http://www.googlefight.com/index.php?word1=url&word2=uri


>> * For Julian, an example of a URL that would be invalid per STD 66 yet
>> is transmitted over the wire just fine: <a href="http://www.w3.org/%">http://www.w3.org/% or
>> <a href="http://www.w3.org/?%">http://www.w3.org/?% Also fragments such as #™ do not undergo any
>> transformation. Fragments are pretty much parsed as literals except
>> for thirty or so code points.
>
> Again, I'm mainly interested in *valid* URIs where you think RFC 3986 needs
> fixing.

This was about demonstrating that STD 66 is not a suitable interface.
(I thought you suggested that. If not, sorry, hopefully it helps
someone else.)


--
http://annevankesteren.nl/

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Julian Reschke
On 2012-10-23 11:09, Anne van Kesteren wrote:

> On Tue, Oct 23, 2012 at 11:45 AM, Julian Reschke <[hidden email]> wrote:
>> On 2012-10-23 10:36, Anne van Kesteren wrote:
>>> * Building on top of STD 66 is not practical. You want a single
>>> algorithm that deals with parsing, resolving, and canonicalizing.
>>
>> Sounds like three algorithms with well-defined interfaces to me.
>
> Feel free to take mine, do that, and convince people it's better. It's
> in the public domain. I have not done it as it just results in
> overhead and no benefit.
>
>
>> What's "common usage" may depend on context. It may be true for the browser
>> world.
>
> http://www.googlefight.com/index.php?word1=url&word2=uri

I was referring to "whatever you find in @href" as opposed to "what RFC
3986 says it is".

>>> * For Julian, an example of a URL that would be invalid per STD 66 yet
>>> is transmitted over the wire just fine: <a href="http://www.w3.org/%">http://www.w3.org/% or
>>> <a href="http://www.w3.org/?%">http://www.w3.org/?% Also fragments such as #™ do not undergo any
>>> transformation. Fragments are pretty much parsed as literals except
>>> for thirty or so code points.
>>
>> Again, I'm mainly interested in *valid* URIs where you think RFC 3986 needs
>> fixing.
>
> This was about demonstrating that STD 66 is not a suitable interface.
> (I thought you suggested that. If not, sorry, hopefully it helps
> someone else.)

OK, so if browsers put /% on the wire *and* servers rely on that, that
would be an issue. However, I'm not convinced the latter is the case.

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

Anne van Kesteren-4
On Tue, Oct 23, 2012 at 11:19 AM, Julian Reschke <[hidden email]> wrote:
> On 2012-10-23 11:09, Anne van Kesteren wrote:
>> http://www.googlefight.com/index.php?word1=url&word2=uri
>
> I was referring to "whatever you find in @href" as opposed to "what RFC 3986
> says it is".

Ah okay, well if you have relative URLs and absolute URLs, it makes
sense to call them URLs together.


>> This was about demonstrating that STD 66 is not a suitable interface.
>> (I thought you suggested that. If not, sorry, hopefully it helps
>> someone else.)
>
> OK, so if browsers put /% on the wire *and* servers rely on that, that would
> be an issue. However, I'm not convinced the latter is the case.

I had not really expected otherwise.


--
http://annevankesteren.nl/

12345