Percent encoded dots in . and .. path elements

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Percent encoded dots in . and .. path elements

☻Mike Samuel
Apologies if this is not the right forum for RFC 3986 related questions.


Dot ('.') is in the unreserved set, and 3986 says

"""
    URIs that differ in the replacement of an unreserved character with
    its corresponding percent-encoded US-ASCII octet are equivalent: they
    identify the same resource.
"""

which leads me to believe that %2E which encodes dot should be
normalized before interpreting "." and ".." path elements when doing
path resolution.


If so, then resolving
  Base URI:  /x/y/z/
against
  Relative URI: .%2E
should yield
  /x/y/
and not
  /x/y/z/.%2E

The existing libraries that I tested (Java's java.net.URI, Python's
urlparse.urljoin) yield /x/y/z/.%2E and Java's normalize() method does
not recognize the last path element as special.


Browser's seem to differ.  Chrome and Safari seem to normalize ".%2E" early.
Firefox seems to be leaving it up to the protocol handler.
"https://www.google.com/webhp/.%2E" beGETs "www.google.com/."
"file:///Users/msamuel/work/.%2E" fetches the right resource but ".."
shows up as a path element in the URL bar.
"http://urlecho.appspot.com/echo/z/.%2E" beGETs "urlecho.appspot.com/echo/z/.."


Should resolution/normalization treat the path element ".%2E" as special?

cheers,
mike


Reply | Threaded
Open this post in threaded view
|

Re: Percent encoded dots in . and .. path elements

Roy T. Fielding
On Jun 27, 2014, at 8:51 AM, ☻Mike Samuel wrote:

> Apologies if this is not the right forum for RFC 3986 related questions.
>
>
> Dot ('.') is in the unreserved set, and 3986 says
>
> """
>    URIs that differ in the replacement of an unreserved character with
>    its corresponding percent-encoded US-ASCII octet are equivalent: they
>    identify the same resource.
> """
>
> which leads me to believe that %2E which encodes dot should be
> normalized before interpreting "." and ".." path elements when doing
> path resolution.

It can be normalized, yes.  It could also be rejected for security reasons,
or simply processed as is if the resource wants to do so.

> If so, then resolving
>  Base URI:  /x/y/z/
> against
>  Relative URI: .%2E
> should yield
>  /x/y/
> and not
>  /x/y/z/.%2E
>
> The existing libraries that I tested (Java's java.net.URI, Python's
> urlparse.urljoin) yield /x/y/z/.%2E and Java's normalize() method does
> not recognize the last path element as special.
>
>
> Browser's seem to differ.  Chrome and Safari seem to normalize ".%2E" early.
> Firefox seems to be leaving it up to the protocol handler.
> "https://www.google.com/webhp/.%2E" beGETs "www.google.com/."
> "file:///Users/msamuel/work/.%2E" fetches the right resource but ".."
> shows up as a path element in the URL bar.
> "http://urlecho.appspot.com/echo/z/.%2E" beGETs "urlecho.appspot.com/echo/z/.."
>
>
> Should resolution/normalization treat the path element ".%2E" as special?

This depends on when the %2E is processed.  Usually, references
are normalized after resolution to absolute form because the scheme
impacts normalization.  Since normalization is optional, various implementations
will differ regarding to when it is done (if at all).  Likewise, ".." is
only special during the relative->absolute conversion, so normalizing the
%2E after relative parsing is going to result in a ".." segment.

What the spec says is that ".%2E" and ".." are equivalent, meaning that
a server is likely to decode it to ".." and either reject the request for
security reasons or redirect it to a URI without the corresponding "/parent/..".
A redirect is necessary to avoid security bypass on the server path.

What the same thing means for locally processed file URIs is currently
not standardized due to lack of consensus among user agents, though
I would expect a browser to do the same processing as a server.

Note that you have to be careful in testing to see whether the browser
is normalizing the URI before the request or if it is being normalized
and redirected by the server after the (initial) request.

....Roy


Reply | Threaded
Open this post in threaded view
|

Re: Percent encoded dots in . and .. path elements

☻Mike Samuel
On Fri, Jun 27, 2014 at 12:55 PM, Roy T. Fielding <[hidden email]> wrote:
> will differ regarding to when it is done (if at all).  Likewise, ".." is
> only special during the relative->absolute conversion, so normalizing the
> %2E after relative parsing is going to result in a ".." segment.

I think this is the part I was missing: that resolution is a separate
operation from absolution (?).

In that case, among the libraries, only Java's normalize() is borken.
It advertises
"""
If a ".." segment is preceded by a non-".." segment then both of these
segments are removed. This step is repeated until it is no longer
applicable.
"""

Browsers sending non-absolute URIs for HTTP/HTTPS seems problematic.
RFC 7230 says:
"""
request-target = origin-form / absolute-form / authority-form /
    asterisk-form
"""
and those non-terminals are defined thus:
"""
absolute-form = absolute-URI
...
asterisk-form = "*"
...
authority-form = authority
...
origin-form = absolute-path [ "?" query ]
"""

Conflating absolute and non-absolute paths could have some security
consequences, since the semantics of cookies depend on them, but
cookie path restrictions are not widely used.  I don't recall whether
similarly named cookies on different paths mask one another, but fewer
cookies rarely mean greater privileges.