Re: data URIs - filename and content-disposition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Alessandro Angeli
I will revive this old thread with a proposal, in case this is ever
going to be implemented.

The original proposal is to add support for the FILENAME and
CONTENT-DISPOTION params in the MEDIATYPE part of a "data:" URI.

It evolved into a more generic support for a HEADERS param.

The former has the benefit of simplicity, the latter of flexibility.
But, at least judging from the discussion of the related proposal in the
Firefox bug-tracking system
(https://bugzilla.mozilla.org/show_bug.cgi?id=532230), neither is easily
implemented because of both parsing and handling limitations.

Moreover, both proposals require the definition of a new param (either
CONTENT-DISPOTION or HEADERS) that can be applied to all MEDIATYPEs and
the repurposing of the FILENAME param, which originally only applies to
the Content-Disposition header field and not the MEDIATYPE.

However, as far as I can tell, it is possible to achieve an even more
generic and flexible result than what would be accomplished by the
HEADERS param in a completely standard-compliant way by using the
message/* MEDIATYPE, so that the payload (DATA part) of the "data:" URI
would be a complete message/*, including its header fields.

For example, using message/http, one would have (all in one line):

{data:message/http,HTTP 200 OK|Content-Type:text/plain;charset=utf-8
|Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
WORLD}

I used {} to delimit the URI and I used spaces and | for readability,
but they are supposed to be escaped as %20 and %0D%0A (that is, I used |
to represent a new line). I also used unescaped reserved chars because
of the consideration at the end of this message.

Using message/rfc822:

{data:message/rfc822,Content-Type:text/plain;charset=utf-8
|Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
WORLD}

The benefits over the HEADERS param would be:

1) no need to define a new param

2) more flexible (you can even specify the HTTP response line)

3) to implement it, I believe it should be possible to simply unescape
the whole payload and pipe the result as an octet-stream into the
browser's HTTP response handler (if using message/http; if using
message/rfc822, a fake response line could be prefixed to the payload to
turn it into a message/http)

4) base64 encoding can be specified for the whole payload or only for
the message/* body, using the usual Content-Transfer-Encoding header
field

5) it is possible to use quoted-printable, which may be more compact
(after all, "=" does not need to be URI-escaped)

6) it is even possible to use gzip compression, which may mitigate the
bloating caused by base64

The implementation suggested in 3) would be the full embodiment of the
stated purpose of the "data:" URI, which is an inline representation of
an external resource: the header metadata of an HTTP resource is part of
the resource, but the current widespread usage of the "data:" URL can
only represent a subset of the Content-Type header field.

It should also have a performance not worse than fetching the resource
externally (assuming that unescaping the payload is not slower than
transferring it over a network).

About the unescaped chars, RFC2397:3 claims that URLCHAR is imported
from RFC2396. However, RFC2396 does not have a definition for URLCHAR.
Instead, it defines the following 3 char classes (the definitions are
equivalent to the ones in RFC2396:A, but rewrote in a more
human-understandable way):

pchar         = escaped | alphanum | mark
              | ":" | "@" | "&" | "=" | "+" | "$" | ","
uric_no_slash = pchar | ";" | "?"
uric          = pchar | ";" | "?" | "/"

They are used in the following URI parts (again, partially rewrote and
keeping only the ABSOLUTEURI form of the URI-REFERENCE):

URI-reference = scheme ":" (opaque_part | hier_part) ["#" fragment]
opaque_part   = uric_no_slash *uric
hier_part     = ( ["//" authority] [abs_path] ) ["?" query] )
abs_path      = "/"  segment *( "/" segment )
segment       = *pchar *( ";" *pchar )
query         = *uric
fragment      = *uric

I would think that a "data:" URI uses the OPAQUE_PART syntax, in which
case the unescaped chars are allowed. But they would also be allowed if
using the HIER_PART one (except maybe in some parts of the AUTHORITY,
which is not used in "data:" anyway).

--
Alessandro






Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
On 3/1/2012 12:48 PM, Alessandro Angeli wrote:

> However, as far as I can tell, it is possible to achieve an even more
> generic and flexible result than what would be accomplished by the
> HEADERS param in a completely standard-compliant way by using the
> message/* MEDIATYPE, so that the payload (DATA part) of the "data:" URI
> would be a complete message/*, including its header fields.
>
> For example, using message/http, one would have (all in one line):
>
> {data:message/http,HTTP 200 OK|Content-Type:text/plain;charset=utf-8
> |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
> WORLD}
>
> I used {} to delimit the URI and I used spaces and | for readability,
> but they are supposed to be escaped as %20 and %0D%0A (that is, I used |
> to represent a new line). I also used unescaped reserved chars because
> of the consideration at the end of this message.
>
> Using message/rfc822:
>
> {data:message/rfc822,Content-Type:text/plain;charset=utf-8
> |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
> WORLD}

Both of those sound cool. I like the latter better as there's no need to
specify http stuff. The message/rfc822 format is fine too.

"message/rfc822" for the mime type in the data URI though is not. I
might want to create a data URI for a real message/rfc822 file and I
wouldn't want that being interpreted as something that has an embedded
attachment that the browser needs to extract.

For example, Opera can already render the content of
<data:message/rfc822,Content-Type%3A%20text%2Fplain%3B%20charset%3D%22utf-8%22%3B%20name%3D%22test.txt%22%0D%0AContent-Disposition%3A%20attachment%3B%20filename%3D%22test.txt%22%0D%0AContent-Transfer-Encoding%3A%208bit%0D%0A%0D%0Atest%0D%0A>
as an email message in a browser tab.

So, the mime type needs to be something not currently used. message/http
or whatever.

> The benefits over the HEADERS param would be:
>
> 1) no need to define a new param
>
> 2) more flexible (you can even specify the HTTP response line)

I personally can't think of use for that or need that.

> 3) to implement it, I believe it should be possible to simply unescape
> the whole payload and pipe the result as an octet-stream into the
> browser's HTTP response handler (if using message/http; if using
> message/rfc822, a fake response line could be prefixed to the payload to
> turn it into a message/http)

Indeed. That sounds like that'd be the case.

> 4) base64 encoding can be specified for the whole payload or only for
> the message/* body, using the usual Content-Transfer-Encoding header
> field

Yeh, if the data URI content is base64-encoded in the URI, the
person/thing creating the attachment's content in the message/ format
might want to use a Content-Transfer-Encoding of 8bit (for example) to
save space instead of base64 so that the file data isn't base64-encoded
twice.

Might even be cool to support Content-Transfer-Encoding of "Binary" for
the format too for binary attachments. Opera can already handle that.
Load <http://shadow2531.com/opera/testcases/mht/000.mht> in Opera. See
the source of it (wget it for example) to see the binary png data for
the attachment (it breaks Opera's view source often).

> 5) it is possible to use quoted-printable, which may be more compact
> (after all, "=" does not need to be URI-escaped)

Indeed.

> 6) it is even possible to use gzip compression, which may mitigate the
> bloating caused by base64

Indeed. I think browser already supports all kinds of encodings.

One thing with the header format though. Although you could can specify
multiple attachments in the message/rfc822 (or eml-like) format (with
multipart/mixed and then each attachment section), the format for use
with data URIs should be limited to a single attachment.

--
Michael