Re: data URIs - filename and content-disposition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Julian Reschke
On 25.02.2010 17:06, Michael A. Puls II wrote:

> ...
> What about this?
>
> Say you have this file:
>
> "with spaces.txt"
> ---------
> √
> ---------
>
> and want that as data URI that's treated as an attachment with a
> filename hint of "with spaces.txt".
>
> Well, you might want headers like this:
>
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: attachment; filename="with spaces.txt"
> Content-Language: en
>
> So, how bout doing it like the following?:
>
> data:text/plain;charset=utf-8;headers=Content-Disposition%3A%20attachment%3B%20filename%3D%22with%20spaces.txt%22%0D%0AContent-Language%3A%20en,%E2%88%9A
>
>
> That way, 'text/plain;charset=utf-8' would be the full Content-Type
> header and the rest of the headers can be specified as \r\n-separated
> lines like in HTTP. It's tagged with "headers=" so it can be found in
> the string easily, and the value is percent-encoded so it doesn't
> interfere with existing UA handling of data URIs.
>
> The restriction would be that the headers value can't contain a
> Content-Type header (since it's already implied. And, perhaps, it should
> be specified exactly what headers are allowed in the headers value.
>
> It even works (as in, doesn't cause a problem) with ;base64 at the end
> like this (in Opera at least):
> ...

The advantage of this is that it's flexible.

The disadvantage is that it may be too flexible :-). For instance, you'd
either need to restrict the micro syntax for embedded headers, or
recipients will need to run this through a full-blown HTTP header parser
(which may be hard to do).

So I have a slight preference to keep things simple, and to focus on the
specific use case.

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
On Fri, 26 Feb 2010 08:20:39 -0500, Julian Reschke <[hidden email]>  
wrote:

> On 25.02.2010 17:06, Michael A. Puls II wrote:
>> ...
>> What about this?
>>
>> Say you have this file:
>>
>> "with spaces.txt"
>> ---------
>> √
>> ---------
>>
>> and want that as data URI that's treated as an attachment with a
>> filename hint of "with spaces.txt".
>>
>> Well, you might want headers like this:
>>
>> Content-Type: text/plain; charset=utf-8
>> Content-Disposition: attachment; filename="with spaces.txt"
>> Content-Language: en
>>
>> So, how bout doing it like the following?:
>>
>> data:text/plain;charset=utf-8;headers=Content-Disposition%3A%20attachment%3B%20filename%3D%22with%20spaces.txt%22%0D%0AContent-Language%3A%20en,%E2%88%9A
>>
>>
>> That way, 'text/plain;charset=utf-8' would be the full Content-Type
>> header and the rest of the headers can be specified as \r\n-separated
>> lines like in HTTP. It's tagged with "headers=" so it can be found in
>> the string easily, and the value is percent-encoded so it doesn't
>> interfere with existing UA handling of data URIs.
>>
>> The restriction would be that the headers value can't contain a
>> Content-Type header (since it's already implied. And, perhaps, it should
>> be specified exactly what headers are allowed in the headers value.
>>
>> It even works (as in, doesn't cause a problem) with ;base64 at the end
>> like this (in Opera at least):
>> ...
>
> The advantage of this is that it's flexible.
>
> The disadvantage is that it may be too flexible :-). For instance, you'd  
> either need to restrict the micro syntax for embedded headers, or  
> recipients will need to run this through a full-blown HTTP header parser  
> (which may be hard to do).

Yeh, I was thinking of that too. For browsers, percent-decoding it and  
running it through a full-blown http header parser probably wouldn't be a  
problem. For others though, it might. And, with the way the filename can  
be encoded, even if you were to split up the headers and their different  
parts properly, you'd have to write a decoder for the filename.

I'm also not sure about the performance concerns with using a full-blown  
http header parser every time a data URI is invoked (which could be a lot  
if you load a page full of data URIs-based img elements. But, that'd be  
the author's fault for including the extra, non-needed headers in that  
case).

> So I have a slight preference to keep things simple, and to focus on the  
> specific use case.

Well, I'm personally happy with just:

data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,

(that could even be shortened to just disposition=attachment)

I just suggested the more flexible way as I figured that's what most  
people would want.

Now, if we do it just the simple way, how should the filename value be  
encoded? Just percent-encoded UTF-8? That'd be fine by me because I could  
just use encodeURIComponent() to produce the value.

For the disposition value, it'd be handled like this:

If the disposition value is present and is "inline" or "attachment", the  
browser tries to treat it as such. Else, do like browsers do now.

--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Julian Reschke
On 26.02.2010 14:57, Michael A. Puls II wrote:

> ...
>> The advantage of this is that it's flexible.
>>
>> The disadvantage is that it may be too flexible :-). For instance,
>> you'd either need to restrict the micro syntax for embedded headers,
>> or recipients will need to run this through a full-blown HTTP header
>> parser (which may be hard to do).
>
> Yeh, I was thinking of that too. For browsers, percent-decoding it and
> running it through a full-blown http header parser probably wouldn't be
> a problem. For others though, it might. And, with the way the filename
> can be encoded, even if you were to split up the headers and their
> different parts properly, you'd have to write a decoder for the filename.
>
> I'm also not sure about the performance concerns with using a full-blown
> http header parser every time a data URI is invoked (which could be a
> lot if you load a page full of data URIs-based img elements. But, that'd
> be the author's fault for including the extra, non-needed headers in
> that case).

+1

>> So I have a slight preference to keep things simple, and to focus on
>> the specific use case.
>
> Well, I'm personally happy with just:
>
> data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,
>
> (that could even be shortened to just disposition=attachment)
>
> I just suggested the more flexible way as I figured that's what most
> people would want.
>
> Now, if we do it just the simple way, how should the filename value be
> encoded? Just percent-encoded UTF-8? That'd be fine by me because I
> could just use encodeURIComponent() to produce the value.

We'll need to define which characters need to be percent-escaped,
though. Obviously all non-URI characters, but also those needed to parse
the parameters, so minimally ";".

> For the disposition value, it'd be handled like this:
>
> If the disposition value is present and is "inline" or "attachment", the
> browser tries to treat it as such. Else, do like browsers do now.

Maybe just refer to
<http://greenbytes.de/tech/webdav/rfc2183.html#rfc.section.2.8>? This
makes unknown extension types equivalent to "attachment".

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
On Fri, 26 Feb 2010 09:15:56 -0500, Julian Reschke <[hidden email]>  
wrote:

> On 26.02.2010 14:57, Michael A. Puls II wrote:
>> ...
>>> The advantage of this is that it's flexible.
>>>
>>> The disadvantage is that it may be too flexible :-). For instance,
>>> you'd either need to restrict the micro syntax for embedded headers,
>>> or recipients will need to run this through a full-blown HTTP header
>>> parser (which may be hard to do).
>>
>> Yeh, I was thinking of that too. For browsers, percent-decoding it and
>> running it through a full-blown http header parser probably wouldn't be
>> a problem. For others though, it might. And, with the way the filename
>> can be encoded, even if you were to split up the headers and their
>> different parts properly, you'd have to write a decoder for the  
>> filename.
>>
>> I'm also not sure about the performance concerns with using a full-blown
>> http header parser every time a data URI is invoked (which could be a
>> lot if you load a page full of data URIs-based img elements. But, that'd
>> be the author's fault for including the extra, non-needed headers in
>> that case).
>
> +1
>
>>> So I have a slight preference to keep things simple, and to focus on
>>> the specific use case.
>>
>> Well, I'm personally happy with just:
>>
>> data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,
>>
>> (that could even be shortened to just disposition=attachment)
>>
>> I just suggested the more flexible way as I figured that's what most
>> people would want.
>>
>> Now, if we do it just the simple way, how should the filename value be
>> encoded? Just percent-encoded UTF-8? That'd be fine by me because I
>> could just use encodeURIComponent() to produce the value.
>
> We'll need to define which characters need to be percent-escaped,  
> though. Obviously all non-URI characters, but also those needed to parse  
> the parameters, so minimally ";".

Well, encodeURIComponent basically percent-encodes anything not in  
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()"

, which is great to me. That covers encoding ; too.

>> For the disposition value, it'd be handled like this:
>>
>> If the disposition value is present and is "inline" or "attachment", the
>> browser tries to treat it as such. Else, do like browsers do now.
>
> Maybe just refer to  
> <http://greenbytes.de/tech/webdav/rfc2183.html#rfc.section.2.8>? This  
> makes unknown extension types equivalent to "attachment".

I'm happy either way :)

--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Julian Reschke
On 26.02.2010 15:33, Michael A. Puls II wrote:

> ...
>>>> So I have a slight preference to keep things simple, and to focus on
>>>> the specific use case.
>>>
>>> Well, I'm personally happy with just:
>>>
>>> data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,
>>>
>>>
>>> (that could even be shortened to just disposition=attachment)
>>>
>>> I just suggested the more flexible way as I figured that's what most
>>> people would want.
>>>
>>> Now, if we do it just the simple way, how should the filename value be
>>> encoded? Just percent-encoded UTF-8? That'd be fine by me because I
>>> could just use encodeURIComponent() to produce the value.
>>
>> We'll need to define which characters need to be percent-escaped,
>> though. Obviously all non-URI characters, but also those needed to
>> parse the parameters, so minimally ";".
>
> Well, encodeURIComponent basically percent-encodes anything not in
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()"
>
> , which is great to me. That covers encoding ; too.
> ...

Let's see. The data URI scheme RFC (RFC2397) uses token/attribute/value
from RFC 2045, which has
(<http://greenbytes.de/tech/webdav/rfc2045.html#rfc.section.5.1>):

      value := token / quoted-string

      token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                  or tspecials>

      tspecials :=  "(" / ")" / "<" / ">" / "@" /
                    "," / ";" / ":" / "\" / <">
                    "/" / "[" / "]" / "?" / "="
                    ; Must be in quoted-string,
                    ; to use within parameter values

so any new data URI parameter should accept both tokens and quoted strings.

Also, tspecials appears to include a few things encodeURIComponent doesn't.

The devil is in the details. This will need examples and test cases.

Best regards, Julian

Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
On Fri, 26 Feb 2010 09:52:14 -0500, Julian Reschke <[hidden email]>  
wrote:

> On 26.02.2010 15:33, Michael A. Puls II wrote:
>> ...
>>>>> So I have a slight preference to keep things simple, and to focus on
>>>>> the specific use case.
>>>>
>>>> Well, I'm personally happy with just:
>>>>
>>>> data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,
>>>>
>>>>
>>>> (that could even be shortened to just disposition=attachment)
>>>>
>>>> I just suggested the more flexible way as I figured that's what most
>>>> people would want.
>>>>
>>>> Now, if we do it just the simple way, how should the filename value be
>>>> encoded? Just percent-encoded UTF-8? That'd be fine by me because I
>>>> could just use encodeURIComponent() to produce the value.
>>>
>>> We'll need to define which characters need to be percent-escaped,
>>> though. Obviously all non-URI characters, but also those needed to
>>> parse the parameters, so minimally ";".
>>
>> Well, encodeURIComponent basically percent-encodes anything not in
>> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()"
>>
>> , which is great to me. That covers encoding ; too.
>> ...
>
> Let's see. The data URI scheme RFC (RFC2397) uses token/attribute/value  
> from RFC 2045, which has  
> (<http://greenbytes.de/tech/webdav/rfc2045.html#rfc.section.5.1>):
>
>       value := token / quoted-string
>
>       token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
>                   or tspecials>
>
>       tspecials :=  "(" / ")" / "<" / ">" / "@" /
>                     "," / ";" / ":" / "\" / <">
>                     "/" / "[" / "]" / "?" / "="
>                     ; Must be in quoted-string,
>                     ; to use within parameter values
>
> so any new data URI parameter should accept both tokens and quoted  
> strings.
>
> Also, tspecials appears to include a few things encodeURIComponent  
> doesn't.

O.K. I'll get back you on this. I know that browsers choke on " already,  
so it looks like quoted strings might be out.

--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
In reply to this post by Julian Reschke
On Fri, 26 Feb 2010 09:52:14 -0500, Julian Reschke <[hidden email]>  
wrote:

> On 26.02.2010 15:33, Michael A. Puls II wrote:
>> ...
>>>>> So I have a slight preference to keep things simple, and to focus on
>>>>> the specific use case.
>>>>
>>>> Well, I'm personally happy with just:
>>>>
>>>> data:text/plain;charset=utf-8;content-disposition=attachment;filename=name,
>>>>
>>>>
>>>> (that could even be shortened to just disposition=attachment)
>>>>
>>>> I just suggested the more flexible way as I figured that's what most
>>>> people would want.
>>>>
>>>> Now, if we do it just the simple way, how should the filename value be
>>>> encoded? Just percent-encoded UTF-8? That'd be fine by me because I
>>>> could just use encodeURIComponent() to produce the value.
>>>
>>> We'll need to define which characters need to be percent-escaped,
>>> though. Obviously all non-URI characters, but also those needed to
>>> parse the parameters, so minimally ";".
>>
>> Well, encodeURIComponent basically percent-encodes anything not in
>> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()"
>>
>> , which is great to me. That covers encoding ; too.
>> ...
>
> Let's see. The data URI scheme RFC (RFC2397) uses token/attribute/value  
> from RFC 2045, which has  
> (<http://greenbytes.de/tech/webdav/rfc2045.html#rfc.section.5.1>):
>
>       value := token / quoted-string
>
>       token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
>                   or tspecials>
>
>       tspecials :=  "(" / ")" / "<" / ">" / "@" /
>                     "," / ";" / ":" / "\" / <">
>                     "/" / "[" / "]" / "?" / "="
>                     ; Must be in quoted-string,
>                     ; to use within parameter values
>
> so any new data URI parameter should accept both tokens and quoted  
> strings.
>
> Also, tspecials appears to include a few things encodeURIComponent  
> doesn't.
>
> The devil is in the details. This will need examples and test cases.

See <http://shadow2531.com/opera/testcases/datauri/data_uri_rules.html>  
and the example link at the bottom of that page. That's how I'd parse data  
URIs and how I'd support filename and content-disposition.

--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: data URIs - filename and content-disposition

Michael A. Puls II
In reply to this post by Julian Reschke
On Fri, 26 Feb 2010 08:20:39 -0500, Julian Reschke <[hidden email]>  
wrote:

> On 25.02.2010 17:06, Michael A. Puls II wrote:
>> ...
>> What about this?
>>
>> Say you have this file:
>>
>> "with spaces.txt"
>> ---------
>> √
>> ---------
>>
>> and want that as data URI that's treated as an attachment with a
>> filename hint of "with spaces.txt".
>>
>> Well, you might want headers like this:
>>
>> Content-Type: text/plain; charset=utf-8
>> Content-Disposition: attachment; filename="with spaces.txt"
>> Content-Language: en
>>
>> So, how bout doing it like the following?:
>>
>> data:text/plain;charset=utf-8;headers=Content-Disposition%3A%20attachment%3B%20filename%3D%22with%20spaces.txt%22%0D%0AContent-Language%3A%20en,%E2%88%9A
>>
>>
>> That way, 'text/plain;charset=utf-8' would be the full Content-Type
>> header and the rest of the headers can be specified as \r\n-separated
>> lines like in HTTP. It's tagged with "headers=" so it can be found in
>> the string easily, and the value is percent-encoded so it doesn't
>> interfere with existing UA handling of data URIs.
>>
>> The restriction would be that the headers value can't contain a
>> Content-Type header (since it's already implied. And, perhaps, it should
>> be specified exactly what headers are allowed in the headers value.
>>
>> It even works (as in, doesn't cause a problem) with ;base64 at the end
>> like this (in Opera at least):
>> ...
>
> The advantage of this is that it's flexible.
>
> The disadvantage is that it may be too flexible :-). For instance, you'd  
> either need to restrict the micro syntax for embedded headers, or  
> recipients will need to run this through a full-blown HTTP header parser  
> (which may be hard to do).
>
> So I have a slight preference to keep things simple, and to focus on the  
> specific use case.

Judging by Opera and Safari's handling of data URIs, the part between  
"data:" and "," is considered a single, percent-encoded value that is  
percent-decoded before being parsed.

For example: <data:text%2Fplain%3Bcharset%3Dutf-8%3Bbase64,4oia>

What this means is that if you have:

-----------------
Content-Type: text/plain; charset=utf-8; name*=UTF-8''%E2%88%9A.txt
Content-Disposition: attachment; filenane*=UTF-8''%E2%88%9A.txt
Content-Transfer-Encoding: base64

4oia
-----------------

, you could represent the headers as a single header named "data" like so:

data: text/plain; charset=utf-8; base64; filename*=UTF-8''%E2%88%9A.txt;  
content-disposition=attachment

Then, to make a URI out of it, you percent-encode the whole header value  
like this:

<data:%20text%2Fplain%3B%20charset%3Dutf-8%3B%20base64%3B%20filename*%3DUTF-8''%25E2%2588%259A.txt%3B%20content-disposition%3Dattachment,4oia>

, which you will see works in Opera and will work in Safari *if* they fix  
the code to trim white-space around base64. In fact, here's the trimmed  
version for Safari:

<data:%20text%2Fplain%3B%20charset%3Dutf-8%3Bbase64%3B%20filename*%3DUTF-8''%25E2%2588%259A.txt%3B%20content-disposition%3Dattachment,4oia>

So, if we go by how Opera and Safari do it, it almost looks like they  
already use a mime header parser (after percent-decoding the whole value  
between "data:" and ","). I'm not saying they do, but it kind of looks  
like it:

With that said, even if a full-blown header parser being required would  
suck for some, it'd work out great for browsers as they already have code  
to parse mime headers. And, it would allow specifying non-ascii filenames  
according to an existing spec.

The only thing to really specify then would be what param name to use for  
filename and content-disposition. Since Content-Type already has 'name',  
maybe it should be preferred over 'filename'. As for content-disposition,  
that could be as-is or just 'disposition' (as a new param for Content-Type  
even).

As for handling duplicate name/values, that could be handled as the mime  
specs say.

Given this new information, what do you think? And, what do others think?

--
Michael