[widgets] Content-type sniffing and file extension to MIME mapping

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

I had a discussion with Henri Sivonen and a few other people in the
HTML-WG about using HTML5's content-type sniffing as a way of deriving
the MIME type of files inside a widget package. Henri suggested that
we should primarily rely on file extensions as a way of mapping files
to MIME types. Although relying on extensions can be potentially
unreliable, it seems like a simple solution to a complicated problem.
For the spec, I guess  it would mean including a table of file
extension to MIME type mappings into the spec for common IANA
registered types (MIME type registrations list file extensions). As a
second line of defense, if there is no file extension, or the file
extension does not map to the file extension to MIME table, then HTML
content-type sniffing heuristics can be used.

Comments? questions? suggestions? tomatoes?

Kind regards,
Marcos
--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

bilcorry

Marcos Caceres wrote on 11/29/2008 9:39 AM:
> I had a discussion with Henri Sivonen and a few other people in the
> HTML-WG about using HTML5's content-type sniffing as a way of deriving
> the MIME type of files inside a widget package. Henri suggested that
> we should primarily rely on file extensions as a way of mapping files
> to MIME types. Although relying on extensions can be potentially
> unreliable, it seems like a simple solution to a complicated problem.

Content-sniffing can pose it's own problems, here's one example:

        http://www.gnucitizen.org/blog/backdooring-images/


> For the spec, I guess  it would mean including a table of file
> extension to MIME type mappings into the spec for common IANA
> registered types (MIME type registrations list file extensions).

The Apache (httpd) project includes a file called "mime.types" that maps file extensions to MIME types.  I haven't seen anything more extensive than Apache's.


> As a
> second line of defense, if there is no file extension, or the file
> extension does not map to the file extension to MIME table, then HTML
> content-type sniffing heuristics can be used.

This paper describes how the major browsers do it:

        http://www.leviathansecurity.com/pdf/Flirting%20with%20MIME%20Types.pdf

Firefox specifically appears to do it the way you're proposing here.


- Bil


Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Ian Hickson

On Sat, 29 Nov 2008, Bil Corry wrote:

> Marcos Caceres wrote on 11/29/2008 9:39 AM:
> > I had a discussion with Henri Sivonen and a few other people in the
> > HTML-WG about using HTML5's content-type sniffing as a way of deriving
> > the MIME type of files inside a widget package. Henri suggested that
> > we should primarily rely on file extensions as a way of mapping files
> > to MIME types. Although relying on extensions can be potentially
> > unreliable, it seems like a simple solution to a complicated problem.
>
> Content-sniffing can pose it's own problems, here's one example:
>
> http://www.gnucitizen.org/blog/backdooring-images/

Content-sniffing providing privilege escalation is a problem, as is
non-interoperable content-sniffing. However, assuming you define the
content-sniffing to not have any privilege escalations, and assuming that
all implementations implement the same thing, there's no problem.

Note also that none of this applies to widgets, since the user has already
given them as full a set of privileges as would be possible to obtain
through content-sniffing.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

Ok, hearing no objections, then I propose we bake in the following
file extensions into the spec (we can debate which MIME types to use
after we settle on the extensions!):

.html
.htm
.css
.gif
.jpeg
.png
.js
.json
.xml
.txt

The following we should probably bake in too:
.mp3
.swf
.wav
.svg
.ico

We may bake in the following:
xhtml

Please suggest more you would like to see!

Kind regards,
Marcos
--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

timeless-3

On Tue, Dec 2, 2008 at 1:42 AM, Marcos Caceres <[hidden email]> wrote:
> Ok, hearing no objections, then I propose we bake in the following
> file extensions into the spec (we can debate which MIME types to use
> after we settle on the extensions!):

> .jpeg

you're missing .jpg which is fairly odd.

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Jonas Sicking-2
In reply to this post by Marcos Caceres-2

Marcos Caceres wrote:

> Ok, hearing no objections, then I propose we bake in the following
> file extensions into the spec (we can debate which MIME types to use
> after we settle on the extensions!):
>
> .html
> .htm
> .css
> .gif
> .jpeg
> .png
> .js
> .json
> .xml
> .txt
>
> The following we should probably bake in too:
> .mp3
> .swf
> .wav
> .svg
> .ico

I'm not a big fan of "endorsing" mp3 given that it's a patented format
that can't be implemented by FOSS software.

/ Jonas

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

On Mon, Dec 1, 2008 at 11:48 PM, Jonas Sicking <[hidden email]> wrote:

> Marcos Caceres wrote:
>>
>> Ok, hearing no objections, then I propose we bake in the following
>> file extensions into the spec (we can debate which MIME types to use
>> after we settle on the extensions!):
>>
>> .html
>> .htm
>> .css
>> .gif
>> .jpeg
>> .png
>> .js
>> .json
>> .xml
>> .txt
>>
>> The following we should probably bake in too:
>> .mp3
>> .swf
>> .wav
>> .svg
>> .ico
>
> I'm not a big fan of "endorsing" mp3 given that it's a patented format that
> can't be implemented by FOSS software.
>
> / Jonas
>

Ok, no MP3. maybe you can suggest some other suitable audio format?


--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Jonas Sicking-2

Marcos Caceres wrote:

> On Mon, Dec 1, 2008 at 11:48 PM, Jonas Sicking <[hidden email]> wrote:
>> Marcos Caceres wrote:
>>> Ok, hearing no objections, then I propose we bake in the following
>>> file extensions into the spec (we can debate which MIME types to use
>>> after we settle on the extensions!):
>>>
>>> .html
>>> .htm
>>> .css
>>> .gif
>>> .jpeg
>>> .png
>>> .js
>>> .json
>>> .xml
>>> .txt
>>>
>>> The following we should probably bake in too:
>>> .mp3
>>> .swf
>>> .wav
>>> .svg
>>> .ico
>> I'm not a big fan of "endorsing" mp3 given that it's a patented format that
>> can't be implemented by FOSS software.
>>
>> / Jonas
>>
>
> Ok, no MP3. maybe you can suggest some other suitable audio format?

I'd stick with wav for now. Possibly also FLAC.

/ Jonas

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

On Tue, Dec 2, 2008 at 12:22 AM, Jonas Sicking <[hidden email]> wrote:

> Marcos Caceres wrote:
>>
>> On Mon, Dec 1, 2008 at 11:48 PM, Jonas Sicking <[hidden email]> wrote:
>>>
>>> Marcos Caceres wrote:
>>>>
>>>> Ok, hearing no objections, then I propose we bake in the following
>>>> file extensions into the spec (we can debate which MIME types to use
>>>> after we settle on the extensions!):
>>>>
>>>> .html
>>>> .htm
>>>> .css
>>>> .gif
>>>> .jpeg
>>>> .png
>>>> .js
>>>> .json
>>>> .xml
>>>> .txt
>>>>
>>>> The following we should probably bake in too:
>>>> .mp3
>>>> .swf
>>>> .wav
>>>> .svg
>>>> .ico
>>>
>>> I'm not a big fan of "endorsing" mp3 given that it's a patented format
>>> that
>>> can't be implemented by FOSS software.
>>>
>>> / Jonas
>>>
>>
>> Ok, no MP3. maybe you can suggest some other suitable audio format?
>
> I'd stick with wav for now. Possibly also FLAC.

Ok, sounds good. I suppose flac has a .flac extension? (sorry, apart
from mp3 and wav, I don't know audio formats at all!)

--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Jonas Sicking-2

Marcos Caceres wrote:

> On Tue, Dec 2, 2008 at 12:22 AM, Jonas Sicking <[hidden email]> wrote:
>> Marcos Caceres wrote:
>>> On Mon, Dec 1, 2008 at 11:48 PM, Jonas Sicking <[hidden email]> wrote:
>>>> Marcos Caceres wrote:
>>>>> Ok, hearing no objections, then I propose we bake in the following
>>>>> file extensions into the spec (we can debate which MIME types to use
>>>>> after we settle on the extensions!):
>>>>>
>>>>> .html
>>>>> .htm
>>>>> .css
>>>>> .gif
>>>>> .jpeg
>>>>> .png
>>>>> .js
>>>>> .json
>>>>> .xml
>>>>> .txt
>>>>>
>>>>> The following we should probably bake in too:
>>>>> .mp3
>>>>> .swf
>>>>> .wav
>>>>> .svg
>>>>> .ico
>>>> I'm not a big fan of "endorsing" mp3 given that it's a patented format
>>>> that
>>>> can't be implemented by FOSS software.
>>>>
>>>> / Jonas
>>>>
>>> Ok, no MP3. maybe you can suggest some other suitable audio format?
>> I'd stick with wav for now. Possibly also FLAC.
>
> Ok, sounds good. I suppose flac has a .flac extension? (sorry, apart
> from mp3 and wav, I don't know audio formats at all!)

I'm mostly in the same boat :)

But it does look like .flac is the preferred extension.

/ Jonas

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Jere.Kapyaho


On 2.12.2008 3.05, "ext Jonas Sicking" <[hidden email]> wrote:

>
> Marcos Caceres wrote:
>> On Tue, Dec 2, 2008 at 12:22 AM, Jonas Sicking <[hidden email]> wrote:
>>> Marcos Caceres wrote:
>>>> On Mon, Dec 1, 2008 at 11:48 PM, Jonas Sicking <[hidden email]> wrote:
>>>>> Marcos Caceres wrote:
>>>>>> Ok, hearing no objections, then I propose we bake in the following
>>>>>> file extensions into the spec (we can debate which MIME types to use
>>>>>> after we settle on the extensions!):
>>>>>>
>>>>>> .html
>>>>>> .htm
>>>>>> .css
>>>>>> .gif
>>>>>> .jpeg
>>>>>> .png
>>>>>> .js
>>>>>> .json
>>>>>> .xml
>>>>>> .txt
>>>>>>
>>>>>> The following we should probably bake in too:
>>>>>> .mp3
>>>>>> .swf
>>>>>> .wav
>>>>>> .svg
>>>>>> .ico
>>>>> I'm not a big fan of "endorsing" mp3 given that it's a patented format
>>>>> that
>>>>> can't be implemented by FOSS software.
>>>>>
>>>>> / Jonas
>>>>>
>>>> Ok, no MP3. maybe you can suggest some other suitable audio format?
>>> I'd stick with wav for now. Possibly also FLAC.
>>
>> Ok, sounds good. I suppose flac has a .flac extension? (sorry, apart
>> from mp3 and wav, I don't know audio formats at all!)
>
> I'm mostly in the same boat :)
>
> But it does look like .flac is the preferred extension.
>
> / Jonas
>

Yes, it's .flac (or .fla in a pinch) for FLAC.

I was going to suggest to add .aac and .mp4, but if patented formats are
out, then I won't. However, isn't .swf equally patented, or has it been
liberated recently? (Or GIF? PDF?) It may not be feasible to discriminate
some file formats on that basis, especially if they do have a registered
MIME type.

I will nevertheless suggest the audio formats .ogg (open) and also .aiff.
And if patents shouldn't matter, also .mov, .wmv and .mp2.

But... maybe relying on file extensions is not the way to do it after all.
Extensions are not reliable, and not even mandatory on some systems. Since
the widget package is in a sense "sealed", a metadata list that connects
each filename inside a package to a MIME type would work with or without
file extensions. Or has this been discussed and dismissed already?

Example metadata list (format is simply <filename> WSP <mimetype>):

images/splashScreen.png image/png
music/themesong.flac audio/flac
favicon.ico image/vnd.microsoft.icon
main application/xhtml+xml

Note especially the last item, which has no file extension at all.

--Jere


Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

timeless-3

i'd like to take this moment to vote against .fla as mapped to flac.

http://www.google.com/search?hl=en&q=filetype%3Afla

[FLASH]
<p align="left"><font face="Verdana" size="13" color="#000000 ...
File Format: Shockwave Flash
<p align="left"><font face="Verdana" size="13" color="#000000">Central
America</ font></p> <p align="left"><font face="Century Gothic"
size="13" ...
www.babewithabackpack.com/world_flash.fla - Similar pages - Note this

I'm told this is a flash creator format instead of the .swf format
we're used to, but it's clearly flash and not flac.

If someone decides to make a widget which does some form of flash
authoring (and hopefully adobe would move their platform to our
standard), then i don't want to make life more difficult for them.

note that flac isn't supported in the browsers i have today, so
standardizing it is like standardizing a mime type for mng.

if we must support a mime type for audio, let it be .wav and .wave,
those are fairly unambiguous (although codec support should be clearly
marked as not guaranteed)

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2
In reply to this post by Jere.Kapyaho

Hi Jere,

On Tue, Dec 2, 2008 at 3:19 PM, Jere Kapyaho <[hidden email]> wrote:
<snip>
> Yes, it's .flac (or .fla in a pinch) for FLAC.
>

Oh oh! .fla is a clash with Adobe flash files:(

> I was going to suggest to add .aac and .mp4, but if patented formats are
> out, then I won't. However, isn't .swf equally patented, or has it been
> liberated recently? (Or GIF? PDF?) It may not be feasible to discriminate
> some file formats on that basis, especially if they do have a registered
> MIME type.
>

I don't know if it should be based on patented formats being out. I
think it should be based on what we can show to be the core
technologies that make widgets usable and interoperable. If, for
instance, implementers are happy to foot the bill for technology foo
because it provides something that developers want and commonly use,
then we should probably include it.

> I will nevertheless suggest the audio formats .ogg (open) and also .aiff.
> And if patents shouldn't matter, also .mov, .wmv and .mp2.
>
> But... maybe relying on file extensions is not the way to do it after all.
> Extensions are not reliable, and not even mandatory on some systems. Since
> the widget package is in a sense "sealed", a metadata list that connects
> each filename inside a package to a MIME type would work with or without
> file extensions. Or has this been discussed and dismissed already?
>

We've talked about this informally for a while (mostly at F2Fs or in
IRC, but I don't think we have ever really discussed it via the public
list).

> Example metadata list (format is simply <filename> WSP <mimetype>):
>
> images/splashScreen.png image/png
> music/themesong.flac audio/flac
> favicon.ico image/vnd.microsoft.icon
> main application/xhtml+xml
>
> Note especially the last item, which has no file extension at all.

I personally, don't think we should define a format that forces
developers to include every file just to cover the use case where a
file extension is missing. Also, I assume that some tool would be
needed to generate this metadata list, as I don't see any developer
ever doing this by hand because:

   1. a widget could contain hundreds of files.
   2. file names with spaces, and possibly other characters, would
need to be URL encoded.
   3. it would be tremendously error prone and hard to maintain.
   4. developers would wonder why this is not done automatically by
the widget engine, when they've never had to do it with any other
widget engine before.

If software must be created to derive the MIME types and generate the
metadata file, either through sniffing or through looking at the file
extension, then I think such a tool should just be part of the widget
engine. Note that such software has been created (see Linux's "file"
util [1]), and, in some cases, Apache uses similar functionality to
derive MIME types [2].

If we were going to add a mimetype override file, I would argue we
should only do it based on file extensions.

I still believe the spec should:

  1. define the mappings for file extension to MIME, which all engines
must use.
  2. in the case there is strong support from working group members
for adding a mimetype override format, the spec include a default
override file that all widget engines are expected to use.

In the case of 2 above, I would _not_ want us to define yet another
XML format. I think we should just have a very simple text-based
format that simply looks like this (based loosely on Apache's
addType):

text/html .php

The file could be called "mimetypes" or "mime.types" and sit at the
root of a widget package.

Kind regards,
Marcos

[1] http://httpd.apache.org/docs/1.3/mod/mod_mime.html
[2] http://httpd.apache.org/docs/1.3/mod/mod_mime_magic.html
--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Boris Zbarsky
In reply to this post by Marcos Caceres-2

Marcos Caceres wrote:
> Ok, hearing no objections, then I propose we bake in the following
> file extensions into the spec (we can debate which MIME types to use
> after we settle on the extensions!):
...
> .css

That extension is used for both the text/css and application/x-pointplus
MIME types.  In fact it's mapped to the latter by default in some web
servers.

Then again, this is likely to be a common problem with any
extension-sniffing approach.

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

On Tue, Dec 2, 2008 at 5:42 PM, Boris Zbarsky <[hidden email]> wrote:

> Marcos Caceres wrote:
>>
>> Ok, hearing no objections, then I propose we bake in the following
>> file extensions into the spec (we can debate which MIME types to use
>> after we settle on the extensions!):
>
> ...
>>
>> .css
>
> That extension is used for both the text/css and application/x-pointplus
> MIME types.  In fact it's mapped to the latter by default in some web
> servers.
>

That wouldn't be a problem in widgets, as we would say .css is always text/css.


--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Boris Zbarsky

Marcos Caceres wrote:
> That wouldn't be a problem in widgets, as we would say .css is always text/css.

My point is that this doesn't seem like a reasonable requirement,
necessarily.

-Boris


Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2

On Tue, Dec 2, 2008 at 6:09 PM, Boris Zbarsky <[hidden email]> wrote:
> Marcos Caceres wrote:
>>
>> That wouldn't be a problem in widgets, as we would say .css is always
>> text/css.
>
> My point is that this doesn't seem like a reasonable requirement,
> necessarily.
>

Do you have any suggestions as to how we might move forward? Or a
different approach to solving the problem?

--
Marcos Caceres
http://datadriven.com.au

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Boris Zbarsky

Marcos Caceres wrote:
> Do you have any suggestions as to how we might move forward? Or a
> different approach to solving the problem?

The problem being that a ZIP file doesn't know anything about the types
of files in it?

What Gecko does right now for jar: URIs is somewhat similar to what it
does for file: URIs.  Specifically:

1)  If the caller expects a particular type and indicates that, use
     that type.  For example, any jar: URI loaded from a
     <link rel="stylesheet" type="text/css"> will be treated as
     text/css.
2)  If the caller has no type expectation, look up type based on
     extension.  This doesn't use a particular hardcoded list but
     does a best-effort lookup based on a hardcoded override list
     in Gecko, the type+extension combinations the user's profile
     has seen and acted on before, the OS extension to type mappings,
     another hardcoded list of extension-to-type hints (which differ
     from the overrides in not overriding the OS mappings).
3)  If step 2 did not find a type, use our standard unknown content
     sniffer (which sniffs based on various stuff including the URI,
     the data, etc).

None of this is great for interoperability, of course, but it's no worse
than anything that happens with file:// URIs.

I suppose you could in fact define a small set of extensions that would
interoperably be mapped to particular types in the context of widgets.
You could also define a particular file in the package that contains
extension-to-type mappings (either without the predefined mappings, or
able to add to and override the predefined mappings).  You'd need
something like this for sane extensibility anyway, in my opinion.

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Jere.Kapyaho
In reply to this post by Marcos Caceres-2

On 2.12.2008 18.29, "ext Marcos Caceres" <[hidden email]> wrote:
> On Tue, Dec 2, 2008 at 3:19 PM, Jere Kapyaho <[hidden email]> wrote:
> <snip>
>> Yes, it's .flac (or .fla in a pinch) for FLAC.
> Oh oh! .fla is a clash with Adobe flash files:(

Well, that would have been for truly legacy systems only. :) However, when
multiple file extensions map to the same MIME type, there could be other
conflicts like this.

> I personally, don't think we should define a format that forces
> developers to include every file just to cover the use case where a
> file extension is missing.

If the extension is missing, it could be on purpose. Or it could be there,
but it could be just plain wrong, or ambiguous (think .jpg vs. .jpeg, or
.htm vs. .html). The concept I envisioned is somewhat similar to the index
of a JAR file. [1].

> Also, I assume that some tool would be
> needed to generate this metadata list, as I don't see any developer
> ever doing this by hand because:
>
>    1. a widget could contain hundreds of files.
>    2. file names with spaces, and possibly other characters, would
> need to be URL encoded.
>    3. it would be tremendously error prone and hard to maintain.
>    4. developers would wonder why this is not done automatically by
> the widget engine, when they've never had to do it with any other
> widget engine before.

If a widget has hundreds of files, nobody would try to do it by hand anyway
(point #1). If the filename is UTF-8 and defined as a relative URI inside
the package, it will have to be UTF-8-ified and URL-encoded. (point #2). I
guess I envisioned a tool doing the assembly anyway (point #3).

> If we were going to add a mimetype override file, I would argue we
> should only do it based on file extensions.

Note that I'm not pushing the method I described as *the* solution, but to
me only point #4 of those above is critical. File extensions are by nature
unreliable and ambiguous, but very commonly used as a way (or even the only
way) of recognizing content. A more immediate problem in terms of the spec
is that you will need to come up with all the 'important' file extensions up
front, and the list will need to be updated later, perhaps frequently,
depending on how exhaustive the initial list was.

But the Apache style extension to MIME type mapping probably works
adequately in this context also.

[1] http://java.sun.com/j2se/1.3/docs/guide/jar/jar.html#JAR%20Index

--Jere


Reply | Threaded
Open this post in threaded view
|

Re: [widgets] Content-type sniffing and file extension to MIME mapping

Marcos Caceres-2
In reply to this post by bilcorry

Hi Bil,
Sorry, your I accidentally skipped over your email.

On Sun, Nov 30, 2008 at 5:44 AM, Bil Corry <[hidden email]> wrote:

>
> Marcos Caceres wrote on 11/29/2008 9:39 AM:
>> I had a discussion with Henri Sivonen and a few other people in the
>> HTML-WG about using HTML5's content-type sniffing as a way of deriving
>> the MIME type of files inside a widget package. Henri suggested that
>> we should primarily rely on file extensions as a way of mapping files
>> to MIME types. Although relying on extensions can be potentially
>> unreliable, it seems like a simple solution to a complicated problem.
>
> Content-sniffing can pose it's own problems, here's one example:
>
>        http://www.gnucitizen.org/blog/backdooring-images/
>

I see.

>
>> For the spec, I guess  it would mean including a table of file
>> extension to MIME type mappings into the spec for common IANA
>> registered types (MIME type registrations list file extensions).
>
> The Apache (httpd) project includes a file called "mime.types" that maps file extensions to MIME types.  I haven't seen anything more extensive than Apache's.
>
>
>> As a
>> second line of defense, if there is no file extension, or the file
>> extension does not map to the file extension to MIME table, then HTML
>> content-type sniffing heuristics can be used.
>
> This paper describes how the major browsers do it:
>
>        http://www.leviathansecurity.com/pdf/Flirting%20with%20MIME%20Types.pdf
>
> Firefox specifically appears to do it the way you're proposing here.

Thanks for this resource, it was quite useful!


--
Marcos Caceres
http://datadriven.com.au

123