The <iframe> element and sandboxing ideas

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

The <iframe> element and sandboxing ideas

Ian Hickson


Summary:

 * I've added a sandbox="" attribute to <iframe>, which by default
   disables a number of features and takes a space-separated list of
   features to re-enable:

     - by default, content in sandboxed browsing contexts, and any
       browsing contexts nested in them, have a unique origin
       (independent of the origin of their URI); this can be overriden
       using the "allow-same-origin" keyword

     - by default, all form controls in those browsing contexts are
       disabled; this can be overriden using the "allow-forms"
       keyword

     - by default, script in those browsing contexts cannot run; this can
       be overriden using the "allow-scripts" keyword

     - content in those browsing contexts cannot navigate other
       browsing contexts outside of the sandbox (seamless="" below
       overrides this)

     - content in those browsing contexts cannot create new browsing
       contexts or open modal dialogs or alerts

     - all plugins in those browsing contexts are disabled

 * I've added a seamless="" boolean attribute to <iframe>, which, if
   the content's active document's URI has the same origin as the
   container, causes the iframe to size vertically to the bounding box
   of the contents, and horizontally to the width of the container,
   and which causes the initial containing block of the contents to be
   treated as zero height. In addition, styles on the root element of
   the content must inherit from the <iframe> instead of being the
   initial values, and the style sheets that apply to the <iframe>
   must also apply to the contents. In addition, any time the browsing
   context navigates itself, the parent browsing context gets
   navigated instead.

This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
approaches taken.

There are various things that this doesn't address yet; e.g. there's no
way to force (or even allow) a non-seamless iframe to open links in the
parent window.


On Thu, 9 Mar 2006, Alexey Feldgendler wrote:

>
> Let's imagine a blogging website that allows anybody to create a blog
> which is available as http://www.example.com/blogs/username/. Many such
> sites allow various user customization, so imagine this site lets the
> blog owner to supply custom HTML to display on top of the blog page.
> This is primarily used by blog authors to design stylish navigation. To
> make such navigation menus more attractive, the authors wish to use
> JavaScript and Flash, but unrestricted JavaScript would make it possible
> for the blog owner to steal visitors' session cookies.
>
> The blog author logs in and opens some kind of customization screen:
>
> HTML to display on top of your blog: [TEXTAREA]
> [SUBMIT]
>
> So, imagine the blog author enters into the textarea:
>
> Welcome to my blog!</sandbox><a href="#"
> onclick="alert(document.cookie)">Click here</a>
>
> After submission, this code is fed to the HTML cleaner. At present, HTML
> cleaners are usually complicated scripts which try to catch known quirks
> of the user agents, and still they usually have security holes found one
> after another. See for example
> http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl.
> With HTML 5 parsing spec, there will be one single algorithm for parsing
> HTML code with well-defined error recovery. So, the HTML cleaner at the
> server side runs the HTML 5 parser on the user-supplied text, which
> produces the following DOM:
>
> * Welcome to my blog!
> * A
>     href="#"
>     onclick="alert(document.cookie)"
>   * Click here
>
> The </sandbox> tag is ignored as an easy parse error because there is no
> matching <sandbox> tag in the user-supplied text. After parsing, the
> HTML cleaner iterates through the tree, renaming potentially unsafe
> elements and attributes, producing the following:
>
> * Welcome to my blog!
> * A
>     href="#"
>     safe-onclick="alert(document.cookie)"
>   * Click here
>
> At the final stage, the HTML cleaner re-serializes the DOM into the
> following code, which is saved into the database:
>
> Welcome to my blog!<a href="#"
> safe-onclick="alert(document.cookie)">Click here</a>
>
> When the site renders the blog page, it puts the "HTML for page top"
> inside a sandbox:
>
> <body>
> <sandbox>
> Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click
> here</a>
> </sandbox>
> ...
> </body>
>
> Each blog entry is probably also contained in its own sandbox. This is
> even more important on the so-called friends pages, where entries by
> different authors are displayed on the same page.
>
> When the page is rendered in a modern user agent which supports
> sandboxing, the safe-onclick attribute is interpreted exactly the same
> as onclick. When the user clicks the link, the event handler is
> executed. Because the code is inside the sandbox, it operates on a fake
> document object, so it doesn't retrieve the cookies (I think
> document.cookie should just return an empty string). The visitor's
> session cookies are safe.
>
> When the page is rendered in an older user agent which doesn't support
> sandboxing, the safe-onclick attribute is ignored because it is unknown.
> When the user clicks the link, no event handler is executed, and the
> cookies are safe again.

You can do this now (though it's far uglier) by taking the author's markup
and converting it to base64, and then stuffing it into an iframe something
like this:

   <iframe seamless sandbox="allow-scripts allow-forms"
           src="data:text/html;base64,PCFET0NUWVBFIEhUTUw%2BPHRpdGxlPjwvdGl0bGU%2BV2VsY29tZSB0byBteSBibG9nITwvc2FuZGJveD48YSBocmVmPSIjIiBvbmNsaWNrPSJhbGVydChkb2N1bWVudC5jb29raWUpIj5DbGljayBoZXJlPC9hPg0K">
   </iframe>

This isn't very readable, I'll grant you. I'm thinking of introducing a
new attribute. I haven't worked out what to call it yet, but definitely
not "src", "source", "src2", "content", "value", or "data" -- maybe
"html" or "doc", though neither of those are great. This attribute would
take a string which would then be interpreted as the source document
markup of an HTML document, much like the above; it would override src=""
if it was present, allowing src="" to be used for legacy UAs:

   <iframe seamless sandbox="allow-scripts allow-forms" doc="
     <!DOCTYPE HTML>
     <title></title>
     Welcome to my blog!
     </sandbox>
     <a href='#' onclick='alert(document.cookie)'>Click here</a>
   "></iframe>

(There are things we can do to make this better, e.g. make the <!DOCTYPE
HMTL> and <title></title> bits implicit, maybe introducing type="" to say
whether it's HTML or XML instead of only supporting HTML, maybe saying
that if src="" and doc="" are both specified they must have identical
data, etc.)

Comments and suggestions on this are welcome. I haven't added it to the
spec yet. I do agree that without this or something equivalent that we
don't have a solution for sandboxing embedded blog comments yet.


On Mon, 23 Apr 2007, Jonas Sicking wrote:
>
> The idea is basically an element like <iframe> but that renders the
> linked page, instead of inside a square area, in flow with the main
> page. This idea is really rough still, but I hope to try to implement it
> in a not too distant future to solidify it a bit. One thing very much up
> in the air is what the element would be called. Suggestions welcome, but
> I'm using the name <include> below.

I've basically added this to <iframe> using the seamless="" attribute.


> Should the stylesheets of the outer or the inner document be used?

I went with "yes".


> When a fragment identifier is specified, should we render that element,
> or its children?

I went with making that work the same as with normal <iframe>s (so likely
no effect if the default shrink-wrapping-to-boundary-box behaviour is in
effect).


> Should style be inherited from the parent of the <include>, or from the
> DOM parent in the inner document?

I've made inheritance happen from <iframe> to root element.


> Should the inner DOM be rendered inside of, or in place of the <include>?

I've made this happen as with <iframe>.


On Mon, 23 Apr 2007, Gervase Markham wrote:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=80713

I've taken the notes there into account.


On Mon, 23 Apr 2007, Jonas Sicking wrote:
>
> There's a big difference to that and to what I'm proposing. With what's
> in bug 80713 you're still limited to a box that basically doesn't take
> part of the outer page at all. For example in the table example in my
> original post the headers of the table would not resize to fit the
> column sizes in the <include>ed table.

Woah. That's far more radical. I have no idea how to do that. How would
you make the parser not generate the implied elements and switch straight
to the "in table" mode? How would you make the CSS model work with this?
How would you define conformance for the document fragments?


On Thu, 26 Apr 2007, Martin Atkins wrote:
>
> Would documents included via <include> run in the security context of
> the including page, as with the script technique, or would they run in
> the context of the included document, as with iframes?

The sandbox="" attribute can be specified to change it from the former to
the latter (and in fact, from the former to an isolated origin regardless
of the true origin of the document).


On Fri, 27 Apr 2007, Jonas Sicking wrote:
>
> They would run in the context of the included page, just like an iframe.
> The processing of <include> is exactly that of <iframe> the only
> difference is in the rendering.

It may be worth bringing this up with the CSSWG if it really is just a
rendering issue.


On Tue, 8 May 2007, Dean Edwards wrote:
>
> XBL has an attribute to cover inherited styles, so you're right.
> Realistically, I can't see Microsoft ever implementing XBL (I hope I'm
> wrong). So adding it to HTML might be the only way to achieve this
> functionality.

Inventing a new technology that does the same as another on the basis that
the UAs will implement one but not the other seems dubious at best.


> Kind of like an <iframe> but without an external source.

Would the doc="" proposal above be enough?


On Tue, 8 May 2007, Henri Sivonen wrote:
>
> I wonder if this issue could be solved on the layout/CSS level by
> providing a way to make the height of an iframe depend on the actual
> height of the root element of the document loaded in the iframe. That
> is, would it be feasible to make the iframe contents have the layout/UI
> feel of a part of the parent page while keeping the DOMs and script
> security contexts separate?

That's pretty much what seamless="" does, yes.


On Tue, 8 May 2007, Jon Barnett wrote:

>
> http://www.w3.org/TR/css3-box/#intrinsic0 (and also CSS2 10.6)
> Since CSS doesn't attempt to specify the intrinsic width of a document in an
> iframe, maybe HTML5 should specify that the intrinsic width of a document
> is:
> - if the CSS width property is specified on the html element, the margin-box
> of the page at that width (which may have overflow)
> - else, if the CSS min-width property is specified on the html element, the
> margin-box of the page at that width (which may have overflow)
> - else, the smallest width the page can have without horizontal scrolling
> and the intrinsic height of the document is:
> - if the CSS height or min-height property are set, similar to above,
> - else, the smallest height the page can have at the intrinsic width of the
> document without vertical scrolling

That seems overly complicated, but the spec says something similar in
fewer words.


On Thu, 10 May 2007, Magnus Gasslander wrote:

>
> I see you have done some work to prevent reflow loops with percentage
> root heights > 100%, but how does your patch handle an iframe document
> that looks like this? (I can think of nastier testcases also, where
> "bottom"  is embedded further down in the document)
>
> <html>
> <head>
> </head>
> <body>
> <div style="position:absolute;bottom:-5px;">This will force a scrollbar on the
> document</div>
> </body>
> </html>

As far as I can tell, the spec handles this fine.


On Mon, 14 May 2007, Michel Fortin wrote:
>
> What about encoding the content of each comment iframe in a "data:" URI?

That unfortunately isn't compatible with IE, and has rather unfortunate
non-trivial escaping requirements.


On Mon, 14 May 2007, Jon Barnett wrote:
>
> The contents of an iframe with a data: URI source should be trusted,
> unlike an iframe with an http: URI source from another domain.  A script
> in an iframe with a data: URI source should, by default, be able to
> communicate with the parent window.  So, that alone doesn't solve the
> problem.

Adding sandbox="" solves this (at least for new UAs).


On Mon, 14 May 2007, Alexey Feldgendler wrote:
>
> Not to mention that data: URIs are ugly, wasteful (because of the BASE64
> encoding), cannot be read and written by humans directly, and have
> maximum length problems in some implementations.

Right.


On Mon, 14 May 2007, Alexey Feldgendler wrote:
>
> Yes, I want the sandbox to degrade securely, as does any webmaster who
> might be going to allow some user-supplied scripting while relying on
> sandboxing for security. To cover its use cases, this feature must
> degrade securely.

Degrade securely _and usefully_, or just securely (and maybe to nothing)?

The latter is handled by the doc="" proposal. The former may be impossible
without server-side filtering.


> This does degrade securely, doesn't require separate HTTP requests, and
> maintains human readability.
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-December/005301.html

This still requires server-side filtering, though.


On Mon, 14 May 2007, Michel Fortin wrote:
>
> People are already struggling to remove all scripts from HTML snippets.
> I don't think finding all these occurrence and replacing them is going
> to be much better. Also, you'd need safe-style="" and <safe-style> too,
> since IE can embed javascript expressions into style rules. (And now
> lets hope IE does not allow expression elsewhere.)

Indeed.


> This principle could be transposed to <sandbox>, where it could be
> defined as taking the unsafe HTML content from an attribute. And the
> best part: you don't need anything else like the safe-* substitutions as
> suggested earlier for <sandbox>:
>
>     <sandbox type="text/html" content="
>       &lt;p&gt;&quot;Unsafe&quot; content here:&lt;/p&gt;
>       &lt;script&gt;
>         document.write(window.parent.location)
>       &lt;/script&gt;
>     ">
>        Alternative, possibly degraded but safe content for older browsers.
>     </sandbox>

I think we'd want to use <iframe> for this, but otherwise, yes.


On Tue, 15 May 2007, Gervase Markham wrote:
>
> Would you really want separate security contexts for each comment?

On Tue, 15 May 2007, Alexey Feldgendler wrote:
>
> I wouldn't want to allow people screw up others' comments, making it
> look that other users wrote what they didn't write. So, yes, it's
> important that any code within a comment cannot change anything but
> itself. This also means that the comment should be unable to change the
> header/footer around it to pretend that someone else wrote it.

Documents per comment are expensive, but they do seem to be what we need
(or maybe want) here.


On Tue, 15 May 2007, Kristof Zelechovski wrote:
>
> The OP probably meant that maintaining so many contexts would cause a
> comparable deterioration in performance.  All user comments should be
> put in one security context.
>
> With all comments grouped together in such a manner, you could even use
> an inline frame.

While simple, this wouldn't let you do things like have trusted content
interleaved with comments (e.g. "edit" and "reply" links), which is
common.


On Tue, 15 May 2007, Jon Barnett wrote:
>
> I really think comments are a bad use case.  Why would someone allow
> scripts in comments in any context, much less a sandboxed one?

You wouldn't, but you would want to prevent scripts from running
altogether.


> The best use case I have thought of so far is MySpace et. al., a site
> where users have their own page with limited permission in the context
> of the overall site.  MySpace solves this by not allowing scripts at
> all, as most such web sites do.  If possible, such sites might allow a
> user to insert widget scripts with limited permissions.  For this use
> case, iframe isn't ideal, either, but limited scripting and styling are
> desired.

Would the spec's current proposals work?


On Wed, 9 May 2007, Alexey Feldgendler wrote:

> On Tue, 08 May 2007 05:50:38 +0200, Ian Hickson <[hidden email]> wrote:
> >
> > This probably depends on the use cases in question. For some use
> > cases, the status quo is in fact the script running with full
> > privileges, so while not being ideal, it is indeed acceptable; in
> > other cases, you wouldn't want scripts to run at all if they weren't
> > limited in some way.
>
> A security feature, by definition, protects the users from a certain
> class of attacks. An attack needs to be only successful in one browser
> to do harm. For example, a malicious advertising script which actually
> steals passwords entered by users on the host page is dangerous enough
> even if the attacker only succeeds in stealing passwords of just a
> fraction of the users.
>
> I can't really imagine a scenario in which sandbox restrictions could be
> somehow considered optional. Wherever there is need for such
> restrictions, it's unacceptable to run the script without them
> implemented.

In some cases the sandbox would be "defence in depth" -- for example, in
all cases where user-generated content is embedded today.


> The key differences from <iframe> are:
>
> 1. Doesn't require loading of a separate document via a separate HTTP
> request, and without the ugliness of data: URIs. If there was some
> "inline" version of <iframe>, such as <iframe>content</iframe>, that
> would be just fine.

doc="" would handle this, then...


> 2. Implements the security barrier even though the inner content doesn't
> come from a different domain. <iframe> would require a separate domain
> for that.

sandbox="" does this now.


> 3. The security barrier is asymmetric, i.e. the outer scripts have
> access to the inner content, but not the other way round.

What's the use case for this?


> All attempts to treat user-submitted HTML as a string are doomed to
> having such vulnerabilities. <sandbox> alone doesn't add much to this
> problem. Just look at how complex is the HTML sanitizer in LiveJournal
> which allows some user-submitted markup but not all.

That's one advantage of the doc="" idea; it makes sanitising mostly
trivial compared to all other ideas for this.


On Thu, 10 May 2007, Gervase Markham wrote:
>
> If attributes on closing tags were allowed, you could do:
>
> <sandbox secret="09f9...">Hello World</sandbox secret="09F9...">
>
> In other words, make them match. So any inserted </sandbox> tags
> wouldn't close the sandbox unless they knew the secret - which they
> couldn't do, because they have the chicken-and-egg problem of having to
> be able to read the page first.

This relies on the author being able to reliably produce unpredictable
content, which is a very dubious responsibility to put on many authors.

Also, it would make the XML guys have a fit. Then again, maybe that goes
in the "pro" column and not the "con" column...


> http://www.gerv.net/security/content-restrictions/ , specifically the
> "hierarchy" restriction, allows the <iframe> content to be isolated from
> the parent.

It's not enirely clear what the proposal here is; as far as I can tell
it's an HTTP header. Is that right? Self-describing the security
restrictions on content works for same-site serving, but not really for
third-party content.


> IE has the proprietary "security" attribute on <iframe> which restricts
> script in various ways:
> http://msdn2.microsoft.com/en-us/library/ms534622.aspx

I tried using this, but it was tied too closely to IE's own security
concepts to really make use of it, sadly.


On Thu, 15 May 2008, Henri Sivonen wrote:
> >
> > Documents don't have intrinsic dimensions, and the user's default font
> > size is likely to vary from user to usr. How would you know what
> > height and width to give?
>
> You give it the dimensions of an industry-standard ad banner size.

On Fri, 16 May 2008, James Justin Harrell wrote:
>
> The same way you would know what height and width to give to a
> non-replaced element. Why should an embedded document not be able to
> render as if the contents of the document were present inline in the
> parent document? Backwards compatibility should probably trump better
> behavior here, but why is it not possible to specify this through CSS?
>
> I've heard of this problem multiple times. For example,
> http://weblogs.mozillazine.org/gerv/archives/2005/02/autosizing_ifra.html

I've added height/width back.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Andrew Fedoniouk

Ian Hickson wrote:
>
> Summary:
>
>  * I've added a sandbox="" attribute to <iframe>, which by default
>    disables a number of features and takes a space-separated list of
>    features to re-enable:
>
...

Makes sense, Ian.

Additionally to this, what about adding <meta> tag that disables or
limits features of the page if it is running inside <frame> or <iframe>?

Say something like this:

<html>
   <head>
     <meta name="allowed-context" value="standalone-only" />
   </head>
   ...
</html>

That may prevent some types of malicious uses.

--
Andrew Fedoniouk.

http://terrainformatica.com

Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Martin Atkins-2
In reply to this post by Ian Hickson

Ian Hickson wrote:
> Summary:
>
>  * I've added a sandbox="" attribute to <iframe>, which by default
>    disables a number of features and takes a space-separated list of
>    features to re-enable:
>
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.

One alternative would be to use a different element name so that
fallback content can be provided for legacy browsers. In the short term,
this is likely to be something like this:

<sandbox src="/comments/blah">
<iframe src="/comments/blah?do-security-filtering=1"></iframe>
</sandbox>

Once a large percentage of browsers support <sandbox> authors can start
to be less accommodating with their fallback content, either by
filtering out HTML tags entirely (which I'd assume is easier than just
filtering out script) or at the extreme just setting the fallback
content to be "Your browser is not supported".

This comment does not address "seamless", which seems to be orthogonal
and can thus be equally applied to both sandbox and iframe as currently
specified.


Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Jon Ferraiolo-2

FYI - We have had some discussion in and around the topic of better iframes at OpenAjax Alliance:

http://www.openajax.org/runtime/wiki/Better_IFrames_Better_Sandboxing

However, people see shortcomings with all proposals listed on that page. Our hope was that the HTML5 leaders would figure out a good approach, so I am glad to see that Ian has started discussion on this topic.

Regarding Martin's comments, I think it is a correct objective to find a bridge between what exists with today's browsers and what we hope will exist in future browsers. The Ajax community usually needs to get the desired result in both legacy browsers and new browsers.

If you need to sandbox in today's browsers, what the community tends to use one of two approaches: (1) require that sandboxed components be expressed in a restricted subset of HTML and/or JavaScript, such as Caja or ADSafe or the markup restrictions for portlets, (2) place the sandboxed components into an IFRAME whose domain or subdomain differs from everything else on the page (ie, leveraging the browser same-domain policy to achieve sandboxing). The problem with approach #1 is that some functionality (potentially critical) is disabled and developers have to in effect learn a new language. The problem with approach #2 is that isolation is an all-or-nothing proposition and there are shortcomings with IFRAMEs, such as lack of automatic content sizing. Ian's proposal below addresses these IFRAME shortcomings directly, which is great.

If I had time to think extensively about this issue (which I don't), I would attempt to craft a proposal that used an approach where an Ajax library performed the mapping between what exists today (i.e., IFRAME) and what would exist in the future, where Ajax libraries could be eliminated once older browsers were put out of commission. My initial thought would be to put a 'sandbox' attribute on a DIV rather than on an IFRAME. That way, you end up with more powerful sandboxing, along the lines of what Doug Crockford proposed with his <module> tag. Newer browsers would deliver the sandboxing features that Ian is proposing below. For older browsers, someone could author an Ajax library that looked for DIV elements with a 'sandbox' attribute, and under the hood transformed the DOM such that it achieved sandboxing via IFRAMEs and implements the flexibility that Ian describes in his proposal via typical ugly Ajax hacks, such as passing messages via postMessage (or the even uglier fragment identifer approach).

Jon




Inactive hide details for Martin Atkins <mart@degeneration.co.uk>Martin Atkins <[hidden email]>



To

Ian Hickson <[hidden email]>

cc

whatwg <[hidden email]>, HTMLWG <[hidden email]>, [hidden email]

Subject

Re: The <iframe> element and sandboxing ideas


Ian Hickson wrote:
> Summary:
>
>  * I've added a sandbox="" attribute to <iframe>, which by default
>    disables a number of features and takes a space-separated list of
>    features to re-enable:
>
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.

One alternative would be to use a different element name so that
fallback content can be provided for legacy browsers. In the short term,
this is likely to be something like this:

<sandbox src="/comments/blah">
<iframe src="/comments/blah?do-security-filtering=1"></iframe>
</sandbox>

Once a large percentage of browsers support <sandbox> authors can start
to be less accommodating with their fallback content, either by
filtering out HTML tags entirely (which I'd assume is easier than just
filtering out script) or at the extreme just setting the fallback
content to be "Your browser is not supported".

This comment does not address "seamless", which seems to be orthogonal
and can thus be equally applied to both sandbox and iframe as currently
specified.




pic19618.gif (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [whatwg] The <iframe> element and sandboxing ideas

Bugzilla from giecrilj@stegny.2a.pl
In reply to this post by Martin Atkins-2

Legacy browsers will use @SRC which must be filtered.  They will ignore the
new content (whatever the attribute name will be) altogether so it need not
be filtered. Fallback @SRC can contain a URL to an error page saying "Sorry,
not in your browser".
Chris

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Martin Atkins
Sent: Thursday, May 22, 2008 2:21 PM
To: Ian Hickson
Cc: [hidden email]; whatwg; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas

Ian Hickson wrote:
> Summary:
>
>  * I've added a sandbox="" attribute to <iframe>, which by default
>    disables a number of features and takes a space-separated list of
>    features to re-enable:
>
[snip list]

Unless I'm missing something, this attribute is useless in practice
because legacy browsers will not impose the restrictions. This means
that as long as legacy browsers exist (i.e. forever) server-side
filtering must still be employed to duplicate the effects of the sandbox.





Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Boris Zbarsky
In reply to this post by Ian Hickson

Ian Hickson wrote:
>      - by default, content in sandboxed browsing contexts, and any
>        browsing contexts nested in them

How do those nested browsing contexts come about, given that later you say:

 >     - content in those browsing contexts cannot create new browsing
 >       contexts or open modal dialogs or alerts

?

>        have a unique origin
>        (independent of the origin of their URI); this can be overriden
>        using the "allow-same-origin" keyword

So the parent page cannot script the contents of the iframe by default, right?

>      - by default, script in those browsing contexts cannot run; this can
>        be overriden using the "allow-scripts" keyword

What happens if the parent page sets window.location to a javascript: URI on the
sandbox iframe?  Does the script run?  If so, in which browsing context?

>    causes the iframe to size vertically to the bounding box
>    of the contents, and horizontally to the width of the container

I assume that the bounding box is computed after setting the width?

By "the width of the container" do you mean that the iframe computed width
should be equal to its containing block's computed width?  Or that the
display:block non-replaced width algorithm from CSS should be used?

>    and which causes the initial containing block of the contents to be
>    treated as zero height.

So percentage heights would end up being 0, while the iframe would be whatever
height is needed if one assumes they're auto?

>    and the style sheets that apply to the <iframe>
>    must also apply to the contents.

But the ' ' and '>' combinators don't cross the iframe boundary, right?

> This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
> approaches taken.

As someone else pointed out, this doesn't seem like it would be usable without
some UA sniffing or something, as things stand.

> There are various things that this doesn't address yet; e.g. there's no
> way to force (or even allow) a non-seamless iframe to open links in the
> parent window.

This could be an @sandbox keyword value.

> This attribute would
> take a string which would then be interpreted as the source document
> markup of an HTML document, much like the above

This seems very prone to security issues (injection of the closing quote in the
content) to me...  The base64 approach is nice in that you can't shoot yourself
in the foot with it.

-Boris

Reply | Threaded
Open this post in threaded view
|

RE: [whatwg] The <iframe> element and sandboxing ideas

Bugzilla from giecrilj@stegny.2a.pl

1. Nested browsing contexts in a sandboxed frame cannot be created
dynamically but they can be defined by the inner markup.
2. If the frame is not allowed to execute scripts, setting location to
script should have no effect.
3. There is a potential discrepancy between applying parent width, which is
characteristic to block-level elements, and the declared element level in
that the level of a frame depends on an attribute.  This is unprecedented:
the elements in HTML have a fixed level by design.  Introducing a new
element should be reconsidered in view of that IMHO.
4. Percentage in height scales to the container's height, not to the initial
dimensions of the current element.  It is an error if the container's height
is left implicit or if the sum of percentages exceeds 100%.
5. The argument against SANDBOX is that the user could inject /SANDBOX.  The
argument against code attribute is that the user could inject a quote.
Aren't these similar enough to reconsider SANDBOX?  
It seems it is easier to sanitize quotes because the burden of quoting is on
the user.
Compare '<SANDBOX ><SANDBOX ></SANDBOX ></SANDBOX >' to '<SPAN
TITLE="&quot;" ></SPAN >' that must be converted to '"<SPAN
TITLE=&quot;&amp;quot;&quot; ></SPAN >'.  The quoting required seems
straightforward.  I agree that using a data URL is simpler and cannot be
viewed as an obstacle to productivity since the author's text must be
processed anyway, so why not just encode it?  And it is more consistent with
contemporary technology.
HTH,
Chris

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Boris Zbarsky
Sent: Thursday, May 22, 2008 6:27 PM
To: Ian Hickson
Cc: [hidden email]; whatwg; HTMLWG
Subject: Re: [whatwg] The <iframe> element and sandboxing ideas

Ian Hickson wrote:
>      - by default, content in sandboxed browsing contexts, and any
>        browsing contexts nested in them

How do those nested browsing contexts come about, given that later you say:

 >     - content in those browsing contexts cannot create new browsing
 >       contexts or open modal dialogs or alerts

?

>        have a unique origin
>        (independent of the origin of their URI); this can be overriden
>        using the "allow-same-origin" keyword

So the parent page cannot script the contents of the iframe by default,
right?

>      - by default, script in those browsing contexts cannot run; this can
>        be overriden using the "allow-scripts" keyword

What happens if the parent page sets window.location to a javascript: URI on
the
sandbox iframe?  Does the script run?  If so, in which browsing context?

>    causes the iframe to size vertically to the bounding box
>    of the contents, and horizontally to the width of the container

I assume that the bounding box is computed after setting the width?

By "the width of the container" do you mean that the iframe computed width
should be equal to its containing block's computed width?  Or that the
display:block non-replaced width algorithm from CSS should be used?

>    and which causes the initial containing block of the contents to be
>    treated as zero height.

So percentage heights would end up being 0, while the iframe would be
whatever
height is needed if one assumes they're auto?

>    and the style sheets that apply to the <iframe>
>    must also apply to the contents.

But the ' ' and '>' combinators don't cross the iframe boundary, right?

> This is all HIGHLY EXPERIMENTAL. I am looking for feedback on the general
> approaches taken.

As someone else pointed out, this doesn't seem like it would be usable
without
some UA sniffing or something, as things stand.

> There are various things that this doesn't address yet; e.g. there's no
> way to force (or even allow) a non-seamless iframe to open links in the
> parent window.

This could be an @sandbox keyword value.

> This attribute would
> take a string which would then be interpreted as the source document
> markup of an HTML document, much like the above

This seems very prone to security issues (injection of the closing quote in
the
content) to me...  The base64 approach is nice in that you can't shoot
yourself
in the foot with it.

-Boris



Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Boris Zbarsky

Kristof Zelechovski wrote:
> 1. Nested browsing contexts in a sandboxed frame cannot be created
> dynamically but they can be defined by the inner markup.

There was no mention of "dynamically" in Ian's proposal.  My assumption
was that "cannot create browsing contexts" meant just that.  If it
doesn't, the wording needs some changes.

> 2. If the frame is not allowed to execute scripts, setting location to
> script should have no effect.

OK.  Again, that was not clear in the original proposal.

> 4. Percentage in height scales to the container's height, not to the initial
> dimensions of the current element.  It is an error if the container's height
> is left implicit

It's not an error in CSS.  Or are you suggesting a different algorithm?

> or if the sum of percentages exceeds 100%.

Again, not a problem in CSS.  Percentages of auto just get treated as
auto.  If you're suggesting a totally different algorithm, it needs a
lot of fleshing out.

> 5. The argument against SANDBOX is that the user could inject /SANDBOX.  The
> argument against code attribute is that the user could inject a quote.
> Aren't these similar enough to reconsider SANDBOX?  

SANDBOX and the non-base64 attribute thing seem pretty similar in a lot
of ways to me, except that the iframe (having a separate Window and
such) might be easier to secure in existing implementations.

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Ojan Vafai
In reply to this post by Ian Hickson
On Wed, May 21, 2008 at 3:30 PM, Ian Hickson <[hidden email]> wrote:
 * I've added a seamless="" boolean attribute to <iframe>, which, if
  the content's active document's URI has the same origin as the
  container, causes the iframe to size vertically to the bounding box
  of the contents, and horizontally to the width of the container,
  and which causes the initial containing block of the contents to be
  treated as zero height. In addition, styles on the root element of
  the content must inherit from the <iframe> instead of being the
  initial values, and the style sheets that apply to the <iframe>
  must also apply to the contents. In addition, any time the browsing
  context navigates itself, the parent browsing context gets
  navigated instead.

This looks awesome. 

So, the whole point of these is defining elements that are isolated from their surrounding context on different axes. Same origin iframes currently just give you CSS isolation. sandbox affords script isolation. seamless affords the ability to turn off the CSS isolation.

Seems to me that we need a third property which controls event isolation. Currently events don't propagate in/out of iframes and event coordinates are all relative to the iframe's viewport (e.g. on mouse events). 

My first intuition was that seamless should also just propagate events and have mouse coordinate be relative to the parent browsing context. But I can think of cases where you would want to control the two separately. For example, if you are especially concerned about performance and don't want events in the parent browsing context to be handled by the iframe's contents.

Ojan
Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Jon Ferraiolo-2
In reply to this post by Jon Ferraiolo-2

Further comments after attending a talk at an IEEE security workshop (where Ian's proposal was presented to various security experts):

1) I take back my suggestion about considering <div sandbox="..."> versus Ian's original <iframe sandbox="..." />. Ian's original approach, although more restrictive, does start off from a foundation based on security concerns and then attempts to find ways to loosen them. The problem with <div> is that if the content is not well-formed and inserts an extra </div>, then the content after the </div> would not be sandboxed.

2) Olaf suggested that there might be another attribute to propagate events. This is definitely highly desirable in some scenarios. Note that the CDF WG has done some work that relates at least partially, although I wouldn't be surprised if Ian isn't all that positive on CDF. Nevertheless, here is the spec: http://www.w3.org/TR/WICD/. The WICD spec focuses on various aspects of not just event propagation, but also hyperlink propagation and focus management. All of these topics seem worthy of consideration in terms of bridging between the host web page and any of the iframes it embeds.

3) It seems to me that for some of the propagation areas (e.g., CSS propagation, event propagation, focus-model integration) you would want both the container and the component to opt-in before the propagation occurred. For example, with CSS propagation, there may be cases where the component only wants certain of its own characteristics to be stylable by the parent. If you look at typical Ajax widgets, which use CSS for controlling the visual rendering, there are some aspects which are meant to be stylable by the developer, some aspects that are meant to be "themable" (i.e., stylable via a shared theme), and other aspects which the widget needs to control exclusively and should not be overridden. I would assume that there are also security issues with allowing the parent to override the styling of an embedded iframe because conceivably someone could invoke a bank website within an iframe and it wouldn't be good if the parent could override some of the CSS for the bank's website. Similarly, you probably wouldn't want the parent frame to be able to listen to keystrokes that happen within the child iframe (e.g., your password). For some of the information passing between parent and child, it might be best to somehow use a publish/subscribe mechanism like how postMessage() works, where both both parent and child have to opt-in before the propagation occurs.

Jon


[hidden email] wrote on 05/22/2008 07:58:06 AM:

> FYI - We have had some discussion in and around the topic of better
> iframes at OpenAjax Alliance:
>
> http://www.openajax.org/runtime/wiki/Better_IFrames_Better_Sandboxing
>
> However, people see shortcomings with all proposals listed on that
> page. Our hope was that the HTML5 leaders would figure out a good
> approach, so I am glad to see that Ian has started discussion on this topic.
>
> Regarding Martin's comments, I think it is a correct objective to
> find a bridge between what exists with today's browsers and what we
> hope will exist in future browsers. The Ajax community usually needs
> to get the desired result in both legacy browsers and new browsers.
>
> If you need to sandbox in today's browsers, what the community tends
> to use one of two approaches: (1) require that sandboxed components
> be expressed in a restricted subset of HTML and/or JavaScript, such
> as Caja or ADSafe or the markup restrictions for portlets, (2) place
> the sandboxed components into an IFRAME whose domain or subdomain
> differs from everything else on the page (ie, leveraging the browser
> same-domain policy to achieve sandboxing). The problem with approach
> #1 is that some functionality (potentially critical) is disabled and
> developers have to in effect learn a new language. The problem with
> approach #2 is that isolation is an all-or-nothing proposition and
> there are shortcomings with IFRAMEs, such as lack of automatic
> content sizing. Ian's proposal below addresses these IFRAME
> shortcomings directly, which is great.
>
> If I had time to think extensively about this issue (which I don't),
> I would attempt to craft a proposal that used an approach where an
> Ajax library performed the mapping between what exists today (i.e.,
> IFRAME) and what would exist in the future, where Ajax libraries
> could be eliminated once older browsers were put out of commission.
> My initial thought would be to put a 'sandbox' attribute on a DIV
> rather than on an IFRAME. That way, you end up with more powerful
> sandboxing, along the lines of what Doug Crockford proposed with his
> <module> tag. Newer browsers would deliver the sandboxing features
> that Ian is proposing below. For older browsers, someone could
> author an Ajax library that looked for DIV elements with a 'sandbox'
> attribute, and under the hood transformed the DOM such that it
> achieved sandboxing via IFRAMEs and implements the flexibility that
> Ian describes in his proposal via typical ugly Ajax hacks, such as
> passing messages via postMessage (or the even uglier fragment
> identifer approach).
>
> Jon
>
>
>
>
> [image removed] Martin Atkins <[hidden email]>
>

>
> Martin Atkins <[hidden email]>
> Sent by: [hidden email]

> 05/22/08 05:20 AM
>
> [image removed]

> To
>
> [image removed]
> Ian Hickson <[hidden email]>

>
> [image removed]

> cc
>
> [image removed]
> whatwg <[hidden email]>, HTMLWG <[hidden email]>, [hidden email]

>
> [image removed]

> Subject
>
> [image removed]
> Re: The <iframe> element and sandboxing ideas

>
> [image removed]

>
> [image removed]

>
>
>
> Ian Hickson wrote:
> > Summary:
> >
> >  * I've added a sandbox="" attribute to <iframe>, which by default
> >    disables a number of features and takes a space-separated list of
> >    features to re-enable:
> >
> [snip list]
>
> Unless I'm missing something, this attribute is useless in practice
> because legacy browsers will not impose the restrictions. This means
> that as long as legacy browsers exist (i.e. forever) server-side
> filtering must still be employed to duplicate the effects of the sandbox.
>
> One alternative would be to use a different element name so that
> fallback content can be provided for legacy browsers. In the short term,
> this is likely to be something like this:
>
> <sandbox src="/comments/blah">
> <iframe src="/comments/blah?do-security-filtering=1"></iframe>
> </sandbox>
>
> Once a large percentage of browsers support <sandbox> authors can start
> to be less accommodating with their fallback content, either by
> filtering out HTML tags entirely (which I'd assume is easier than just
> filtering out script) or at the extreme just setting the fallback
> content to be "Your browser is not supported".
>
> This comment does not address "seamless", which seems to be orthogonal
> and can thus be equally applied to both sandbox and iframe as currently
> specified.
>
>
> [image removed]

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Ojan Vafai
In reply to this post by Ojan Vafai
A couple more thoughts.

1. When seamless is set, the compatMode of the iframe should be the same as that of the parent browsing context, even if the doctype of the iframe would put it in a different compatmode than its parent. 
2. The corrollary to this is that when seamless is not set that the compatMode of iframes created from JS should be backcompat unless a doctype is document.write'ed in. This is what every browser does currently AFAIK.
3. The behavior of seamless in the face of different overflow values needs to be spec'ed as well. I think the current spec deals well with overflow:visible. But if overflow is scroll or auto, then it should behave the way a div does with overflow scroll or auto (i.e. not size it's height to its contents).

Ojan

On Sat, May 24, 2008 at 10:55 AM, Ojan Vafai <[hidden email]> wrote:
On Wed, May 21, 2008 at 3:30 PM, Ian Hickson <[hidden email]> wrote:
 * I've added a seamless="" boolean attribute to <iframe>, which, if
  the content's active document's URI has the same origin as the
  container, causes the iframe to size vertically to the bounding box
  of the contents, and horizontally to the width of the container,
  and which causes the initial containing block of the contents to be
  treated as zero height. In addition, styles on the root element of
  the content must inherit from the <iframe> instead of being the
  initial values, and the style sheets that apply to the <iframe>
  must also apply to the contents. In addition, any time the browsing
  context navigates itself, the parent browsing context gets
  navigated instead.

This looks awesome. 

So, the whole point of these is defining elements that are isolated from their surrounding context on different axes. Same origin iframes currently just give you CSS isolation. sandbox affords script isolation. seamless affords the ability to turn off the CSS isolation.

Seems to me that we need a third property which controls event isolation. Currently events don't propagate in/out of iframes and event coordinates are all relative to the iframe's viewport (e.g. on mouse events). 

My first intuition was that seamless should also just propagate events and have mouse coordinate be relative to the parent browsing context. But I can think of cases where you would want to control the two separately. For example, if you are especially concerned about performance and don't want events in the parent browsing context to be handled by the iframe's contents.

Ojan

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Ojan Vafai
Revising a comment I made yesterday. A couple other things came to mind. 

What happens if an iframe is loaded with sandbox set and then the property it is unset? What security origin is it in? Similiarly, what happens when seamless is set/removed on an iframe already in the page? Does it start inheriting CSS and resize to fit it's content? I don't feel strongly about what should happen in these cases, seems worth being explicit though. 

1. When seamless is set, the compatMode of the iframe should be the same as that of the parent browsing context, even if the doctype of the iframe would put it in a different compatmode than its parent. 

I thought about this some more and this seems like a bad idea. If you actualy link to a page that expects to be quirks from a standards parent, then this could be break things. I'll modify this to the following:

Iframes with an empty src (or no src property) should inherit their parent's compatmode iff seamless is set, otherwise they should be in backcompat unless a standards doctype is document.write'ed in.

Again the latter part of that is for compatibility with current browsers.
 
2. The corrollary to this is that when seamless is not set that the compatMode of iframes created from JS should be backcompat unless a doctype is document.write'ed in. This is what every browser does currently AFAIK.
3. The behavior of seamless in the face of different overflow values needs to be spec'ed as well. I think the current spec deals well with overflow:visible. But if overflow is scroll or auto, then it should behave the way a div does with overflow scroll or auto (i.e. not size it's height to its contents).

Ojan


On Sat, May 24, 2008 at 10:55 AM, Ojan Vafai <[hidden email]> wrote:
On Wed, May 21, 2008 at 3:30 PM, Ian Hickson <[hidden email]> wrote:
 * I've added a seamless="" boolean attribute to <iframe>, which, if
  the content's active document's URI has the same origin as the
  container, causes the iframe to size vertically to the bounding box
  of the contents, and horizontally to the width of the container,
  and which causes the initial containing block of the contents to be
  treated as zero height. In addition, styles on the root element of
  the content must inherit from the <iframe> instead of being the
  initial values, and the style sheets that apply to the <iframe>
  must also apply to the contents. In addition, any time the browsing
  context navigates itself, the parent browsing context gets
  navigated instead.

This looks awesome. 

So, the whole point of these is defining elements that are isolated from their surrounding context on different axes. Same origin iframes currently just give you CSS isolation. sandbox affords script isolation. seamless affords the ability to turn off the CSS isolation.

Seems to me that we need a third property which controls event isolation. Currently events don't propagate in/out of iframes and event coordinates are all relative to the iframe's viewport (e.g. on mouse events). 

My first intuition was that seamless should also just propagate events and have mouse coordinate be relative to the parent browsing context. But I can think of cases where you would want to control the two separately. For example, if you are especially concerned about performance and don't want events in the parent browsing context to be handled by the iframe's contents.

Ojan


Reply | Threaded
Open this post in threaded view
|

Re: The <iframe> element and sandboxing ideas

Collin Jackson-4
In reply to this post by Jon Ferraiolo-2

On Sun, May 25, 2008 at 12:02 PM, Jon Ferraiolo <[hidden email]> wrote:
> I would assume that there are also
> security issues with allowing the parent to override the styling of an
> embedded iframe because conceivably someone could invoke a bank website
> within an iframe and it wouldn't be good if the parent could override some
> of the CSS for the bank's website. Similarly, you probably wouldn't want the
> parent frame to be able to listen to keystrokes that happen within the child
> iframe (e.g., your password).

Since the parent can already overlay password fields on top of the
sandboxed frame or replace it with a spoofed version, I don't think we
should encourage widgets to solicit passwords inside their sandboxed
frame if they don't trust their parent.

Collin Jackson


Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Jonas Sicking-2
In reply to this post by Ian Hickson

> On Mon, 23 Apr 2007, Jonas Sicking wrote:
>> There's a big difference to that and to what I'm proposing. With what's
>> in bug 80713 you're still limited to a box that basically doesn't take
>> part of the outer page at all. For example in the table example in my
>> original post the headers of the table would not resize to fit the
>> column sizes in the <include>ed table.
>
> Woah. That's far more radical. I have no idea how to do that. How would
> you make the parser not generate the implied elements and switch straight
> to the "in table" mode? How would you make the CSS model work with this?
> How would you define conformance for the document fragments?

The parser questions here are interesting for sure, but I believe they
could be solved.

One way to solve the "don't make the parser switch into mode X when it
hits the iframe" would be to teach the parser about <include> (or
<iframe seamless>, or <iframe include>, or whatever it'll be called).
That is pretty ugly though.

One way to solve the fragment issue would be to say that the inner
document always has to be a full document, and then use a fragment
identifier to point to the contents of a table.

The CSS model is simpler. XBL deals with exactly the same problem of
combining multiple DOMs into a single flattened tree on which CSS is
applied.

I'm still intending to do some testing with this idea once I get more
time. A lot of the implementation details have to be solved for XBL anyway.

/ Jonas

Reply | Threaded
Open this post in threaded view
|

Re: [whatwg] The <iframe> element and sandboxing ideas

Andrew Fedoniouk

Jonas Sicking wrote:

>
>> On Mon, 23 Apr 2007, Jonas Sicking wrote:
>>> There's a big difference to that and to what I'm proposing. With
>>> what's in bug 80713 you're still limited to a box that basically
>>> doesn't take part of the outer page at all. For example in the table
>>> example in my original post the headers of the table would not resize
>>> to fit the column sizes in the <include>ed table.
>>
>> Woah. That's far more radical. I have no idea how to do that. How
>> would you make the parser not generate the implied elements and switch
>> straight to the "in table" mode? How would you make the CSS model work
>> with this? How would you define conformance for the document fragments?
>
> The parser questions here are interesting for sure, but I believe they
> could be solved.
>
> One way to solve the "don't make the parser switch into mode X when it
> hits the iframe" would be to teach the parser about <include> (or
> <iframe seamless>, or <iframe include>, or whatever it'll be called).
> That is pretty ugly though.
>
> One way to solve the fragment issue would be to say that the inner
> document always has to be a full document, and then use a fragment
> identifier to point to the contents of a table.
>
> The CSS model is simpler. XBL deals with exactly the same problem of
> combining multiple DOMs into a single flattened tree on which CSS is
> applied.
>
> I'm still intending to do some testing with this idea once I get more
> time. A lot of the implementation details have to be solved for XBL anyway.
>
> / Jonas
>

That is known as "client side include"

<include src="data.partial.htm">
   Ooops, "data.partial.htm" is not available
</include>

After loading data.partial.htm node of <include> is getting replaced by
the content of data.partial.htm.

Simple and straightforward.

--
Andrew Fedoniouk.

http://terrainformatica.com