Limited DOM in Web Workers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Limited DOM in Web Workers

Jack Coulter-3
Hi,

I have a proposal of sorts, regarding Workers. As we all know, there's no
access to the DOM from within a Web Worker. While this is ideal for security
purposes, I can't help but think a restricted subset of the available DOM
manipulation methods would be incredibly useful.

I'm not talking about allowing Worker's to manipulate the main DOM tree of
the page, but rather, exposing DOMParser, and XMLHttpRequest.responseXML,
and a few other objects to workers, to allow the manipulation of DOM trees
which are never actually rendered to the page.

This would allow developers to parse and manipulate XML in workers, freeing
the main thread of a page to perform other tasks.

A possible counter argument, may be "Why not just use JSON instead?", and
while I agree that JSON is the easiest method of serialising and parsing
objects, sometimes developers must work with data from a source which
only provides XML.


An example of a use-case, I'd like to hack on the Strope.js XMPP
implementation to allow it to run in a worker thread, currently this is
impossible, without writing my own XML parser, which would undoubtedly
be slower than the native DOMParser)

Any thoughts?


Regards,
Jack Coulter


Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Boris Zbarsky
On 1/7/11 2:29 PM, Jack Coulter wrote:
> I'm not talking about allowing Worker's to manipulate the main DOM tree of
> the page, but rather, exposing DOMParser, and XMLHttpRequest.responseXML,
> and a few other objects to workers, to allow the manipulation of DOM trees
> which are never actually rendered to the page.

Whether they're rendered doesn't necessarily matter if the DOM
implementation is not threadsafe (which it's not, in today's UAs).  That
said...

> This would allow developers to parse and manipulate XML in workers, freeing
> the main thread of a page to perform other tasks.
...

> An example of a use-case, I'd like to hack on the Strope.js XMPP
> implementation to allow it to run in a worker thread, currently this is
> impossible, without writing my own XML parser, which would undoubtedly
> be slower than the native DOMParser)

If you think you could do this with your own XML parser, is there a
reason you can't do it with e4x (I never thought I'd say that, but this
seems like an actually good use case for something like e4x)?  That
should work fine in workers in Gecko-based browsers that support it, and
doesn't drag in the entire DOM implementation.

That leaves the problem of convincing developers of those ECMAScript
implementations that don't support e4x to support it, of course; while
things like http://code.google.com/p/v8/issues/detail?id=235#c42 don't
necessarily fill me with hope in that regard it may still be simpler
than convincing all browsers to rewrite their DOMs to be threadsafe in
the way that would be needed to support exposing an actual DOM in workers.

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Jack Coulter-3
Excerpts from Boris Zbarsky's message of Sat Jan 08 14:34:14 +1100 2011:

> On 1/7/11 2:29 PM, Jack Coulter wrote:
> > I'm not talking about allowing Worker's to manipulate the main DOM tree of
> > the page, but rather, exposing DOMParser, and XMLHttpRequest.responseXML,
> > and a few other objects to workers, to allow the manipulation of DOM trees
> > which are never actually rendered to the page.
>
> Whether they're rendered doesn't necessarily matter if the DOM
> implementation is not threadsafe (which it's not, in today's UAs).  That
> said...
>

Sorry, I wasn't really clear. What I meant was, a private DOM hierarchy. You
still wouldn't be able to access it in multiple places simultaneously, and
you'd still have to serialise it to a string to use it in postMessage. Forgive
my ignorance, but if this were the case, then isn't the thread-safety issue
effectively sidestepped?

> > This would allow developers to parse and manipulate XML in workers, freeing
> > the main thread of a page to perform other tasks.
> ...
>

Why '...'? Did I say something in error here?

> > An example of a use-case, I'd like to hack on the Strope.js XMPP
> > implementation to allow it to run in a worker thread, currently this is
> > impossible, without writing my own XML parser, which would undoubtedly
> > be slower than the native DOMParser)
>
> If you think you could do this with your own XML parser, is there a
> reason you can't do it with e4x (I never thought I'd say that, but this
> seems like an actually good use case for something like e4x)?  That
> should work fine in workers in Gecko-based browsers that support it, and
> doesn't drag in the entire DOM implementation.

I know of E4X, and while I think it's a really nice language feature, the lack
non-gecko support makes it substantially less useful.

> That leaves the problem of convincing developers of those ECMAScript
> implementations that don't support e4x to support it, of course; while
> things like http://code.google.com/p/v8/issues/detail?id=235#c42 don't
> necessarily fill me with hope in that regard it may still be simpler
> than convincing all browsers to rewrite their DOMs to be threadsafe in
> the way that would be needed to support exposing an actual DOM in workers.

Heh, some coincidence, I was actually reading through this very thread earlier,
today. After thinking about it, I'd say that E4X would be the best solution for
XML in Workers, but would need to be supported more widely.

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Jonas Sicking-2
In reply to this post by Boris Zbarsky
On Fri, Jan 7, 2011 at 7:34 PM, Boris Zbarsky <[hidden email]> wrote:

> On 1/7/11 2:29 PM, Jack Coulter wrote:
>>
>> I'm not talking about allowing Worker's to manipulate the main DOM tree of
>> the page, but rather, exposing DOMParser, and XMLHttpRequest.responseXML,
>> and a few other objects to workers, to allow the manipulation of DOM trees
>> which are never actually rendered to the page.
>
> Whether they're rendered doesn't necessarily matter if the DOM
> implementation is not threadsafe (which it's not, in today's UAs).  That
> said...
>
>> This would allow developers to parse and manipulate XML in workers,
>> freeing
>> the main thread of a page to perform other tasks.
>
> ...
>
>> An example of a use-case, I'd like to hack on the Strope.js XMPP
>> implementation to allow it to run in a worker thread, currently this is
>> impossible, without writing my own XML parser, which would undoubtedly
>> be slower than the native DOMParser)
>
> If you think you could do this with your own XML parser, is there a reason
> you can't do it with e4x (I never thought I'd say that, but this seems like
> an actually good use case for something like e4x)?  That should work fine in
> workers in Gecko-based browsers that support it, and doesn't drag in the
> entire DOM implementation.
>
> That leaves the problem of convincing developers of those ECMAScript
> implementations that don't support e4x to support it, of course; while
> things like http://code.google.com/p/v8/issues/detail?id=235#c42 don't
> necessarily fill me with hope in that regard it may still be simpler than
> convincing all browsers to rewrite their DOMs to be threadsafe in the way
> that would be needed to support exposing an actual DOM in workers.

I would strongly advice using e4x. It seems unlikely to be picked up
by other browsers, and I'm still hoping that we'll remove support from
gecko before long.

My question is instead, what part of the DOM is it that you want? One
of the most important features of the DOM is modifying what is being
displayed to the user. Obviously that isn't the features requested
here. Another important feature is simply holding a tree structure.
However plain javascript objects do that very well (better than the
DOM in many ways).

Other features of the DOM include form handling, parsing attribute
values in the form of integers, floats, comma-separated lists, etc,
URL resolving and more. Much of this doesn't seem very interesting to
do on workers, or at least important to have the browser provide an
implementation for in workers.

Hence I'm asking, why specifically would you like to access a DOM from workers?

/ Jonas

/ Jonas

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Keean Schupke-2
Hi, Sorry for this small aside, but it (slightly) relevent.

What do you suggest people use instead of e4x in general. For example:

var x = <table><tr><td>something</td></tr></table>;

Is a lot more elegant than:

var x2 = document.createTextNode('something');
var x1 = document.createElement('td');
x1.appendChild(x2);
var x0 = document.createElement('tr');
x0.appendChild(x1);
var x = document.createElement('table');
x.appendChild(x0);

The only thing I can think of is having a the table attached to the document but hidden and then copying the html fragment:

var x = document.getElementById('hiddentable').cloneNode(true);

But how do you ensure the renderer and DOM traversal ignores the hidden node as in a HTML5 app with multiple UI element that need be on screen at different times it could slow things down a lot.


Cheers,
Keean.


On 8 January 2011 09:09, Jonas Sicking <[hidden email]> wrote:
On Fri, Jan 7, 2011 at 7:34 PM, Boris Zbarsky <[hidden email]> wrote:
> On 1/7/11 2:29 PM, Jack Coulter wrote:
>>
>> I'm not talking about allowing Worker's to manipulate the main DOM tree of
>> the page, but rather, exposing DOMParser, and XMLHttpRequest.responseXML,
>> and a few other objects to workers, to allow the manipulation of DOM trees
>> which are never actually rendered to the page.
>
> Whether they're rendered doesn't necessarily matter if the DOM
> implementation is not threadsafe (which it's not, in today's UAs).  That
> said...
>
>> This would allow developers to parse and manipulate XML in workers,
>> freeing
>> the main thread of a page to perform other tasks.
>
> ...
>
>> An example of a use-case, I'd like to hack on the Strope.js XMPP
>> implementation to allow it to run in a worker thread, currently this is
>> impossible, without writing my own XML parser, which would undoubtedly
>> be slower than the native DOMParser)
>
> If you think you could do this with your own XML parser, is there a reason
> you can't do it with e4x (I never thought I'd say that, but this seems like
> an actually good use case for something like e4x)?  That should work fine in
> workers in Gecko-based browsers that support it, and doesn't drag in the
> entire DOM implementation.
>
> That leaves the problem of convincing developers of those ECMAScript
> implementations that don't support e4x to support it, of course; while
> things like http://code.google.com/p/v8/issues/detail?id=235#c42 don't
> necessarily fill me with hope in that regard it may still be simpler than
> convincing all browsers to rewrite their DOMs to be threadsafe in the way
> that would be needed to support exposing an actual DOM in workers.

I would strongly advice using e4x. It seems unlikely to be picked up
by other browsers, and I'm still hoping that we'll remove support from
gecko before long.

My question is instead, what part of the DOM is it that you want? One
of the most important features of the DOM is modifying what is being
displayed to the user. Obviously that isn't the features requested
here. Another important feature is simply holding a tree structure.
However plain javascript objects do that very well (better than the
DOM in many ways).

Other features of the DOM include form handling, parsing attribute
values in the form of integers, floats, comma-separated lists, etc,
URL resolving and more. Much of this doesn't seem very interesting to
do on workers, or at least important to have the browser provide an
implementation for in workers.

Hence I'm asking, why specifically would you like to access a DOM from workers?

/ Jonas

/ Jonas


Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Jack Coulter-3
In reply to this post by Jonas Sicking-2
> I would strongly advice using e4x. It seems unlikely to be picked up
> by other browsers, and I'm still hoping that we'll remove support from
> gecko before long.

I assume you meant to say "advise *against*"?

> My question is instead, what part of the DOM is it that you want? One
> of the most important features of the DOM is modifying what is being
> displayed to the user. Obviously that isn't the features requested
> here. Another important feature is simply holding a tree structure.
> However plain javascript objects do that very well (better than the
> DOM in many ways).
>
> Other features of the DOM include form handling, parsing attribute
> values in the form of integers, floats, comma-separated lists, etc,
> URL resolving and more. Much of this doesn't seem very interesting to
> do on workers, or at least important to have the browser provide an
> implementation for in workers.
>
> Hence I'm asking, why specifically would you like to access a DOM from workers?

Really, only two sections: DOMParser, and holding and manipulating the
tree (appendChild/removeChild/createElement/createTextNode, etc). The
goal here is to allow workers to parse/serialise/manipulate XML with
the same power and flexibility we have with the native JSON parser.

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Nathan Rixham-2
In reply to this post by Jonas Sicking-2
Jonas Sicking wrote:
> My question is instead, what part of the DOM is it that you want?

I actually need DOM support in WebWorkers as well, and have done for
quite some time, typically just the ability to convert XML and HTML
documents in to DOM trees and traverse, essentially DOM Level 3 - indeed
even and XML / HTML to javascript object structure replicating the tree
would be sufficient for certainly all of my own use cases.

General use case is simple, XHR requests w/ XML or text/html responses,
extract some information from the response and use, typical usage on the
web when working with APIs which return XML, when taking a "my website
is my api" approach, and when working with microdata and rdfa.

Best,

Nathan

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Boris Zbarsky
In reply to this post by Jack Coulter-3
On 1/8/11 4:07 AM, Jack Coulter wrote:
> Sorry, I wasn't really clear. What I meant was, a private DOM hierarchy. You
> still wouldn't be able to access it in multiple places simultaneously, and
> you'd still have to serialise it to a string to use it in postMessage. Forgive
> my ignorance, but if this were the case, then isn't the thread-safety issue
> effectively sidestepped?

You're assuming that none of the DOM implementation code uses any sort
of non-DOM objects, ever, or that if it does those objects are fully
threadsafe.  That's just not not the case, at least in Gecko.

The issue in this case is not the same DOM object being touched on
multiple threads.  The issue is two DOM objects on different threads
both touching some global third object.

For example, the XML parser has to do some things that in Gecko can only
be done on the main thread (DTD loading, offhand; there are a few others
that I've seen before but don't recall offhand).

>>> This would allow developers to parse and manipulate XML in workers, freeing
>>> the main thread of a page to perform other tasks.
>> ...
>>
> Why '...'? Did I say something in error here?

No, I was just indicating that I'd snipped some text there.

> I know of E4X, and while I think it's a really nice language feature, the lack
> non-gecko support makes it substantially less useful.

Well... so we're comparing a feature that's supported in Gecko but not
other UAs to a feature that's not supported in any UA, right?  ;)

(Fwiw, I think the way E4X was actually done is insane; heck it
redefines what the |x.y()| syntax means! But perhaps some other API
along those lines that doesn't actually create DOM nodes with all their
weird behaviors (e.g. if you create an <img> it tries to load things off
the network) and instead just parses XML into objects exposed to JS
would be a better fit for workers.)

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Jack Coulter-3
Excerpts from Boris Zbarsky's message of Sun Jan 09 10:42:46 +1100 2011:

> On 1/8/11 4:07 AM, Jack Coulter wrote:
> You're assuming that none of the DOM implementation code uses any sort
> of non-DOM objects, ever, or that if it does those objects are fully
> threadsafe.  That's just not not the case, at least in Gecko.
>
> The issue in this case is not the same DOM object being touched on
> multiple threads.  The issue is two DOM objects on different threads
> both touching some global third object.
>
> For example, the XML parser has to do some things that in Gecko can only
> be done on the main thread (DTD loading, offhand; there are a few others
> that I've seen before but don't recall offhand).
>

Ah, I didn't understand this before, thanks for the clarification.

> > I know of E4X, and while I think it's a really nice language feature, the lack
> > non-gecko support makes it substantially less useful.
>
> Well... so we're comparing a feature that's supported in Gecko but not
> other UAs to a feature that's not supported in any UA, right?  ;)
>
> (Fwiw, I think the way E4X was actually done is insane; heck it
> redefines what the |x.y()| syntax means! But perhaps some other API
> along those lines that doesn't actually create DOM nodes with all their
> weird behaviors (e.g. if you create an <img> it tries to load things off
> the network) and instead just parses XML into objects exposed to JS
> would be a better fit for workers.)

I agree this would probably be the best approach. We need to find or create
some API for *purely* manipulating/parsing/serialising XML documents, no
loading of resources like with the DOM. This is preferable to a javascript
based parser, for both developer ease (a single native implementation, rather
than a whole bunch of different javascript libraries), and speed reasons.


The real question is: Do we want to create something new? Perhaps at least
superficially resembling the DOM api, for developer familiarity. Or do we
simply want to have E4X universally supported, both in workers, and in
the main thread?

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

ATSUSHI TAKAYAMA
Hi,

On Sun, Jan 9, 2011 at 2:29 AM, Jack Coulter <[hidden email]> wrote:
> The real question is: Do we want to create something new? Perhaps at least
> superficially resembling the DOM api, for developer familiarity. Or do we
> simply want to have E4X universally supported, both in workers, and in
> the main thread?

Why don't you use a JavaScript implementation of DOM for now? And if
so many developers take such approach, then browser vendors can
consider implementing it natively. (But the reality is, XML is just
hardly ever used on client side)
I know that jsdom https://github.com/tmpvar/jsdom already has some use
cases with Node.js and some people run YUI or jQuery on top of it.

A.TAKAYAMA

Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Keean Schupke-2
Surely the idea is to do more on the client? I for one am using XML web service APIs on the servers. The client JavaScript/HTTP is downloaded once into the AppCache. We have 3 or 4 different Web-Service servers performing different functions (say a registration service, an upload service, geo-location correction service). Some of the services are provided by other people. (geo-location correction service is provided directly by NOAA for example).

The 'client' runs the complete application written in JavaScript, and queries each service in XML. This uses the 'client' CPU to do the work. Its like having an N machine cluster super-computer where N equals the number of users (so CPU sales perfectly 1 to 1).

IMHO this is the future of the internet/web, and this is what the webapps-working group should be trying to encourage and make easier.

On this basis client side XML is central to what a web-app is. Although I would prefer to use JSON in our own services we have to be able to use web-services provided by other people too.


Cheers,
Keean.


On 9 January 2011 08:32, ATSUSHI TAKAYAMA <[hidden email]> wrote:
Hi,

On Sun, Jan 9, 2011 at 2:29 AM, Jack Coulter <[hidden email]> wrote:
> The real question is: Do we want to create something new? Perhaps at least
> superficially resembling the DOM api, for developer familiarity. Or do we
> simply want to have E4X universally supported, both in workers, and in
> the main thread?

Why don't you use a JavaScript implementation of DOM for now? And if
so many developers take such approach, then browser vendors can
consider implementing it natively. (But the reality is, XML is just
hardly ever used on client side)
I know that jsdom https://github.com/tmpvar/jsdom already has some use
cases with Node.js and some people run YUI or jQuery on top of it.

A.TAKAYAMA


Reply | Threaded
Open this post in threaded view
|

Re: Limited DOM in Web Workers

Jonas Sicking-2
In reply to this post by Jack Coulter-3
On Sat, Jan 8, 2011 at 11:29 PM, Jack Coulter <[hidden email]> wrote:

> Excerpts from Boris Zbarsky's message of Sun Jan 09 10:42:46 +1100 2011:
>> On 1/8/11 4:07 AM, Jack Coulter wrote:
>> You're assuming that none of the DOM implementation code uses any sort
>> of non-DOM objects, ever, or that if it does those objects are fully
>> threadsafe.  That's just not not the case, at least in Gecko.
>>
>> The issue in this case is not the same DOM object being touched on
>> multiple threads.  The issue is two DOM objects on different threads
>> both touching some global third object.
>>
>> For example, the XML parser has to do some things that in Gecko can only
>> be done on the main thread (DTD loading, offhand; there are a few others
>> that I've seen before but don't recall offhand).
>>
>
> Ah, I didn't understand this before, thanks for the clarification.
>
>> > I know of E4X, and while I think it's a really nice language feature, the lack
>> > non-gecko support makes it substantially less useful.
>>
>> Well... so we're comparing a feature that's supported in Gecko but not
>> other UAs to a feature that's not supported in any UA, right?  ;)
>>
>> (Fwiw, I think the way E4X was actually done is insane; heck it
>> redefines what the |x.y()| syntax means! But perhaps some other API
>> along those lines that doesn't actually create DOM nodes with all their
>> weird behaviors (e.g. if you create an <img> it tries to load things off
>> the network) and instead just parses XML into objects exposed to JS
>> would be a better fit for workers.)
>
> I agree this would probably be the best approach. We need to find or create
> some API for *purely* manipulating/parsing/serialising XML documents, no
> loading of resources like with the DOM. This is preferable to a javascript
> based parser, for both developer ease (a single native implementation, rather
> than a whole bunch of different javascript libraries), and speed reasons.
>
> The real question is: Do we want to create something new? Perhaps at least
> superficially resembling the DOM api, for developer familiarity. Or do we
> simply want to have E4X universally supported, both in workers, and in
> the main thread?

My recommendation is that people experiment with this by writing JS
libraries to fulfill this use cases (there are already both XML and
HTML parsers written purely in javascript). That will provide feedback
to spec writers and browser implementers as to what solutions work and
what advantages/disadvantages they have.

/ Jonas