Last Comment Bug 649537 - (new-web-workers) Workers: Make one OS thread and JS runtime per worker, and lose XPConnect
(new-web-workers)
: Workers: Make one OS thread and JS runtime per worker, and lose XPConnect
Status: RESOLVED FIXED
: addon-compat, dev-doc-complete
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: mozilla8
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
Mentors:
: 621205 627550 664650 (view as bug list)
Depends on: 671015 672204 672229 672313 672370 672382 674541 675578 677194 678057 682183 687216 687220 687335 687929 700166 713415 718311
Blocks: 617569 650411 654265 669545 673304 690418 714253
  Show dependency treegraph
 
Reported: 2011-04-12 17:45 PDT by Ben Turner (not reading bugmail, use the needinfo flag!)
Modified: 2012-04-19 23:10 PDT (History)
48 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Snapshot 1 (221.56 KB, patch)
2011-04-12 17:47 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Snapshot 2 (249.71 KB, patch)
2011-04-28 18:12 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Snapshot 3 (370.11 KB, patch)
2011-05-11 14:19 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Snapshot 4 (384.48 KB, patch)
2011-05-24 16:16 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Patch, v1 (489.82 KB, patch)
2011-06-05 11:44 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jonas: review+
mrbkap: review+
Details | Diff | Splinter Review
Error propagation changes, v1 (40.95 KB, patch)
2011-06-24 14:00 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jonas: review+
Details | Diff | Splinter Review
Close handler changes, v1 (50.92 KB, patch)
2011-07-06 14:28 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jonas: review+
Details | Diff | Splinter Review
Follow up fix (1.96 KB, patch)
2011-07-17 23:46 PDT, Hiroyuki Ikezoe (:hiro)
gal: review+
Details | Diff | Splinter Review
Final patch (920.70 KB, patch)
2011-07-25 21:55 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
bent.mozilla: review+
Details | Diff | Splinter Review
Crash fix (4.24 KB, patch)
2011-07-25 21:57 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review

Description Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-12 17:45:50 PDT
We're reimplementing workers in JSAPI only with one OS thread and one JSRuntime per worker. We're also losing XPConnect.

One runtime per thread is apparently important for other JS work as well.
Comment 1 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-12 17:47:21 PDT
Created attachment 525576 [details] [diff] [review]
Snapshot 1

This implements the new model. PostMessage and error reporting are handled, as well as some close handlers. Next up are timeouts and XHR.
Comment 2 Luke Wagner [:luke] 2011-04-12 21:42:24 PDT
\o/ \o/ \o/ \o/ \o/ \o/
Comment 3 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-12 23:10:29 PDT
I still don't quite understand this all - we really should be
able to use the same interfaces in main thread and in workers.
Writing everything manually using JS API is extremely error prone.
Could we start writing some tool which generates
JS API -friendly code from .idl interfaces? That way we could 
guarantee that when the interfaces change, the implementation will be
changed both in workers and in main thread.
(Although even that wouldn't make sure that the behavior is the same.)


Though, actually, I don't even know if and when this bug should be fixed, or
is this all prototyping-like code?

(Sorry, if I sound too pessimistic.)
Comment 4 Andreas Gal :gal 2011-04-12 23:22:32 PDT
That tool is coming. This bug does stuff by hand to see how everything fits together and as part of the dom bindings work we will create the tool to automate this.
Comment 5 Luke Wagner [:luke] 2011-04-12 23:50:13 PDT
(I should have probably asked this before all the cheering, but, just to be clear) will this let us make JSRuntimes officially single-threaded (except of course for the multi-threading we choose to do inside the JS engine)?
Comment 6 Andreas Gal :gal 2011-04-13 00:11:49 PDT
That's the plan.
Comment 7 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-13 00:16:15 PDT
(In reply to comment #3)
> Though, actually, I don't even know if and when this bug should be fixed, or
> is this all prototyping-like code?

This is prototyping, yes. We thought workers would be a nice self-contained world that we could experiment with to see what the IDL->JSAPI conversion would look like. Automation is most definitely the next step once that is all agreed upon.
Comment 8 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-13 00:20:15 PDT
(In reply to comment #5)
> (I should have probably asked this before all the cheering, but, just to be
> clear) will this let us make JSRuntimes officially single-threaded (except of
> course for the multi-threading we choose to do inside the JS engine)?

There's another use in IndexedDB that I just remembered today... Basically we need the ability to modify a structured clone byte stream to add a property rather than the deserialize+modify+reserialize procedure that we do right now for autoIncrement objectStore data.
Comment 9 neil@parkwaycc.co.uk 2011-04-13 02:18:36 PDT
(In reply to comment #1)
> XHR.
How are you doing events/callbacks without XPConnect?
Comment 10 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-13 03:34:51 PDT
Oh, right. That is a good question.
Does the patch support both
addEventListener("foo", function(evt) {}, false)
and
addEventListener("foo", { handleEvent: function(evt) {}}, false)
Comment 11 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-13 08:08:53 PDT
(In reply to comment #9)
> (In reply to comment #1)
> > XHR.
> How are you doing events/callbacks without XPConnect?

Events and listeners are done in JSAPI (see Events.[h|cpp] and ListenerManager.[h|cpp] in the patch).

We may have to use XPConnect on the main thread for security or to make other DOM classes happy but I haven't cared too much at this point. It will need some security review.
Comment 12 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-13 08:09:38 PDT
(In reply to comment #10)
> Oh, right. That is a good question.
> Does the patch support both
> addEventListener("foo", function(evt) {}, false)
> and
> addEventListener("foo", { handleEvent: function(evt) {}}, false)

No, currently it only supports the first. I can add the second one easily though.
Comment 13 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-13 13:32:18 PDT
Oh, the mainthread part really should use nsDOMEventTargetHelper (if it doesn't use it already)
Comment 14 Luke Wagner [:luke] 2011-04-15 11:43:03 PDT
Ben: do you have any plans to land the JSRuntime work early?  If so, as witnessed by the two threads of discussion, perhaps this bug should be cloven?
Comment 15 Luke Wagner [:luke] 2011-04-15 11:51:20 PDT
*** Bug 621205 has been marked as a duplicate of this bug. ***
Comment 16 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-15 12:16:46 PDT
(In reply to comment #14)
> Ben: do you have any plans to land the JSRuntime work early?  If so, as
> witnessed by the two threads of discussion, perhaps this bug should be cloven?

I don't think I can really split this... The new runtime/thread per worker is too tightly integrated into the new lifetime model.
Comment 17 Mike Moening 2011-04-15 14:38:13 PDT
Does this prevent embedders from using multiple threads & contexts per JSRuntime?
I sure hope not.
Comment 18 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-15 14:47:10 PDT
(In reply to comment #17)
> Does this prevent embedders from using multiple threads & contexts per
> JSRuntime?

This bug is only about changing the way that DOM workers are implemented. Changes to the JS engine may happen elsewhere.
Comment 19 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-04-28 18:12:11 PDT
Created attachment 529009 [details] [diff] [review]
Snapshot 2

Lots of fixes, adds timeouts. XHR is next.

Oh, tried lots of different approaches but couldn't reuse smaug's nsDOMEventTargetHelper. We figured out that we don't really need it.
Comment 20 Boris Zbarsky [:bz] 2011-04-28 18:28:08 PDT
Why the "// Fudge a little..." bit when firing timers?

Also, should there not be a 4ms nested timeout clamp here?
Comment 21 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-05-11 14:19:04 PDT
Created attachment 531744 [details] [diff] [review]
Snapshot 3

This adds XHR to the mix. Mostly done now, just tying up some loose ends.
Comment 22 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-05-11 14:22:57 PDT
Comment on attachment 531744 [details] [diff] [review]
Snapshot 3

Blake, how does the JSAPI usage look?
Comment 23 Andreas Gal :gal 2011-05-11 14:25:34 PDT
++bent
Comment 24 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-05-16 14:16:12 PDT
So it is still unclear to me do we want to land this or do we use the
patch as a example what kind of code the becoming tool should generate?
And what is the timeframe for all this?

We should be adding new APIs workers now, but if all that code will be
rewritten very soon, it may not make sense to implement anything new
using the current workers code.
Comment 25 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-05-24 16:16:22 PDT
Created attachment 534930 [details] [diff] [review]
Snapshot 4

This is almost done, just working through a few test failures.
Comment 26 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-05-30 23:39:51 PDT
Anyone who wants to follow along or play with the new workers can clone https://hg.mozilla.org/users/bturner_mozilla.com/new-web-workers
Comment 27 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-05 11:44:47 PDT
Created attachment 537467 [details] [diff] [review]
Patch, v1

This is ready for review, all tests pass.
Comment 28 Brendan Eich [:brendan] 2011-06-05 12:27:13 PDT
(In reply to comment #11)
> We may have to use XPConnect on the main thread for security or to make
> other DOM classes happy but I haven't cared too much at this point. It will
> need some security review.

Do not kick the security can down the road. Security properties are not modular, they cross-cut and you cannot "add them back in later".

What's being skipped right now, that comment 11 alludes to here without saying enough?

/be
Comment 29 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-05 20:49:59 PDT
*** Bug 627550 has been marked as a duplicate of this bug. ***
Comment 30 Luke Wagner [:luke] 2011-06-09 17:10:43 PDT
Comment on attachment 537467 [details] [diff] [review]
Patch, v1

Review of attachment 537467 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsvector.h
@@ +299,5 @@
> +    Vector &operator=(const Vector &other) {
> +      resizeUninitialized(other.length());
> +    }
> +
> +    Vector(const Vector &);

Was this change intentional?  operator= looks wrong and the cctor is still undefined.  Could you revert these changes and, if you need to clone a vector, make an external "bool Clone(const Vector &src, Vector *dst)" function?
Comment 31 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-09 19:33:38 PDT
(In reply to comment #30)
> ::: js/src/jsvector.h

Weird, I have no idea where that came from. Totally unnecessary, consider it nuked from orbit!
Comment 32 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-09 19:41:15 PDT
(In reply to comment #28)
> What's being skipped right now, that comment 11 alludes to here without
> saying enough?

Nothing is being skipped now, to my knowledge.

Back when I wrote comment 11 there was some concern about firing DOM events on the main thread without using smaug's nsDOMEventTargetHelper. It's extremely difficult to use that helper for workers since it involves the cycle collector and some rather confusing private-ish IDL DOM interfaces. After trying several different ways to incorporate it I gave up.

We've talked about it a bunch since then and we're pretty convinced that the workaround here (specifically, always pushing the correct JSContext on the main thread's stack and checking for navigated windows) is safe.
Comment 33 Blake Kaplan (:mrbkap) 2011-06-13 15:43:19 PDT
As an update: I'm about halfway through reviewing the JSAPI stuff. I've only found nits so far... There's a few things I'm going to re-check, but overall, this looks pretty good.
Comment 34 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-06-18 09:48:02 PDT
Question, if we're going to have one OS thread per worker, how are we 
going to handle the case when web pages create lots of workers.
IIRC there was a message about the problem long ago in whatwg, and 
Chrome couldn't handle such case because it mapped
worker<->process or worker<->thread.
Comment 35 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-18 10:08:52 PDT
(In reply to comment #34)

The solution implemented in this patch is to provide up to 20 workers per domain (tld+0). This limit is tweakable via a preference. After that workers are queued and will not run until another finishes. This limit includes sub-workers.

Chrome is exempted and always gets a real thread, regardless of whether it is creating a Worker or ChromeWorker.
Comment 36 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-19 15:32:49 PDT
That is unfortunate. I can easily imagine people doing audio processing wanting to have one worker per processing node (small independent heaps keeps GC pause times low). Is there any hope of getting back to our current thread-pool model?
Comment 37 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-20 07:45:56 PDT
(In reply to comment #36)
> Is there any hope of getting back to our current thread-pool model?

It all depends on how what the JS folks think. The current model causes problems because the JIT and compartment structures are supposed to be per-thread, so moving workers back and forth between different threads was killing caches. I think the code needed to support such a model was also getting complex. I'm not entirely sure that my description here is accurate, luke would probably need to weigh in, but I know they have all been quite excited to have this stuff move to be single threaded.

This patch splits things so much that each thread now has its own runtime. I'm not sure if that makes things harder or easier. But if the JS engine can handle the threadpool approach then I think we could move back to that over the dedicated threads that we have now.
Comment 38 Mike Moening 2011-06-20 07:55:37 PDT
Does my question in #17 and the answer in #18 still hold true? This talk has me worried for those of use who really use multiple JS threads and contexts in a pooled setup...

And another stupid question, why do you need to have 1 runtime per thread? That just seems wasteful.
Comment 39 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-20 08:19:01 PDT
(In reply to comment #38)
> Does my question in #17 and the answer in #18 still hold true? 

Yes. I don't know what the JS folks are planning.

> And another stupid question, why do you need to have 1 runtime per thread?

Shared runtimes means shared GC. This way all workers are as independent as the OS can make them.
Comment 40 Mike Moening 2011-06-20 08:21:51 PDT
>> Shared runtimes means shared GC.

Not necessarily.  I thought compartment level GC was coming very soon.
Comment 41 Andrew McCreight [:mccr8] 2011-06-20 08:26:37 PDT
Per-compartment GC is already in, but global GCs are still occasionally required, because compartments can contain references to other compartments.
Comment 42 Luke Wagner [:luke] 2011-06-20 10:21:52 PDT
What is important to JS that a compartment/runtime is only accessed by a single thread *at a time*.  Thus, it should be possible to multiplex N workers' runtimes across M threads as long as you do something analogous to what is currently done with JS_SetContextThread.

MikeM: The bug for the JS changes to make JSRuntime single-threaded is bug 650411.  Note that the title/locking code for concurrent access to objects has been removed as of (FF4.0) so if you are using a single object from multiple threads without some extended form of a cross-compartment wrapper that handles synchronization you are breaking a JS engine invariant and should expect bad things to happen.
Comment 43 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 11:45:59 PDT
(In reply to comment #36)
> Is there any hope of getting back to our current thread-pool model?

Can we please not do that?  In the current model, if a page creates a Worker, there's no guarantee the Worker ever runs because there's a limit on the number of OS threads allocated to all Workers across all tabs, and there's no preemption mechanism.  I'm not a fan of the magical limit of 20 OS threads per domain, but it's definitely a step in the right direction IMHO.  Any model in which we run workers using an nsThreadPool-style mechanism instead of handing them an OS thread is a model in which infinite DoS is possible, because of lack of preemption.  I agree that per-domain pools would be superior to a global pool, but OS-thread-per-worker is superior to per-domain pools.

What are the goals of the audio-processing code you're imagining?  (Limiting GC pause times is a mechanism to achieve some other goal.)
Comment 44 Igor Bukanov 2011-06-20 13:10:46 PDT
(In reply to comment #43)
> In the current model, if a page creates a
> Worker, there's no guarantee the Worker ever runs because there's a limit on
> the number of OS threads allocated to all Workers across all tabs, and
> there's no preemption mechanism.  I'm not a fan of the magical limit of 20
> OS threads per domain, but it's definitely a step in the right direction
> IMHO.  Any model in which we run workers using an nsThreadPool-style
> mechanism instead of handing them an OS thread is a model in which infinite
> DoS is possible, because of lack of preemption.

Well, we can implement own scheduling. But that would mean that we completely ignore the services provided by the OS. So one worker per native thread sounds right.

On the other hand with one JS runtime  per worker we would use at least 1MB per worker as 1MB is the minimal allocation unit for the GC.
Comment 45 Igor Bukanov 2011-06-20 13:13:42 PDT
(In reply to comment #41)
> Per-compartment GC is already in, but global GCs are still occasionally
> required, because compartments can contain references to other compartments.

Note that even per-compartment GC still stops all the threads. It just runs faster than a full GC it does not need to mark live objects from all the compartments.
Comment 46 Andreas Gal :gal 2011-06-20 13:21:15 PDT
This is all virtual memory. On a machine very many workers make sense, 500 workers or so should be no big deal. Lets not complicate matters until we have proof we actually need thread sharing.
Comment 47 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 16:17:22 PDT
(In reply to comment #44)
> On the other hand with one JS runtime  per worker we would use at least 1MB
> per worker as 1MB is the minimal allocation unit for the GC.

Urgh.

(In reply to comment #45)
> Note that even per-compartment GC still stops all the threads. It just runs
> faster than a full GC it does not need to mark live objects from all the
> compartments.

I assume it only stops all the threads participating in the main JSRuntime? I sure hope it doesn't stop my audio-processing Worker threads (with Ben's patches). Or do we need bug 650411 as well?

(In reply to comment #46)
> This is all virtual memory. On a machine very many workers make sense, 500
> workers or so should be no big deal. Lets not complicate matters until we
> have proof we actually need thread sharing.

Currently we use three threads per media element, and that definitely causes severe problems on some pages. We are doing a bunch of work to reduce that thread usage, e.g. in bug 592833 (problems with 350 threads in that bug). I would like to avoid hitting the same problem again when Web authors start doing audio processing in Workers.

(In reply to comment #43)
> Can we please not do that?  In the current model, if a page creates a
> Worker, there's no guarantee the Worker ever runs because there's a limit on
> the number of OS threads allocated to all Workers across all tabs, and
> there's no preemption mechanism.  I'm not a fan of the magical limit of 20
> OS threads per domain, but it's definitely a step in the right direction
> IMHO.  Any model in which we run workers using an nsThreadPool-style
> mechanism instead of handing them an OS thread is a model in which infinite
> DoS is possible, because of lack of preemption.

Safety limits on the number of OS threads is an orthogonal issue, and I actually don't care about it. We could even have a single threadpool with per-domain limits on the number of active OS threads. My main concern right now is VM exhaustion by well-behaved apps that simply want to create a few hundred Workers (each of which is idle most of the time, so the peak number of OS threads needed is much smaller). Once we have more data I may also get concerned about per-JSRuntime memory overhead.

> What are the goals of the audio-processing code you're imagining?  (Limiting
> GC pause times is a mechanism to achieve some other goal.)

Basically we've got data flowing through the audio graph, and at each non-input node we have a filter that repeatedly reads an audio buffer from each input and computes an output buffer. (State is often retained between invocations.) Some applications can tolerate high latency (large buffers and significant pause times during processing), but many applications require low latency (small buffers, frequent invocations and minimal pause times).

Low latency means we need to be off the main thread, which means in one or more Workers. The mapping of filters onto Workers is author-visible because the JS code might access Worker-global state, so we have to specify what that mapping is or let authors choose. In my proposal, authors can choose, but the most natural, simplest and best mapping from filters to Workers is one-to-one: filters usually have independent state, so minimizing the heap usage in each Worker means smaller pause times, and independent Workers means more flexible scheduling --- lower latency and better usage of multiple cores.
Comment 48 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 16:19:17 PDT
Having said all that, if we know how to do Workers in a threadpool and we just aren't doing it until we have concrete data justifying the complexity, I'm fine with that for now. I expect to have that data sooner than you expect, that's all :-). I'm just checking that we're not backing ourselves into a corner that will be hard to get out of.
Comment 49 Luke Wagner [:luke] 2011-06-20 16:33:51 PDT
(In reply to comment #44)
> On the other hand with one JS runtime  per worker we would use at least 1MB
> per worker as 1MB is the minimal allocation unit for the GC.

If this becomes an issue, we've discussed, as a GC internal implementation detail, doing something to share memory between separate runtimes on the same HW thread (I know we're talking about not multiplexing HW threads, I'm just saying if this became a problem).
Comment 50 Igor Bukanov 2011-06-20 16:35:08 PDT
(In reply to comment #47)
> I assume it only stops all the threads participating in the main JSRuntime?
> I sure hope it doesn't stop my audio-processing Worker threads (with Ben's
> patches). Or do we need bug 650411 as well?

The GC is per JSRuntime, so the GC from the main JSRuntime does not affect other threads if those have separated runtimes.
Comment 51 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 16:47:18 PDT
> (In reply to comment #43)
> My main concern right
> now is VM exhaustion by well-behaved apps that simply want to create a few
> hundred Workers (each of which is idle most of the time, so the peak number
> of OS threads needed is much smaller).

We can easily take away OS threads from idle workers.  That's another orthogonal problem.   Ben and I discussed that in another bug.

> 
> > What are the goals of the audio-processing code you're imagining?  (Limiting
> > GC pause times is a mechanism to achieve some other goal.)
> 
> Basically we've got data flowing through the audio graph, and at each
> non-input node we have a filter that repeatedly reads an audio buffer from
> each input and computes an output buffer. (State is often retained between
> invocations.) Some applications can tolerate high latency (large buffers and
> significant pause times during processing), but many applications require
> low latency (small buffers, frequent invocations and minimal pause times).
> 
> Low latency means we need to be off the main thread, which means in one or
> more Workers. The mapping of filters onto Workers is author-visible because
> the JS code might access Worker-global state, so we have to specify what
> that mapping is or let authors choose. In my proposal, authors can choose,
> but the most natural, simplest and best mapping from filters to Workers is
> one-to-one: filters usually have independent state, so minimizing the heap
> usage in each Worker means smaller pause times, and independent Workers
> means more flexible scheduling --- lower latency and better usage of
> multiple cores.

How do authors specify filters?  It's not clear to me why the API needs to be intertwined with Workers specifically, per se.  But I'd like to know more details.
Comment 52 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 16:54:54 PDT
If you take OS threads away from idle workers and make them available to other workers, how is that different from using a thread pool?

(In reply to comment #51)
> How do authors specify filters?  It's not clear to me why the API needs to
> be intertwined with Workers specifically, per se.  But I'd like to know more
> details.

Right now I'm proposing they specify filters by writing an "onprocessstream" event handler in a worker. If filters didn't retain state between invocations, and never needed to communicate with each other or the main thread, and we were sure filters would never need to use XHR or any other Worker features, we could do something different, like use a JS function with a clean global state every time. But none of those things are true. (E.g. it would make total sense to have a zero-input "filter" that streams audio via XHR and decodes it; people are already wanting to do this.)
Comment 53 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 17:17:24 PDT
(In reply to comment #52)
> If you take OS threads away from idle workers and make them available to
> other workers, how is that different from using a thread pool?

That is a kind of thread pool.  But I was careful to say *nsThreadPool*, which is our current implementation that you wanted to return to.  My personal reason for wanting worker-per-OS-thread is bug 530886, which is a plan to more intelligently schedule workers.  That kind of scheduling would be good for your a/v processing too, IIUC.  The goal is to intelligently distribute CPU time to possibly-infinitely-running OS threads, not to distribute dispatch of nsIRunnable as nsThreadPool abstracts over.  We could build that kind of scheduling on top of nsThreadPool, but nsThreadPool would only get in the way.

> (In reply to comment #51)
> > How do authors specify filters?  It's not clear to me why the API needs to
> > be intertwined with Workers specifically, per se.  But I'd like to know more
> > details.
> 
> Right now I'm proposing they specify filters by writing an "onprocessstream"
> event handler in a worker. If filters didn't retain state between
> invocations, and never needed to communicate with each other or the main
> thread, and we were sure filters would never need to use XHR or any other
> Worker features, we could do something different, like use a JS function
> with a clean global state every time. But none of those things are true.
> (E.g. it would make total sense to have a zero-input "filter" that streams
> audio via XHR and decodes it; people are already wanting to do this.)

OK.  Do you have a link to a WIP spec or examples?  Does the JS that sets up the processing graph need to explicitly pass references to Workers when building the nodes, or can they be created implicitly?  After the graph is created, can script ask for handles to the Workers backing particular nodes?  Maybe this is silly (I don't know yet), but if full Worker power isn't needed in the common case, I'd rather use a simpler mechanism that gave us more flexibility in implementation.
Comment 54 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 17:31:44 PDT
(In reply to comment #53)
> That is a kind of thread pool.  But I was careful to say *nsThreadPool*,
> which is our current implementation that you wanted to return to.

Great! I didn't want to imply we need our current implementation, but our current *model*, which is that workers can share OS threads. I don't care how that is done.

> > (In reply to comment #51)
> > Right now I'm proposing they specify filters by writing an "onprocessstream"
> > event handler in a worker. If filters didn't retain state between
> > invocations, and never needed to communicate with each other or the main
> > thread, and we were sure filters would never need to use XHR or any other
> > Worker features, we could do something different, like use a JS function
> > with a clean global state every time. But none of those things are true.
> > (E.g. it would make total sense to have a zero-input "filter" that streams
> > audio via XHR and decodes it; people are already wanting to do this.)
> 
> OK.  Do you have a link to a WIP spec or examples?  Does the JS that sets up
> the processing graph need to explicitly pass references to Workers when
> building the nodes, or can they be created implicitly?

It's explicit right now.

> After the graph is
> created, can script ask for handles to the Workers backing particular nodes?
> Maybe this is silly (I don't know yet), but if full Worker power isn't
> needed in the common case, I'd rather use a simpler mechanism that gave us
> more flexibility in implementation.

I wouldn't. Web authors and spec writers would legitimately ask "why aren't you just using the existing Workers instead of creating your own new, different thing that duplicates some of Workers?" And where would you draw the line among the features I outlined above for this new feature? For sure, wherever you draw the line is going to be unsatisfactory over time.

If it's at all possible to make Workers perform adequately for this task, we should do so.
Comment 55 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 17:40:46 PDT
(In reply to comment #54)
> (In reply to comment #53)
> > After the graph is
> > created, can script ask for handles to the Workers backing particular nodes?
> > Maybe this is silly (I don't know yet), but if full Worker power isn't
> > needed in the common case, I'd rather use a simpler mechanism that gave us
> > more flexibility in implementation.
> 
> I wouldn't. Web authors and spec writers would legitimately ask "why aren't
> you just using the existing Workers instead of creating your own new,
> different thing that duplicates some of Workers?" And where would you draw
> the line among the features I outlined above for this new feature? For sure,
> wherever you draw the line is going to be unsatisfactory over time.

If you have an API that takes an explicit Worker argument, having an overload that takes a filter script specifier, say, like the Worker ctor would, doesn't seem unreasonable, and might even be a nice author convenience.  We could implement that filter script by handing it to a Worker or some other, cleverer thing.  And if script tried to ask for the Worker backing that filter node, we could maintain that illusion somehow.

> If it's at all possible to make Workers perform adequately for this task, we
> should do so.

Sure, without significantly hurting other use cases.
Comment 56 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 17:45:17 PDT
(In reply to comment #55)
> If you have an API that takes an explicit Worker argument, having an
> overload that takes a filter script specifier, say, like the Worker ctor
> would, doesn't seem unreasonable, and might even be a nice author
> convenience.

Sure, I can add that in the hope that we can exploit it later, and for the convenience even if we don't.

> We could implement that filter script by handing it to a
> Worker or some other, cleverer thing.  And if script tried to ask for the
> Worker backing that filter node, we could maintain that illusion somehow.

We wouldn't need do that; if authors want the Worker reference, they can create the worker themselves.

It sounds incredibly hard to actually exploit this, though. You're thinking of static analysis on the script to determine if you need a full-fledged worker or can get away with something lighter-weight? I'd rather put the effort into making Workers lighter-weight :-).
Comment 57 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 18:00:49 PDT
(In reply to comment #56)
> It sounds incredibly hard to actually exploit this, though. You're thinking
> of static analysis on the script to determine if you need a full-fledged
> worker or can get away with something lighter-weight? I'd rather put the
> effort into making Workers lighter-weight :-).

Not necessarily.  The #1 goal I want to is to let the a/v effects impl use its own Worker backend if it wanted/needed to, and not regress "normal" Workers :).  Beyond that, there are lots of ways we could analyze (dynamically) and possibly optimize your (explicit!) processing graphs.  I'm not going to claim anything there is easy.
Comment 58 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 18:08:43 PDT
In what way would allowing Workers to share OS threads be "regressing" normal Workers?
Comment 59 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-06-20 18:15:56 PDT
That's not what I said, sorry if that seemed implied.  What I mean is, for example the old nsThreadPool-style Worker implementation might be fantastically great for a/v processing nodes, but it would be a step backwards from the work in this bug for generic Workers.  IMHO it should be possible have both worlds.
Comment 60 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 18:18:22 PDT
Ah yeah, sure.
Comment 61 Boris Zbarsky [:bz] 2011-06-20 20:24:41 PDT
Just in case it wasn't clear from roc's comments, and because I'm not sure what comment 46 was trying to say...

Virtual memory pressure is being a serious problem for us on all platforms except OS X 10.6+ and Linux64.  Several hundred megabytes of address space is a huge deal for us as long as we're stuck in 32-bit land.  :(
Comment 62 Andreas Gal :gal 2011-06-20 20:27:10 PDT
You are asking a machine to run 500+ simultaneous workers doing stuff. That will not be free in terms of physical memory and virtual address space. Maybe thats not a workload that will ever work in 32-bit machines.
Comment 63 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-06-20 20:36:16 PDT
There are scenarios, like A/V processing and others, where you can have a lot of workers but they're not all active simultaneously. Maybe you have a lot of simple filters that only need to run for a small fraction of each second, or you have a lot of video and audio elements with filters attached to them but only a subset of the elements are playing simultaneously.
Comment 64 Boris Zbarsky [:bz] 2011-06-20 20:46:04 PDT
Also, keep in mind that in our fatal-OOM world virtual memory exhaustion means automatic process crash.  So "not a workload that will ever work" means "a workload that will crash the browser".....

Further note that we're not just talking 32-bit machines, since we're not shipping 64-bit Windows builds yet.
Comment 65 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-06-23 16:18:26 PDT
RuntimeService::GetOrCreateService()
   459     if (!nsContentUtils::GetBoolPref(PREF_MAX_WORKERS_ENABLED)) {
   460       return nsnull;
   461     }

Move this to ResolveWorkerClasses and make it so that chrome-created workers are always allowed.


   463     nsRefPtr<RuntimeService> service = new RuntimeService();
   464     NS_ENSURE_TRUE(service, nsnull);

you can count on infallible malloc


Could we rename "origin" to "domain". It scares me to have something called "origin" which isn't the real full origin.

Also, when calculating the domain, add a explicit check that ensures that we don't end up with an empty-string domain. I.e. turn this assertion into a runtime check: NS_ASSERTION(isChrome || !origin.IsEmpty(), ...)


Please add a comment to RuntimeService::Cleanup() warning that it spins the event loop.


  1121       contextStack = nsContentUtils::ThreadJSContextStack();
  1122       if (contextStack) {
  1123         if (NS_FAILED(contextStack->Push(cx))) {
  1124           NS_WARNING("Failed to push context!");
  1125           contextStack = nsnull;
  1126         }
  1127       }
  1128       else {
  1129         NS_WARNING("Failed to get context stack!");
  1130       }

just assert that you have a contextstack instead of nullchecking.

For the methods on WorkerPrivate that never return null, lose the "Get" prefix on the name. Such as GetPrincipal and GetLocationInfo

please comment mRunningExpiredTimeouts

Add a comment above all the *Internal methods stating the relationship between Cancel/CancelInternal etc.


Rename UnregisterWorkerRunnable to TopLevelWorkerFinishedRunnable

Kill CloseEventRunnable::mFromTerminate. It's not used.

Change this comment
   526       // If the worker has been finalized nothing would be able to handle this
   527       // event. Bail out.
to
   526       // We don't want to fire message events on a worker that has been
   527       // finalized. Bail out.
Same in ErrorEventRunnable


Fire an error event on the worker global from the same runnable as outer-worker object.


  1153     // Call operation callback manually here before we start running JS. Don't
  1154     // let this block, though.

Change to say that we're doing this to allow control runnables run before running WorkerRun. Or remove the whole operationscallback call if possible.


  1519       if (!RootJSObject(aCx, aIncrease)) {
Change that to true/false as appropriate

in WorkerPrivateParent<Derived>::ModifyBusyCount
  1534       if (mParentStatus == Terminating) {
You should also cancel if we're in the closing state. Call Cancel rather than dispatching the runnable manually.


  2078       mCondVar.Wait(PR_MillisecondsToInterval(RemainingRunTimeMS()));
 Always infinite.
 
  2084     JS_ReportPendingException(aCx);
 JS_ClearPendingException(aCx); ?
 
Fix close handler timeout issues for each status.

  2369 WorkerPrivate::AddChildWorker
Don't do this from the close handler.

File a bug on making errors from error handlers report to the console.


  2951     TimeoutInfo*& info = expiredTimeouts[index];
  2952     if (info->mCanceled || !info->mIsInterval) {
  2953       mTimeouts.RemoveElementSorted(info, comparator);
  2954       continue;
  2955     }
Just do RemoveElement without comparator

  2957     PRUint32 timeoutIndex;
  2958     if (!mTimeouts.GreatestIndexLtEq(info, comparator, &timeoutIndex) ||
  2959         timeoutIndex == PRUint32(-1)) {
  2960       NS_ERROR("Should still be in the other list!");
  2961     }
Just use IndexOf as 0 is the far most likely index.


   143   if (!JS_ConvertArguments(aCx, aArgc, JS_ARGV(aCx, aVp), "So/bb", &type,
   144                            &listener, &capturing)) {
   145     return false;
   146   }
Missing wantsUntrusted.


ListenerManager.h
    60     Target,
Change this to "Onfoo" or "Bubblingonfoo". Might even want to rename "Phase" to "Type" if you want to be really correct.

Events being dispatched cannot be re-dispatched. Should throw.


ScriptLoader.cpp
   250     // Figure out which principal to use.
   251     nsIPrincipal* principal = mWorkerPrivate->GetPrincipal();
   252     if (!principal) {
   253       if (!parentWorker) {
   254         NS_ERROR("Must have a principal if this is not a subworker!");
   255       }
Change this to
NS_ASSERTION(parentWorker, "Must have a principal...");
NS_ASSERTION(mIsWorkerScript, "Must have a principal when doing importScripts");


   513     for (PRUint32 index = 0; index < mLoadInfos.Length(); index++) {
   514       ScriptLoadInfo& loadInfo = mLoadInfos[index];
   515 
   516       // Skip scripts that have already executed.
   517       if (loadInfo.mExecutionScheduled) {
   518         continue;
   519       }
   520 
   521       // If we still have a channel then the load is not complete.
   522       if (loadInfo.mChannel) {
   523         break;
   524       }
Seems simpler to me to write this as two loops. One that loops while mExecutionScheduled is true and then sets firstIndex. And one that loops while mChannel is null and then sets lastIndex.


   182     if (NS_FAILED(indexSupports->GetData(&index)) ||
   183         index >= mLoadInfos.Length()) {
   184       NS_ERROR("Bad index!");
   185       CancelMainThread();
   186       return NS_ERROR_FAILURE;
   187     }
Just assert that GetData always succeeds and never returns a bogus value.


   384     aLoadInfo.mChannel = nsnull;
If mChannel is already null, return NS_BINDING_ABORTED


Need readystate constants on XHR prototype


in XHRPrivate.cpp
  1090   nsresult (NS_STDCALL nsIDOMEventTarget::*function)
  1091     (const nsAString&, nsIDOMEventListener*, PRBool) =
  1092       aAdd ? &nsIDOMEventTarget::AddEventListener :
  1093              &nsIDOMEventTarget::RemoveEventListener;
Could you use the NS_STDCALL_FUNCPROTO macro so that once we remove stdcall things will keep working.


   453   EventRunnable(Proxy* aProxy, bool aUploadEvent, bool aSnapshotXHRState,
   454                 const nsString& aType, bool aLengthComputable, PRUint64 aLoaded,
   455                 PRUint64 aTotal)
Remove the aSnapshotXHRState parameter from both ctors.


  1366   nsRefPtr<AbortRunnable> runnable = new AbortRunnable (mWorkerPrivate, mProxy);
holy hell, what crap is this!?! We have certain standards here you know!


File a bug on making sync xhr throw if there's a network error


  1658     if (aFromOpen) {
  1659       if (!DispatchPrematureAbortEvent(aCx, target, STRING_abort, false) ||
  1660           !DispatchPrematureAbortEvent(aCx, target, STRING_loadend, false)) {
  1661         return false;
  1662       }
  1663     }

Remove the aFromOpen test (and the whole argument)
Comment 66 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-06-23 16:19:48 PDT
Comment on attachment 537467 [details] [diff] [review]
Patch, v1

r=me with those changes on the non JS-objects
Comment 67 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-24 14:00:30 PDT
Created attachment 541787 [details] [diff] [review]
Error propagation changes, v1

Error changes we discussed.
Comment 68 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-27 21:47:33 PDT
*** Bug 664650 has been marked as a duplicate of this bug. ***
Comment 69 Blake Kaplan (:mrbkap) 2011-06-28 15:03:41 PDT
Comment on attachment 537467 [details] [diff] [review]
Patch, v1

Review of attachment 537467 [details] [diff] [review]:
-----------------------------------------------------------------

Please fix the readonly setter behavior that we talked about.

::: dom/workers/EventTarget.cpp
@@ +199,5 @@
> +                                            &preventDefaultCalled)) {
> +    return false;
> +  }
> +
> +  JS_SET_RVAL(aCx, aVp, preventDefaultCalled ? JSVAL_TRUE : JSVAL_FALSE);

BOOLEAN_TO_JSVAL(preventDefaultCalled)

::: dom/workers/Events.cpp
@@ +120,5 @@
> +      JS_ASSERT(false);
> +      return false;
> +    }
> +
> +    return canceled == JSVAL_TRUE ? true : false;

Can you assume that canceled is a boolean here? If so, you could use JSVAL_TO_BOOLEAN.

@@ +240,5 @@
> +    JS_ASSERT(JSID_IS_INT(aIdval));
> +
> +    int32 slot = JSID_TO_INT(aIdval);
> +
> +    const char*& name = sProperties[slot - SLOT_FIRST].name;

Why the reference to a pointer here?

@@ +291,5 @@
> +    }
> +
> +    JS_ASSERT(cancelableVal == JSVAL_TRUE || cancelableVal == JSVAL_FALSE);
> +
> +    return cancelableVal == JSVAL_TRUE ?

Just use JSVAL_TO_BOOLEAN here. You can nuke the assert at the same time (since JSVAL_TO_BOOLEAN does that for you).

::: dom/workers/Exceptions.cpp
@@ +110,5 @@
> +    int32 slot = JSID_TO_INT(aIdval);
> +
> +    JSClass* classPtr = JS_GET_CLASS(aCx, aObj);
> +
> +    if (classPtr != &sClass ||

So, this means that if I have an exception on my prototype chain, I won't be able to get properties off of it? That's different behavior from what XPConnect does, but we've been talking recently of changing that. So I'd leave it this way.

::: dom/workers/ListenerManager.cpp
@@ +265,5 @@
> +ListenerManager::SetEventListener(JSContext* aCx, JSString* aType,
> +                                  jsval aListener)
> +{
> +  bool isNull = JSVAL_IS_VOID(aListener) || JSVAL_IS_NULL(aListener);
> +  if (isNull) {

isNull is single use, and it isn't really clear what's happening here. Can you comment at least and explain what the difference between null and undefined is (or is that a bug)?

::: dom/workers/WorkerPrivate.cpp
@@ +676,5 @@
> +      }
> +    }
> +    else {
> +      NS_ASSERTION(aWorkerPrivate == GetWorkerPrivateFromContext(aCx),
> +             "Badness!");

Nit: bad indentation.

@@ +680,5 @@
> +             "Badness!");
> +      target = JS_GetGlobalObject(aCx);
> +    }
> +
> +    NS_ASSERTION(target, "This should never be nulL!");

nulL :)
Comment 70 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-02 20:00:41 PDT
Comment on attachment 541787 [details] [diff] [review]
Error propagation changes, v1

Jonas! Review pretty please!
Comment 71 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-06 14:28:48 PDT
Created attachment 544343 [details] [diff] [review]
Close handler changes, v1

Here are the changes to the close handlers. Now everything works the way we expect.
Comment 72 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-07-07 19:15:15 PDT
Comment on attachment 541787 [details] [diff] [review]
Error propagation changes, v1

Review of attachment 541787 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +734,5 @@
> +    }
> +
> +    // Now fire an event at the global object, but don't do that if the error
> +    // code is too much recursion and this is the same script threw the error.
> +    if (aFireAtScope && (aTarget || aErrorNumber != JSMSG_OVER_RECURSED)) {

I don't understand the JSMSG_OVER_RECURSED exception here. Isn't it that we want to avoid firing errors that originate from the error handler?

Or are you just letting those recurse until we reach JSMSG_OVER_RECURSED?

(Sorry, i forget what strategy we went for here).
Comment 73 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-07 19:35:37 PDT
(In reply to comment #72)
> I don't understand the JSMSG_OVER_RECURSED exception here. Isn't it that we
> want to avoid firing errors that originate from the error handler?

No, this is totally separate. This is for when the JS engine itself runs out of internal stack-ish space (like 'function foo() { foo(); } foo();'). In that case we don't want to run the scope's onerror handler, we just want to notify the parent thread.

Final strategy was that errors from within the scope's onerror handler get posted to the parent thread immediately and we abort the onerror handler.
Comment 74 Luke Wagner [:luke] 2011-07-07 19:46:11 PDT
On a side note, with http://hg.mozilla.org/mozilla-central/rev/28be8df0deb7, content has less stack than chrome so, even after it hits the end-of-stack, chrome still has 16K left.
Comment 75 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-07-07 20:05:02 PDT
Comment on attachment 541787 [details] [diff] [review]
Error propagation changes, v1

Review of attachment 541787 [details] [diff] [review]:
-----------------------------------------------------------------

Cool
Comment 76 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-07-11 22:21:33 PDT
Comment on attachment 544343 [details] [diff] [review]
Close handler changes, v1

Review of attachment 544343 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerFeature.h
@@ +59,5 @@
> + * | Terminating |     yes     |       yes       |         none          |
> + * +-------------+-------------+-----------------+-----------------------+
> + * |  Canceling  |     yes     |       yes       |    short duration     |
> + * +-------------+-------------+-----------------+-----------------------+
> + * |   Killing   |     yes     |       yes       |        instant        |

Maybe change "close handler timeout" to "close handler", change "none" to "no timeout" and "instant" to "doesn't run".

Up to you

@@ +90,5 @@
> +  Canceling,
> +
> +  // The application is shutting down. Setting this status causes the worker to
> +  // abort immediately and the close handler is never scheduled.
> +  Killing,

Maybe change "Killing" to "Shutdown"
Comment 77 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-12 09:43:26 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/7c1a923bea13
Comment 78 :Ehsan Akhgari 2011-07-12 11:27:19 PDT
I backed it out of inbound because of mochitest-3 and mochitest-oth oranges.
Comment 79 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-12 12:14:36 PDT
(In reply to comment #78)
> I backed it out of inbound because of mochitest-3 and mochitest-oth oranges.

This is bug 671015, multiple threads GCing somehow tripped it.
Comment 80 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-15 20:27:34 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/e4b7f7fce83a

And needed this to fix test timeouts:

http://hg.mozilla.org/integration/mozilla-inbound/rev/8bf078002768

I filed bug 672015 to fix that.
Comment 81 Joe Drew (not getting mail) 2011-07-16 18:43:49 PDT
This (along with most things committed on Friday afternoon) was backed out of mozilla-inbound in order to clear up orange.
Comment 82 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-16 18:59:23 PDT
(In reply to comment #81)
> This (along with most things committed on Friday afternoon) was backed out
> of mozilla-inbound in order to clear up orange.

Joe, really? I don't see an entry in the pushlog, and the files are still there as far as I can tell... Am I missing something?
Comment 83 Joe Drew (not getting mail) 2011-07-16 19:08:17 PDT
My push failed, but I hadn't noticed. It's pushed now!
Comment 84 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-07-16 19:23:53 PDT
fwiw, this appeared to be causing permaorange on Linux Moth

++DOMWINDOW == 559 (0xc17c8f0) [serial = 1196] [outer = 0x9c537e0]
JS API usage error: found live context at 0xcbbb328
JS API usage error: 1 context left in runtime upon JS_DestroyRuntime.
Assertion failure: backgroundFinalizeState == BFS_DONE, at ../../../js/src/jsgc.h:757
WARNING: shutting down early because of crash!: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 696
WARNING: plugin process _exit()ing: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 688
NEXT ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/components/viewsource/test/test_428653.xul | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:09:19.196219
INFO | automation.py | Reading PID log: /tmp/tmpDdp0Q9pidlog
==> process 2078 launched child process 2111
==> process 2078 launched child process 2118
==> process 2078 launched child process 2120
==> process 2078 launched child process 2122
==> process 2078 launched child process 2124
INFO | automation.py | Checking for orphan process with PID: 2111
INFO | automation.py | Checking for orphan process with PID: 2118
INFO | automation.py | Checking for orphan process with PID: 2120
INFO | automation.py | Checking for orphan process with PID: 2122
INFO | automation.py | Checking for orphan process with PID: 2124
PROCESS-CRASH | chrome://mochitests/content/chrome/toolkit/components/viewsource/test/test_428653.xul | application crashed (minidump found)
Crash dump filename: /tmp/tmp_PaGj9/minidumps/0cbd9e93-eb1c-7fff-32109a22-63423ed1.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGABRT
Crash address: 0x81e

Thread 28 (crashed)
 0  linux-gate.so + 0x424
    eip = 0x00eea424   esp = 0xa38ff090   ebp = 0xa38ff0a8   ebx = 0x0000081e
    esi = 0xa38ff288   edi = 0x00c7cff4   eax = 0x00000000   ecx = 0x00000864
    edx = 0x00000006   efl = 0x00200206
    Found by: given as instruction pointer in context
 1  libxul.so!JS_Assert [jsutil.cpp : 89 + 0xb]
    eip = 0x02d5ef95   esp = 0xa38ff0b0   ebp = 0xa38ff0d8
    Found by: previous frame's frame pointer
 2  libxul.so!js::gc::ArenaList::releaseAll [jsgc.h : 757 + 0x29]
    eip = 0x02c80b7e   esp = 0xa38ff0e0   ebp = 0xa38ff108   ebx = 0x03591544
    Found by: call frame info
 3  libxul.so!JSCompartment::finishArenaLists [jsgc.cpp : 315 + 0x20]
    eip = 0x02c7b0da   esp = 0xa38ff110   ebp = 0xa38ff138   ebx = 0x03591544
    Found by: call frame info
 4  libxul.so!js_FinishGC [jsgc.cpp : 947 + 0xa]
    eip = 0x02c7bd4a   esp = 0xa38ff140   ebp = 0xa38ff178   ebx = 0x03591544
    Found by: call frame info
 5  libxul.so!JSRuntime::~JSRuntime [jsapi.cpp : 729 + 0xa]
    eip = 0x02be8da7   esp = 0xa38ff180   ebp = 0xa38ff1a8   ebx = 0x03591544
    Found by: call frame info
 6  libxul.so!js::Foreground::delete_<JSRuntime> [jsutil.h : 496 + 0x10]
    eip = 0x02c10943   esp = 0xa38ff1b0   ebp = 0xa38ff1c8   ebx = 0x03591544
    Found by: call frame info
 7  libxul.so!JS_Finish [jsapi.cpp : 799 + 0xa]
    eip = 0x02bf571f   esp = 0xa38ff1d0   ebp = 0xa38ff1e8   ebx = 0x03591544
    Found by: call frame info
 8  libxul.so!WorkerThreadRunnable::Run [RuntimeService.cpp : 327 + 0xa]
    eip = 0x01c8cc5e   esp = 0xa38ff1f0   ebp = 0xa38ff238   ebx = 0x03591544
    Found by: call frame info
 9  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 617 + 0x16]
    eip = 0x0281f824   esp = 0xa38ff240   ebp = 0xa38ff2c8   ebx = 0x03591544
    Found by: call frame info
10  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 245 + 0x1f]
    eip = 0x027b8dc7   esp = 0xa38ff2d0   ebp = 0xa38ff308   ebx = 0x03591544
    esi = 0x00000000   edi = 0x003d0f00
    Found by: call frame info
11  libxul.so!nsThread::ThreadFunc [nsThread.cpp : 272 + 0x12]
    eip = 0x0281e6c0   esp = 0xa38ff310   ebp = 0xa38ff358   ebx = 0x03591544
    esi = 0x00000000   edi = 0x003d0f00
    Found by: call frame info
12  libnspr4.so!_pt_root [ptthread.c : 187 + 0x10]
    eip = 0x00161b0b   esp = 0xa38ff360   ebp = 0xa38ff388   ebx = 0x0016f460
    esi = 0xa38ffb70   edi = 0x003d0f00
    Found by: call frame info
13  libpthread-2.11.so + 0x5ab4
    eip = 0x00c6bab5   esp = 0xa38ff390   ebp = 0xa38ff488   ebx = 0x00c7cff4
    esi = 0xa38ffb70   edi = 0x003d0f00
    Found by: call frame info
14  libc-2.11.so + 0xda83d
    eip = 0x00bc283e   esp = 0xa38ff490   ebp = 0x00000000
    Found by: previous frame's frame pointer

http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1310847060.1310850338.28464.gz&fulltext=1#err0
Comment 85 Andreas Gal :gal 2011-07-16 22:20:55 PDT
gregor, do you recognize that assert? sounds like your code
Comment 86 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-17 10:00:13 PDT
(In reply to comment #84)
> fwiw, this appeared to be causing permaorange on Linux Moth

This is bug 666963. My workaround wasn't sufficient it seems. I'll reland with a better workaround.
Comment 87 Gregor Wagner [:gwagner] 2011-07-17 10:11:13 PDT
(In reply to comment #85)
> gregor, do you recognize that assert? sounds like your code

This assertion says that the background thread is still finalizing arenas and we want to shut-down the runtime and free all arenas. Thats bad. There is probably a waitBackgroundSweepEnd missing somewhere.

Usually we don't finalize in the background for the last-context GC. Did something change there? Is the last GC not a Last-context GC any more?
Comment 88 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-17 11:38:58 PDT
(In reply to comment #87)
> Usually we don't finalize in the background for the last-context GC. Did
> something change there? Is the last GC not a Last-context GC any more?

It's not really the last context, ctypes stashes one context in a reserved slot on its prototype object. Bug 666963. It's lame.
Comment 89 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-17 12:13:58 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/fdddabd345b9

Better workaround:

http://hg.mozilla.org/integration/mozilla-inbound/rev/3857a4309fc3
Comment 90 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-17 12:31:56 PDT
(In reply to comment #87)
> This assertion says that the background thread is still finalizing arenas
> and we want to shut-down the runtime and free all arenas. Thats bad. There
> is probably a waitBackgroundSweepEnd missing somewhere.

Yeah, that sounds bugworthy for sure.
Comment 92 Hiroyuki Ikezoe (:hiro) 2011-07-17 23:46:17 PDT
Created attachment 546469 [details] [diff] [review]
Follow up fix
Comment 93 Andreas Gal :gal 2011-07-17 23:57:16 PDT
Comment on attachment 546469 [details] [diff] [review]
Follow up fix

Stealing.
Comment 94 :Ehsan Akhgari 2011-07-18 15:17:33 PDT
So, this has caused at least three new regressions showing up as intermittent mochitest-3 oranges on Windows.  Of these three, bug 672229 and bug 672370 look pretty serious to me.

Kyle tells me that Ben is on vacation today.  I know that it's mean to back somebody out when they're on vacation, but I really don't feel comfortable for this code to live in the tree because of the possible implications of these bugs, so I've backed it out of m-c.  Sorry bent.  :(

http://hg.mozilla.org/mozilla-central/rev/70ac1a0bc4ab
Comment 95 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-25 21:39:13 PDT
Landed on m-i with a fix. I'll post the interdiff in a sec.

http://hg.mozilla.org/integration/mozilla-inbound/rev/f631e1b0e296
Comment 96 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-25 21:55:41 PDT
Created attachment 548366 [details] [diff] [review]
Final patch
Comment 97 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-25 21:57:42 PDT
Created attachment 548367 [details] [diff] [review]
Crash fix
Comment 98 Marco Bonardo [::mak] 2011-07-26 04:05:02 PDT
http://hg.mozilla.org/mozilla-central/rev/f631e1b0e296
Comment 99 Gabriel Levy 2011-08-30 12:03:07 PDT
If I am reading this correctly, creating ChromeWorkers using nsiWorkerFactory is no longer supported. The ChromeWorker docs on MDN should probably be updated to reflect this change. In the meantime, can anyone offer advice on how to create ChromeWorkers in a bootstrapped extension now? Simply calling "new ChromeWorker" as I see in the tests gives:

Warning: WARN addons.manager: Exception calling callback: ReferenceError: ChromeWorker is not defined
Comment 100 Stefan Sitter 2011-09-02 00:15:11 PDT
FYI: This change broke the Lightning calendaring extension (Bug 678343) too.
Comment 101 Eric Shepherd [:sheppy] 2011-09-29 07:29:38 PDT
Can anyone please comment on comment #99 and let me know what the docs need to say about this issue?
Comment 102 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-09-29 07:32:08 PDT
Bootstrapped extensions presumably aren't running in a DOM scope, so they don't see the ChromeWorker ctor.  We should probably put that back ...

bent?
Comment 103 Jorge Villalobos [:jorgev] 2011-09-29 10:27:42 PDT
This was brought up a while ago in the blog. After some delays on my part (sorry) and some testing, we have confirmed that this is a real bug. I filed bug 690418.
Comment 104 Eric Shepherd [:sheppy] 2011-10-18 07:13:16 PDT
I'm still unclear as to what the documentation impact here is. This seems to be a refactoring of a functionality that already existed rather than anything new, per se. Can anyone clarify for me? Thanks.
Comment 105 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-10-18 08:36:34 PDT
(In reply to Eric Shepherd [:sheppy] from comment #104)
> I'm still unclear as to what the documentation impact here is. This seems to
> be a refactoring of a functionality that already existed rather than
> anything new, per se. Can anyone clarify for me? Thanks.

I think the only thing that needs to be documented is that nsIWorkerFactory is now gone. If you want to create a worker you use the normal DOM syntax and call 'new Worker()' or 'new ChromeWorker()'.
Comment 106 Eric Shepherd [:sheppy] 2011-10-18 08:45:10 PDT
Documentation updated:

https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWorkerFactory

Other bugs have already covered that you can do new ChromeWorker and new Worker, so this is done.

Firefox 8 for developers is also updated.
Comment 108 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2011-11-16 05:33:26 PST
(In reply to Eric Shepherd [:sheppy] from comment #104)
> I'm still unclear as to what the documentation impact here is. This seems to
> be a refactoring of a functionality that already existed rather than
> anything new, per se. Can anyone clarify for me? Thanks.

Actually, the real impact is for people who have just lost the ability to XPConnect from a chrome thread. If the tool mentioned by Andreas in comment #4 exists, or is coming, or if there is a way to interact with the platform from a chrome worker, it really should be explained somewhere.

Otherwise we need to, er, apologize to people who have just lost the only tool they had to remove blocking IO from the main thread.
Comment 109 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-11-16 16:45:39 PST
That tool does not exist yet, and probably won't exist quickly enough that it's something we should tell people to hinge any hope on.

Note that you can still do blocking IO on worker threads by simply using the standardized FileReaderSync API.

That seems worth mentioning in docs.

If you are chrome code, you can simply call:

f = new File("/path/to/file");  // I'm not 100% sure of the syntax for the path here
worker.postMessage(f);

and then inside the worker do something like:

onmessage = function(e) {
  var fr = new FileReaderSync;
  fileContents = fr.readAsText(e.data);
}
Comment 110 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2011-11-16 23:12:16 PST
(In reply to Jonas Sicking (:sicking) from comment #109)
> That tool does not exist yet, and probably won't exist quickly enough that
> it's something we should tell people to hinge any hope on.
> 
> Note that you can still do blocking IO on worker threads by simply using the
> standardized FileReaderSync API.

This will help for some tasks. I will also try to get my JSAPI-based File API in a usable shape for, well, most other IO tasks.
Comment 111 rsjtdrjgfuzkfg (Dirk Steinmetz) 2012-04-05 19:24:53 PDT
Is there a way to have something like the old dispatch with JS-nsIRunnables or are there plans to implement it again? As I personally liked that concept, and dislike to use native code.

It sounds to me like it is not possible to access XPCOM from threads anymore without using native code, is that correct? Note that I don't speak of any XPCOM sharing, I'd just like to use XPCOM interfaces from within a JS-thread, to communicate with a local application via TCP.

I have to agree that even for my usecase there are alternatives (using a StreamPump), but they're worse than a multithreaded solution imho (as there are synchronous requests, too, which rely on equal functions which would need to get re-written for the one case if no synchronous access is possible without blocking the main thread).
Comment 112 David Rajchenbach-Teller [:Yoric] (please use "needinfo") 2012-04-06 01:25:46 PDT
(In reply to rsjtdrjgfuzkfg from comment #111)
> Is there a way to have something like the old dispatch with JS-nsIRunnables
> or are there plans to implement it again? As I personally liked that
> concept, and dislike to use native code.
> 
> It sounds to me like it is not possible to access XPCOM from threads anymore
> without using native code, is that correct? Note that I don't speak of any
> XPCOM sharing, I'd just like to use XPCOM interfaces from within a
> JS-thread, to communicate with a local application via TCP.
> 
> I have to agree that even for my usecase there are alternatives (using a
> StreamPump), but they're worse than a multithreaded solution imho (as there
> are synchronous requests, too, which rely on equal functions which would
> need to get re-written for the one case if no synchronous access is possible
> without blocking the main thread).

That will certainly require a new library on top of js-ctypes, possibly with a design comparable to that of OS.File.

Note You need to log in before you can comment on or make changes to this bug.