Last Comment Bug 437152 - (workerthreads) implement worker threads
(workerthreads)
: implement worker threads
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86 All
: P1 normal with 2 votes (vote)
: ---
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
: Andrew Overholt [:overholt]
Mentors:
http://wiki.mozilla.org/DOMWorkerThreads
Depends on: 447669 448007 468065 419091 442708 443869 443870 443871 443874 443877 443878 444880 445828 447711 459503 468045 483633
Blocks: workerthreadstracker
  Show dependency treegraph
 
Reported: 2008-06-03 17:09 PDT by Vladimir Vukicevic [:vlad] [:vladv]
Modified: 2013-04-04 13:53 PDT (History)
34 users (show)
dsicore: wanted1.9.1+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP (110.89 KB, patch)
2008-06-20 02:02 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Sample html that uses threads (807 bytes, text/html)
2008-06-20 02:05 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details
WIP (116.53 KB, patch)
2008-06-20 15:30 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Patch, v1 (135.20 KB, patch)
2008-07-07 00:09 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Patch, v1.1 (143.83 KB, patch)
2008-07-13 20:56 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jst: review-
Details | Diff | Splinter Review
Patch, v1.2 (9.84 KB, patch)
2008-07-21 11:59 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jst: review+
jst: superreview+
Details | Diff | Splinter Review
Additional mochitest for timeouts (4.01 KB, patch)
2008-07-23 12:40 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jst: review+
jst: superreview+
Details | Diff | Splinter Review
Patch, v1.3 (1.04 KB, patch)
2008-08-11 14:50 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jst: review+
jst: superreview+
Details | Diff | Splinter Review
Additional mochitest for many long-running wokers (3.20 KB, patch)
2008-08-11 14:53 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
jst: review+
jst: superreview+
Details | Diff | Splinter Review
Don't set gczeal in tests on debug builds (5.02 KB, patch)
2008-08-19 12:41 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Don't set gczeal in tests on debug builds (5.72 KB, patch)
2008-08-19 12:48 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
bugs: review-
Details | Diff | Splinter Review

Description Vladimir Vukicevic [:vlad] [:vladv] 2008-06-03 17:09:55 PDT
We need to implement a JS worker threads API accessible to content.  More discussion and links coming soon.
Comment 1 Damon Sicore (:damons) 2008-06-04 16:48:08 PDT
wanted1.9.1+.  If this should be blocking, please mark, vlad.
Comment 2 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-20 02:02:45 PDT
Created attachment 325946 [details] [diff] [review]
WIP

Here's my latest progress. Most of the basics work great, like making multiple threads and sending messages all over the place. Workers are canceled when navigating away from the page. Errors in worker scripts are properly reported.

I did make some destabilizing changes yesterday (one context per worker on any thread changed to one context per thread for any worker) but i think I've fixed that all up now.

I'm not currently being bitten by JSEng/XPConnect bugs but I do have a bunch of patches included here that I need to separate out and file bugs on (hangs and crashes alike).

Things missing:
1. Suspend and Resume on page navigation.
2. Loading script files instead of just strings.
3. Fun stuff like XHR and Timers.
Comment 3 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-20 02:05:39 PDT
Created attachment 325948 [details]
Sample html that uses threads

Here's an example of how you use the threads currently. The API exposed to content JS is absolutely not frozen or even slushy because we haven't had the necessary discussions with others (Gears, HTML5 folks) yet. This is just an idea of what it could look like.
Comment 4 Damon Sicore (:damons) 2008-06-20 14:16:40 PDT
Thanks for the update ben.  Good stuff!
Comment 5 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-20 15:30:58 PDT
Created attachment 326046 [details] [diff] [review]
WIP

Now has suspend/resume/cancel on navigation and cache drop.
Comment 6 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-20 18:10:40 PDT
http://wiki.mozilla.org/DOMWorkerThreads is the beginning of the API proposal. It's nowhere near done yet, I'll post to mozilla.dev.platform when it's ready to be debated by a larger audience.
Comment 7 Brian Crowder 2008-06-24 14:04:05 PDT
Is there a "make XHR threadsafe" bug somewhere?
Comment 8 Olli Pettay [:smaug] 2008-06-25 03:38:10 PDT
Hmm, threadsafe XHR...
Currently XHR is pretty heavily bound to DOM.
Does cycle collector work well with threads?
Comment 9 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-06-25 11:47:29 PDT
Sorry, the plan is not to make a "threadsafe XHR". We're going to implement a subset of XHR similar to this: http://code.google.com/apis/gears/api_httprequest.html
Comment 10 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-07 00:09:24 PDT
Created attachment 328303 [details] [diff] [review]
Patch, v1

I think this is ready for review. String-only message passing, pause/resume/cancel, timeouts, and intervals all should work. Basic tests included.

Note that I've stripped out all the miscellaneous JS/XPConnect/XPCOM changes that I needed into blocking bugs so this patch by itself won't work!

I've also got some more elaborate tests on the way that aren't quite ready for review yet.
Comment 11 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-07 00:11:14 PDT
Comment on attachment 328303 [details] [diff] [review]
Patch, v1

My review plan was to get bsmedberg to review the threading arch and to get jst to review the js-y stuff. Sound adequate? Anyone else I need to loop in here?
Comment 12 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-07 00:11:54 PDT
Comment on attachment 328303 [details] [diff] [review]
Patch, v1

Vlad, do you want to take a look too?
Comment 13 John Resig 2008-07-07 05:20:04 PDT
(In reply to comment #10)

Is the API still being discussed at all? I noticed a couple, little, strange bits that would be good to discuss.

Also - are there more tests coming along? I only saw the one Mochitest - wasn't sure if there was supposed to be more, or not.
Comment 14 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-08 11:06:54 PDT
Comment on attachment 328303 [details] [diff] [review]
Patch, v1

I've found one additional thread safety issue with timeouts and will post a new patch shortly.
Comment 15 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-13 20:56:28 PDT
Created attachment 329381 [details] [diff] [review]
Patch, v1.1

Fixes the timeout threadsafety issue (the linked list of timeouts in nsDOMWorkerThread wasn't protected). Also moved from a linked list of runnables in the thread service to a deque to simplify the code.

I'm still seeing a bizarre slowdown of previously-suspended timeouts (intervals keep coming back later and later even though cpu usage remains flat)... Maybe a reviewer can spot the problem? It's a (relatively) small bug so I wouldn't necessarily say it should block the alpha.
Comment 16 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-13 22:22:38 PDT
(In reply to comment #13)
> Is the API still being discussed at all? I noticed a couple, little, strange
> bits that would be good to discuss.

The API is still being discussed, yes. Whatever we include in the alpha will almost certainly not be final.

> Also - are there more tests coming along?

I've included another mochitest (for error propagation) with the latest patch and I'm still working on others.

Comment 17 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-16 17:17:10 PDT
Comment on attachment 329381 [details] [diff] [review]
Patch, v1.1

- In nsNavigator::NewWorkerPool():

+  nsCOMPtr<nsIDOMThreadService> threadService =
+    do_GetService("@mozilla.org/domthreads/service;1", &rv);

This service is in the same library, right? No need to go through the service manager then. And would this even need to be a registered service?

+  rv = threadService->CreatePool(getter_AddRefs(newPool));
+  NS_ENSURE_SUCCESS(rv, rv);
+
+  newPool.forget(_retval);

If rv is an error code, we'll return w/o setting *_retval to null here. You could just pass _retval to CreatePool() here and not use the nsCOMPtr at all.

+/**
+ * Simple class to automatically destroy a JSContext to make error handling
+ * easier.
+ */
+class JSAutoContext

Since this only automatically destroys a JSContext, maybe call it JSAutoContextDestroyer?

+class nsDOMWorkerScriptError : public nsIClassInfo
...
+  // Lame, nsIScriptError and nsIClassInfo both have 'readonly attribute
+  // unsigned long flags' so we have to use an inner class to do this the
+  // right way...
+  class InnerScriptError : public nsIScriptError

Since nsDOMWorkerScriptError is primarily a nsIScriptError, and nsIClassInfo really is secondary and an implementation detail here, maybe you should flip the inner/outer inheritance here, make nsDOMWorkerScriptError inherit nsIScriptError, and deal with the classinfo fu in the inner class (or a separate one, whatever is easier).

- In nsDOMWorkerBase::PostMessageInternal():

+    if (GetScriptContextFromJSContext(cx)) {
+      // Must be a normal DOM context, use the pool as the source.
+      source = Pool();
+    }
+    else {
+      // Must be a worker context, get the worker from the context private.
+      nsRefPtr<nsDOMWorkerThread> worker =
+        (nsDOMWorkerThread*)JS_GetContextPrivate(cx);

The if case is true, but the else case is not. You can't assume that if it's not a normal DOM context it'll be one of yours, you'll need to check by checking the context private flags and QI:ing to nsIDOMWorkerThread or whatever. And even in the first case, even if the context isn't a DOM context it's still possible that the running JS code that got you here is from the webpage. Maybe the check in that case should be whether we're on the main thread or not, rather than looking at what type of context got us here?

- In nsDOMWorkerPool::SetErrorListener():

+    // This must exist in a DOM context, not one of our worker contexts.
+    nsIXPConnect* xpc = nsContentUtils::XPConnect();
+  
+    nsAXPCNativeCallContext* ncc;
+    nsresult rv = xpc->GetCurrentNativeCallContext(&ncc);
+    NS_ENSURE_SUCCESS(rv, rv);
+  
+    JSContext* cx;
+    rv = ncc->GetJSContext(&cx);
+    NS_ENSURE_SUCCESS(rv, rv);
+
+    nsIScriptContext* scriptObj = GetScriptContextFromJSContext(cx);
+    NS_ENSURE_TRUE(scriptObj, NS_ERROR_INVALID_ARG);

What exactly is this guarding against? Seems wrong to rely on the context being a cetrain type of context...

- In nsDOMWorkerFunctions::Dump():

Don't we want to pref whether this actually prints to the console (mmm, threadsafe prefs?), like we do for window.dump()? If not, we'll end up in the same situation where we were with window.dump() where we'd just spew crap on the console all the time... And should we include the address of the thread and/or worker in the output to make it more useful?

- In nsDOMWorkerThread::CompileGlobalObject():

+  JSObject* global = JS_NewObject(aCx, nsnull, nsnull, nsnull);
+  NS_ENSURE_TRUE(global, PR_FALSE);
+
+  PRBool success = JS_InitStandardClasses(aCx, global);
+  NS_ENSURE_TRUE(success, PR_FALSE);
...
+  mGlobal = global;
+  success = JS_AddNamedRootRT(rt, &mGlobal, "nsDOMWorkerThread Global Object");

You'll need to root global way before here, it needs to be rooted right after it's been created as there's nothing here holding it alive.

- In nsDOMWorkerTimeout::Notify():

+  if (type == nsITimer::TYPE_ONE_SHOT) {
[...]
+        rv = aTimer->InitWithCallback(this, mInterval,
+                                      nsITimer::TYPE_REPEATING_SLACK);

aTimer is the timer that's in the middle of firing when we do this. As this timer is re-initialized as a repeating slack timer, it'll get added to the timer thread, which is all nice n' good. But once we're done here and return from Notify() the timer code checks if mType is repeating slack, and if it is it calls SetDelayInternal() to adjust when to fire the timeout, and then adds it to the timer thread again. So what you'll end up with, unless I missed something obvious, is the same timer added to the timer thread twice, and that's not good I would think. I think in this case you'll probably be better off creating a new timer rather than re-initializing the one that we're in the middle of firing here.

Other than that this looks really good. It's a big patch, lots to read, but I don't see anything that looks really wrong.

r- until you've gone through the above. Make the next patch be one that applies on top of this one for easier reviewing :)
Comment 18 Blake Kaplan (:mrbkap) (PTO until Jan. 2, 2017) 2008-07-17 02:30:07 PDT
(In reply to comment #17)
> - In nsDOMWorkerThread::CompileGlobalObject():
> 
> +  JSObject* global = JS_NewObject(aCx, nsnull, nsnull, nsnull);
> +  NS_ENSURE_TRUE(global, PR_FALSE);
> +
> +  PRBool success = JS_InitStandardClasses(aCx, global);
> +  NS_ENSURE_TRUE(success, PR_FALSE);

Note that if aCx's globalObject member is null, then the call to JS_InitStandardClasses will set aCx->globalObject, which is a root.
Comment 19 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-17 07:02:26 PDT
(In reply to comment #17)
> This service is in the same library, right? 

I'll fix that.

> If rv is an error code, we'll return w/o setting *_retval to null here.

Doesn't bsmedberg's static analysis enforce the XPCOM rule that you shouldn't set *_retval if you return failure?

> Since this only automatically destroys a JSContext, maybe call it
> JSAutoContextDestroyer?

Sure.

> Since nsDOMWorkerScriptError is primarily a nsIScriptError, and nsIClassInfo
> really is secondary and an implementation detail here, maybe you should flip
> the inner/outer inheritance here, make nsDOMWorkerScriptError inherit
> nsIScriptError, and deal with the classinfo fu in the inner class (or a
> separate one, whatever is easier).

I tried that first but iirc the nsIClassInfo macros didn't work properly with the inner class... I could do that and not use the macros, I guess, but this seemed easier. Given that choice do you think it is worth it to make those changes?
> You can't assume that if it's not a normal DOM context it'll be one of yours

Hm, I'm confused on this point. How is it possible (on the main thread) to be running in a context that is not a DOM context? I agree that it might be simpler to simply check to see if we're running on the main thread.

And then, on the worker threads, the only context they can ever see is one that I create (and set/clear explicitly), so I don't see why I should do extra verification there... 

> What exactly is this guarding against? Seems wrong to rely on the context being
> a cetrain type of context...

Yeah, that's more debug-ish code... I'll remove it since I no longer expose any way for another thread (or context) to set the error listener on the pool.

> Don't we want to pref whether this actually prints to the console (mmm,
> threadsafe prefs?), like we do for window.dump()?

Yeah, I had planned on it, just hadn't gotten to it (also have to worry about JS options - version, GC level, strict, etc.). I was going to do this by 1) having all these settings per-worker and set to the current state on creation, and then 2) adding a pref callback on the thread service that posted 'pref changed' runnables to worker threads. Soudn ok? Do we care about this for alpha, or is this followup material?

> And should we include the address of the thread and/or worker in the output
> to make it more useful?

Hm, I would lean against this and make the behavior identical to the existing window.dump. If consumers want to print that info they can, it's exposed in the 'thisThread' object.

> You'll need to root global way before here, it needs to be rooted right after
> it's been created as there's nothing here holding it alive.

Given what mrbkap said is it worth changing this? Or should I just add an explanatory comment?


> aTimer is the timer that's in the middle of firing when we do this. As this
> timer is re-initialized as a repeating slack timer, it'll get added to the
> timer thread, which is all nice n' good. But once we're done here and return
> from Notify() the timer code checks if mType is repeating slack, and if it is
> it calls SetDelayInternal() to adjust when to fire the timeout, and then adds
> it to the timer thread again. So what you'll end up with, unless I missed
> something obvious, is the same timer added to the timer thread twice, and
> that's not good I would think. I think in this case you'll probably be better
> off creating a new timer rather than re-initializing the one that we're in the
> middle of firing here.

Hm... Yeah. This is a bug in the timer code, don't you think? Looks like the timer code should remove the re-init'd timer if mArmed is true before adding that second time (as it does in InitCommon). I'll test this today to see if that's what's causing my mysterious slowdowns.

> Make the next patch be one that applies on top of this one for easier
> reviewing :)

You got it.
Comment 20 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-17 16:11:59 PDT
(In reply to comment #18)
> Note that if aCx's globalObject member is null, then the call to
> JS_InitStandardClasses will set aCx->globalObject, which is a root.

Ah, good point, I didn't realize aCx->globalObject was a root (that it is makes a lot of sense thoug) since in the DOM we explicitly have to root the globals. Inner/outer windows are so much fun I can hardly believe it! :)
Comment 21 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-17 16:56:56 PDT
(In reply to comment #19)
> Doesn't bsmedberg's static analysis enforce the XPCOM rule that you shouldn't
> set *_retval if you return failure?

Um, if anything it should enforce the opposite I would imagine. bsmedberg?

> I tried that first but iirc the nsIClassInfo macros didn't work properly with
> the inner class... I could do that and not use the macros, I guess, but this
> seemed easier. Given that choice do you think it is worth it to make those
> changes?

Probably not if it ends up being trickier than the alternative. 

> > You can't assume that if it's not a normal DOM context it'll be one of yours
> 
> Hm, I'm confused on this point. How is it possible (on the main thread) to be
> running in a context that is not a DOM context? I agree that it might be
> simpler to simply check to see if we're running on the main thread.

evalInContext() is one example that creates its own contexts in XPConnect and content code will run on the main thread on those contexts, and they're not DOM contexts. I believe there's other examples of the same as well, but can't think of them right now.

> And then, on the worker threads, the only context they can ever see is one that
> I create (and set/clear explicitly), so I don't see why I should do extra
> verification there... 

As long as the if() case guarantees that we're *not* running on a DOM worker, then you don't need any more checking in the else case. There's no way for other threads (non-main and non-dom-worker threads) to ever call in here, right? If it would be, then you'd need more than an is-main-thread check in the if() case as well.

> > Don't we want to pref whether this actually prints to the console (mmm,
> > threadsafe prefs?), like we do for window.dump()?
> 
> Yeah, I had planned on it, just hadn't gotten to it (also have to worry about
> JS options - version, GC level, strict, etc.). I was going to do this by 1)
> having all these settings per-worker and set to the current state on creation,
> and then 2) adding a pref callback on the thread service that posted 'pref
> changed' runnables to worker threads. Soudn ok? Do we care about this for
> alpha, or is this followup material?

I don't think this is required for alpha.

> > And should we include the address of the thread and/or worker in the output
> > to make it more useful?
> 
> Hm, I would lean against this and make the behavior identical to the existing
> window.dump. If consumers want to print that info they can, it's exposed in the
> 'thisThread' object.

But the thisThread property is an nsIDOMWorkerThread, which exposes nothing unique about itself. IOW, if you toString() it you'll only get [object DOMWorkerThread] or something, same thing on all threads (unless you're running this in a debug build which will include the address of the object in the string, but most people won't be running in debug builds). I don't know if this is critical to actually worry about, I guess scripts can always make up their own unique names and include that in the dump output if it's important... Unless of course it's important to track whether the JS executes on the same thread repeatedly or is bounced from thread to thread. I guess that only really matters to us, and we can figure that out in other ways if there's ever a need.

> > You'll need to root global way before here, it needs to be rooted right after
> > it's been created as there's nothing here holding it alive.
> 
> Given what mrbkap said is it worth changing this? Or should I just add an
> explanatory comment?

Nope, no change needed. Comment would be good.

[...]
> Hm... Yeah. This is a bug in the timer code, don't you think? Looks like the

Could be, not sure if this was ever intended use of timers, changing one from a one-shot timer to a repeating timer while it's firing. I could go either way there.

> timer code should remove the re-init'd timer if mArmed is true before adding
> that second time (as it does in InitCommon). I'll test this today to see if
> that's what's causing my mysterious slowdowns.

It could, but wouldn't that penalize us every time we fire a repeating slack timeout? I.e. we'd always need to attempt to remove the timeout before adding it, or something, as we won't easily know ahead of time whether the timer has already been added etc?
Comment 22 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-17 17:00:49 PDT
s/timeout/timer/g :)
Comment 23 Benjamin Smedberg [:bsmedberg] 2008-07-17 18:25:19 PDT
The "rules" of MS-COM state that when a method returns a failure code, it shouldn't touch the outparam at all. In many cases we set it to null, which is currently considered a minor warning in the outparams analysis. If you set it to a non-null param and return a failure code, that's a major warning and will become an error once the analysis and annotations are better.
Comment 24 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-21 11:59:42 PDT
Created attachment 330617 [details] [diff] [review]
Patch, v1.2

This patch applies on top of attachment 329381 [details] [diff] [review] although interdiff gave me a little trouble... Should address those comments above.
Comment 25 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-21 12:00:55 PDT
Oh, yeah, I filed bug 445828 on the timer re-init problem and... get this... it completely solves the slowdown problem. Yay!
Comment 26 Johnny Stenback (:jst, jst@mozilla.com) 2008-07-21 17:26:02 PDT
Comment on attachment 330617 [details] [diff] [review]
Patch, v1.2

Looks good!
Comment 27 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-07-23 12:40:11 PDT
Created attachment 330981 [details] [diff] [review]
Additional mochitest for timeouts
Comment 28 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-11 14:50:58 PDT
Created attachment 333289 [details] [diff] [review]
Patch, v1.3

This applies on top of attachment 330617 [details] [diff] [review] and fixes one hazard where a worker could see its last release on a background thread (which isn't supposed to happen).
Comment 29 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-11 14:53:49 PDT
Created attachment 333291 [details] [diff] [review]
Additional mochitest for many long-running wokers
Comment 30 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-13 11:21:02 PDT
Note the new tracker for worker threads features, bug 450448.
Comment 31 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-16 21:18:20 PDT
Pushed changeset 1ebacbd09ad9 to mozilla-central.

Also had to fix an uninitialized nsresult bug that caused failing test boxes, changeset c1e9838bb4f4.

FIXED.
Comment 32 Ted Mielczarek [:ted.mielczarek] 2008-08-19 05:25:40 PDT
bent: could you reformat the output of test_threadErrors.html slightly so that the normal output doesn't get picked up by the tinderbox's error parser?

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1219140344.1219143241.13982.gz

The regex it uses is here:
http://mxr.mozilla.org/mozilla/source/webtools/tinderbox/ep_unittest.pl#31
Comment 33 John Resig 2008-08-19 05:44:00 PDT
I notice that this implementation utilizes the concept of pools, which aren't defined in the WHATWG Web Worker spec. Is there a follow-up bug that's working to match the API? or is the spec out of date?
Comment 34 Olli Pettay [:smaug] 2008-08-19 07:35:26 PDT
When I run mochitest locally, workerthread tests take occasionally lots of time 
(10+ mins or so). I tried to attach gdb to see what is happening but so far I 
couldn't get any reasonable stack. The problem seems to happen only when I run 
all the mochitests, not only workerthread tests. If I run only workerthread tests,
they all pass in less than 2 minutes.
This is on 32 bit Linux.
Comment 35 Olli Pettay [:smaug] 2008-08-19 08:12:01 PDT
I think the problem has something to do with gczeal.
Comment 36 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-19 10:02:12 PDT
(In reply to comment #32)
> bent: could you reformat the output of test_threadErrors.html

I'm looking into it, but I'm not manually outputting those messages (that's the JS engine throwing the errors to the console), so I'm not sure how quick the fix will be.
Comment 37 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-19 10:03:20 PDT
(In reply to comment #33)
Yeah, we're still working on getting some spec stabilized, but until then we're just going to go with what we have. I'll file bugs just as soon as we know what changes we need to make.
Comment 38 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-19 10:05:51 PDT
(In reply to comment #34)
> When I run mochitest locally, workerthread tests take occasionally lots of time 

(In reply to comment #35)
> I think the problem has something to do with gczeal.
> 

Indeed. I'm a little paranoid about GC causing random problems since I'm passing weak refs across threads so I have gczeal turned up for all the workerthreads tests. Think this is a problem?

Comment 39 Brian Crowder 2008-08-19 10:09:27 PDT
My guess is that using gczeal here will make things run very slowly, especially for threaded tests, which I'm guessing will probably end up running effectively serially as a result of using the feature.  That may or may not be a bad thing, but you might instead want to run much shorter tests with gczeal tuned up high, than you would otherwise.
Comment 40 Olli Pettay [:smaug] 2008-08-19 11:06:50 PDT
Tested without gczeal==2, and now I can run mochitest in reasonable time.
WorkerThread tests don't take too much time.
So would be great to fix the tests somehow. Maybe shorter tests with gczeal==2, like crowder suggested.
Comment 41 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-19 12:41:51 PDT
Created attachment 334540 [details] [diff] [review]
Don't set gczeal in tests on debug builds
Comment 42 Ben Turner (not reading bugmail, use the needinfo flag!) 2008-08-19 12:48:23 PDT
Created attachment 334541 [details] [diff] [review]
Don't set gczeal in tests on debug builds

Now with comments
Comment 43 Olli Pettay [:smaug] 2008-08-19 13:01:05 PDT
Comment on attachment 334541 [details] [diff] [review]
Don't set gczeal in tests on debug builds

Just add a #ifdef DEBUG pref, rather than relying on a pref which is also #ifdef DEBUGged 
http://mxr.mozilla.org/mozilla-central/source/browser/app/profile/firefox.js#357

Note You need to log in before you can comment on or make changes to this bug.