Last Comment Bug 678623 - "too much recursion" with physics engine in worker thread
: "too much recursion" with physics engine in worker thread
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla8
Assigned To: Alon Zakai (:azakai)
:
: Jason Orendorff [:jorendorff]
Mentors:
http://www.syntensity.com/static/stag...
Depends on: 653967
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-12 14:16 PDT by Alon Zakai (:azakai)
Modified: 2011-08-16 10:53 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
stack trace (17.25 KB, text/plain)
2011-08-12 14:16 PDT, Alon Zakai (:azakai)
no flags Details
patch (1005 bytes, patch)
2011-08-12 15:54 PDT, Alon Zakai (:azakai)
no flags Details | Diff | Splinter Review
patch v2 (1.94 KB, patch)
2011-08-15 16:48 PDT, Alon Zakai (:azakai)
bent.mozilla: review+
Details | Diff | Splinter Review

Description Alon Zakai (:azakai) 2011-08-12 14:16:50 PDT
Created attachment 552763 [details]
stack trace

The worker version of the physics engine demo in bug 653967,

http://www.syntensity.com/static/staging/demo_worker.html

fails with "too much recursion" happening in the worker thread. However the non-worker version of that demo, which loads the same physics engine (just not in a worker) works ok,

http://www.syntensity.com/static/staging/demo_simple.html

This problem is sensitive to something, not sure what: I get it on Nightly on linux and when I build with default settings, but I do not get it when I build with --disable-optimize for example. bent can't see it on os x.

Attaching a stack trace for this error. js_EmitTree occurs dozens of times calling itself recursively, eventually leading to js_ReportOverRecursed being run.
Comment 1 Alon Zakai (:azakai) 2011-08-12 15:54:45 PDT
Created attachment 552785 [details] [diff] [review]
patch

This patch sets the stack quota to 128*sizeof(size_t)*1024, which is the same as elsewhere (TabChild, nsInProcessTabChildGlobal). With this the demo works fine.
Comment 2 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-12 16:24:06 PDT
I don't know if this is safe... Does the mochitest dom/workers/test/test_recursion.html pass without crashing? I tuned this to 32 on OS X because that was the largest value I could use without that test crashing...
Comment 3 Alon Zakai (:azakai) 2011-08-12 16:47:28 PDT
I sent to try half an hour ago. I guess I'll see it crash on OS X there...

Do you know if the code in the worker in that test crashes OS X when run normally (not in a worker)? If so, then the other places we currently set the stack size to the value in the patch are dangerous. If not, then I wonder what is different between the worker and non-worker case.
Comment 4 Alon Zakai (:azakai) 2011-08-12 18:22:42 PDT
As far as I can see, we never pick a stack size - all our thread creations use the default platform values. And I believe those values are 8MB for Linux, 1MB for Windows, 512K for OS X. So I guess that can explain why OS X would be most sensitive to this problem - the JS stack size limit we set on a 32-bit platform is 512K.

If that is the situation, perhaps a solution is to set the stack size to 1MB on OS X? I worry about us doing such a change though.
Comment 5 Emanuel Hoogeveen [:ehoogeveen] 2011-08-13 06:20:46 PDT
Bug 664341 reduced the stack size for media threads on Linux.
Comment 6 Alon Zakai (:azakai) 2011-08-15 16:48:35 PDT
Created attachment 553309 [details] [diff] [review]
patch v2

This patch sets the stack size to 1MB and the JS stack limit to 512K (on 32-bit systems), and passes try.

The patch uses the same stack size on all OSes instead of using the defaults. This seems safer since we are now able to be sure that the JS stack limit will be smaller (the crashes before were apparently due to the JS stack limit being equal to the native stack size, and we overflowed).
Comment 7 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-15 16:58:16 PDT
Comment on attachment 553309 [details] [diff] [review]
patch v2

Review of attachment 553309 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good! Thanks.

::: dom/workers/RuntimeService.cpp
@@ +78,5 @@
>  #define WORKER_RUNTIME_HEAPSIZE 32 * 1024 * 1024
>  
> +// The C stack size. We use the same stack size on all platforms for
> +// consistency.
> +#define WORKER_STACKSIZE 256 * sizeof(size_t) * 1024

Nit: WORKER_STACK_SIZE

@@ +804,5 @@
>  
>    if (!thread) {
> +    if (NS_FAILED(NS_NewThread(getter_AddRefs(thread),
> +                               nsnull,
> +                               WORKER_STACKSIZE))) {

Nit: keep nsnull on the first line, only wrap what you have to.
Comment 9 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-16 04:18:07 PDT
http://hg.mozilla.org/mozilla-central/rev/f9a0435281d3

Could we check in a test here?
Comment 10 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-16 07:57:48 PDT
The test is "Does dom/workers/test/test_recursion.html crash the browser?"
Comment 11 Alon Zakai (:azakai) 2011-08-16 10:03:39 PDT
We can also add a test for the other direction, that is, to test if we can run huge scripts in workers without hitting "too much recursion". I don't see how to do that though except to add a test with the demo in this bug (2MB), and I assumed people would object to that. Or not?

Note You need to log in before you can comment on or make changes to this bug.