Closed Bug 942984 Opened 6 years ago Closed 6 years ago

Emscripten web page hard-crashes the browser.

Categories

(Core :: JavaScript Engine, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox26 - wontfix
firefox27 + verified
firefox28 + verified

People

(Reporter: jujjyl, Assigned: bhackett)

References

Details

(Keywords: regression)

Attachments

(1 file)

This one occurs on nightly build 2013-11-25 on OSX 10.8.4.

1. Visit https://dl.dropboxusercontent.com/u/40949268/emcc/bugs/nehe-emscripten/lesson7.html

(to download a locally executable version, get https://dl.dropboxusercontent.com/u/40949268/emcc/bugs/nehe-emscripten/lesson7.zip )

The browser will crash hard to desktop with a dialog "Firefox had a problem and crashed. We'll try to restore ..." message.
Current Aurora as of today did not crash.
Hmm.  All of those crash reports are empty.  :(

That said, this is quite reproducible:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000114a00ff8
[Switching to process 46321 thread 0x7d13]
js::gc::BarrieredCell<JSString>::zoneFromAnyThread (this=0x125bb18a0) at Barrier.h:155
155         JS_ALWAYS_INLINE JS::Zone *zoneFromAnyThread() const { return tenuredZoneFromAnyThread(); }
(gdb) 

At this point, my stack has 3234 frames on it, most of them like so:

#3076 0x00000001016e607a in js::frontend::Parser<js::frontend::SyntaxParseHandler>::statement (this=0x114a805b0, canHaveDirectives=false) at Parser.cpp:5205
#3077 0x00000001016e6fc1 in js::frontend::Parser<js::frontend::SyntaxParseHandler>::ifStatement (this=0x114a805b0) at Parser.cpp:3958
#3078 0x00000001016e607a in js::frontend::Parser<js::frontend::SyntaxParseHandler>::statement (this=0x114a805b0, canHaveDirectives=false) at Parser.cpp:5205
#3079 0x00000001016e6fc1 in js::frontend::Parser<js::frontend::SyntaxParseHandler>::ifStatement (this=0x114a805b0) at Parser.cpp:3958

(in fact, all the frames from frame 25 to frame 3220 kinda look like that).  I bet we're running out of stack space.

Looking at the lesson7.js script, the function __ZN5Regal5Token14GLenumToStringEj has a long if/else cascade with 3313 else clauses.

So it's entirely possible that this used to fit in the available stack space by accident and now just doesn't....
Component: General → JavaScript Engine
The regression range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a4e9c9c9dbf9&tochange=b9029b1de410 and the crash is for an async script, while parsing on a worker thread.  So presumably a regression from bug 906371.  But then why is it not an issue on Aurora?  Are we sure the crash doesn't happen there?

It looks like the JS compile helper threads are started via PR_CreateThread with a default thread stack size (over in jsworkers.cpp).

If https://developer.apple.com/library/mac/qa/qa1419/_index.html is not out of date, that default stack size is 512KB for pthreads on Mac, whereas the main thread uses an 8MB stack.  So I can definitely see how we might have fit in the latter but not in the former.

It looks like the worker GC threads use a 1MB stack.  But all that would do is buy us a few more thousand if nesting levels...  The real right fix probably involves doing this parse non-recursively or something, if emscripten outputs long if/else chains like this.

Brian, does it make sense to just bump up the thread stack size or these worker threads for now?  On Linux, looks like pthreads default to a 2MB stack.  On Windows, I would assume we end up with the default 1MB thread stacks.  So the Mac's 512KB is the lowest of the lot.
Blocks: 906371
Flags: needinfo?(bhackett1024)
I would recommend not increasing the thread stack size, since that would potentially hide the bug, waiting for someone else to write a new application that consumes the bigger stack space and crash again.

It's ok (for now) if that page doesn't run but throws a "stack overflow" exception or similar, to start with.
(In reply to Boris Zbarsky [:bz] from comment #4)
> It looks like the worker GC threads use a 1MB stack.  But all that would do
> is buy us a few more thousand if nesting levels...  The real right fix
> probably involves doing this parse non-recursively or something, if
> emscripten outputs long if/else chains like this.

The compiled code for the symbol __ZN5Regal5Token14GLenumToStringEj is from here: https://github.com/p3/regal/blob/53fc58e83f68a6f10e164b631f9e628bc411ed3b/src/regal/RegalToken.cpp#L712

It might be possible to improve emscripten output, azakai knows more, but we should not change emscripten codegen behavior before this crash is fixed, to not hide the bug.
I agree with Jukka, we have a mechanism to detect stack overflows during parsing and that should be doing the right thing here.
So the parser does have JS_CHECK_RECURSION bits.  In my case, these seem to macro-expand to:

  do {
    int stackDummy_;
    if (!((uintptr_t)(&stackDummy_) > (js::GetNativeStackLimit(cx))-(0))) {
      js_ReportOverRecursed(cx); 
      return false; 
    }
  } while (0);

js::GetNativeStackLimit is over in jscntxtinlines in this case and ends up returning cx->perThreadData->nativeStackLimit[kind].

On this thread, cx->perThreadData->nativeStackLimit is an array of three 0 entries.  So the RHS of that compare is always 0, the &stackDummy is always greater than it, and we never stop the recursion.
We seem to only set the nativeStackLimit stuff to anything nonzero in js::RecomputeStackLimit, which only recomputes it for rt->mainThread.
It's not clear to me whether NSPR has a way to ask what the thread stack size is of a thread you created.

In any case, we'd presumably want the same stack limit across architectures....
Attached patch patchSplinter Review
Yeah, we're not setting the native stack limit anywhere.  This patch just uses 512K as trying to figure out what the 'default' setting should be is super baroque, see the XPCJSRuntime constructor.  This patch also propagates the overrecursed bit to the parse task (the worker thread can't report exceptions directly) so that a message appears in the console after the parse fails.
Attachment #8338626 - Flags: review?(wmccloskey)
Flags: needinfo?(bhackett1024)
Assignee: nobody → bhackett1024
Attachment #8338626 - Flags: review?(wmccloskey) → review+
https://hg.mozilla.org/mozilla-central/rev/c5714ab2828b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Unless this is something we expect a lot of users to hit, the example looks like an edge case specific to the link and not a widely impacting site so we should only take this up to FF27 given that we're about to build 26 RC on Monday.  Brian or bz - ni? me if you disagree strongly with this assessment, if I'm missing something here.
One concern I have about the different stack size on main thread and workers: a script will compile fine until it's made async, and then it suddenly won't...

I'm not sure I have a proposed solution, unfortunately.
Comment on attachment 8338626 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 906371
User impact if declined: stack overflow crashes on unusual web scripts
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): none, except potential but nonsensical windows tp5 regressions (Bug 943924)
Attachment #8338626 - Flags: approval-mozilla-aurora?
Attachment #8338626 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This has non-trivial conflicts on Aurora.
Flags: needinfo?(bhackett1024)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:27.0) Gecko/20100101 Firefox/27.0

Verified as fixed on latest Aurora 28.0a2 (buildID: 20131218004001) and 27 Beta 2 (buildID: 20131216183647).
Status: RESOLVED → VERIFIED
Keywords: verifyme
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0

Verified as fixed on Firefox 28 beta 2.
You need to log in before you can comment on or make changes to this bug.