Last Comment Bug 667915 - Assertion failure: pc_ >= script->code && pc_ < script->code + script->length | Assertion failure: compartment mismatched | Crash [@ moz_free | NS_Free_P nsMemory::Free(void*) XPCJSStackFrame::~XPCJSStackFrame() ]
: Assertion failure: pc_ >= script->code && pc_ < script->code + script->length...
: assertion, crash, reproducible
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 All
-- critical (vote)
: mozilla8
Assigned To: Luke Wagner [:luke]
: Jason Orendorff [:jorendorff]
: 691784 (view as bug list)
Depends on: 669890
Blocks: 532972 669123
  Show dependency treegraph
Reported: 2011-06-28 08:40 PDT by Bob Clary [:bc:]
Modified: 2011-10-16 21:00 PDT (History)
9 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Fix AutoCompartment::enter (981 bytes, patch)
2011-06-28 14:20 PDT, Luke Wagner [:luke]
mrbkap: review+
Details | Diff | Splinter Review
Fix FixupArity (8.53 KB, patch)
2011-06-28 14:55 PDT, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
don't let content oom chrome (WIP) (46.46 KB, patch)
2011-06-29 18:01 PDT, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
don't let content oom chrome (77.75 KB, patch)
2011-06-30 16:10 PDT, Luke Wagner [:luke]
jwalden+bmo: review+
mrbkap: review+
Details | Diff | Splinter Review

Description User image Bob Clary [:bc:] 2011-06-28 08:40:14 PDT

2. stop or continue slow script.

3. Crash with multiple signatures or assertions.

Mac OS X 10.5

Assertion failure: pc_ >= script->code && pc_ < script->code + script->length, at /work/mozilla/builds/nightly/mozilla/js/src/vm/Stack.cpp:978

Linux 64 bit

Assertion failure: compartment mismatched

Windows XP and 7

moz_free | NS_Free_P nsMemory::Free(void*) XPCJSStackFrame::~XPCJSStackFrame() XPCJSStackFrame::`vector deleting destructor'(unsigned int) + 0xe XPCJSStackFrame::Release()
Comment 1 User image Bob Clary [:bc:] 2011-06-28 08:41:18 PDT
stack for Mac:
Comment 2 User image Luke Wagner [:luke] 2011-06-28 13:44:30 PDT
So check this out: the code chews up the whole stack (causing an exception to be thrown), when its time to report the exception, we try to push a dummy frame from AutoEnterCompartment, THAT also hits the end of the stack, tries to throw, but now the exception's compartment doesn't match the cx's, hence the assert.
Comment 3 User image Luke Wagner [:luke] 2011-06-28 14:20:12 PDT
Created attachment 542601 [details] [diff] [review]
Fix AutoCompartment::enter

This fixes the compartment assert (which I can repro on linux64 by disabling all jits): AutoCompartment::enter switches the compartment before pushing the dummy frame (hence the mismatch when the CompartmentChecker asserts that GetGlobalForScopeChain (fp->scopeChain() in this case) is in cx->compartment).  I'm a bit confused why it was written this way in the first place; perhaps there is some hidden constraint here?
Comment 4 User image Blake Kaplan (:mrbkap) 2011-06-28 14:32:06 PDT
Comment on attachment 542601 [details] [diff] [review]
Fix AutoCompartment::enter

My only thought would be a fear that we'd have to read cx->compartment in order to allocate stack space for the dummy frame, but that doesn't appear to be the case, so r=mrbkap.
Comment 5 User image Luke Wagner [:luke] 2011-06-28 14:55:58 PDT
Created attachment 542615 [details] [diff] [review]
Fix FixupArity

FixupArity calls getFixupFrame which reports despite a trashed regs.pc.

With these patches, I don't get any asserts.  But this does expose a problem I've been a'fearin', namely, that content can use up the stack and OOM unsuspecting chrome JS.  I'm going to think about this.
Comment 6 User image Luke Wagner [:luke] 2011-06-29 18:01:22 PDT
Created attachment 543023 [details] [diff] [review]
don't let content oom chrome (WIP)

So its actually pretty easy for content to oom chrome.  For example, this page will use up all the stack space such that the slow-script dialog will fail, eventually locking up the entire browser while running at 100% CPU:

  function foo() {
    try {
    } catch(e) {
      while (true) ;

The attached patch fixes this test case and also causes the page in comment 0 to have a normal slow-script dialog.  The approach is to report a smaller stack size to any script that doesn't have system principals.  That way, content ooms early and, when chrome runs, it still has plenty of space.  Still a bit of work to do.
Comment 7 User image Luke Wagner [:luke] 2011-06-30 16:10:36 PDT
Created attachment 543291 [details] [diff] [review]
don't let content oom chrome

Mostly same patch but with build fixes for windows and a JSAPI test of the new mechanism.  This patch subsumes the other two and includes shell test-cases to trigger the specific asserts in the previous two patches.

All in all, I'm pretty happy to have this patch: its relatively simple and it fixes a long-standing concern since the introduction of the contiguous stack.  Incidentally, the other three browsers all show issues with the testcase above:
 - IE hangs and needs to be force-quit
 - Safari hangs and then shows what is probably supposed to be their slow-script dialog, but its a tiny empty rectangle. Fortunately, their default-close behavior kills the script.
 - Chrome UI remains responsive (process isolation ftw), but the tab is unresponsive and doesn't display the slow-script dialog that normally is shown; it has to be killed by clicking the tab's 'x'.

Blake: could you check out the xpconnect/caps changes and see if that all look sane?
Waldo: can you look at the rest?
Comment 8 User image Luke Wagner [:luke] 2011-07-01 14:08:18 PDT
Green on try (once "if (getMaxArgs)" is added to regress-350256-02.js).
Comment 9 User image Jeff Walden [:Waldo] (remove +bmo to email) 2011-07-05 11:30:27 PDT
Comment on attachment 543291 [details] [diff] [review]
don't let content oom chrome

Review of attachment 543291 [details] [diff] [review]:

::: js/src/vm/Stack.h
@@ +1357,5 @@
>       * LeaveTree requires stack allocation to rebuild the stack. There is no
>       * good way to handle an OOM for these allocations, so this function checks
>       * that OOM cannot occur using the size of the TraceNativeStorage as a
> +     * conservative upper bound. Despite taking a 'cx', this function does not
> +     * report an error if it returns 'false'.

I'd pull the last sentence into its own paragraph; it blends in a bit here.

@@ +1362,2 @@
>       */
> +    inline bool ensureEnoughSpaceToEnterTrace(JSContext *cx, MaybeReportError report);

This method's always called with report == NO_REPORT_ERROR.  Is that for documentation purposes?  I'm not sure I see anything clearly better, but maybe there is something.
Comment 11 User image Luke Wagner [:luke] 2011-07-06 10:58:14 PDT
(In reply to comment #10)
Thanks for the test-case.  With the patch, I can repeatedly click 'Continue' and eventually 'Stop' and the page is responsive with no asserts.
Comment 14 User image IU 2011-07-08 09:07:38 PDT
The patch aside, why is firefox even hanging on those pages?  For me, neither chrome 13 nor ie 9 hang on those pages, and opera 11.5 only momentarily pauses on the comment 10 page.  So while this patch helps firefox recover, it would be nice if there was a follow-up to address the root problem.
Comment 15 User image Luke Wagner [:luke] 2011-07-08 09:31:07 PDT
The root problem is we don't have content/chrome process separation.  Rest assured, work is underway:

A secondary root problem is that, if we have an error while running the operation callback, apparently we either (1) lose it or (2) choose not to halt the running script.  That sounds like a nice bug to file, if you'd like.
Comment 16 User image IU 2011-07-08 10:03:24 PDT
(In reply to comment #15)

Thanks, Luke.

> A secondary root problem is that, if we have an error while running the
> operation callback, apparently we either (1) lose it or (2) choose not to
> halt the running script.  That sounds like a nice bug to file, if you'd like.

Don't understand enough to file that one.  Any takers?
Comment 17 User image Luke Wagner [:luke] 2011-07-08 10:49:59 PDT
Filed bug 670183.

To be clear, I think the problem fixed by this bug is as "root" of a problem as the other two things; it fixes a class of DOSes.
Comment 18 User image Marius Strumyla 2011-10-07 06:18:17 PDT
*** Bug 691784 has been marked as a duplicate of this bug. ***
Comment 19 User image Bob Clary [:bc:] 2011-10-16 20:47:47 PDT
Luke, is this really fixed on Beta/Firefox 8? I see a very similar crash (stack overflow) with oodles of 


on the stack when visiting I don't see it in the automation for Aurora/9, or Nightly/10.
Comment 20 User image Luke Wagner [:luke] 2011-10-16 21:00:54 PDT
That stack trace would almost certainly have been fixed by bug 676934.  (Basically, we increased the scripted stack depth max (to something comparable to other engines) and it looks like XPCJSStackFrame was doing a recursive destructor call for each such frame which apparently is enough to overflow the C stack.  Bug 676934 just bounded the number of XPCJSStackFrames we create.)

Note You need to log in before you can comment on or make changes to this bug.