Closed Bug 667915 Opened 13 years ago Closed 13 years ago

Assertion failure: pc_ >= script->code && pc_ < script->code + script->length | Assertion failure: compartment mismatched | Crash [@ moz_free | NS_Free_P nsMemory::Free(void*) XPCJSStackFrame::~XPCJSStackFrame() ]

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: bc, Assigned: luke)

References

()

Details

(Keywords: assertion, crash, reproducible, Whiteboard: [inbound])

Attachments

(3 files, 1 obsolete file)

1. http://www.autoteile-immler.com/THULE+973+BackPac+Fahrradtr%e4ger+inkl.+Montagekit+973-15/a5730277_u161/

2. stop or continue slow script.

3. Crash with multiple signatures or assertions.

Mac OS X 10.5

Assertion failure: pc_ >= script->code && pc_ < script->code + script->length, at /work/mozilla/builds/nightly/mozilla/js/src/vm/Stack.cpp:978

Linux 64 bit

Assertion failure: compartment mismatched

Windows XP and 7

moz_free | NS_Free_P nsMemory::Free(void*) XPCJSStackFrame::~XPCJSStackFrame() XPCJSStackFrame::`vector deleting destructor'(unsigned int) + 0xe XPCJSStackFrame::Release()
So check this out: the code chews up the whole stack (causing an exception to be thrown), when its time to report the exception, we try to push a dummy frame from AutoEnterCompartment, THAT also hits the end of the stack, tries to throw, but now the exception's compartment doesn't match the cx's, hence the assert.
This fixes the compartment assert (which I can repro on linux64 by disabling all jits): AutoCompartment::enter switches the compartment before pushing the dummy frame (hence the mismatch when the CompartmentChecker asserts that GetGlobalForScopeChain (fp->scopeChain() in this case) is in cx->compartment).  I'm a bit confused why it was written this way in the first place; perhaps there is some hidden constraint here?
Assignee: general → luke
Status: NEW → ASSIGNED
Attachment #542601 - Flags: review?(mrbkap)
Comment on attachment 542601 [details] [diff] [review]
Fix AutoCompartment::enter

My only thought would be a fear that we'd have to read cx->compartment in order to allocate stack space for the dummy frame, but that doesn't appear to be the case, so r=mrbkap.
Attachment #542601 - Flags: review?(mrbkap) → review+
Attached patch Fix FixupAritySplinter Review
FixupArity calls getFixupFrame which reports despite a trashed regs.pc.

With these patches, I don't get any asserts.  But this does expose a problem I've been a'fearin', namely, that content can use up the stack and OOM unsuspecting chrome JS.  I'm going to think about this.
Attachment #542615 - Flags: review?(dvander)
So its actually pretty easy for content to oom chrome.  For example, this page will use up all the stack space such that the slow-script dialog will fail, eventually locking up the entire browser while running at 100% CPU:

  <script>
  function foo() {
    try {
      foo()
    } catch(e) {
      while (true) ;
    }
  </script>

The attached patch fixes this test case and also causes the page in comment 0 to have a normal slow-script dialog.  The approach is to report a smaller stack size to any script that doesn't have system principals.  That way, content ooms early and, when chrome runs, it still has plenty of space.  Still a bit of work to do.
Mostly same patch but with build fixes for windows and a JSAPI test of the new mechanism.  This patch subsumes the other two and includes shell test-cases to trigger the specific asserts in the previous two patches.

All in all, I'm pretty happy to have this patch: its relatively simple and it fixes a long-standing concern since the introduction of the contiguous stack.  Incidentally, the other three browsers all show issues with the testcase above:
 - IE hangs and needs to be force-quit
 - Safari hangs and then shows what is probably supposed to be their slow-script dialog, but its a tiny empty rectangle. Fortunately, their default-close behavior kills the script.
 - Chrome UI remains responsive (process isolation ftw), but the tab is unresponsive and doesn't display the slow-script dialog that normally is shown; it has to be killed by clicking the tab's 'x'.

Blake: could you check out the xpconnect/caps changes and see if that all look sane?
Waldo: can you look at the rest?
Attachment #543023 - Attachment is obsolete: true
Attachment #543291 - Flags: review?(mrbkap)
Attachment #543291 - Flags: review?(jwalden+bmo)
Attachment #542615 - Flags: review?(dvander)
Green on try (once "if (getMaxArgs)" is added to regress-350256-02.js).
Comment on attachment 543291 [details] [diff] [review]
don't let content oom chrome

Review of attachment 543291 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Stack.h
@@ +1357,5 @@
>       * LeaveTree requires stack allocation to rebuild the stack. There is no
>       * good way to handle an OOM for these allocations, so this function checks
>       * that OOM cannot occur using the size of the TraceNativeStorage as a
> +     * conservative upper bound. Despite taking a 'cx', this function does not
> +     * report an error if it returns 'false'.

I'd pull the last sentence into its own paragraph; it blends in a bit here.

@@ +1362,2 @@
>       */
> +    inline bool ensureEnoughSpaceToEnterTrace(JSContext *cx, MaybeReportError report);

This method's always called with report == NO_REPORT_ERROR.  Is that for documentation purposes?  I'm not sure I see anything clearly better, but maybe there is something.
Attachment #543291 - Flags: review?(jwalden+bmo) → review+
Attachment #543291 - Flags: review?(mrbkap) → review+
(In reply to comment #10)
Thanks for the test-case.  With the patch, I can repeatedly click 'Continue' and eventually 'Stop' and the page is responsive with no asserts.
http://hg.mozilla.org/mozilla-central/rev/28be8df0deb7
http://hg.mozilla.org/mozilla-central/rev/0661e7077dce
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
The patch aside, why is firefox even hanging on those pages?  For me, neither chrome 13 nor ie 9 hang on those pages, and opera 11.5 only momentarily pauses on the comment 10 page.  So while this patch helps firefox recover, it would be nice if there was a follow-up to address the root problem.
The root problem is we don't have content/chrome process separation.  Rest assured, work is underway: https://wiki.mozilla.org/Electrolysis.

A secondary root problem is that, if we have an error while running the operation callback, apparently we either (1) lose it or (2) choose not to halt the running script.  That sounds like a nice bug to file, if you'd like.
(In reply to comment #15)

Thanks, Luke.

> A secondary root problem is that, if we have an error while running the
> operation callback, apparently we either (1) lose it or (2) choose not to
> halt the running script.  That sounds like a nice bug to file, if you'd like.

Don't understand enough to file that one.  Any takers?
Filed bug 670183.

To be clear, I think the problem fixed by this bug is as "root" of a problem as the other two things; it fixes a class of DOSes.
Luke, is this really fixed on Beta/Firefox 8? I see a very similar crash (stack overflow) with oodles of 

XPCJSStackFrame::Release()
nsCOMPtr<nsIStackFrame>::~nsCOMPtr<nsIStackFrame>()
XPCJSStackFrame::~XPCJSStackFrame()

on the stack when visiting http://www.in-town.nl/web/ I don't see it in the automation for Aurora/9, or Nightly/10.
That stack trace would almost certainly have been fixed by bug 676934.  (Basically, we increased the scripted stack depth max (to something comparable to other engines) and it looks like XPCJSStackFrame was doing a recursive destructor call for each such frame which apparently is enough to overflow the C stack.  Bug 676934 just bounded the number of XPCJSStackFrames we create.)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: