Closed Bug 1139604 Opened 10 years ago Closed 10 years ago

Intermittent 817219.html | application crashed [@ mozilla::dom::ProtoAndIfaceCache::EntrySlotMustExist(unsigned int)] after "Assertion failure: (*this)[i], at BindingUtils.h:352"

Categories

(Core :: DOM: Core & HTML, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 --- unaffected
firefox38 --- unaffected
firefox39 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: bzbarsky)

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(3 files)

Attached file test log
See attached log and comment 1 for details. Amazing assertion stack if I might say so.
Yeah, so... I was getting totally garbage stacks on Linux on try as well, last time I pushed something that crashed. Did someone break the stack-munging that's supposed to happen on test machines? Who'd know about that stuff?
Flags: needinfo?(ryanvm)
Past that, it looks like we have the following callers of EntrySlotMustExist: 1) GetUnforgeableHolder. This is actually dead code now that bug 1133760 landed. 2) The code that CGGetPerInterfaceObject spits out. This second could be called if the interface object (or interface prototype object) is actually null, on various failures, and has comments saying so. But ArrayCache::EntrySlotMustExist asserts that the value in the slot is non-null. I think that's a bogus assertion: we can assert that the slot actually exists (as we do in the PageTableCache::EntrySlotMustExist, but of course for the ArrayCache the slot always exists... But we can't assert we have a non-null value in the slot. Nathan, am I just totally missing something here, or is this assertion a wrong as it looks like to me it is?
Flags: needinfo?(nfroyd)
Oh, so I just noticed the other fun bit from that stack: it's quite large. And we have: 09:35:18 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion 09:35:18 INFO - JavaScript error: , line 0: too much recursion 09:35:18 INFO - Assertion failure: (*this)[i], at ../../dist/include/mozilla/dom/BindingUtils.h:352 A green run has this instead: 19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion 19:08:05 INFO - [6494] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/base/nsINode.cpp, line 1446 19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion 19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 8: TypeError: f is not a function and this repeats a bunch of times. So anyway, I expect the issue is that we do the first too much recursion thing, then end up throwing _again_ while reporting it or something (stack sure would help there). This prevents some DOM class from being fully set up, but then we bogusly assert it is. Which DOM class is involved would be helpful to know, of course.
ni ted to look at the b0rked stack.
Flags: needinfo?(ted)
I noticed today that (some? all?) Linux try builds happen to be stripped of their symbols (I'm not talking about debug info). That surely would explain the log.
And looking slightly further, this could be related to the fact that opt builds are built with --enable-profiling, which sets STRIP_FLAGS to --strip-debug, which means "keep symbols", but debug builds aren't (because looking back at one of the builds I saw stripped, it was a debug build, and the corresponding opt build on the same push is not stripped).
So maybe we just want something like this: diff --git a/configure.in b/configure.in index f3b3365..3afee57 100644 --- a/configure.in +++ b/configure.in @@ -1734,17 +1734,17 @@ MOZ_ARG_ENABLE_BOOL(systrace, [ --enable-systrace Set compile flags necessary for using sampling profilers (e.g. shark, perf)], MOZ_USE_SYSTRACE=1, MOZ_USE_SYSTRACE= ) if test -n "$MOZ_USE_SYSTRACE"; then AC_DEFINE(MOZ_USE_SYSTRACE) fi # For profiling builds keep the symbol information -if test "$MOZ_PROFILING" -a -z "$STRIP_FLAGS"; then +if test "$MOZ_PROFILING" -a -z "$STRIP_FLAGS" || test "$MOZ_DEBUG"; then case "$OS_TARGET" in Linux|DragonFly|FreeBSD|NetBSD|OpenBSD) STRIP_FLAGS="--strip-debug" ;; esac fi dnl ========================================================
We know we strip debug builds, which is why the test harnesses run their output through the stack fixer scripts: https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#452 This should be effectively getting piped through fix_stack_using_bpsyms.
Flags: needinfo?(ted)
I don't think I have anything useful to add over what Mike and Ted have already said.
Flags: needinfo?(ryanvm)
I don't know why this didn't work in automation, I'll file a separate bug to follow up on that, but I downloaded the symbols for this build and piped the log through fix_stack_using_bpsyms.py locally and this is the result it produced, it looks sane.
Ted, thank you for the stack! OK, so the stack shows us firing NodeRemoved events in an infinite recursion, fine. Then one of those NodeRemoved listeners makes a reload() call, which calls stop(), which tries to fire a pageshow event. We apparently have a scripted event listener for pageshow (maybe in the browser chrome?) so we try to JS-wrap the PageTransitionEvent and this fails (presumably this is the "too much recursion, line 0" error; sure would be nice to have a stack to _that_. And then we hit the bogus assert. It's a little disturbing to me that we can end up with too much recursion while creating the PageTransitionEvent prototype object like that. :( I wonder what exactly is doing that recursion check that makes it fail..
OK, I pushed some logging to try that instruments all the recursion-checking in the JS engine, and it looks like our second recursion failure is in CallJSPropertyOp. So I think I know what's going on here, even without a stack to the CallJSPropertyOp invocation in question. This is kinda sorta triggered by bug 928336. When we go to set up the prototype for the PageTransitionEvent, we try to create the unforgeable holder for that binding and define the isTrusted property on it. The unforgeable holder has the PageTransitionEvent JSClass, which has an addProperty hook, which we invoke via CallJSPropertyOp from CallAddPropertyHook. This hits the recursion check and fails prototype creation, which then hits the bogus assert. It _used_ to be the case that we'd create the prototype fine but then presumably fail to create the actual event object, since _that_ would have invoked the addProperty hook. I'm not sure I see a sane way to avoid failing the prototype creation here, but given that the actual event firing failed anyway before as far as I can guess, I'm no longer worried about that part.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Flags: needinfo?(nfroyd)
Attachment #8573909 - Flags: review?(peterv) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: