Closed
Bug 1139604
Opened 10 years ago
Closed 10 years ago
Intermittent 817219.html | application crashed [@ mozilla::dom::ProtoAndIfaceCache::EntrySlotMustExist(unsigned int)] after "Assertion failure: (*this)[i], at BindingUtils.h:352"
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox37 | --- | unaffected |
firefox38 | --- | unaffected |
firefox39 | --- | fixed |
firefox-esr31 | --- | unaffected |
People
(Reporter: RyanVM, Assigned: bzbarsky)
Details
(Keywords: assertion, crash, intermittent-failure)
Attachments
(3 files)
See attached log and comment 1 for details. Amazing assertion stack if I might say so.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
![]() |
Assignee | |
Comment 3•10 years ago
|
||
Yeah, so... I was getting totally garbage stacks on Linux on try as well, last time I pushed something that crashed. Did someone break the stack-munging that's supposed to happen on test machines? Who'd know about that stuff?
Flags: needinfo?(ryanvm)
![]() |
Assignee | |
Comment 4•10 years ago
|
||
Past that, it looks like we have the following callers of EntrySlotMustExist:
1) GetUnforgeableHolder. This is actually dead code now that bug 1133760 landed.
2) The code that CGGetPerInterfaceObject spits out.
This second could be called if the interface object (or interface prototype object) is actually null, on various failures, and has comments saying so.
But ArrayCache::EntrySlotMustExist asserts that the value in the slot is non-null. I think that's a bogus assertion: we can assert that the slot actually exists (as we do in the PageTableCache::EntrySlotMustExist, but of course for the ArrayCache the slot always exists... But we can't assert we have a non-null value in the slot.
Nathan, am I just totally missing something here, or is this assertion a wrong as it looks like to me it is?
Flags: needinfo?(nfroyd)
![]() |
Assignee | |
Comment 5•10 years ago
|
||
Oh, so I just noticed the other fun bit from that stack: it's quite large. And we have:
09:35:18 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion
09:35:18 INFO - JavaScript error: , line 0: too much recursion
09:35:18 INFO - Assertion failure: (*this)[i], at ../../dist/include/mozilla/dom/BindingUtils.h:352
A green run has this instead:
19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion
19:08:05 INFO - [6494] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/dom/base/nsINode.cpp, line 1446
19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 26: too much recursion
19:08:05 INFO - JavaScript error: file:///builds/slave/test/build/tests/reftest/tests/layout/base/crashtests/817219-iframe.html, line 8: TypeError: f is not a function
and this repeats a bunch of times.
So anyway, I expect the issue is that we do the first too much recursion thing, then end up throwing _again_ while reporting it or something (stack sure would help there). This prevents some DOM class from being fully set up, but then we bogusly assert it is. Which DOM class is involved would be helpful to know, of course.
ni ted to look at the b0rked stack.
Flags: needinfo?(ted)
Comment 7•10 years ago
|
||
I noticed today that (some? all?) Linux try builds happen to be stripped of their symbols (I'm not talking about debug info). That surely would explain the log.
Comment 8•10 years ago
|
||
And looking slightly further, this could be related to the fact that opt builds are built with --enable-profiling, which sets STRIP_FLAGS to --strip-debug, which means "keep symbols", but debug builds aren't (because looking back at one of the builds I saw stripped, it was a debug build, and the corresponding opt build on the same push is not stripped).
Comment 9•10 years ago
|
||
So maybe we just want something like this:
diff --git a/configure.in b/configure.in
index f3b3365..3afee57 100644
--- a/configure.in
+++ b/configure.in
@@ -1734,17 +1734,17 @@ MOZ_ARG_ENABLE_BOOL(systrace,
[ --enable-systrace Set compile flags necessary for using sampling profilers (e.g. shark, perf)],
MOZ_USE_SYSTRACE=1,
MOZ_USE_SYSTRACE= )
if test -n "$MOZ_USE_SYSTRACE"; then
AC_DEFINE(MOZ_USE_SYSTRACE)
fi
# For profiling builds keep the symbol information
-if test "$MOZ_PROFILING" -a -z "$STRIP_FLAGS"; then
+if test "$MOZ_PROFILING" -a -z "$STRIP_FLAGS" || test "$MOZ_DEBUG"; then
case "$OS_TARGET" in
Linux|DragonFly|FreeBSD|NetBSD|OpenBSD)
STRIP_FLAGS="--strip-debug"
;;
esac
fi
dnl ========================================================
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 11•10 years ago
|
||
We know we strip debug builds, which is why the test harnesses run their output through the stack fixer scripts:
https://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#452
This should be effectively getting piped through fix_stack_using_bpsyms.
Flags: needinfo?(ted)
Reporter | ||
Comment 12•10 years ago
|
||
I don't think I have anything useful to add over what Mike and Ted have already said.
Flags: needinfo?(ryanvm)
Comment 13•10 years ago
|
||
I don't know why this didn't work in automation, I'll file a separate bug to follow up on that, but I downloaded the symbols for this build and piped the log through fix_stack_using_bpsyms.py locally and this is the result it produced, it looks sane.
Comment 14•10 years ago
|
||
Filed bug 1139922 on that.
Comment hidden (Legacy TBPL/Treeherder Robot) |
![]() |
Assignee | |
Comment 16•10 years ago
|
||
Ted, thank you for the stack!
OK, so the stack shows us firing NodeRemoved events in an infinite recursion, fine. Then one of those NodeRemoved listeners makes a reload() call, which calls stop(), which tries to fire a pageshow event. We apparently have a scripted event listener for pageshow (maybe in the browser chrome?) so we try to JS-wrap the PageTransitionEvent and this fails (presumably this is the "too much recursion, line 0" error; sure would be nice to have a stack to _that_. And then we hit the bogus assert.
It's a little disturbing to me that we can end up with too much recursion while creating the PageTransitionEvent prototype object like that. :( I wonder what exactly is doing that recursion check that makes it fail..
![]() |
Assignee | |
Comment 17•10 years ago
|
||
OK, I pushed some logging to try that instruments all the recursion-checking in the JS engine, and it looks like our second recursion failure is in CallJSPropertyOp.
So I think I know what's going on here, even without a stack to the CallJSPropertyOp invocation in question. This is kinda sorta triggered by bug 928336. When we go to set up the prototype for the PageTransitionEvent, we try to create the unforgeable holder for that binding and define the isTrusted property on it. The unforgeable holder has the PageTransitionEvent JSClass, which has an addProperty hook, which we invoke via CallJSPropertyOp from CallAddPropertyHook. This hits the recursion check and fails prototype creation, which then hits the bogus assert.
It _used_ to be the case that we'd create the prototype fine but then presumably fail to create the actual event object, since _that_ would have invoked the addProperty hook.
I'm not sure I see a sane way to avoid failing the prototype creation here, but given that the actual event firing failed anyway before as far as I can guess, I'm no longer worried about that part.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
![]() |
Assignee | |
Comment 20•10 years ago
|
||
Attachment #8573909 -
Flags: review?(peterv)
![]() |
Assignee | |
Updated•10 years ago
|
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
![]() |
Assignee | |
Updated•10 years ago
|
Flags: needinfo?(nfroyd)
Updated•10 years ago
|
Attachment #8573909 -
Flags: review?(peterv) → review+
![]() |
Assignee | |
Comment 21•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 24•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Reporter | ||
Updated•10 years ago
|
status-firefox37:
--- → unaffected
status-firefox38:
--- → unaffected
status-firefox-esr31:
--- → unaffected
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•