Closed
Bug 708825
Opened 13 years ago
Closed 13 years ago
"ASSERTION: Fault in cycle collector: traversed refs exceed refcount" closing window nearScriptStackLimit
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
People
(Reporter: jruderman, Assigned: mccr8)
Details
(Keywords: assertion, testcase, Whiteboard: [sg:critical][qa!])
Attachments
(3 files)
624 bytes,
text/html
|
Details | |
1.25 KB,
patch
|
Details | Diff | Splinter Review | |
1.32 KB,
patch
|
bholley
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta+
akeybl
:
approval-mozilla-esr10+
|
Details | Diff | Splinter Review |
1. Load the testcase
2. Click the button
3. Close the testcase
4. Force a CC (e.g. by quitting Firefox or using about:memory)
Result:
###!!! ASSERTION: Fault in cycle collector: traversed refs exceed refcount (ptr: 127319dc0)
: 'Not Reached', file /Users/jruderman/trees/mozilla-central/xpcom/base/nsCycleCollector.cpp, line 1235
In addition, everything leaks.
I can reproduce with both Tinderbox debug builds and a local debug build (Mac OS X 10.6, 64-bit Firefox).
Some variants crash instead, in exploitable-looking ways.
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → continuation
Assignee | ||
Comment 1•13 years ago
|
||
When the cycle collector hits a fault like this, it turns itself off, which would explain the leaks.
Assignee | ||
Comment 2•13 years ago
|
||
The easiest way for me to reproduce this was to load the test case, click the button, then just exit the browser. The first shutdown CC will hit the assertion, I believe.
The node in question is an <html> element. It has a refcount of 3, with 4 references to it. Those references are its 3 children, plus the document it is in. The children are a <head>, <body> and a nsGenericDOMDataNode. The nsGenericDOMDataNode seems like it has the most possibility for being what is wrong, but I'm not sure.
The DOM isn't traversed until it isn't part of a currently displayed window, which is probably why you have to close the testcase before the assertion triggers.
Reporter | ||
Comment 3•13 years ago
|
||
I'm curious which <html> element. The testcase, about:blank, or data:text/html,1 ?
Assignee | ||
Comment 4•13 years ago
|
||
The test case. It looks like there's no data:test/html document by the time the CC fires off, but I could be wrong.
Reporter | ||
Comment 5•13 years ago
|
||
Oh, that explains why I had to leave in the line
void ((new XMLSerializer).serializeToString(document.documentElement));
Comment 6•13 years ago
|
||
Is this a recent regression or has it been around for a while (e.g. does it affect Fx9 or earlier)?
Updated•13 years ago
|
status-firefox10:
--- → affected
status-firefox11:
--- → affected
status-firefox8:
--- → wontfix
status-firefox9:
--- → wontfix
tracking-firefox10:
--- → +
tracking-firefox11:
--- → +
tracking-firefox8:
--- → -
tracking-firefox9:
--- → -
Updated•13 years ago
|
status-firefox12:
--- → affected
tracking-firefox12:
--- → +
Assignee | ||
Comment 7•13 years ago
|
||
I'll start looking at this again. Though I worry the recent CC changes will make the test case succeed, but the problem still remains. I can always go back to a slightly older build.
I'm curious about what is causing it, but I wonder if in general we should just quit the browser if we hit this fault. It seems to me that if you have more references to something than the refcount thinks there are, that's just a dangling pointer waiting to happen. This is the first time I've ever heard of this happening, so hopefully it isn't a common problem. And by turning them into crashes, at least we'd have an idea if it was.
Assignee | ||
Comment 8•13 years ago
|
||
John glanced at the serializer code, and he said it looked like it was manually patching up some ref count related stuff, and had some scary comments, so that sounds promising.
Updated•13 years ago
|
status-firefox-esr10:
--- → affected
status-firefox13:
--- → affected
Comment 9•13 years ago
|
||
Andrew, ping?
Assignee | ||
Comment 10•13 years ago
|
||
Oh right this. I'll look at it after bug 723971.
Assignee | ||
Comment 11•13 years ago
|
||
I can't reproduce this on a recent Nightly debug build. I even tried making all CCs WANT_ALL_TRACES which should disable most of the recent CC optimizations, but that didn't seem to matter.
On a debug Aurora build, it crashes immediately:
Assertion failure: !cx->isExceptionPending(), at /builds/slave/m-aurora-osx-dbg/build/js/src/jscntxtinlines.h:314
I'm not sure what that means.
On a recent 11 debug nightly, it doesn't seem to reproduce either.
Assignee | ||
Comment 12•13 years ago
|
||
Jesse, can you still reproduce this? Do you happen to have some similar test cases that do reproduce it?
If not, I think I'll do a bisection on 11 to try to figure out where it stopped hitting the assertion, as I'm not really convinced this has been fixed magically.
Assignee | ||
Comment 13•13 years ago
|
||
Okay, now I can't reproduce this on a 12-08 nightly from here:
ftp://ftp.mozilla.org/pub/firefox/nightly/2011/12/2011-12-08-mozilla-central-debug/
or on a couple of other December nightlies. I'm not sure what I'm doing wrong here.
Updated•13 years ago
|
Whiteboard: [sg:critical] → [sg:critical] [need answer to comment 12 from jruderman]
Comment 14•13 years ago
|
||
(In reply to Jesse Ruderman from comment #0)
> Some variants crash instead, in exploitable-looking ways.
Can we get one or two of those attached here? Might help mccr8 find the problem.
Assignee | ||
Comment 15•13 years ago
|
||
Okay, "fortunately" I can reproduce this issue on a local build of beta. I'll go from there.
Whiteboard: [sg:critical] [need answer to comment 12 from jruderman] → [sg:critical]
Updated•13 years ago
|
tracking-firefox13:
--- → +
Assignee | ||
Comment 16•13 years ago
|
||
Thanks a lot to jst for helping me work through this.
When we're morphing a slim wrapper into a regular wrapper, we create a new XPCWrappedNative, and set it to point at the native. We do not addref to do this, because the plan is to make the JS Object wrapper point to the XPCWN, so the ref count will stay the same. But we're in an inconsistent state until we make the update.
We then attempt to enter the compartment of the existing JS object (the wrapper). This fails due to stack overflow. We then drop the last reference to the XPCWrappedNative, causing it to call its destructor. When the destructor runs, it sees that the XPCWN points to a native, so it calls release on this object, because it thinks it owns it, but it doesn't actually. This causes the ref count to be too low, causing the cycle collector assertion.
The fix is to null out the pointer to the native in the XPCWN before we release it.
Assignee | ||
Comment 17•13 years ago
|
||
It looks like this was broken by brain transplants (bug 580128):
http://hg.mozilla.org/mozilla-central/diff/9dc0c11e8eb8/js/src/xpconnect/src/xpcwrappednative.cpp
That added the compartment enter check.
Assignee | ||
Updated•13 years ago
|
Attachment #604576 -
Attachment description: fix an error path, patch against Fx11 → fix an error path, patch against Fx11 and 12
Assignee | ||
Comment 18•13 years ago
|
||
Basically the same.
Assignee | ||
Updated•13 years ago
|
Attachment #604590 -
Flags: review?(bobbyholley+bmo)
Reporter | ||
Comment 19•13 years ago
|
||
\o/
Think we'll be able to fix this on the Firefox 10 branch? The bug will be fairly easy to discover once the nearScriptStackLimit gadget becomes public. It might become public as the result of other bugs being fixed, or as the result of the planned fuzzer release in April.
Assignee | ||
Comment 20•13 years ago
|
||
Isn't 11 releasing within a week?
Pretty cool fuzzer tool, though! jst suggested that having a enter compartment thing that could be triggered to fail might help find weird error paths like this.
Comment 21•13 years ago
|
||
This is certainly fx10 esr material, but I think the fx11 ship has sailed at this point. If we do end up doing a 11.0.1 or what not this one could maybe ride along, but I wouldn't re-spin for this at this late of a time.
Reliably exploiting this seems significantly less than trivial, even if people figure out how to reliably trigger this particular error, you still need to predict when the object in question gets prematurely destroyed, then write to the memory that held this object, and then wait for the existing references to this object to be touched, either by the CC, or some other code.
Assignee | ||
Updated•13 years ago
|
Comment 22•13 years ago
|
||
Comment on attachment 604590 [details] [diff] [review]
fix for 13
I'd love a comment here, but I can understand if we want to keep the commit low-profile. And slim wrappers are going away in the near future, so this will probably stay in my memory until then.
Sorry for the delay. r=bholley
Attachment #604590 -
Flags: review?(bobbyholley+bmo) → review+
Comment 23•13 years ago
|
||
We should land this toward the middle-end of the Fx12 cycle, and then land everywhere.
Assignee | ||
Comment 24•13 years ago
|
||
You mean the current Firefox 14 cycle? Sounds good to me.
For my own reference, that's the very end of March.
Comment 25•13 years ago
|
||
[Triage Comment]
Removing the tracking on this, leaving status-firefox-esr10 affected, we'll track it for the right esr release once it's fixed.
tracking-firefox-esr10:
12+ → ---
Assignee | ||
Comment 26•13 years ago
|
||
Comment on attachment 604576 [details] [diff] [review]
fix an error path, patch against Fx11 and 12
[Approval Request Comment]
Regression caused by (bug #): 580128
User impact if declined: users of 12 and 13 may be exposed to vulnerability once this lands
Testing completed (on m-c, etc.): Passed a try run. I can land this on m-c a few days before I land it elsewhere to make sure it doesn't go crazy, but I'd prefer to have some kind of conditional approval before I do that, if possible.
Risk to taking this patch (and alternatives if risky): Low. This patch only alters the behavior of a very rarely taken path that will probably crash anyways.
String changes made by this patch: none
Attachment #604576 -
Flags: approval-mozilla-beta?
Attachment #604576 -
Flags: approval-mozilla-aurora?
Comment 27•13 years ago
|
||
Comment on attachment 604576 [details] [diff] [review]
fix an error path, patch against Fx11 and 12
[Triage Comment]
The process for landing sg:crit bugs hasn't changed since the FF11 security post-mortem, although other landing strategies were discussed. Unless this bug has special circumstances, let's land on m-c and then nominate once it's had a couple of days to bake for stability's sake.
Attachment #604576 -
Flags: approval-mozilla-beta?
Attachment #604576 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 28•13 years ago
|
||
Target Milestone: --- → mozilla14
Assignee | ||
Comment 29•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Updated•13 years ago
|
tracking-firefox-esr10:
--- → 14+
Assignee | ||
Comment 30•13 years ago
|
||
Comment on attachment 604590 [details] [diff] [review]
fix for 13
[Approval Request Comment]
Regression caused by (bug #): bug 580128
User impact if declined: security vulnerability. Apparently this may become easy to find with fuzzer tool release in April (see comment 19).
Testing completed (on m-c, etc.): it has been on m-c for a few days
Risk to taking this patch (and alternatives if risky): should be low. This only alters the behavior of a rarely taken control path that is currently broken to the point where it will probably always crash it we hit it.
String changes made by this patch: none
Attachment #604590 -
Flags: approval-mozilla-esr10?
Attachment #604590 -
Flags: approval-mozilla-beta?
Attachment #604590 -
Flags: approval-mozilla-aurora?
Comment 31•13 years ago
|
||
Comment on attachment 604590 [details] [diff] [review]
fix for 13
[Triage Comment]
Low risk sg:crit fix - approved for all branches.
Attachment #604590 -
Flags: approval-mozilla-esr10?
Attachment #604590 -
Flags: approval-mozilla-esr10+
Attachment #604590 -
Flags: approval-mozilla-beta?
Attachment #604590 -
Flags: approval-mozilla-beta+
Attachment #604590 -
Flags: approval-mozilla-aurora?
Attachment #604590 -
Flags: approval-mozilla-aurora+
Updated•13 years ago
|
Assignee | ||
Comment 32•13 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/8ea9983dc063
https://hg.mozilla.org/releases/mozilla-beta/rev/92b07ee2bfc2
https://hg.mozilla.org/releases/mozilla-esr10/rev/8772f3d527ac
Target Milestone: mozilla14 → mozilla12
Updated•13 years ago
|
Updated•13 years ago
|
Group: core-security
Comment 33•13 years ago
|
||
(In reply to Jesse Ruderman from comment #0)
> Created attachment 580173 [details]
> testcase
>
> 1. Load the testcase
> 2. Click the button
> 3. Close the testcase
> 4. Force a CC (e.g. by quitting Firefox or using about:memory)
>
> Result:
>
> ###!!! ASSERTION: Fault in cycle collector: traversed refs exceed refcount
> (ptr: 127319dc0)
> : 'Not Reached', file
> /Users/jruderman/trees/mozilla-central/xpcom/base/nsCycleCollector.cpp, line
> 1235
Crashes after step 2 on ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-18-mozilla-beta-debug on Win 7, Ubuntu 12.04 and Mac OS X 10.6:
Assertion failure: !cx->isExceptionPending(), at e:\builds\moz2_slave\m-beta-w32
-dbg\build\js\src\jscntxtinlines.h:317
https://crash-stats.mozilla.com/report/pending/6a95f8f1-1b4c-483b-b125-999632120521
Reporter | ||
Comment 34•13 years ago
|
||
Paul, can you file a new bug report for that? Do you know if it's a regression that happened after this bug's patch landed?
Comment 35•13 years ago
|
||
Sure Jesse. Bug 757004 submitted. I'll look for a regression range and post there the results
Comment 36•13 years ago
|
||
Based on comment 34 marking this as verified. No "ASSERTION: Fault in cycle collector: traversed refs exceed refcount" occurs in FF 13, 14 and ESR 10.0.5.
Status: RESOLVED → VERIFIED
Whiteboard: [sg:critical][qa+] → [sg:critical][qa!]
Comment 37•13 years ago
|
||
Executing the steps in comment 0 and closing Firefox after --> bug 404828
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-18-mozilla-central-debug
You need to log in
before you can comment on or make changes to this bug.
Description
•