Last Comment Bug 738769 - "ASSERTION: must be purple" with setUserData
: "ASSERTION: must be purple" with setUserData
Status: VERIFIED FIXED
[sg:critical][qa+][advisory-tracking+]
: assertion, crash, regression, testcase
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla13
Assigned To: Andrew McCreight [:mccr8]
:
Mentors:
Depends on:
Blocks: 326633 728460
  Show dependency treegraph
 
Reported: 2012-03-23 13:03 PDT by Jesse Ruderman
Modified: 2012-06-08 00:23 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
verified
verified
unaffected


Attachments
testcase (crashes after about 10 seconds) (155 bytes, text/html)
2012-03-23 13:03 PDT, Jesse Ruderman
no flags Details
stack trace (8.35 KB, text/plain)
2012-03-23 13:03 PDT, Jesse Ruderman
no flags Details
ensure that participants are purple before unpurpling them (10.00 KB, patch)
2012-03-23 17:19 PDT, Andrew McCreight [:mccr8]
bugs: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review

Description Jesse Ruderman 2012-03-23 13:03:00 PDT
Created attachment 608837 [details]
testcase (crashes after about 10 seconds)
Comment 1 Jesse Ruderman 2012-03-23 13:03:16 PDT
Created attachment 608838 [details]
stack trace
Comment 2 Andrew McCreight [:mccr8] 2012-03-23 13:23:09 PDT
Thanks!
Comment 3 Jesse Ruderman 2012-03-23 15:02:28 PDT
Also crashes in opt nightly: bp-8d859490-dc5a-497e-b3d8-60cd52120323
Comment 4 Andrew McCreight [:mccr8] 2012-03-23 15:19:20 PDT
Yeah, not surprising.  You end up trying to deref a tagged refcount.

This is a regression from bug 728460.  ChildFinder::NoteXPCOMChild calls CanSkip on its children.  CanSkip can end up removing its children from the purple buffer.  An XPCVariant can being a child of its child, and gets removed from the purple buffer during a CanSkip.

Here's the specific way this happens in this case (the purple node we're examining is the XPCVariant):

0x124e02680 [rc=1] XPCVariant
> 0x118032430 mJSVal
> 0x1250da530 mData

0x1250da530 [rc=5] nsGenericElement (xhtml) html file:///Users/amccreight/mz/tests/purple.html
> 0x120cf1670 mNodeInfo
> 0x124e02680 [user data (or handler)]
> 0x1250da5e0 mAttrsAndChildren[i]
> 0x12507c790 mAttrsAndChildren[i]

XPCVariant does a CanSkip on the nsGenericElement, which is in a currently visible document, so it calls MarkNodeChildren on the XPCVariant, which unpurples it.

The immediate fix is to change that CanSkip to a CanSkipThis or CanSkipInCC, but I think really we should consider changing UnmarkPurple to UnmarkIfPurple or something.  That extra branch won't kill us in CC cleanup code and will add a layer of defense.  We can probably also then tear out some of the weird special casing code.

I think this was hard to reproduce because we only run the ChildFinder some of the time, in the last cleanup slice before a CC.  I filed bug 738813 to allow control of this in fuzzing.  Always-on is probably more useful, but I guess there could be errors that are covered up by running the child finder before the CC, so we might as well support always-off as well.
Comment 5 Al Billings [:abillings] 2012-03-23 15:38:50 PDT
Also crashes in opt nightly: bp-8d859490-dc5a-497e-b3d8-60cd52120323

Takes about 10 seconds. Seems reliable.
Comment 6 Andrew McCreight [:mccr8] 2012-03-23 16:19:50 PDT
In terms of vulnerability, what will happen here is something like this:

x = r.mFoo->mBar
r.mFoo = 2 * x + 1

r.mFoo is double a ref count, plus 1.  This is going to be an unaligned load of a 32-bit value (the low bit is always 1), but I don't know offhand if that will always crash or work on x86.

In general, an attacker can control the ref count of an object, but I'm not sure how easy it would be to make the ref count large enough that (times two plus one) it is an address that can actually be read from without crashing.

In this particular case that Jesse found, the object in question is an XPCVariant.  I think that an XPCVariant can only be owned by a single object, capping the ref count at 1.  If that is the case, then the only thing an attacker can do is force a read at address 3.  But I'm not sure, and maybe they can get a higher ref count under some circumstances.  It is also possible that there are other cases that we have not found.
Comment 7 Andrew McCreight [:mccr8] 2012-03-23 17:15:11 PDT
I audited all calls to nsCycleCollectingAutoRefCnt::RemovePurple and nsCycleCollectingAutoRefCnt::unmarkPurple and with my patch they should be okay.
Comment 8 Andrew McCreight [:mccr8] 2012-03-23 17:16:07 PDT
By "okay" I mean they are all tested for being purple, and there's no place in between the check and the unmarking that could unpurple them, even assuming the CanSkip* callbacks unpurple arbitrary things.
Comment 9 Andrew McCreight [:mccr8] 2012-03-23 17:19:05 PDT
Created attachment 608926 [details] [diff] [review]
ensure that participants are purple before unpurpling them
Comment 10 Andrew McCreight [:mccr8] 2012-03-23 17:22:40 PDT
Very mechanical patch.  Most of it is just renaming UnmarkPurple to UnmarkIfPurple.  The only interesting part is the very end in nsISupportsImpl.h, where I add a purple check.
Comment 11 Andrew McCreight [:mccr8] 2012-03-24 08:18:24 PDT
Comment on attachment 608926 [details] [diff] [review]
ensure that participants are purple before unpurpling them

Try run looked good.
Comment 12 Andrew McCreight [:mccr8] 2012-03-26 09:59:36 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f01aac728984
Comment 13 Ed Morley [:emorley] 2012-03-27 05:38:45 PDT
https://hg.mozilla.org/mozilla-central/rev/f01aac728984
Comment 14 Andrew McCreight [:mccr8] 2012-03-28 12:01:33 PDT
Comment on attachment 608926 [details] [diff] [review]
ensure that participants are purple before unpurpling them

[Approval Request Comment]
Regression caused by (bug #): 728460
User impact if declined: crashes in some situations, potentially security problems
Testing completed (on m-c, etc.): it has been on m-c for a day or two
Risk to taking this patch (and alternatives if risky): should be low. All it does it turn a debug-only assertion into a dynamic check, plus a bunch of renaming.
String changes made by this patch: none
Comment 15 Alex Keybl [:akeybl] 2012-03-28 14:58:02 PDT
Comment on attachment 608926 [details] [diff] [review]
ensure that participants are purple before unpurpling them

[Triage Comment]
sg:crit and low risk, approved for Beta 12 and Aurora 13.
Comment 16 Andrew McCreight [:mccr8] 2012-03-30 09:42:38 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/28722c954913
Comment 17 Paul Silaghi, QA [:pauly] 2012-05-21 02:28:20 PDT
No crashes loading the testcase on FF 13b4 and FF 2012-05-18-mozilla-beta-debug on Win 7, Ubuntu 12.04 and Mac OS X 10.6. Marking verified.
Comment 18 Paul Silaghi, QA [:pauly] 2012-06-08 00:23:15 PDT
No crashes loading the testcase on FF 14 2012-06-07-mozilla-beta-debug on Win 7, Ubuntu 12.04 and Mac OS X 10.6. Marking verified.

I was able to see the crash on Nightly 2012-03-23-mozilla-central-debug.

Note You need to log in before you can comment on or make changes to this bug.