Closed Bug 1277908 Opened 9 years ago Closed 9 years ago

Use-after-poison in NS_NewStyleContext (with animation)

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- unaffected
firefox48 --- unaffected
firefox49 + fixed
firefox50 + fixed

People

(Reporter: truber, Assigned: birtles)

References

Details

(5 keywords, Whiteboard: [post-critsmash-triage])

Attachments

(6 files, 2 obsolete files)

Attached file q4-reduce-truber.html
Attached testcase causes: ==30545==ERROR: AddressSanitizer: use-after-poison on address 0x6250005f1a98 at pc 0x7f571bccde75 bp 0x7fffad981770 sp 0x7fffad981768 READ of size 4 at 0x6250005f1a98 thread T0
Attached file output.txt
Group: core-security → dom-core-security
An ASAN "use-after-poison" is not "frame-poisoning", it's a use-after-free. Because we have long-lived arenas where we manage allocations ourselves we added ASAN poisoning hooks to help detect misuse. "frame-poisoning" is, we think, just a safe crash (DoS). It's a mitigation specific to the nsFrame tree and we tag these separately just in case someone does find a bypass so we know how much trouble we're in. http://robert.ocallahan.org/2010/10/mitigating-dangling-pointer-bugs-using_15.html
Group: dom-core-security → layout-core-security
We destroy nsRuleNode which is allocated in the presshell arena and the presshell is alive when we use it later, so this actually is mitigated by frame poisoning.
Attached file stack
Here's the stack when we destroy the 'aVisitedRuleNode' here: http://mxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp?rev=8f803cb0eac1#918 and then use it later in that function (nsStyleSet::GetContext). It's a EffectCompositor::UpdateEffectProperties call in a nested nsStyleSet::GetContext that seems to be destructive.
Sprinkling a few kungfuDeathGrips in nsStyleSet::GetContext seems to fix it, fwiw.
Component: Layout → CSS Parsing and Computation
Blocks: 1281164
Sure. I don't quite understand what nsRuleWalker does in nsStyleSet::ResolveStyleByAddingRules, but ForwardOnPossiblyCSSRule seems unnecessary in case of calculating the change hints for optimization of paint-only animations. I will look into further. Leaving ni to me.
Assignee: nobody → hiikezoe
Status: NEW → ASSIGNED
attachment 8759744 [details] caused a crash on linux64 debug locally, and the crash can be stopped by skipping call of CalcStyleDifference in case both of two nsStyleContext are the same in BuildSegmentsFromValueEntries. I pushed a try with this fix now. https://treeherder.mozilla.org/#/jobs?repo=try&revision=96ba26e91a75
Flags: needinfo?(hiikezoe)
I made a mistake when I checked the fix. It does not fix the crash at all.
Flags: needinfo?(hiikezoe)
There is a fundamental problem lying in our animation styling code. In fact with attachment 8759744 [details] I can cause another crash by reducing kGCInterval in nsStyleSet.h to 10 even if a key changeset[1] which causes this bug is backed out. I think the problem is bug 1279819. [1] https://hg.mozilla.org/integration/mozilla-inbound/rev/2a3e5b026e23 That's being said, for this particular test case we can move calculation of style difference between each keyframes after checking animation properties at [2] to avoid this use-after-poison caused by bug 1166500. We don't actually need to update change hints if there are no properties change. [2] https://hg.mozilla.org/mozilla-central/file/c2da34d96746/dom/animation/KeyframeEffect.cpp#l522 A try with the change. The change also fixes the test case in bug 1281164. https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f3ebf00b30e
Flags: needinfo?(hiikezoe)
Brian, I think you've been investigating bug 1279819, which is highly related to this bug and bug 1281164, and you've noticed that a key factor is the number of nested calls of nsStyleSet::GetContext and kGCInterval. I think a proper solution will be introduced in bug 1279819, but it's worth to reduce the number of nested calls of nsStyleSet::GetContext. This patch moves calculation of style difference, which leads to the nested calls, after checking mProperties of KeyframeEffectReadOnly equivalence, as a result test cases of this bug and bug 1281164 no longer crash because the number of the nested calls reduced. Brian, what do you think? Or should we make this bug depend on bug 1279819?
Attachment #8765302 - Flags: review?(bbirtles)
Comment on attachment 8765302 [details] [diff] [review] Don't calculate style difference if there are no properties change Review of attachment 8765302 [details] [diff] [review]: ----------------------------------------------------------------- I think this won't be necessary once we fix bug 1279819 but it seems like an improvement none-the-less. It avoids doing so unnecessary work when nothing has changed and avoids storing the change hint on each segment. I think we should wait until bug 1279819 has landed and then land this without the tests (the tests should possibly be included with bug 1279819).
Attachment #8765302 - Flags: review?(bbirtles) → review+
Also, I'd like to know if we're actually freeing something earlier that we ought to and whether we should add the kungFuDeathGrip mentioned in comment 5.
Can someone CC me on bug 1279819 please? It's nice that we can avoid doing the calls that makes this testcase crash, but it seems rather fragile to depend on that. I think we need to fix this crash as if the current patch didn't exist, to make sure we don't crash in the future when some other path leads to nested GetContext calls.
(In reply to Mats Palmgren (:mats) from comment #14) > Can someone CC me on bug 1279819 please? > > It's nice that we can avoid doing the calls that makes this testcase crash, > but it seems rather fragile to depend on that. I think we need to fix this > crash > as if the current patch didn't exist, to make sure we don't crash in the > future > when some other path leads to nested GetContext calls. Yes, exactly. I think hiro was originally expecting we'd find some very generic method for catching this sort of situation in bug 1279819. It turns out that's not the case, however. That's why I think we need to understand exactly what is going on and add the necessary kungFuDeathGrip(s) if that's the correct solution to prevent this bug for other cases of nested GetContext calls.
mats, could you please post the kungFuDeathGrip fix here? We'd like to take the fix here in this bug.
Flags: needinfo?(mats)
It was merely an observation and I didn't suggest it to be a real fix so I didn't keep the patch, sorry. I just converted a few raw pointers to nsRefPtr though, nothing fancy. The hard part to fix this bug for real is to analyze all the callers of nsStyleSet::GetContext, and their callers and so on recursively, to make sure they don't use any style data (and other data too?) after calling anything that reach nsStyleSet::GetContext, because it may have been deleted. And to do that we need to first determine precisely which kind of data structures GetContext might destroy. Only after we have done all that can we decide whether sprinkling kungFuDeathGrips in GetContext (and possibly other methods) is a good enough fix.
Flags: needinfo?(mats)
Or rather, we should analyze all calls to EffectCompositor::UpdateEffectProperties, or whatever it calls that is destructive. So it may be a larger set than just callers of nsStyleSet::GetContext.
I did some analysis of this locally and have a rough idea of what's going on. Unfortunately I was using an opt build and, althought it has debugging symbols enabled, ASAN doesn't seem to like them--and rr seemed to get stuck when trying to go backwards so there are still quite a few bits I'm unclear about. Basically, we first create the rule node in question in ReplaceAnimationRule where we call n->Transition(aNewAnimRule, SheetType::Animation, false); This creates a new nsRuleNode with ref count zero. We call RuleNodeUnused(this, aMayGC = false) to indicate that there is a currently unused nsRuleNode in existance. Shortly after we pass this to a (shallow-nested) call to GetContext() where we create a style context for the visited style and add-ref the nsRuleNode and mark it as being used. So far, so good. In what appears to be a subsequent tick, we go through and destroy a bunch of frames. (Perhaps due to this part of the test: "while(root.firstChild) { root.removeChild(root.firstChild); }"). At that point we drop the ref count of the nsRuleNode to zero (since the corresponding nsStyleContext is destroyed) and mark is as no longer being used. However, at that point we decide not to run GC (I think we were failing the gc-interval check but all those variables were optimized out so I couldn't really tell). However, when we subsequently go to process restyles we end up with a nested call to GetContext. For the outer call to GetContext, we call NS_NewStyleContext to create the style context for aRuleNode.[1] However, at this point we have already stored aVisitedRuleNode (I've yet to work out where this comes from--I'll look into that next). The call to NS_NewStyleContext generates a subsequent call to GetContext which then triggers a call to UpdateProperties.[2] Within UpdateProperties we end up calling BuildSegmentsFromValueEntries which creates temporary style contexts that are destroyed when each iteration of the loop finishes. The destructor for these style contexts in turn triggers calls to nsRuleNode::Release which, this time, decides to actually perform GC and cleans up the unused nsRuleNodes, including the rule node we have referenced in a local variable further up the stack. Later on when we go to call NS_NewStyleContext from the outer call to GetContext for aVisitedRuleNode we end up referencing the poisoned memory. [1] http://searchfox.org/mozilla-central/rev/ffcc65db736dd64dc3cbc49592f9edac77bf65ad/layout/style/nsStyleSet.cpp#911 [2] http://searchfox.org/mozilla-central/rev/ffcc65db736dd64dc3cbc49592f9edac77bf65ad/layout/style/nsStyleSet.cpp#944
I didn't get too far with this today. I spent most of the time trying to get rr to behave but the rr tests reset the VM so I lost the few notes I wrote up. I got some help from roc but I suspect it's something incompatible in my VM environment. Basically though I can see that the visited rule node that gets destroyed is passed into GetContext from nsStyleSet::ResolveStyleFor[1] where it appears to be set by calling GetCurrentNode from the RuleWalker set up there. I don't know that code so maybe this is normal but I'm surprised that we can reached the destroyed nsRuleNode from mRuleTree. [1] https://dxr.mozilla.org/mozilla-central/source/layout/style/nsStyleSet.cpp#1321
Spent most of the day setting up Linux and rr on a separate laptop (i.e. not under a VM) so I didn't get far with this but I confirmed that we're getting the expired (but not yet GC-ed) rule node here: #0 nsRuleNode::Transition (this=0x7f0ec8d3c848, aRule=0x7f0ec5320100, aLevel=mozilla::SheetType::Animation, aIsImportantRule=false) at /home/brian/src/layout/style/nsRuleNode.cpp:1595 #1 0x00007f0ee7900c68 in nsRuleWalker::DoForward (this=0x7ffcdba5edb0, aRule=0x7f0ec5320100) at /home/brian/src/layout/style/nsRuleWalker.h:31 #2 0x00007f0ee7900d50 in nsRuleWalker::Forward (this=0x7ffcdba5edb0, aRule=0x7f0ec5320100) at /home/brian/src/layout/style/nsRuleWalker.h:39 #3 0x00007f0ee78eed86 in mozilla::EffectCompositor::AnimationStyleRuleProcessor::RulesMatching (this=0x7f0ec60da2b0, aData=0x7ffcdba5edd0) at /home/brian/src/dom/animation/EffectCompositor.cpp:803 #4 0x00007f0ee9eec367 in EnumRulesMatching<ElementRuleProcessorData> (aProcessor=0x7f0ec60da2b0, aData=0x7ffcdba5edd0) at /home/brian/src/layout/style/nsStyleSet.cpp:779 #5 0x00007f0ee9ece6d9 in nsStyleSet::FileRules (this=0x7f0ec8eee280, aCollectorFunc=0x7f0ee9eec331 <EnumRulesMatching<ElementRuleProcessorData>(nsIStyleRuleProcessor*, void*)>, aData=0x7ffcdba5edd0, aElement=0x7f0ec6ed3340, aRuleWalker=0x7ffcdba5edb0) at /home/brian/src/layout/style/nsStyleSet.cpp:1163 #6 0x00007f0ee9ecf2a8 in nsStyleSet::ResolveStyleFor (this=0x7f0ec8eee280, aElement=0x7f0ec6ed3340, aParentContext=0x7f0ec534dc10, aTreeMatchContext=...) at /home/brian/src/layout/style/nsStyleSet.cpp:1343 #7 0x00007f0ee9fe3a9c in mozilla::StyleSetHandle::Ptr::ResolveStyleFor (this=0x7ffcdba5eeb0, aElement=0x7f0ec6ed3340, aParentContext=0x7f0ec534dc10, aTreeMatchContext=...) at /home/brian/src/obj-debug/dist/include/mozilla/StyleSetHandleInlines.h:93 #8 0x00007f0ee9fa3396 in nsCSSFrameConstructor::ResolveStyleContext (this=0x7f0ec71c24e0, aParentStyleContext=0x7f0ec534dc10, aContent=0x7f0ec6ed3340, aState=0x7ffcdba5f630) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:4977 #9 0x00007f0ee9fa31c5 in nsCSSFrameConstructor::ResolveStyleContext (this=0x7f0ec71c24e0, aParentFrame=0x7f0ec5334570, aContainer=0x7f0ec60f7740, aChild=0x7f0ec6ed3340, aState=0x7ffcdba5f630) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:4946 #10 0x00007f0ee9fa32b2 in nsCSSFrameConstructor::ResolveStyleContext (this=0x7f0ec71c24e0, aInsertion=..., aChild=0x7f0ec6ed3340, aState=0x7ffcdba5f630) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:4963 #11 0x00007f0ee9fa4622 in nsCSSFrameConstructor::AddFrameConstructionItems (this=0x7f0ec71c24e0, aState=..., aContent=0x7f0ec6ed3340, aSuppressWhiteSpaceOptimizations=false, aInsertion=..., aItems=...) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:5592 #12 0x00007f0ee9fb2987 in nsCSSFrameConstructor::ProcessChildren (this=0x7f0ec71c24e0, aState=..., aContent=0x7f0ec60f7740, aStyleContext=0x7f0ec534dc10, aFrame=0x7f0ec5334570, aCanHaveGeneratedContent=true, aFrameItems=..., aAllowBlockStyles=false, aPendingBinding=0x0, aPossiblyLeafFrame=0x7f0ec5334570) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:10704 #13 0x00007f0ee9f9b18d in nsCSSFrameConstructor::ConstructTable (this=0x7f0ec71c24e0, aState=..., aItem=..., aParentFrame=0x7f0ec60f1118, aDisplay=0x7f0ec534e988, aFrameItems=...) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:2131 #14 0x00007f0ee9f9cb58 in nsCSSFrameConstructor::ConstructDocElementFrame (this=0x7f0ec71c24e0, aDocElement=0x7f0ec60f7740, aFrameState=0x7f0ec60fa640) at /home/brian/src/layout/base/nsCSSFrameConstructor.cpp:2610 #15 0x00007f0ee9faa980 in nsCSSFrameConstructor::ContentRangeInserted (this=0x7f0ec71c24e0, aContainer=0x0, So it looks like the current node of the rule walker has an mChildren.asList member pointing to the expired rule node. I'm not sure if that member failed to get cleared or if the original current node itself also is expired. Will investigate further tomorrow.
So it appears that we have something like this: 1. At some point we flush pending restyles and recreate frames. I haven't looked into why but it doesn't really matter. There are a bunch of reasons we might decide to flush pending restyles. 2. In the process we destroy nsStyleContexts kept alive by the corresponding frames (each nsIFrame has an mStyleContext member it keeps an owning reference to. Furthermore, style contexts keep an owning reference to their parents so we reach a point where we have released the references from frames and then we unwind from children up). 3. That calls Release on the nsRuleNode used for the style context's data (specifically the OwningStyleContextSource mSource member) 4. The nsRuleNode has mRefCnt == 0 so we call: mPresContext->StyleSet()->AsGecko()->RuleNodeUnused(this, aMayGC = true) 5. However, inside RuleNodeUnused mUnusedRuleNodeCount = 109 but kGCInterval is 300 so we don't do GC yet 6. As a result nsRuleNode's dtor is not called. Normally nsRuleNode's dtor would call mParent->RemoveChild(this) and remove itself from the parent but that doesn't happen yet. As a result the no-longer-referenced nsRuleNode remains in the rule tree. 7. Having destroyed the frames, we go to recreate them and call nsCSSFrameConstructor::ContentInserted and resolve a new style context which brings us to nsStyleSet::ResolveStyleFor. 8. Initially we do the regular rule matching to get the rule node for the non-visited style 9. We're dealing with an anchor element (or at least I think that's why) so we also do rule matching to get the rule node for the visited style 10. That brings us to the stack in comment 21 where we get back the the rule node for the newly appended (anchor) element 11. We have an animation rule for this element already because although we destroyed the element's frame, the animation rules are attached to *elements* and hence survive the re-framing process (which seems fairly reasonable, I guess). 12. When we call Forward on the rule walker and pass in the animation style rule it looks up the children on its current node and finds out that the first (only) child matches the passed-in rule 13. This is not so bad except that what happens next is we call GetContext and pass the two rule nodes (one for non-visited style, one for visited style). 14. Inside GetContext we create the two new style contexts. 15. In the process of creating the first style context we get into this situation where we make a nested call to GetContext (due to a quirk, possibly a bug, in the way we handle rem units) and that triggers some animation code that calls UpdateProperties that triggers creation of a couple of temporary style contexts. 16. When we destroy those temporary style contexts we happen to get another call to RuleNodeUnused and this time mUnusedRuleNodeCount is greater than kGCInterval and so we do a sweep of the rule node tree and clean up the visited rule node we passed in 13. 17. Back up the stack we go to create the second style context, the one for visited style, and discover that the rule node we have a hold on has been destroyed.
(In reply to Brian Birtles (:birtles) from comment #22) > 15. In the process of creating the first style context we get into this > situation where we make a nested call to GetContext (due to a quirk, > possibly a bug, in the way we handle rem units) and that triggers some You may already know, the trigger is ResolveStyleFor() for docElement in CalcLengthWith. http://hg.mozilla.org/mozilla-central/file/09221c72fcb0/layout/style/nsRuleNode.cpp#l517
There are probably about a dozen ways to fix this (including avoiding the nested calls to GetContext -- which is something I think we should eventually do in a separate bug) but probably the simplest and most correct thing is just to hold a reference to aVisitedRuleNode between when we create the first style context and the second. I'll have a look at doing that tomorrow.
Requesting review from dholbert since mats' Bugzilla status says he's on vacation. Hi Daniel, for some background comment 22 should explain what's happening here. A note about the {} syntax which isn't common in our codebase yet. Our coding style currently says: "In general, initialize variables with nsFoo aFoo = bFoo and not nsFoo aFoo(bFoo)."[1] [1] https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#CC_practices I believe this is based on avoiding C++'s most vexing parse. One of the key benefits of C++11's uniform initialization is avoiding this ambiguity and appears to be best practice for this situation.[2] [2] https://herbsutter.com/2013/05/09/gotw-1-solution/ As for compiler support, I believe this comes under the "Initializer lists" item on our "Using C++ in Mozilla code" chart[2] which is marked as ok to use (and I believe is the case all the way up to beta which we'll need to uplift this to). [3] https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code
Attachment #8769013 - Flags: review?(dholbert)
Comment on attachment 8769013 [details] [diff] [review] Keep visited rule node alive while creating non-visited style context Review of attachment 8769013 [details] [diff] [review]: ----------------------------------------------------------------- This seems fine; r=me. Thanks for the explanation of the {} syntax.
Attachment #8769013 - Flags: review?(dholbert) → review+
Oops, need to rebase this on central (I previously made the patch on top of beta). Carrying r=dholbert.
Attachment #8769013 - Attachment is obsolete: true
Attachment #8769037 - Flags: review+
Approval Request Comment [Feature/regressing bug #]: bug 1166500 [User impact if declined]: Possible instability due to accessing poisoned memory. [Describe test coverage new/current, TreeHerder]: This has only just landed on m-i [1] but should get put through its paces over the next few hours. [Risks and why]: Minimal. It's a one-liner that ensures that an object remains alive until the point where it is used. [String/UUID change made/needed]: None [1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a91607dcab11acc0cda76066b995b4dada139d2d
Attachment #8769055 - Flags: approval-mozilla-aurora?
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Group: layout-core-security → core-security-release
Tracking 49/50+ for this crash regression fix.
Comment on attachment 8769055 [details] [diff] [review] Keep visited rule node alive while creating non-visited style context (backport for Aurora) Crash fix, new regression from 49, let's uplift to aurora.
Attachment #8769055 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Hiro, did you already file a separate bug for landing attachment 8765302 [details] [diff] [review]?
Flags: needinfo?(hiikezoe)
Not yet. Though this bug is not yet in public, can we file the bug now?
Flags: needinfo?(hiikezoe)
As discussed, I think this is fine since this bug has been fixed and shipped on all affected branches. This bug can be made public but I don't have the privileges to do so. I assume someone from the security team will do that in time.
Thanks, filed bug 1288586.
Attachment #8765302 - Attachment is obsolete: true
Assignee: hiikezoe → bbirtles
Attached patch Crash testsSplinter Review
Including two test cases in comment 0 and bug1281164 comment 0.
Attachment #8774899 - Flags: review?(bbirtles)
Comment on attachment 8774899 [details] [diff] [review] Crash tests Review of attachment 8774899 [details] [diff] [review]: ----------------------------------------------------------------- r=me assuming these fail without the fix from this bug
Attachment #8774899 - Flags: review?(bbirtles) → review+
(In reply to Brian Birtles (:birtles) from comment #40) > Comment on attachment 8774899 [details] [diff] [review] > Crash tests > > Review of attachment 8774899 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me assuming these fail without the fix from this bug Note that I confirmed now 1277908-2.html fails without below fixes, but 1277908-1.html does not fail at all on an inbound[1]. https://hg.mozilla.org/mozilla-central/rev/a91607dcab11 https://hg.mozilla.org/mozilla-central/rev/6cd25af34d5c [1] http://hg.mozilla.org/integration/mozilla-inbound/rev/6a5a741a2391 I guess some other changes fixed this crash.
Whiteboard: [post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: