Last Comment Bug 367243 - Crash [@ nsCounterList::SetScope] messing with counter-increment
: Crash [@ nsCounterList::SetScope] messing with counter-increment
Status: RESOLVED FIXED
[sg:critical?]
: crash, testcase, verified1.8.0.10, verified1.8.1.2
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- critical (vote)
: ---
Assigned To: Boris Zbarsky [:bz] (Out June 25-July 6)
: Hixie (not reading bugmail)
Mentors:
Depends on: 367906
Blocks: randomclasses
  Show dependency treegraph
 
Reported: 2007-01-17 06:04 PST by Jesse Ruderman
Modified: 2007-12-17 22:18 PST (History)
4 users (show)
jruderman: blocking1.9?
dveditz: blocking1.8.1.2+
dveditz: blocking1.8.0.10+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (usually crashes Firefox within 10 reloads) (552 bytes, text/html)
2007-01-17 06:04 PST, Jesse Ruderman
no flags Details
Patch rev. 1 (2.71 KB, patch)
2007-01-17 18:36 PST, Mats Palmgren (:mats)
no flags Details | Diff | Review
Patch rev. 2 (43.70 KB, patch)
2007-01-18 21:04 PST, Mats Palmgren (:mats)
bzbarsky: review-
Details | Diff | Review
More like this (4.84 KB, patch)
2007-01-18 21:36 PST, Boris Zbarsky [:bz] (Out June 25-July 6)
mats: review+
dbaron: superreview+
jaymoz: approval1.8.1.2+
jaymoz: approval1.8.0.10+
Details | Diff | Review

Description Jesse Ruderman 2007-01-17 06:04:54 PST
Created attachment 251784 [details]
testcase (usually crashes Firefox within 10 reloads)

Firefox crashes due to touching memory with a random address, so I'm filing this bug as security-sensitive.  Here's the top of a stack trace from an opt build:

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0x48000084 <-- random

Thread 0 Crashed:
0  nsCounterList::SetScope
1  nsCounterManager::AddResetOrIncrement
2  nsCounterManager::AddCounterResetsAndIncrements
3  nsCSSFrameConstructor::InitAndRestoreFrame
4  nsCSSFrameConstructor::ConstructBlock
Comment 1 Mats Palmgren (:mats) 2007-01-17 18:34:08 PST
While processing restyles we batch changes to counter lists:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/base/nsCSSFrameConstructor.cpp&rev=1.1296&root=/cvsroot&mark=12914,12920,12927#12885
When a frame is destroyed it's notified to nsCounterManager::DestroyNodesFor
which removes all nodes referring to the frame, but it leaves all the
mScope* pointers of the nodes intact and marks the list "dirty".
(I think this is a regression from bug 317948.)

The crash occurs when accesing a dangling mScope* pointer (in SetScope
in this case). We need to call RecalcAll() for the counter list
before that happens. 
Comment 2 Mats Palmgren (:mats) 2007-01-17 18:36:11 PST
Created attachment 251857 [details] [diff] [review]
Patch rev. 1

This patch delayes the RecalcAll() for as long as possible.
These methods depends on the mScope* pointers:
nsCounterList::SetScope
nsCounterList::ValueBefore (only called from nsCounterNode::Calc()
    which should only be used when the list is not dirty)
nsCounterUseNode::GetText - which should only be called after it's been
    inserted into the counter list, which indirectly calls SetScope().

So, doing a RecalcAll() in SetScope() if the list is dirty should
be enough. A bit fragile perhaps.

An alternative is to do it directly for each frame destroyed:
  nsCounterList::DestroyNodesFor(nsIFrame* aFrame) {
   if (nsGenConList::DestroyNodesFor(aFrame))
     RecalcAll();
  }
Comment 3 Mats Palmgren (:mats) 2007-01-17 18:44:05 PST
The relevant code in nsCSSFrameConstructor/nsCounterManager is almost exactly
the same on the 1.8 branch and the bug should occur there too, but I haven't
been able to crash a branch build so far (and I assume Jesse didn't either).
Odd.
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-17 20:46:29 PST
Wouldn't a better fix be to not call SetScope() in nsCounterList::Insert() and not to call Calc() in nsCounterManager::AddResetOrIncrement if the list is dirty?    That would ensure that we don't end up doing the frametree-touching that RecalcAll does while we're reconstructing frames.
Comment 5 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-17 20:49:57 PST
For what it's worth, I'd be happy to write up that patch, but I can't reproduce the original crash, which would make it kinda hard to test the patch...

Oh, and SetScope() and Calc() could assert that the list is not dirty...
Comment 6 Mats Palmgren (:mats) 2007-01-18 21:02:19 PST
(In reply to comment #4)
> Wouldn't a better fix be to not call SetScope() in nsCounterList::Insert()
> not to call Calc() in nsCounterManager::AddResetOrIncrement if the list is
> dirty?  That would ensure that we don't end up doing the frametree-touching
> that RecalcAll does while we're reconstructing frames.

Yes, but we still need to guarantee that Calc() is called before
someone tries to get the value with GetText().
I don't think this solvable in general the way nsCSSFrameConstructor is written:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/base/nsCSSFrameConstructor.cpp&rev=1.1296&root=/cvsroot&mark=2026,2035#2011
Imagine the counter list is already dirty before we Insert the node...
We could make it work if could divide "frame construction" into a
"destroy phase" (after which we do ReCalcAll) and a "build phase",
and then make sure we only reach CreateGeneratedFrameFor() from the
"build phase". This seems hard though.

It's much easier to make the counter list resilient to frame destruction.
This code needs to be cleaned up anyway IMHO.
I don't like the way it's organized at the moment where all methods+data
are public and the knowledge about which order to call stuff, which methods
can be called while dirty etc has to be known by the caller. This is
better handled by the objects themselves.
Comment 7 Mats Palmgren (:mats) 2007-01-18 21:04:25 PST
Created attachment 252011 [details] [diff] [review]
Patch rev. 2

 * make "dirty list" an internal state that the caller does not
   need to worry about
 * when a node is Inserted, if the list is non-dirty and the node
   is last, then recalculate the node data directly and leave the list
   non-dirty, as before. Otherwise mark it dirty.
 * ReCalcAll only when we need a node value (and the list is dirty)
 * make nsCounterList::ReCalcAll deal with the fact that some pointers
   could be dangling
 * make all internal methods+data non-public
 * remove code in nsCSSFrameConstructor/nsPresShell that was only needed
   to track counter/quote dirtiness
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-18 21:21:23 PST
Comment on attachment 252011 [details] [diff] [review]
Patch rev. 2

> I don't think this solvable in general the way nsCSSFrameConstructor is
> written

That code is bogus anyway.  Again, it could be solved by making that chunk of code more like:

        counterList->Insert(node);
        PRBool dirty = counterList->IsDirty();
        if (!dirty) {
          if (counterList->IsLast(node)) {
            node->Calc(counterList);
            node->GetText(contentString);
          } else {
            counterList->SetDirty();
            CountersDirty();
            dirty = PR_TRUE;
          }
        }

        textPtr = &node->mText; // text node assigned below

This patch won't work for several reasons:

1)  When counter stuff changes we actually need to change it.  Right now that happens because either we Calc() nodes as we add them and then set the right text on the generated content up front, or because we go through and RecalcAll() as soon as the update is over; that sets new text on all the generated content and triggers the corresponding reflows and paints.  With this patch , this doesn't happen anymore, unless someone calls GetTextFor()... which generally no one will if we, say, remove a counter node from the list.  I do wonder how much you tested this with actual document mutations...

2)  The whole point of the dirty thing is to NOT recalc lots of times on batched changes.  Doing recalc on misplaced GetTextFor() calls instead of eliminating said calls is not the way to achieve that.
Comment 9 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-18 21:36:07 PST
Created attachment 252013 [details] [diff] [review]
More like this

I'm thinking something more like this.  We could also make Calc() bail if the list is dirty, but the frame constructor code needs to check anyway, as you noticed, because it wants to call GetText().

I do agree that it might make sense to replace the frame constructor code in question with an nsCounterManager method that takes a PRUnichar* and outputs an nsCOMPtr<nsIDOMCharacterData>* and an nsString, returning true/false the way AddResetOrIncrement() does to indicate to the caller whether RecalcAll() should be called.  Or rather, return nsresult so we can still handle OOM, and have three out params.

I can do that if you want.

Oh, and the fourth bullet from comment 7 is not an issue, since recalculating the whole list sets the mScope pointers on the first node directly, and then only uses the information from previous nodes to set the mScope pointers on a given node.  So it's safe to call no matter what the pointers look like.
Comment 10 Mats Palmgren (:mats) 2007-01-18 22:59:00 PST
(In reply to comment #8)
Oh, I didn't quite get that it was RecalcAll() that flushed out the
new text values... so, if I had kept the Begin/EndUpdate() stuff
the patch would be mostly ok I think ;)
Comment 11 Mats Palmgren (:mats) 2007-01-18 22:59:43 PST
(In reply to comment #9)
> I can do that if you want.

Not necessary, I think your patch is fine if we just want to fix
the crash with minimal changes.

> So it's safe to call no matter what the pointers look like.

Ok, I see that now.
Comment 12 Mats Palmgren (:mats) 2007-01-18 23:00:08 PST
Comment on attachment 252013 [details] [diff] [review]
More like this

r=mats
Comment 13 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-18 23:01:30 PST
For branch we definitely want minimal changes.  ;)

I do think we should write some mutation correctness  tests for this stuff...
Comment 14 Daniel Veditz [:dveditz] 2007-01-19 14:34:38 PST
This is coming in late for 1.8.1.2, but the patch looks small enough to hope for quick reviews and trunk testing that we'll try to take it.
Comment 15 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2007-01-19 15:33:02 PST
Comment on attachment 252013 [details] [diff] [review]
More like this

sr=dbaron
Comment 16 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-19 17:20:25 PST
Fixed.
Comment 17 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-19 17:20:51 PST
Comment on attachment 252013 [details] [diff] [review]
More like this

This makes sure we don't work with possibly-dangling pointers.  I think it should be pretty safe...
Comment 18 Jay Patel [:jay] 2007-01-22 14:59:30 PST
Comment on attachment 252013 [details] [diff] [review]
More like this

Approved for both branches, a=jay for drivers.
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-01-22 21:32:59 PST
Fixed on branches.
Comment 20 Jay Patel [:jay] 2007-02-09 17:42:10 PST
v.fixed on both branches with 1.5.0.10 and 2.0.0.2 RC1 builds.  No crash with Jesse's testcase in comment #0, reloaded about 30+ times with no hang or crash.
Comment 21 Jesse Ruderman 2007-12-17 22:18:00 PST
Crashtest checked in.

Note You need to log in before you can comment on or make changes to this bug.