Last Comment Bug 571995 - Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*)] with crazy -moz-column testcase
: Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned in...
Status: RESOLVED FIXED
[sg:critical][critsmash:investigating]
: crash, testcase
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla2.0b7
Assigned To: Mats Palmgren (vacation)
:
Mentors:
Depends on: 603490 603510 604843 605340
Blocks: randomstyles 485003
  Show dependency treegraph
 
Reported: 2010-06-14 15:22 PDT by Jesse Ruderman
Modified: 2013-05-18 18:14 PDT (History)
18 users (show)
mats: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
final+
.13+
.13-fixed
.16+
.16-fixed


Attachments
testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded) (451 bytes, application/xhtml+xml)
2010-06-14 15:22 PDT, Jesse Ruderman
no flags Details
crash report (39.08 KB, text/plain)
2010-06-14 15:22 PDT, Jesse Ruderman
no flags Details
to illustrate (1.11 KB, patch)
2010-06-15 09:12 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
Frame dump + stacks (29.21 KB, text/html)
2010-06-29 17:37 PDT, Mats Palmgren (vacation)
no flags Details
wip (3.47 KB, patch)
2010-06-30 21:22 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
fix v2 (3.07 KB, patch)
2010-07-01 20:14 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
Frame dumps + stack (40.42 KB, text/html)
2010-07-03 16:24 PDT, Mats Palmgren (vacation)
no flags Details
fix v3 (1.00 KB, patch)
2010-07-03 17:30 PDT, Mats Palmgren (vacation)
dveditz: review-
Details | Diff | Splinter Review
crashtest.diff (for trunk and branches) (1.05 KB, patch)
2010-08-04 18:31 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
fix v4 (3.27 KB, patch)
2010-08-04 19:10 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
fix v5 (1.90 KB, patch)
2010-10-04 15:30 PDT, Mats Palmgren (vacation)
roc: review+
Details | Diff | Splinter Review
Trace for bug 603490 (12.00 KB, text/html)
2010-10-12 14:14 PDT, Mats Palmgren (vacation)
no flags Details
Trace for bug 603510 (10.86 KB, text/html)
2010-10-12 14:14 PDT, Mats Palmgren (vacation)
no flags Details
fix v6 (12.34 KB, patch)
2010-10-13 11:06 PDT, Mats Palmgren (vacation)
roc: review+
Details | Diff | Splinter Review
fix v7 (22.02 KB, patch)
2010-10-14 20:24 PDT, Mats Palmgren (vacation)
no flags Details | Diff | Splinter Review
Patch for branches (27.27 KB, patch)
2010-11-09 10:46 PST, Mats Palmgren (vacation)
roc: review+
dveditz: approval1.9.2.13+
dveditz: approval1.9.1.16+
Details | Diff | Splinter Review

Description Jesse Ruderman 2010-06-14 15:22:43 PDT
Created attachment 451142 [details]
testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded)
Comment 1 Jesse Ruderman 2010-06-14 15:22:56 PDT
Created attachment 451143 [details]
crash report
Comment 2 Mats Palmgren (vacation) 2010-06-15 09:04:33 PDT
We're calling a virtual method on a deleted gfxTextRun.
Comment 3 Mats Palmgren (vacation) 2010-06-15 09:12:06 PDT
Created attachment 451301 [details] [diff] [review]
to illustrate

This is not a suggested fix/wallpaper - just to illustrate where the problem
starts.  In BuildTextRunsScanner::AssignTextRun we call f->ClearTextRun()
which deletes mTextRun if it doesn't have the TEXT_IN_CACHE bit:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#3759
The patch illustrates the deleted text run is still in use by one of the
mBreakSinks.
Comment 4 Mats Palmgren (vacation) 2010-06-15 09:25:33 PDT
Same crash occurs on 1.9.1 and 1.9.2
Comment 5 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-15 13:11:42 PDT
Mats can probably fix this.
Comment 6 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-28 19:36:59 PDT
Mats, do you know when you'll be able to work on this?
Comment 7 Mats Palmgren (vacation) 2010-06-29 17:37:02 PDT
Created attachment 455029 [details]
Frame dump + stacks

The first stack dump is from when the first "REASSIGNING MULTIFLOW ..."
warning occurs, note that aTextRun=0x163b680 (red) here.
The next stack is when we hit the warning again and this is the same
call where we have 0x163b680 in one of the BreakSinks.
Also note the "UnhookTextRunFromFrames 0x163b680 (red)" here,
I guess from "f->ClearTextRun();"
Comment 8 Mats Palmgren (vacation) 2010-06-30 08:08:31 PDT
roc, can you tell me what the MULTIFLOW warning means:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#1953
Comment 9 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-30 16:43:59 PDT
It means that we had to wipe out the textrun for a frame, deleting the entire textrun, and the textrun had text accumulated from multiple frames. This is not wrong, it can happen, but I added that warning for myself because if it happens too much we have serious performance problems.
Comment 10 Mats Palmgren (vacation) 2010-06-30 21:22:23 PDT
Created attachment 455365 [details] [diff] [review]
wip

Removing the BreakSink that refers to the old textrun and all BreakSinks
that follows it, then for each of those BreakSinks, remove it from
mLineBreaker and all TextItems that follows.

This fixes the crash for me locally (Linux) and pass reftests+crashtests.
I've submitted the patch to TryServer.
Comment 11 Mats Palmgren (vacation) 2010-06-30 21:44:00 PDT
Comment on attachment 455365 [details] [diff] [review]
wip

This doesn't fix the crash on 1.9.2/1.9.1 though... just moves it to some other place.  Removing *all* BreakSinks seems to fix it though... I'll investigate some more...
Comment 12 Mats Palmgren (vacation) 2010-07-01 20:14:50 PDT
Created attachment 455617 [details] [diff] [review]
fix v2

This fixes the crash also on 1.9.2 and 1.9.1.
It passed TryServer (m-c) unit tests and reftests/crashtests
in local 1.9.2 debug builds on Linux and OSX.
Comment 13 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-07-01 20:27:51 PDT
Hmm, it doesn't seem right just to reset the linebreaker. We could be in the middle of finding line breaks.

Do you know why a textrun whose text was added to the linebreaker is being replaced?
Comment 14 Mats Palmgren (vacation) 2010-07-03 16:24:50 PDT
Created attachment 455900 [details]
Frame dumps + stack

This a detailed trace starting in BuildTextRuns (first frame dump),
to the error condition in AssignTextRun (last frame dump).

While looping in BuildTextRuns, calling ScanFrame for each frame,
we eventually reach 0x19243f0(pink) where we hit
1388       if (!ContinueTextRunAcrossFrames(mLastFrame, frame)) {
1389         FlushFrames(PR_FALSE, PR_FALSE);
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#1388
(mLastFrame=0x1924158(lime) "\n",  frame=0x19243f0(pink) " ")
so we call FlushFrames, which creates a new text run (0x181be80)
and assign it to the frames in the mapped flow (0x14d71b8 0x1924158).

I don't see anything wrong in this per se, other than the old text run
still being referenced in the BreakSinks.
Comment 15 Mats Palmgren (vacation) 2010-07-03 17:30:43 PDT
Created attachment 455905 [details] [diff] [review]
fix v3

Don't use ClearTextRun() since it might delete the text run while
it's used in BreakSinks/LineBreaker.  Delete it later when we
flush line breaks instead (in FlushLineBreaks).
Comment 16 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-07-06 10:29:18 PDT
This patch is nice and safe, but I'm still worried that we could lose line breaks. The line breaker might record line break opportunities in this textrun that we are just going to throw away.

In your log, it looks like the frame 0x14d71b8 where we replace the old textrun with the new textrun is actually zero-length? Maybe that's an underlying problem we should fix.

The reason we wipe out the old textrun when a new textrun overlaps it is that we have to keep the invariant that the text ranges mapped by textruns don't overlap. But if the frame has no text then there is no overlap to worry about so there shouldn't be a problem just changing its textrun from one to another.
Comment 17 Robert Sayre 2010-07-13 13:36:45 PDT
ETA is July 24th, since mats is out for a bit
Comment 18 Daniel Veditz [:dveditz] 2010-07-13 13:37:05 PDT
Comment on attachment 455905 [details] [diff] [review]
fix v3

per comment 16 and offline, roc says we need a new patch.
Comment 19 Damon Sicore (:damons) 2010-07-20 13:30:04 PDT
Blocking Final 2.0+
Comment 20 Wayne Mery (:wsmwk, NI for questions) 2010-07-31 10:34:25 PDT
#114 crash for thunderbird 3.1.1, approx same for v3.0.6

a firefox crash mentions "accidentally rsized the font larger with the mouse pad somehow, then tried to resize the window to full screen and crash"  bp-269e3e50-3730-42e3-b9e4-379322100709

summary sig change to match BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*) to crash-stats

unrelated? ... free | nsAutoPtr<BuildTextRunsScanner::BreakSink>::`scalar deleting destructor''(unsigned int)
Comment 21 Mats Palmgren (vacation) 2010-08-04 04:06:34 PDT
Comment on attachment 451142 [details]
testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded)

This testcase does not trigger the bug on trunk anymore due to:
http://hg.mozilla.org/mozilla-central/rev/c2fa4377a799
It still crashes with that change reverted so replacing the
newlines in the testcase with some other character should
still crash on trunk...
Comment 22 Mats Palmgren (vacation) 2010-08-04 18:31:28 PDT
Created attachment 463024 [details] [diff] [review]
crashtest.diff  (for trunk and branches)
Comment 23 Mats Palmgren (vacation) 2010-08-04 19:07:44 PDT
(In reply to comment #16)
> ... it looks like the frame 0x14d71b8 where we replace the old textrun
> with the new textrun is actually zero-length? Maybe that's an underlying
> problem we should fix.

I don't think that's a bug in this case, it can happen when a text frame
grows to include text from its next continuation during reflow, leaving
the continuation temporarily empty until its reflowed.
SetLength mentions this case explicitly:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#6083
Comment 24 Mats Palmgren (vacation) 2010-08-04 19:10:10 PDT
Created attachment 463029 [details] [diff] [review]
fix v4

ClearTextRun + update break sinks to use the new text run --
but only if the old text run is NOT in use by the previous continuation.
Comment 25 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-08-10 12:56:48 PDT
As discussed offline, Mats is going to make a new patch here.
Comment 26 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-08-17 13:24:48 PDT
Mats, got an update?
Comment 27 Jesse Ruderman 2010-08-19 12:01:23 PDT
dbaron is on a -moz-column crash-killing rampage.
Comment 28 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-09-12 15:16:37 PDT
Is that new patch coming soon?  (roc, do you remember what was wrong with the old one?)
Comment 29 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-13 00:28:30 PDT
Mats has been working on the swapDocShells frame-tree-swapping bug.

This patch was not the right approach. Basically, the situation in this bug is that we have a text continuation frame C that is empty (maps zero characters) but is associated with textrun A. Textruns get recreated and C is assigned a new textrun B. This causes us to delete textrun A (which is also being used by C's prev-in-flows), but it shouldn't need to; because C has no actual text associated with it at this point, we can just flip it to point to textrun B and no textruns need to be updated. (We do need to ensure that C does not appear in the userdata for textrun A.)
Comment 30 Johnny Stenback (:jst, jst@mozilla.com) 2010-09-21 13:53:39 PDT
Any updates on a new approach here?
Comment 31 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-09-21 14:59:01 PDT
Mats has finally finished the presentation-swapping work and will work on this this week.
Comment 32 Mats Palmgren (vacation) 2010-10-04 15:30:57 PDT
Created attachment 480764 [details] [diff] [review]
fix v5
Comment 33 Mats Palmgren (vacation) 2010-10-10 18:39:30 PDT
Intentionally didn't land the tests yet, until we fix branches.
http://hg.mozilla.org/mozilla-central/rev/cc8777e97c21
Comment 34 chris hofmann 2010-10-12 07:56:36 PDT
could this landing be connected to the crash volume regression described in
https://bugzilla.mozilla.org/show_bug.cgi?id=485003#c11 and below?
Comment 35 Mats Palmgren (vacation) 2010-10-12 09:26:10 PDT
I think so.  I'm looking at bug 603490 and bug 603510 which
shows that the assumptions we made in this patch are wrong.
So I'm backing out and reopening this bug instead.
Comment 36 Mats Palmgren (vacation) 2010-10-12 09:41:49 PDT
Backout changeset:
http://hg.mozilla.org/mozilla-central/rev/9af1a5813a7b
Comment 37 Mats Palmgren (vacation) 2010-10-12 14:14:01 PDT
Created attachment 482664 [details]
Trace for bug 603490
Comment 38 Mats Palmgren (vacation) 2010-10-12 14:14:36 PDT
Created attachment 482665 [details]
Trace for bug 603510
Comment 39 Mats Palmgren (vacation) 2010-10-12 14:52:18 PDT
roc, how about if we always do a limited ClearTextRun() -- clearing only
the tail of continuations starting at the frame we're looking at (f).
The crash in this bug comes from UnhookTextRunFromFrames
starting at the "first-in-userdata" (or first-in-flow for SIMPLE_FLOW).
That would mean a full ClearTextRun() for bug 603490 and bug 603510
(which I think is right) and for this bug we would start at the
"first-in-userdata" which is the first frame using the oldTextRun
(0x1920b58 in attachment 455900 [details]), then loop until we find 
0x14d71b8 (f) and clear from there, the net result being a simple
SetTextRun(nsnull) in this case.  What do you think?
Comment 40 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-10-12 15:35:38 PDT
Sounds good.
Comment 41 Mats Palmgren (vacation) 2010-10-13 11:06:12 PDT
Created attachment 482907 [details] [diff] [review]
fix v6

Add 'aStartContinuation' parameter to the relevant methods;
in case it's null these methods should work as before,
in case it's non-null they should traverse frames as before but
not do anything until we get to 'aStartContinuation', then they should
start doing their thing.  The only non-null caller is AssignTextRun.
If 'aStartContinuation' is non-null it's an error if it's not found
in the user data.

Still waiting for Try results, but an earlier version with only
a minor difference worked fine...
Comment 42 Mats Palmgren (vacation) 2010-10-13 14:39:18 PDT
(TryServer unit tests were successful with a couple of known [orange] bugs)
Comment 43 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-10-13 14:51:08 PDT
Comment on attachment 482907 [details] [diff] [review]
fix v6

Probably best to not use default parameters here.
Comment 44 Mats Palmgren (vacation) 2010-10-14 20:24:32 PDT
Created attachment 483383 [details] [diff] [review]
fix v7

Ok, I dropped the default param and added 'nsnull' to callers as needed.
I don't think this needs a new review so I'm pushing this in a bit....
Comment 45 Mats Palmgren (vacation) 2010-10-15 01:00:26 PDT
http://hg.mozilla.org/mozilla-central/rev/9a1c991eddb0
Comment 46 Mats Palmgren (vacation) 2010-11-09 10:46:02 PST
Created attachment 489213 [details] [diff] [review]
Patch for branches

Accumulated patch for 1.9.2/1.9.1 branches.  It contains the fix for this
bug and its regressions, bug 605340 and bug 604843, plus crashtests for
bug 603510 and bug 603490. (I'm holding the crashtest on this bug)
Comment 47 Daniel Veditz [:dveditz] 2010-11-10 10:57:17 PST
Comment on attachment 489213 [details] [diff] [review]
Patch for branches

Approved for 1.9.2.13 and 1.9.1.16, a=dveditz for release-drivers
Comment 49 Mike Hommey [:glandium] 2010-12-08 06:47:23 PST
It looks like none of the crashtests make 1.9.0 crash.
Comment 50 Mats Palmgren (vacation) 2013-05-18 09:43:31 PDT
Crash test:
https://hg.mozilla.org/integration/mozilla-inbound/rev/22e8618b6646
Comment 51 Phil Ringnalda (:philor, back in August) 2013-05-18 18:14:53 PDT
https://hg.mozilla.org/mozilla-central/rev/22e8618b6646

Note You need to log in before you can comment on or make changes to this bug.