Closed Bug 571995 Opened 15 years ago Closed 14 years ago

Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*)] with crazy -moz-column testcase

Categories

(Core :: Layout, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b7
Tracking Status
blocking2.0 --- final+
blocking1.9.2 --- .13+
status1.9.2 --- .13-fixed
blocking1.9.1 --- .16+
status1.9.1 --- .16-fixed

People

(Reporter: jruderman, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: crash, testcase, Whiteboard: [sg:critical][critsmash:investigating])

Crash Data

Attachments

(10 files, 6 obsolete files)

451 bytes, application/xhtml+xml
Details
39.08 KB, text/plain
Details
1.11 KB, patch
Details | Diff | Splinter Review
40.42 KB, text/html
Details
1.05 KB, patch
Details | Diff | Splinter Review
12.00 KB, text/html
Details
10.86 KB, text/html
Details
12.34 KB, patch
roc
: review+
Details | Diff | Splinter Review
22.02 KB, patch
Details | Diff | Splinter Review
27.27 KB, patch
roc
: review+
Details | Diff | Splinter Review
No description provided.
Attached file crash report
We're calling a virtual method on a deleted gfxTextRun.
Group: core-security
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [sg:critical]
Attached patch to illustrateSplinter Review
This is not a suggested fix/wallpaper - just to illustrate where the problem starts. In BuildTextRunsScanner::AssignTextRun we call f->ClearTextRun() which deletes mTextRun if it doesn't have the TEXT_IN_CACHE bit: http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#3759 The patch illustrates the deleted text run is still in use by one of the mBreakSinks.
Same crash occurs on 1.9.1 and 1.9.2
status1.9.1: --- → ?
status1.9.2: --- → ?
Mats can probably fix this.
Assignee: nobody → matspal
Whiteboard: [sg:critical] → [sg:critical][critsmash:investigating]
Mats, do you know when you'll be able to work on this?
Attached file Frame dump + stacks (obsolete) —
The first stack dump is from when the first "REASSIGNING MULTIFLOW ..." warning occurs, note that aTextRun=0x163b680 (red) here. The next stack is when we hit the warning again and this is the same call where we have 0x163b680 in one of the BreakSinks. Also note the "UnhookTextRunFromFrames 0x163b680 (red)" here, I guess from "f->ClearTextRun();"
It means that we had to wipe out the textrun for a frame, deleting the entire textrun, and the textrun had text accumulated from multiple frames. This is not wrong, it can happen, but I added that warning for myself because if it happens too much we have serious performance problems.
Attached patch wip (obsolete) — Splinter Review
Removing the BreakSink that refers to the old textrun and all BreakSinks that follows it, then for each of those BreakSinks, remove it from mLineBreaker and all TextItems that follows. This fixes the crash for me locally (Linux) and pass reftests+crashtests. I've submitted the patch to TryServer.
Comment on attachment 455365 [details] [diff] [review] wip This doesn't fix the crash on 1.9.2/1.9.1 though... just moves it to some other place. Removing *all* BreakSinks seems to fix it though... I'll investigate some more...
Attachment #455365 - Attachment is obsolete: true
Attached patch fix v2 (obsolete) — Splinter Review
This fixes the crash also on 1.9.2 and 1.9.1. It passed TryServer (m-c) unit tests and reftests/crashtests in local 1.9.2 debug builds on Linux and OSX.
Attachment #455617 - Flags: review?(roc)
Hmm, it doesn't seem right just to reset the linebreaker. We could be in the middle of finding line breaks. Do you know why a textrun whose text was added to the linebreaker is being replaced?
Attached file Frame dumps + stack
This a detailed trace starting in BuildTextRuns (first frame dump), to the error condition in AssignTextRun (last frame dump). While looping in BuildTextRuns, calling ScanFrame for each frame, we eventually reach 0x19243f0(pink) where we hit 1388 if (!ContinueTextRunAcrossFrames(mLastFrame, frame)) { 1389 FlushFrames(PR_FALSE, PR_FALSE); http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#1388 (mLastFrame=0x1924158(lime) "\n", frame=0x19243f0(pink) " ") so we call FlushFrames, which creates a new text run (0x181be80) and assign it to the frames in the mapped flow (0x14d71b8 0x1924158). I don't see anything wrong in this per se, other than the old text run still being referenced in the BreakSinks.
Attachment #455029 - Attachment is obsolete: true
Attached patch fix v3 (obsolete) — Splinter Review
Don't use ClearTextRun() since it might delete the text run while it's used in BreakSinks/LineBreaker. Delete it later when we flush line breaks instead (in FlushLineBreaks).
Attachment #455617 - Attachment is obsolete: true
Attachment #455905 - Flags: review?(roc)
Attachment #455617 - Flags: review?(roc)
This patch is nice and safe, but I'm still worried that we could lose line breaks. The line breaker might record line break opportunities in this textrun that we are just going to throw away. In your log, it looks like the frame 0x14d71b8 where we replace the old textrun with the new textrun is actually zero-length? Maybe that's an underlying problem we should fix. The reason we wipe out the old textrun when a new textrun overlaps it is that we have to keep the invariant that the text ranges mapped by textruns don't overlap. But if the frame has no text then there is no overlap to worry about so there shouldn't be a problem just changing its textrun from one to another.
ETA is July 24th, since mats is out for a bit
Comment on attachment 455905 [details] [diff] [review] fix v3 per comment 16 and offline, roc says we need a new patch.
Attachment #455905 - Flags: review?(roc) → review-
Blocking Final 2.0+
blocking2.0: --- → final+
#114 crash for thunderbird 3.1.1, approx same for v3.0.6 a firefox crash mentions "accidentally rsized the font larger with the mouse pad somehow, then tried to resize the window to full screen and crash" bp-269e3e50-3730-42e3-b9e4-379322100709 summary sig change to match BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*) to crash-stats unrelated? ... free | nsAutoPtr<BuildTextRunsScanner::BreakSink>::`scalar deleting destructor''(unsigned int)
Summary: Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks] with crazy -moz-column testcase → Crash [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*)] with crazy -moz-column testcase
Comment on attachment 451142 [details] testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded) This testcase does not trigger the bug on trunk anymore due to: http://hg.mozilla.org/mozilla-central/rev/c2fa4377a799 It still crashes with that change reverted so replacing the newlines in the testcase with some other character should still crash on trunk...
Attachment #451142 - Attachment description: testcase (crashes Firefox when loaded) → testcase (crashes 1.9.1 and 1.9.2 Firefox when loaded)
(In reply to comment #16) > ... it looks like the frame 0x14d71b8 where we replace the old textrun > with the new textrun is actually zero-length? Maybe that's an underlying > problem we should fix. I don't think that's a bug in this case, it can happen when a text frame grows to include text from its next continuation during reflow, leaving the continuation temporarily empty until its reflowed. SetLength mentions this case explicitly: http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrameThebes.cpp#6083
Attached patch fix v4 (obsolete) — Splinter Review
ClearTextRun + update break sinks to use the new text run -- but only if the old text run is NOT in use by the previous continuation.
Attachment #455905 - Attachment is obsolete: true
Attachment #463029 - Flags: review?(roc)
As discussed offline, Mats is going to make a new patch here.
blocking1.9.1: --- → needed
blocking1.9.2: --- → needed
dbaron is on a -moz-column crash-killing rampage.
Whiteboard: [sg:critical][critsmash:investigating] → [sg:critical][critsmash:patch]
Attachment #463029 - Attachment is obsolete: true
Attachment #463029 - Flags: review?(roc)
Whiteboard: [sg:critical][critsmash:patch] → [sg:critical]
Is that new patch coming soon? (roc, do you remember what was wrong with the old one?)
Mats has been working on the swapDocShells frame-tree-swapping bug. This patch was not the right approach. Basically, the situation in this bug is that we have a text continuation frame C that is empty (maps zero characters) but is associated with textrun A. Textruns get recreated and C is assigned a new textrun B. This causes us to delete textrun A (which is also being used by C's prev-in-flows), but it shouldn't need to; because C has no actual text associated with it at this point, we can just flip it to point to textrun B and no textruns need to be updated. (We do need to ensure that C does not appear in the userdata for textrun A.)
Any updates on a new approach here?
Mats has finally finished the presentation-swapping work and will work on this this week.
Whiteboard: [sg:critical] → [sg:critical][critsmash:investigating]
Attached patch fix v5 (obsolete) — Splinter Review
Attachment #480764 - Flags: review?(roc)
Intentionally didn't land the tests yet, until we fix branches. http://hg.mozilla.org/mozilla-central/rev/cc8777e97c21
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
could this landing be connected to the crash volume regression described in https://bugzilla.mozilla.org/show_bug.cgi?id=485003#c11 and below?
I think so. I'm looking at bug 603490 and bug 603510 which shows that the assumptions we made in this patch are wrong. So I'm backing out and reopening this bug instead.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 603490
Depends on: 603510
Attached file Trace for bug 603490
Attachment #480764 - Attachment is obsolete: true
Attached file Trace for bug 603510
roc, how about if we always do a limited ClearTextRun() -- clearing only the tail of continuations starting at the frame we're looking at (f). The crash in this bug comes from UnhookTextRunFromFrames starting at the "first-in-userdata" (or first-in-flow for SIMPLE_FLOW). That would mean a full ClearTextRun() for bug 603490 and bug 603510 (which I think is right) and for this bug we would start at the "first-in-userdata" which is the first frame using the oldTextRun (0x1920b58 in attachment 455900 [details]), then loop until we find 0x14d71b8 (f) and clear from there, the net result being a simple SetTextRun(nsnull) in this case. What do you think?
Attached patch fix v6Splinter Review
Add 'aStartContinuation' parameter to the relevant methods; in case it's null these methods should work as before, in case it's non-null they should traverse frames as before but not do anything until we get to 'aStartContinuation', then they should start doing their thing. The only non-null caller is AssignTextRun. If 'aStartContinuation' is non-null it's an error if it's not found in the user data. Still waiting for Try results, but an earlier version with only a minor difference worked fine...
Attachment #482907 - Flags: review?(roc)
(TryServer unit tests were successful with a couple of known [orange] bugs)
Comment on attachment 482907 [details] [diff] [review] fix v6 Probably best to not use default parameters here.
Attachment #482907 - Flags: review?(roc) → review+
Attached patch fix v7Splinter Review
Ok, I dropped the default param and added 'nsnull' to callers as needed. I don't think this needs a new review so I'm pushing this in a bit....
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Blocks: 485003
Depends on: 604843
blocking1.9.1: needed → .15+
blocking1.9.2: needed → .12+
Depends on: 605340
Target Milestone: mozilla2.0b8 → mozilla2.0b7
Accumulated patch for 1.9.2/1.9.1 branches. It contains the fix for this bug and its regressions, bug 605340 and bug 604843, plus crashtests for bug 603510 and bug 603490. (I'm holding the crashtest on this bug)
Attachment #489213 - Flags: review?(roc)
Attachment #489213 - Flags: approval1.9.2.13?
Attachment #489213 - Flags: approval1.9.1.16?
Comment on attachment 489213 [details] [diff] [review] Patch for branches Approved for 1.9.2.13 and 1.9.1.16, a=dveditz for release-drivers
Attachment #489213 - Flags: approval1.9.2.13?
Attachment #489213 - Flags: approval1.9.2.13+
Attachment #489213 - Flags: approval1.9.1.16?
Attachment #489213 - Flags: approval1.9.1.16+
It looks like none of the crashtests make 1.9.0 crash.
Group: core-security
Crash Signature: [@ BuildTextRunsScanner::BreakSink::SetBreaks(unsigned int, unsigned int, unsigned char*)]
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: