Closed Bug 472776 Opened 16 years ago Closed 15 years ago

Crash [@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences] with bidi

Categories

(Core :: Layout: Text and Fonts, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: jruderman, Assigned: roc)

References

Details

(5 keywords, Whiteboard: [sg:critical] post 1.8-branch)

Crash Data

Attachments

(3 files)

Steps to reproduce:
1. Load the testcase.
2. Quit Firefox.

Result: crash [@ UnhookTextRunFromFrames] or [@ ClearAllTextRunReferences].  If MallocScribble is enabled, the crash involves dereferencing 5555567d, but without MallocScribble, a random address gets called.
Whiteboard: [sg:critical]
I can reproduce the ClearAllTextRunReferences shutdown-crash in my Linux mozilla-central debug build from this morning.  OS --> All.

The crash is at line 328 in nsTextFrameThebes.cpp, on the call to aFrame->GetType(), quoted here:
  328     NS_ASSERTION(aFrame->GetType() == nsGkAtoms::textFrame,
  329                  "Bad frame");

If I run "print *aFrame" in gdb after the crash, I see that all of aFrame's member-data pointers are bad (0x5a5a5a5a).  firebot tells me that that pattern signifies "jemalloc allocated but unused memory (MALLOC_FILL) - or possibly dead land (deleted memory)".

FWIW, I can't reproduce the shutdown-crash in today's optimized nightly build.  (Or at least I can't tell that it's crashing -- Firefox closes normally, and I get no errors printed to stderr or stdout.)
OS: Mac OS X → All
That assertion was added specifically to catch stale text frames here. Without the assertion, the code in that method will usually silently exit without causing any problems.
"Usually" not causing problems isn't very reassuring! Actually, it looks like the crash can happen after merely closing the window, it's not just a shutdown thing. And in a release build, without the assertion, there's a risk that we'll be dereferencing something stale elsewhere in the function, so we're still liable to crash.

I'm suspicious that this may be a result of the fix to bug 470418. That patch redefined the TEXT_INCOMING_ARABICCHAR flag to avoid conflicting with TEXT_IN_CACHE, which was most definitely a Bad Thing. However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA. :( This means that it may affect the behavior of nsContinuingTextFrame::Destroy(), at least: it looks like it could cause ClearTextRun() to be called in cases where otherwise it wouldn't have been. Could that be damaging things?

I haven't yet tested enough to confirm whether this is really the root of the problem, but it looks bad. Part of the trouble is that these flag bits are defined in a variety of places, so it's not immediately obvious if we're inadvertently reusing bits for multiple purposes.
"Usually not causing problems" is actually worse than "always causes problems", since it's harder to debug, which is why I added the assertion to ensure it "always causes problems" :-).

In gfxFont.h are the rules that are supposed to govern who owns what flags:
        CACHE_TEXT_FLAGS    = 0xF0000000,
        USER_TEXT_FLAGS     = 0x0FFF0000,
        PLATFORM_TEXT_FLAGS = 0x0000F000,
        TEXTRUN_TEXT_FLAGS  = 0x00000FFF

> However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA.

No, because TEXT_IN_TEXTRUN_USER_DATA is a frame state bit, and TEXT_INCOMING_ARABICCHAR is a textrun flag bit.
Ah, ok... sorry for my confusion!
Nominating for blocking1.9.1 -- this looks especially easy to exploit, and it's a regression with a fairly simple testcase.
Flags: blocking1.9.1?
looks like we get about 50-60 crash reports a day that look something like the stack below and a few other variations:

0  	xul.dll  	UnhookTextRunFromFrames  	layout/generic/nsTextFrameThebes.cpp:341
1 	xul.dll 	FrameTextRunCache::NotifyExpired 	layout/generic/nsTextFrameThebes.cpp:381
2 	xul.dll 	nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration 	obj-firefox/dist/include/xpcom/nsExpirationTracker.h:210
3 	xul.dll 	nsExpirationTracker<gfxTextRun,3>::TimerCallback 	obj-firefox/dist/include/xpcom/nsExpirationTracker.h:299
4 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:420
5 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:512
6 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:510
7 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:170
8 	nspr4.dll 	PR_GetEnv 	
9 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:87
10 	firefox.exe 	firefox.exe@0x2197 	
11 	kernel32.dll 	BaseProcessStart

no comments, but I'm wondering if people just don't know what to say since they might be in the middle of a shutdown.
Assignee: nobody → roc
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Attached patch fixSplinter Review
The problem was when we constructed a textrun for a frame that already has one. We set TEXT_IN_TEXTRUN_USER_DATA on to frame and then call ClearTextRun on it before we set its mTextRun. Because it already has a textrun, we clear out that textrun and also clear the TEXT_IN_TEXTRUN_USER_DATA flag we just set... The fix is simple, just set TEXT_IN_TEXTRUN_USER_DATA after the ClearTextRun call.
Attachment #360649 - Flags: review?(smontagu)
Whiteboard: [sg:critical] → [sg:critical][needs review]
Blocks: 468491
Attachment #360649 - Flags: review?(smontagu) → review+
Whiteboard: [sg:critical][needs review] → [sg:critical][needs landing]
Is there still something to do here, or could the patch land?
It's ready to land. I was going to land it myself, but you can if you like.
Keywords: checkin-needed
Pushed http://hg.mozilla.org/mozilla-central/rev/ef2d14abf75e
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [sg:critical][needs landing] → [sg:critical][needs 191 landing]
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/9bc41efd9298
Keywords: fixed1.9.1
Whiteboard: [sg:critical][needs 191 landing] → [sg:critical]
Verified fixed on trunk and 1.9.1 with builds on OS X and Windows:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090423 Minefield/3.6a1pre ID:20090423040020

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090423 Shiretoko/3.5b4pre ID:20090423040926
Status: RESOLVED → VERIFIED
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
It look like bug 490410 may be a 1.9.0.x version of this bug.
This bug appears to have originally regressed between these two mozilla-central nightlies:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081216 Minefield/3.2a1pre
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081217 Minefield/3.2a1pre

--> looks like it was a regression from Bug 467150
Blocks: 467150
Keywords: regression
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.11?
Attachment #360649 - Flags: approval1.9.0.11?
Blocks: 490410
Regarding the approval1.9.0.11 request on this bug's patch -- I've confirmed that the patch fixes bug 490410, a 1.9.0.x crash-regression (as noted in bug 490410 comment 10).
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.11?
Flags: blocking1.9.0.11+
Comment on attachment 360649 [details] [diff] [review]
fix

Approved for 1.9.0.11, a=dveditz for release-drivers
Attachment #360649 - Flags: approval1.9.0.11? → approval1.9.0.11+
Patch landed on CVS for 1.9.0.11:

Checking in layout/generic/nsTextFrameThebes.cpp;
/cvsroot/mozilla/layout/generic/nsTextFrameThebes.cpp,v  <--  nsTextFrameThebes.cpp
new revision: 3.190; previous revision: 3.189
done
RCS file: /cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v
done
Checking in layout/generic/crashtests/472776-1.html;
/cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v  <--  472776-1.html
initial revision: 1.1
done
Checking in layout/generic/crashtests/crashtests.list;
/cvsroot/mozilla/layout/generic/crashtests/crashtests.list,v  <--  crashtests.list
new revision: 1.124; previous revision: 1.123
done
Keywords: fixed1.9.0.11
Here's the fix as landed on 1.9.0.x -- I just fixed the bitrot in crashtests.list.
Is this a debug only crash? I cannot crash with 1.9.0.10 on Ubuntu or OS X.
Yes, I believe it was debug-only on mozilla-central.

However, I'm not sure anyone's seen this crash in 1.9.0.10 debug builds.  (I haven't -- see bug 490410 comment 11.)  This bug's testcase may require some additional not-currently-backported-to-1.9.0.x code in order to crash.

In any case, though, we took this bug's patch to fix bug 490410's testcase, not to fix this bug's testcase.
I've verified that bug 490410 is fixed in 1.9.0.11. I'm going to mark this as verified for 1.9.0.11.
Flags: wanted1.8.1.x-
Whiteboard: [sg:critical] → [sg:critical] post 1.8-branch
Group: core-security
No longer blocks: 468491
Still occurring, sample crash report from 3.5.5:

http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123

ClearAllTextRunReferences layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|331|0x0
UnhookTextRunFromFrames layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|355|0xd
FrameTextRunCache::NotifyExpired(gfxTextRun *) layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|389|0xd
nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration()|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|210|0xa
nsExpirationTracker<gfxTextRun,3>::TimerCallback(nsITimer *,void *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|299|0x8
nsTimerImpl::Fire()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|420|0x6
nsTimerEvent::Run()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|512|0x4
nsThread::ProcessNextEvent(int,int *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsThread.cpp:57f71400f4cf|521|0x13
(In reply to comment #23)
> Still occurring, sample crash report from 3.5.5:
> 
> http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123

We have closed this bug a long time ago. I think it would be better to handle it in a new bug.
Crash Signature: [@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: