Steps to reproduce: 1. Load the testcase. 2. Quit Firefox. Result: crash [@ UnhookTextRunFromFrames] or [@ ClearAllTextRunReferences]. If MallocScribble is enabled, the crash involves dereferencing 5555567d, but without MallocScribble, a random address gets called.
I can reproduce the ClearAllTextRunReferences shutdown-crash in my Linux mozilla-central debug build from this morning. OS --> All. The crash is at line 328 in nsTextFrameThebes.cpp, on the call to aFrame->GetType(), quoted here: 328 NS_ASSERTION(aFrame->GetType() == nsGkAtoms::textFrame, 329 "Bad frame"); If I run "print *aFrame" in gdb after the crash, I see that all of aFrame's member-data pointers are bad (0x5a5a5a5a). firebot tells me that that pattern signifies "jemalloc allocated but unused memory (MALLOC_FILL) - or possibly dead land (deleted memory)". FWIW, I can't reproduce the shutdown-crash in today's optimized nightly build. (Or at least I can't tell that it's crashing -- Firefox closes normally, and I get no errors printed to stderr or stdout.)
OS: Mac OS X → All
That assertion was added specifically to catch stale text frames here. Without the assertion, the code in that method will usually silently exit without causing any problems.
"Usually" not causing problems isn't very reassuring! Actually, it looks like the crash can happen after merely closing the window, it's not just a shutdown thing. And in a release build, without the assertion, there's a risk that we'll be dereferencing something stale elsewhere in the function, so we're still liable to crash. I'm suspicious that this may be a result of the fix to bug 470418. That patch redefined the TEXT_INCOMING_ARABICCHAR flag to avoid conflicting with TEXT_IN_CACHE, which was most definitely a Bad Thing. However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA. :( This means that it may affect the behavior of nsContinuingTextFrame::Destroy(), at least: it looks like it could cause ClearTextRun() to be called in cases where otherwise it wouldn't have been. Could that be damaging things? I haven't yet tested enough to confirm whether this is really the root of the problem, but it looks bad. Part of the trouble is that these flag bits are defined in a variety of places, so it's not immediately obvious if we're inadvertently reusing bits for multiple purposes.
"Usually not causing problems" is actually worse than "always causes problems", since it's harder to debug, which is why I added the assertion to ensure it "always causes problems" :-). In gfxFont.h are the rules that are supposed to govern who owns what flags: CACHE_TEXT_FLAGS = 0xF0000000, USER_TEXT_FLAGS = 0x0FFF0000, PLATFORM_TEXT_FLAGS = 0x0000F000, TEXTRUN_TEXT_FLAGS = 0x00000FFF > However, it now conflicts with TEXT_IN_TEXTRUN_USER_DATA. No, because TEXT_IN_TEXTRUN_USER_DATA is a frame state bit, and TEXT_INCOMING_ARABICCHAR is a textrun flag bit.
Ah, ok... sorry for my confusion!
Nominating for blocking1.9.1 -- this looks especially easy to exploit, and it's a regression with a fairly simple testcase.
looks like we get about 50-60 crash reports a day that look something like the stack below and a few other variations: 0 xul.dll UnhookTextRunFromFrames layout/generic/nsTextFrameThebes.cpp:341 1 xul.dll FrameTextRunCache::NotifyExpired layout/generic/nsTextFrameThebes.cpp:381 2 xul.dll nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration obj-firefox/dist/include/xpcom/nsExpirationTracker.h:210 3 xul.dll nsExpirationTracker<gfxTextRun,3>::TimerCallback obj-firefox/dist/include/xpcom/nsExpirationTracker.h:299 4 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:420 5 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:512 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 7 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170 8 nspr4.dll PR_GetEnv 9 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87 10 firefox.exe firefox.exe@0x2197 11 kernel32.dll BaseProcessStart no comments, but I'm wondering if people just don't know what to say since they might be in the middle of a shutdown.
Assignee: nobody → roc
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
The problem was when we constructed a textrun for a frame that already has one. We set TEXT_IN_TEXTRUN_USER_DATA on to frame and then call ClearTextRun on it before we set its mTextRun. Because it already has a textrun, we clear out that textrun and also clear the TEXT_IN_TEXTRUN_USER_DATA flag we just set... The fix is simple, just set TEXT_IN_TEXTRUN_USER_DATA after the ClearTextRun call.
Attachment #360649 - Flags: review?(smontagu)
Whiteboard: [sg:critical] → [sg:critical][needs review]
Attachment #360649 - Flags: review?(smontagu) → review+
Whiteboard: [sg:critical][needs review] → [sg:critical][needs landing]
Is there still something to do here, or could the patch land?
It's ready to land. I was going to land it myself, but you can if you like.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical][needs landing] → [sg:critical][needs 191 landing]
Whiteboard: [sg:critical][needs 191 landing] → [sg:critical]
Verified fixed on trunk and 1.9.1 with builds on OS X and Windows: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090423 Minefield/3.6a1pre ID:20090423040020 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090423 Shiretoko/3.5b4pre ID:20090423040926
It look like bug 490410 may be a 1.9.0.x version of this bug.
This bug appears to have originally regressed between these two mozilla-central nightlies: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081216 Minefield/3.2a1pre Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20081217 Minefield/3.2a1pre --> looks like it was a regression from Bug 467150
Attachment #360649 - Flags: approval22.214.171.124?
Regarding the approval126.96.36.199 request on this bug's patch -- I've confirmed that the patch fixes bug 490410, a 1.9.0.x crash-regression (as noted in bug 490410 comment 10).
Comment on attachment 360649 [details] [diff] [review] fix Approved for 188.8.131.52, a=dveditz for release-drivers
Attachment #360649 - Flags: approval184.108.40.206? → approval220.127.116.11+
Patch landed on CVS for 18.104.22.168: Checking in layout/generic/nsTextFrameThebes.cpp; /cvsroot/mozilla/layout/generic/nsTextFrameThebes.cpp,v <-- nsTextFrameThebes.cpp new revision: 3.190; previous revision: 3.189 done RCS file: /cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v done Checking in layout/generic/crashtests/472776-1.html; /cvsroot/mozilla/layout/generic/crashtests/472776-1.html,v <-- 472776-1.html initial revision: 1.1 done Checking in layout/generic/crashtests/crashtests.list; /cvsroot/mozilla/layout/generic/crashtests/crashtests.list,v <-- crashtests.list new revision: 1.124; previous revision: 1.123 done
Here's the fix as landed on 1.9.0.x -- I just fixed the bitrot in crashtests.list.
Is this a debug only crash? I cannot crash with 22.214.171.124 on Ubuntu or OS X.
Yes, I believe it was debug-only on mozilla-central. However, I'm not sure anyone's seen this crash in 126.96.36.199 debug builds. (I haven't -- see bug 490410 comment 11.) This bug's testcase may require some additional not-currently-backported-to-1.9.0.x code in order to crash. In any case, though, we took this bug's patch to fix bug 490410's testcase, not to fix this bug's testcase.
I've verified that bug 490410 is fixed in 188.8.131.52. I'm going to mark this as verified for 184.108.40.206.
Whiteboard: [sg:critical] → [sg:critical] post 1.8-branch
Still occurring, sample crash report from 3.5.5: http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123 ClearAllTextRunReferences layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|331|0x0 UnhookTextRunFromFrames layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|355|0xd FrameTextRunCache::NotifyExpired(gfxTextRun *) layout/generic/nsTextFrameThebes.cpp:57f71400f4cf|389|0xd nsExpirationTracker<gfxTextRun,3>::AgeOneGeneration()|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|210|0xa nsExpirationTracker<gfxTextRun,3>::TimerCallback(nsITimer *,void *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:obj-firefox/dist/include/xpcom/nsExpirationTracker.h:57f71400f4cf|299|0x8 nsTimerImpl::Fire()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|420|0x6 nsTimerEvent::Run()|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsTimerImpl.cpp:57f71400f4cf|512|0x4 nsThread::ProcessNextEvent(int,int *)|hg:hg.mozilla.org/releases/mozilla-1.9.1:xpcom/threads/nsThread.cpp:57f71400f4cf|521|0x13
(In reply to comment #23) > Still occurring, sample crash report from 3.5.5: > > http://crash-stats.mozilla.com/report/index/0b603ce5-63b2-4545-b5a9-537ac2091123 We have closed this bug a long time ago. I think it would be better to handle it in a new bug.
Crash Signature: [@ UnhookTextRunFromFrames] [@ ClearAllTextRunReferences]
You need to log in before you can comment on or make changes to this bug.