Closed Bug 894798 Opened 6 years ago Closed 3 years ago

crash in gfxFontCache::AgeCachedWordsForFont

Categories

(Core :: Graphics: Text, defect, critical)

25 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox24 --- unaffected
firefox25 --- disabled
firefox26 --- disabled
firefox27 --- disabled
firefox28 --- disabled

People

(Reporter: scoobidiver, Assigned: jfkthame)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(3 files)

It's #19 browser crasher in 25.0a1.
It first showed up in 25.0a1/20130704. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b48e06621dc9&tochange=dcbbfcdf7bb4

Signature 	PL_DHashTableEnumerate | gfxFontCache::AgeCachedWordsForFont(gfxFontCache::HashEntry*, void*) More Reports Search
UUID 	ea978877-6fb9-4e63-bf2c-269ef2130716
Date Processed	2013-07-16 20:12:28.783041
Uptime	541
Last Crash	543 seconds before submission
Install Age 	1818 since version was first installed.
Install Time 	2013-07-16 19:41:49
Product 	Firefox
Version 	25.0a1
Build ID 	20130716030202
Release Channel 	nightly
OS 	Windows NT
OS Version 	6.2.9200
Build Architecture 	x86
Build Architecture Info 	GenuineIntel family 6 model 23 stepping 10 | 2
Crash Reason 	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address 	0xa
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x68d8, AdapterSubsysID: 22971787, AdapterDriverVersion: 9.12.0.0
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers+ 

Frame 	Module 	Signature 	Source
0 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.cpp
1 	xul.dll 	gfxFontCache::AgeCachedWordsForFont(gfxFontCache::HashEntry *,void *) 	gfx/thebes/gfxFont.cpp
2 	xul.dll 	nsTHashtable<nsCertOverrideEntry>::s_EnumStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *) 	obj-firefox/dist/include/nsTHashtable.h
3 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.cpp
4 	xul.dll 	nsTHashtable<gfxFontCache::HashEntry>::EnumerateEntries(PLDHashOperator (*)(gfxFontCache::HashEntry *,void *),void *) 	obj-firefox/dist/include/nsTHashtable.h
5 	xul.dll 	gfxFontCache::WordCacheExpirationTimerCallback(nsITimer *,void *) 	gfx/thebes/gfxFont.cpp
6 	xul.dll 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
7 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
8 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	obj-firefox/xpcom/build/nsThreadUtils.cpp
9 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
12 	xul.dll 	nsBaseAppShell::Run() 	widget/xpwidgets/nsBaseAppShell.cpp
13 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
14 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
15 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
16 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
17 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
18 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
19 	firefox.exe 	NS_internal_main(int,char * *) 	browser/app/nsBrowserApp.cpp
20 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
21 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt_bld/self_x86/crt/src/crtexe.c
22 	kernel32.dll 	BaseThreadInitThunk 	
23 	ntdll.dll 	__RtlUserThreadStart 	
24 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsTHashtable%3CgfxFont%3A%3ACacheHashEntry%3E%3A%3As_EnumStub%28PLDHashTable*%2C+PLDHashEntryHdr*%2C+unsigned+int%2C+void*%29
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=PL_DHashTableEnumerate+|+gfxFontCache%3A%3AAgeCachedWordsForFont%28gfxFontCache%3A%3AHashEntry*%2C+void*%29
Most likely caused by re-enabling word cache expiration timers, bug 889365.
Blocks: 889365
Clearly a result of re-enabling the expiration timers in bug 889365. (Interesting that we're seeing a slightly different signature than bug 717175.)

I'll try to re-examine the relevant code yet again, but the obvious fix (if we don't figure out the actual root cause) will be to revert bug 889365, sadly.
Given that we still don't know how this arises, I think we should revert bug 889365 on Aurora for FF25; we know that will eliminate the possibility of this crash.

It's possible that bug 910376 (just landed on central) will be relevant, but we'll need to watch Nightly crash-stats for a while to see whether there's any change here.
NB: patch intended for aurora only at this point.
Attachment #800135 - Flags: review?(roc)
Assignee: nobody → jfkthame
Comment on attachment 800135 [details] [diff] [review]
re-disable timer-based expiration of cached words (revert bug 889365) due to crashes.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 889365
User impact if declined: crashiness
Testing completed (on m-c, etc.): this simply reverts to the situation we had on m-c (and branches) prior to (speculatively) landing bug 889365
Risk to taking this patch (and alternatives if risky): minimal, this is a backout
String or IDL/UUID changes made by this patch: none

I'd like to leave bug 889365 enabled on m-c for the time being, so we can watch Nightly crash-stats and see whether bug 910376 or other changes may help; but we should revert it on Aurora before the next uplift, as the unexplained crashes indicate it's clearly not ready for release.
Attachment #800135 - Flags: approval-mozilla-aurora?
(In reply to Jonathan Kew (:jfkthame) from comment #6)
> I'd like to leave bug 889365 enabled on m-c for the time being, so we can
> watch Nightly crash-stats and see whether bug 910376 or other changes may
> help; but we should revert it on Aurora before the next uplift, as the
> unexplained crashes indicate it's clearly not ready for release.

Tracking nom for 26 based upon this.
Whiteboard: [before uplift to Beta 26]
Attachment #800135 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Pushed to aurora, marking this as firefox25:disabled.

https://hg.mozilla.org/releases/mozilla-aurora/rev/74efb2b26f7c

Bug remains open to track the issue on mozilla-central.
Thanks!
We're pushing this to Aurora (26) as well since we're about to merge to Beta.  I've marked for 27/28 - what's the plan for this issue?
Specifically, can this be put behind the pref that keeps it on Nightly so we don't have to keep tracking this?
Flags: needinfo?(jfkthame)
From current crash-stats, it looks like the frequency of this signature is substantially reduced compared to comment #0, but it's still around. We can avoid shipping it to release channels using the RELEASE_BUILD condition, as I understand it.
Attachment #823477 - Flags: review?(roc)
This should prevent the problem reaching release channels:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a1700486388

We should uplift this to Aurora, so that it automatically takes care of Beta/Release for FF27 and up.

Let's keep this bug open, as the problem still exists on mozilla-central, although it does seem to be much less severe than at first.
Flags: needinfo?(jfkthame)
Whiteboard: [before uplift to Beta 26] → [leave open]
Comment on attachment 823477 [details] [diff] [review]
disable word-cache expiration timer on release builds

[Approval Request Comment]

As per comment #6, for Firefox 27 and up. This just puts the problematic feature behind an #ifndef RELEASE_BUILD, so we don't have to keep disabling it manually until the crashes on central are fully resolved.
Attachment #823477 - Flags: approval-mozilla-aurora?
Attachment #823477 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
It's still early days for Firefox 28.0a1 but this is currently at #83 (0.1% or 3/2753 crashes).
This is now at #95 (.08% or 17 of 21602 crashes).
(In reply to Wes Kocher (:KWierso) from comment #18)
> https://hg.mozilla.org/mozilla-central/rev/7a1700486388

Setting the FF28 flag to disabled since Wes's checkin was after mozilla-central became Firefox 28. The crash-stats keep dropping: 40 positions in the last week, it's now #108 and 0.07% on crash-stats for the 7 day report.
Some more observations.

1. I can't reproduce this on Linux, only Windows (7).

2. It only seems to happen at shutdown, at no other time.

3. It is the same thread -- the main thread -- that both frees the
   memory and later reads it.  So this is not a threading error.

4. Here are some more complete stacks, first for the free and
   second for the access after the free.  It is clear from those
   that, first there is a call to gfxFontCache::NotifyExpired,
   and later there is a call to gfxFontCache::WordCacheExpirationTimerCallback
   which somehow visits the stuff that ::NotifyExpired already freed.

   I suspect these stacks lack inlined-out frames, unfortunately.  I can
   try to get stacks without inlining if that would help.

https://bugzilla.mozilla.org/show_bug.cgi?id=894798#c21
gfxFontCache::WordCacheExpirationTimerCallbac

First the free:

Note: @0:06:38.916 in thread 1924
Note: 0x2eaace9c-0x2eaacea0 overlaps memory 0x2eaace88-0x2eaacf38 that was freed here:
Note: # 0 replace_free    [e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c:2507] (0x73832080 <drmemorylib.dll+0x32080>) modid:3
Note: # 1 xul.dll!gfxGDIFont::`scalar deleting destructor' (0x6e67fee1 <xul.dll+0x56fee1>) modid:54
Note: # 2 xul.dll!gfxFontCache::NotifyExpired [c:\users\sewardj\moz\mc-curr\gfx\thebes\gfxfont.cpp:280] (0x6e69d554 <xul.dll+0x58d554>) modid:54
Note: # 3 xul.dll!nsExpirationTracker<>::AgeOneGeneration [c:\users\sewardj\moz\mc-curr\ff-o1-win32\dist\include\nsexpirationtracker.h:205] (0x6f104cc1 <xul.dll+0xff4cc1>) modid:54
Note: # 4 xul.dll!nsExpirationTracker<>::TimerCallback [c:\users\sewardj\moz\mc-curr\ff-o1-win32\dist\include\nsexpirationtracker.h:344] (0x6f298689 <xul.dll+0x1188689>) modid:54
Note: # 5 xul.dll!nsTimerImpl::Fire [c:\users\sewardj\moz\mc-curr\xpcom\threads\nstimerimpl.cpp:613] (0x6e1b9ebd <xul.dll+0xa9ebd>) modid:54
Note: # 6 xul.dll!nsTimerEvent::Run [c:\users\sewardj\moz\mc-curr\xpcom\threads\nstimerimpl.cpp:700] (0x6e1bd30e <xul.dll+0xad30e>) modid:54
Note: # 7 xul.dll!nsThread::ProcessNextEvent [c:\users\sewardj\moz\mc-curr\xpcom\threads\nsthread.cpp:848] (0x6e1bbe9b <xul.dll+0xabe9b>) modid:54
Note: # 8 xul.dll!NS_ProcessNextEvent [c:\users\sewardj\moz\mc-curr\xpcom\glue\nsthreadutils.cpp:265] (0x6e1d125f <xul.dll+0xc125f>) modid:54
Note: # 9 xul.dll!mozilla::layers::CompositorParent::ShutDown [c:\users\sewardj\moz\mc-curr\gfx\layers\ipc\compositorparent.cpp:602] (0x6e66ee46 <xul.dll+0x55ee46>) modid:54
Note: #10 xul.dll!mozilla::DefaultDelete<>::operator() [c:\users\sewardj\moz\mc-curr\ff-o1-win32\dist\include\mozilla\uniqueptr.h:489] (0x6f3c5d69 <xul.dll+0x12b5d69>) modid:54
Note: #11 xul.dll!XREMain::XRE_main [c:\users\sewardj\moz\mc-curr\toolkit\xre\nsapprunner.cpp:4385] (0x6f3c9ef6 <xul.dll+0x12b9ef6>) modid:54
Note: #12 xul.dll!XRE_main [c:\users\sewardj\moz\mc-curr\toolkit\xre\nsapprunner.cpp:4448] (0x6f3cbc8a <xul.dll+0x12bbc8a>) modid:54
Note: #13 do_main     [c:\users\sewardj\moz\mc-curr\browser\app\nsbrowserapp.cpp:214] (0x01081ecc <firefox.exe+0x1ecc>) modid:1
Note: #14 NS_internal_main [c:\users\sewardj\moz\mc-curr\browser\app\nsbrowserapp.cpp:478] (0x010819ea <firefox.exe+0x19ea>) modid:1
Note: #15 wmain       [c:\users\sewardj\moz\mc-curr\toolkit\xre\nswindowswmain.cpp:131] (0x01082192 <firefox.exe+0x2192>) modid:1

And later the access:

[in] thread 1924
Error #1: UNADDRESSABLE ACCESS of freed memory: reading 0x2eaace9c-0x2eaacea0 4 byte(s)
# 0 xul.dll!gfxFont::AgeCachedWords [c:\users\sewardj\moz\mc-curr\gfx\thebes\gfxfont.h:1718] (0x6e691a42 <xul.dll+0x581a42>) modid:54
# 1 xul.dll!gfxFontCache::AgeCachedWordsForFont [c:\users\sewardj\moz\mc-curr\gfx\thebes\gfxfont.cpp:301] (0x6e691a63 <xul.dll+0x581a63>) modid:54
# 2 xul.dll!nsTHashtable<>::EnumerateEntries [c:\users\sewardj\moz\mc-curr\ff-o1-win32\dist\include\nsthashtable.h:205] (0x6efca89e <xul.dll+0xeba89e>) modid:54
# 3 xul.dll!gfxFontCache::WordCacheExpirationTimerCallback [c:\users\sewardj\moz\mc-curr\gfx\thebes\gfxfont.cpp:310] (0x6e6a3628 <xul.dll+0x593628>) modid:54
# 4 xul.dll!nsTimerImpl::Fire [c:\users\sewardj\moz\mc-curr\xpcom\threads\nstimerimpl.cpp:613] (0x6e1b9ebd <xul.dll+0xa9ebd>) modid:54
# 5 xul.dll!nsTimerEvent::Run [c:\users\sewardj\moz\mc-curr\xpcom\threads\nstimerimpl.cpp:700] (0x6e1bd30e <xul.dll+0xad30e>) modid:54
# 6 xul.dll!nsThread::ProcessNextEvent [c:\users\sewardj\moz\mc-curr\xpcom\threads\nsthread.cpp:848] (0x6e1bbe9b <xul.dll+0xabe9b>) modid:54
# 7 xul.dll!NS_ProcessNextEvent [c:\users\sewardj\moz\mc-curr\xpcom\glue\nsthreadutils.cpp:265] (0x6e1d125f <xul.dll+0xc125f>) modid:54
# 8 xul.dll!mozilla::layers::CompositorParent::ShutDown [c:\users\sewardj\moz\mc-curr\gfx\layers\ipc\compositorparent.cpp:602] (0x6e66ee46 <xul.dll+0x55ee46>) modid:54
# 9 xul.dll!mozilla::DefaultDelete<>::operator() [c:\users\sewardj\moz\mc-curr\ff-o1-win32\dist\include\mozilla\uniqueptr.h:489] (0x6f3c5d69 <xul.dll+0x12b5d69>) modid:54
#10 xul.dll!XREMain::XRE_main [c:\users\sewardj\moz\mc-curr\toolkit\xre\nsapprunner.cpp:4385] (0x6f3c9ef6 <xul.dll+0x12b9ef6>) modid:54
#11 xul.dll!XRE_main [c:\users\sewardj\moz\mc-curr\toolkit\xre\nsapprunner.cpp:4448] (0x6f3cbc8a <xul.dll+0x12bbc8a>) modid:54
#12 do_main     [c:\users\sewardj\moz\mc-curr\browser\app\nsbrowserapp.cpp:214] (0x01081ecc <firefox.exe+0x1ecc>) modid:1
#13 NS_internal_main [c:\users\sewardj\moz\mc-curr\browser\app\nsbrowserapp.cpp:478] (0x010819ea <firefox.exe+0x19ea>) modid:1
#14 wmain       [c:\users\sewardj\moz\mc-curr\toolkit\xre\nswindowswmain.cpp:131] (0x01082192 <firefox.exe+0x2192>) modid:1
:jseward, the system you're testing on is clearly using GDI fonts, perhaps because hardware acceleration is unsupported/blocked. Could you try forcing the use of DirectWrite fonts (set gfx.font_rendering.directwrite.enabled to true) and see if the problem still reproduces, or if it's specific to the GDIFonts backend? Thanks!
Flags: needinfo?(jseward)
I enabled 2D and 3D hardware acceleration in the VM running Windows (VirtualBox)
and indeed I can no longer reproduce this.
Flags: needinfo?(jseward)
Crash Signature: [@ nsTHashtable<gfxFont::CacheHashEntry>::s_EnumStub(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*)] [@ PL_DHashTableEnumerate | gfxFontCache::AgeCachedWordsForFont(gfxFontCache::HashEntry*, void*)] → [@ nsTHashtable<gfxFont::CacheHashEntry>::s_EnumStub(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*)] [@ PL_DHashTableEnumerate | gfxFontCache::AgeCachedWordsForFont(gfxFontCache::HashEntry*, void*)] [@ nsTHashtable<T>::s_EnumStub] [@ PL_DHashTab…
Is there anything left to do here? It's been several months with no progress.
This is still disabled on release builds:

  http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.cpp#179

So yes, there's still a bug here that we'd like to fix, so that we can enable the cache expiration timer and potentially improve memory use, but so far we haven't pinned down exactly what's causing the error. There are some good clues from jseward above, for if/when someone has the cycles to dig in further.
Whiteboard: [leave open] → [leave open][gfx-noted]
Hi reporter,

All provided crash signatures are no longer present in the latest Firefox release(45.0.1) nor in the latest Nightly(48.0a1). Can you please provide some steps to reproduce this issue?

Also, is this still reproducible on your end ? If yes, can you please retest this using latest FF release and latest Nightly build (https://nightly.mozilla.org/) and report back the results ? When doing this, please use a new clean Firefox profile, maybe even safe mode, to eliminate custom settings as a possible cause (https://goo.gl/PNe90E).

Thanks,
Paul.
Flags: needinfo?(scoobidiver)
Remember that this code is currently #ifdef'd out for beta/release builds, so it can't possibly reproduce on those channels; it can only show up on nightly/aurora.

From the comments above, it looks like this was/is specific to the GDI font backend, so reproducing it may also be dependent on disabling hardware acceleration.

The signatures are likely have changed, as IIRC there have been hashtable implementation changes that would affect this, but the problem (or something like it) apparently still exists:

https://crash-stats.mozilla.com/report/index/3d201d1b-7cdb-40b6-b4a2-131eb2160328
Crash Signature: , void*)] [@ nsTHashtable<T>::s_EnumStub] [@ PL_DHashTableEnumerate | gfxFontCache::AgeCachedWordsForFont] → , void*)] [@ nsTHashtable<T>::s_EnumStub] [@ PL_DHashTableEnumerate | gfxFontCache::AgeCachedWordsForFont] [@ gfxFontCache::WordCacheExpirationTimerCallback]
Scoobidiver is not active any more.
Flags: needinfo?(scoobidiver)
We are still getting reports but I fail to find any with current versions of Firefox. In fact, the most recent version of Firefox to report this crash over the last month is Firefox 41. I suspect it's possible this moved to a different signature but I can't find evidence that this is true. For now I am closing this bug as incomplete. Please reopen it if you can either reproduce this crash in current Firefox versions or if you can find reports meeting the same criteria.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [leave open][gfx-noted] → [gfx-noted]
You need to log in before you can comment on or make changes to this bug.