Closed Bug 838105 Opened 12 years ago Closed 11 years ago

crash in gfxUserFontSet::UserFontCache::Entry::KeyEquals

Categories

(Core :: Graphics: Text, defect)

20 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox19 --- unaffected
firefox20 --- wontfix
firefox21 + fixed
firefox22 --- wontfix
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- verified
firefox26 --- verified
firefox27 --- verified
firefox-esr24 25+ verified
b2g-v1.2 --- fixed

People

(Reporter: scoobidiver, Assigned: jtd)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

It first appeared in 20.0a1/20121211 but has been discontinuous across builds since that.

Signature 	gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const) More Reports Search
UUID	48743183-0d59-46fb-8156-4b5822130202
Date Processed	2013-02-02 14:37:38
Uptime	70
Last Crash	4.8 weeks before submission
Install Age	3.0 days since version was first installed.
Install Time	2013-01-30 13:53:10
Product	Firefox
Version	21.0a1
Build ID	20130126030941
Release Channel	nightly
OS	Windows NT
OS Version	6.2.9200
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x126f8254
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x1189, AdapterSubsysID: 26783842, AdapterDriverVersion: 9.18.13.1090
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
Processor Notes 	sp-processor05.phx1.mozilla.com_27139:2008
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x1189
Total Virtual Memory	4294836224
Available Virtual Memory	3734798336
System Memory Use Percentage	14
Available Page File	36659245056
Available Physical Memory	14604042240

Frame 	Module 	Signature 	Source
0 	xul.dll 	gfxUserFontSet::UserFontCache::Entry::KeyEquals 	gfx/thebes/gfxUserFontSet.cpp:767
1 	xul.dll 	nsTHashtable<gfxUserFontSet::UserFontCache::Entry>::s_MatchEntry 	obj-firefox/dist/include/nsTHashtable.h:441
2 	xul.dll 	SearchTable 	obj-firefox/xpcom/build/pldhash.cpp:435
3 	xul.dll 	PL_DHashTableOperate 	obj-firefox/xpcom/build/pldhash.cpp:647
4 	xul.dll 	gfxUserFontSet::UserFontCache::ForgetFont 	gfx/thebes/gfxUserFontSet.cpp:796
5 	xul.dll 	gfxFontEntry::~gfxFontEntry 	gfx/thebes/gfxFont.cpp:92
6 	xul.dll 	gfxDWriteFontEntry::`vector deleting destructor' 	
7 	xul.dll 	gfxFontEntry::Release 	obj-firefox/dist/include/gfxFont.h:206
8 	xul.dll 	gfxFont::~gfxFont 	gfx/thebes/gfxFont.cpp:1495
9 	xul.dll 	gfxDWriteFont::~gfxDWriteFont 	gfx/thebes/gfxDWriteFonts.cpp:131
10 	xul.dll 	gfxDWriteFont::`vector deleting destructor' 	
11 	xul.dll 	gfxFontCache::DestroyFont 	gfx/thebes/gfxFont.cpp:1354
12 	xul.dll 	gfxFontCache::NotifyExpired 	gfx/thebes/gfxFont.cpp:1341
13 	xul.dll 	nsExpirationTracker<gfxFont,3>::AgeOneGeneration 	obj-firefox/dist/include/nsExpirationTracker.h:188
14 	xul.dll 	nsExpirationTracker<gfxFont,3>::TimerCallback 	obj-firefox/dist/include/nsExpirationTracker.h:311
15 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:482
16 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:565
17 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:627
18 	xul.dll 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:238
19 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:117
20 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:208
21 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:182
22 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
23 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:154
24 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:288
25 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3823
26 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3890
27 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4093
28 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp:185
29 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:105
30 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
31 	kernel32.dll 	BaseThreadInitThunk 	
32 	ntdll.dll 	__RtlUserThreadStart 	
33 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=gfxUserFontSet%3A%3AUserFontCache%3A%3AEntry%3A%3AKeyEquals%28gfxUserFontSet%3A%3AUserFontCache%3A%3AKey+const*+const%29
It's #26 top browser crasher in 21.0a2, #19 in 22.0a1 and a low volume crash in 20.0b4.
It's now #20 top browser crasher in 21.0a2 and #28 in 22.0a1.
It seems to have spiked in 21.0a1/20130218. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c4de6de47382&tochange=0acbd06d48a9
I suspect bug 833169.
I suspect this will be fixed by bug 856784, the stack here is in the same UserFontCache::ForgetFont spot within the gfxFont destructor.  That landed on mozcentral around the 20130403 timeframe, so builds after that time should see the number of crashes fall to 0.
There are two crashes in the trunk after the fix of bug 856784:
bp-d8ab4414-438f-4366-93ca-953f22130409
bp-5687e9c2-fa1d-4738-98a6-6d1132130406
Depends on: 856784
(In reply to Scoobidiver from comment #4)
> There are two crashes in the trunk after the fix of bug 856784:
> bp-d8ab4414-438f-4366-93ca-953f22130409
> bp-5687e9c2-fa1d-4738-98a6-6d1132130406

Drat...
Fiddling with different testcases to try and cause the crash.

Method 1: Using a window with an iframe

Steps:

1. Create several new tabs
2. Load the URL below into each tab:

http://people.mozilla.org/~jdaggett/memtesting/iteratepagesdownloadablefonts.html

Method 2: Using window.open():

Steps:

1. Options > Tabs > disable "Don't load tabs until clicked"
2. Click on one of the testcases below, then allow popups.  Reload.

http://people.mozilla.org/~jdaggett/tests/bug838105-loadtests.html
http://people.mozilla.org/~jdaggett/tests/bug838105-loadtests-try2.html

I was able to get the crash once with method (1) after 20 minutes but I can't seem to reproduce it now, argh...

https://crash-stats.mozilla.com/report/index/bp-bae80ecd-16fd-4e75-a8ea-a274a2130411

Windows 7 JA, DirectWrite

Crashed on:
http://tvg.globo.com/novelas/guerra-dos-sexos/capitulo/2013/4/19/nando-ve-roberta-e-felipe-se-beijando.html
Keywords: testcase-wanted
Reproduced under Win7, using the first URL of method 2 above.  Crash occurs after two hours (!?!):

https://crash-stats.mozilla.com/report/index/bp-3100ef17-3461-46a1-b0e4-53a7e2130411
(In reply to John Daggett (:jtd) from comment #7)
> Crash occurs after two hours (!?!):
It's consistent with the uptime range:
Uptime Range 	Percentage 	Number Of Crashes
1 hour  	56.659 %	485
15-60 min 	28.738 %	246
5-15 min 	11.215 %	96
1-5 min 	3.271 %	        28
< 1 min 	0.117 %	        1
It's #217 crasher in 20.0.1, #19 in 21.0b3 (no duplicates), and #298 in 22.0a2.

If bug 833169 was the cause (see comment 2), it would have been fixed in 22.0 and above by the patch of bug 847272, which should be uplifted to Beta.
Keywords: topcrash
John, comment #9 suggests Bug 847272 may be the right fix here. Given we only have two beta's left how risky is the patch in case we decided to uplift or what are our alternatives here ?

My read is this has been a high volume crasher throughout the Fx21 cycle and so I am worried about causing a new regression so late in the cycle due to uplift.
Added approval-beta request to the patch on bug 847272.
At this point in FF21 we need to have assignees for tracked bugs, given this might be fixed by John's bug 847272, assigning to him at least for now, until we know more.
Assignee: nobody → jdaggett
Keywords: verifyme
I've been running all three test URLs (comment 6) in Firefox 20.0, 21.0b4, 22.0a2 (latest), and 23.0a1 (latest) on Windows 7 for almost four hours now and have not seen any crashes. Unfortunately it doesn't look like the changes in bug 847272 made this any more reproducible, at least internally.
Keywords: verifyme
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #13)
> I've been running all three test URLs (comment 6) in Firefox 20.0, 21.0b4,
> 22.0a2 (latest), and 23.0a1 (latest) on Windows 7 for almost four hours now
> and have not seen any crashes. Unfortunately it doesn't look like the
> changes in bug 847272 made this any more reproducible, at least internally.

Uplift of bug 847272 will infact make the crashes go away or atleast take longer to crash which is why we uplifted that patch on mozilla-beta.

Based on https://bugzilla.mozilla.org/show_bug.cgi?id=847272#c13, i think it will be good to give 21.0b3/b4 a shot to see if we crash else we could check with John & just fallback to crash-stats to hope the volume reduces in beta 6.

note : You may only be able to reproduce this Fx21 <= beta 5 as bug 847272 is already fixed on aurora/nightly hence the low volume on those channels..
(In reply to bhavana bajaj [:bajaj] from comment #14)
> You may only be able to reproduce this Fx21 <= beta 5 as bug 847272
> is already fixed on aurora/nightly hence the low volume on those channels..

One of the points I was trying to make in comment 13 was that I did test this for four hours with "Fx21 <= beta 5" (ie. 21.0b4). To somewhat verify this is fixed QA needs to be able to do an A-B comparison and if we can't reproduce we can't do that comparison. I can run those scripts again and for longer but I'm not sure that really proves anything.
It's a low volume crash in 21.0b6 (8 crashes so far) after the uplift of bug 847272.
Depends on: 847272
Keywords: topcrash
I think we should keep this bug open for now, as I think there's still some underlying bug in the code that just happened to be tickled more after the family name field was added to gfxFontEntry and before fixing bug 847272.  It's still occurring and it occurred *before* v21 so I still think we need to figure out the scenario which induces the problem.
Crash Signature: [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] → [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const*) const ]
OS: Windows 7 → All
Crash Signature: [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const*) const ] → [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const*) const]
Crash Signature: [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const*) const] → [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const* const)] [@ gfxUserFontSet::UserFontCache::Entry::KeyEquals(gfxUserFontSet::UserFontCache::Key const*) const] [@ nsTArray_Impl<gfxFontFeature, nsTArrayInfallible…
Crash Signature: , nsTArrayInfallibleAllocator>::operator==<nsTArrayInfallibleAllocator>(nsTArray_Impl<gfxFontFeature, nsTArrayInfallibleAllocator> const&)] → , nsTArrayInfallibleAllocator>::operator==<nsTArrayInfallibleAllocator>(nsTArray_Impl<gfxFontFeature, nsTArrayInfallibleAllocator> const&)] [@ nsAString_internal::Equals(nsAString_internal const&)]
Don’t remember seeing this before (certainly not in June/July) but I have seen it twice in Nightly in the past 36 hours.
bp-4b43904c-b556-4b90-8724-32e0c2130731
bp-f805014e-69d3-40b9-befd-070642130801

Assume it’s irrelevant, but I have Jonathan Kew’s fontinfo extension installed (and always have done).
https://addons.mozilla.org/en-US/firefox/addon/fontinfo/
Ioana, can you or someone on your team please see if installing the Font Info add-on makes this any more reproducible?
QA Contact: ioana.budnar
I've installed the Font Info add-on and tried with both methods from comment 6 for more than 3 hours. Also I've browsed and used the addon in the meantime but could not crash FF. 
I've only obtained a small hang(for about 2-3 seconds) when 12 websites were opened.

Seems like the Font Info add-on doesn't make this more reproducible.

Tested with Latest Nightly (build ID:20130801030223) and with the 23 RC build (build ID: 20130730113002) on Windows 7.

If there is anything else we can help with, please let us know.
QA Contact: ioana.budnar
http://arstechnica.com/ often triggers the crash for me. I don't have that addon installed. The crash is delayed by a few minutes.
(In reply to Greg Edwards from comment #21)
> http://arstechnica.com/ often triggers the crash for me. I don't have that
> addon installed. The crash is delayed by a few minutes.

Greg, thanks for reporting this.  How often do you see this crash?  Once per day? Or more often than that?  Could you attach a list of crash report id's?  Just copy 'about:crashes' and add it as a comment.  After the crash happens it would be useful to see the list of sites visited, if you don't mind providing that.  To do this, bring up the full history and copy that in here also.  Thanks very much for this, it would be great to figure out a repeatable way to reproduce this!!
bp-c91b38a5-0f7b-43aa-a8b6-ff3332130724	24/07/2013	10:23 AM
bp-66d8a7ea-bc67-4553-89b6-9dc632130723	23/07/2013	3:12 PM

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130731 Firefox/24.0

I can provide about:support if you want it. However I've had Ars open for a while now and no crash, so this may not be reproducing for me that often after all.
bp-f080c686-94ad-4781-8b15-1bc112130807

I just hit this crash on arstechnica, reading http://arstechnica.com/tech-policy/2013/08/federal-judge-bitcoin-a-currency-can-be-regulated-under-american-law/

I usually read AT at least twice a day, and haven't hit this crash before.  Then again, I did just upgrade to FF23.
If it means anything, I had about 5 Ars tabs open at the time.
I had one very similar to Greg today (I'm on Firefox 22.0):

bp-5f4abcbf-0e5f-43ad-9b0d-a2dec2130813 08/13/13 13:15

I probably had 5-10 ArsTechnica tabs open at the time.  I check up on Ars almost every day at lunch, but this was the first time Firefox has crashed while doing so.
I tried playing around with several different arstechnica tabs loaded and wasn't able to crash. John Daggett, is there any information any of the previous commenters can provide which might help you diagnose this?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #28)
> I tried playing around with several different arstechnica tabs loaded and
> wasn't able to crash. John Daggett, is there any information any of the
> previous commenters can provide which might help you diagnose this?

No, this is good for now, this gives me enough to fiddle a bit more with loading ArsTechnica pages to try and understand the timing of font loads that seems to cause the crashes.  Jonathan Kew has also been tweaking the font load code so that may also reduce the chance of this crash occurring.
I had ArsTechnica load in a background tab and was using another application on a different monitor when all the sudden Firefox 23 crashed on my Lenova T430 laptop (x64) running Windows 7 SP1.  I wasn't even interacting with Firefox let alone the ArsTechnica tab.

bp-79e2e854-2a75-4d83-a8ec-9036a2130815  8/15/2013  9:49 AM
I hit this crash again, this time on my home computer.  I had one background AT page open, and I was using my mouse's scroll wheel to scroll down on the main page.
I saw this crash today for the first time. It's funny: I read an ArsTechnica article some seconds (or minutes?) ago... the ArsTechnica tab was closed... I was doing something else (not within Firefox) when the crash came up:

bp-442bab23-93da-409a-bc22-5eb8c2130819
Got this crash again today, and some more info for you guys.

bp-5c8b5d28-a87e-4383-9947-7d8952130822 08/22/13 15:57

I had the following AT tabs open:
http://arstechnica.com/
http://arstechnica.com/tech-policy/2013/08/deciphering-the-tricks-of-the-twitter-spammers/

I'm not sure if this next part is significant, but when I restarted and tried to access AT again, the CSS for the site wouldn't load.  I'll attach the Net tab log from Firebug just in case.
Steps to reproduce:

1. Open pages in comment 35.
2. Go to the first tab (http://arstechnica.com), Then press the right button and choose "Close Other Tabs".
3. Open a new tab, then go to "about:compartments".
4. Close the first tab (http://arstechnica.com).

See http://s449.photobucket.com/user/movh/media/mh8sg96_zpsb6108172.mp4.html

Crash Report: bp-99e910d7-5a38-464f-91f0-a25972130910
This happen since the version 20.0a1 (2012-12-11).

http://hg.mozilla.org/mozilla-central/rev/4dfe323a663d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20121211 Firefox/20.0
Build ID: 20121211030855


Regressed by: bug 816483
(In reply to blinky from comment #36)

Confirmed, can reproduce at will.  Nice work, I am well impressed.
Fantastic work guys!  I also can reproduce the crash now on Windows with the testcase steps as described above in the nightly build.

Fiddling around with some logging on OSX, I can now reproduce it there.  It's clearly a use-after-deletion problem:

MacOSFontEntry const: 0x175803200
UserFontCache::Entry::KeyEquals mFontEntry: 0x175803200 fe: 0x16ef14900
UserFontCache::Entry::KeyEquals mFontEntry: 0x175803200 fe: 0x1754f1e20
UserFontCache::Entry::KeyEquals mFontEntry: 0x175803200 fe: 0x17c204da0
gfxFontEntry destr: 0x175803200
UserFontCache::ForgetFont aFontEntry: 0x175803200
UserFontCache::Entry::KeyEquals mFontEntry: 0x114d051c0 fe: 0x175803200
UserFontCache::Entry::KeyEquals mFontEntry: 0x175803200 fe: 0x114d051c0
*blam*

Now to figure out why the UserFontCache entry doesn't get cleared out within ~gfxFontEntry...
The immediate cause of the use-after-deletion is that within the gfxFontEntry destructor, the call to ForgetFont relies on being able to look up the font entry in the user font cache using the key (i.e. url, principal, weight/width/style props).  The problem is that the principal changes state between the time it's initially inserted and when it's looked up with ForgetFont).

Sorted list of dumped cache contents over time:

fontEntry: 0x1125e8c00 fonturihash: 681c65d7 family: NoticiaBold domainset: false principal: [http://arstechnica.com/gaming/2013/08/nintendo-lowering-price-of-32gb-deluxe-wii-u-to-299-on-september-20/] 
fontEntry: 0x1125e8c00 fonturihash: 681c65d7 family: NoticiaBold domainset: true principal: [http://arstechnica.com/gaming/2013/08/nintendo-lowering-price-of-32gb-deluxe-wii-u-to-299-on-september-20/] 
fontEntry: 0x16e15ba40 fonturihash: 317a78be family: BebasNeue domainset: false principal: [http://arstechnica.com/gaming/2013/08/microsoft-refutes-conspiracies-about-xbox-one-drm-bait-and-switch/] 
fontEntry: 0x16e15ba40 fonturihash: 317a78be family: BebasNeue domainset: true principal: [http://arstechnica.com/gaming/2013/08/microsoft-refutes-conspiracies-about-xbox-one-drm-bait-and-switch/] 
fontEntry: 0x16e3cab80 fonturihash: 317a78be family: BebasNeue domainset: false principal: [http://arstechnica.com/gaming/2013/08/nintendo-lowering-price-of-32gb-deluxe-wii-u-to-299-on-september-20/] 
fontEntry: 0x16e3cab80 fonturihash: 317a78be family: BebasNeue domainset: true principal: [http://arstechnica.com/gaming/2013/08/nintendo-lowering-price-of-32gb-deluxe-wii-u-to-299-on-september-20/] 
fontEntry: 0x16ebdf4c0 fonturihash: 3a31f5ca family: NoticiaBoldItalic domainset: true principal: [http://arstechnica.com/gaming/2013/08/nintendo-lowering-price-of-32gb-deluxe-wii-u-to-299-on-september-20/] 
fontEntry: 0x16efaa3c0 fonturihash: 681c65d7 family: NoticiaBold domainset: false principal: [http://arstechnica.com/gaming/2013/08/microsoft-refutes-conspiracies-about-xbox-one-drm-bait-and-switch/] 
fontEntry: 0x16efaa3c0 fonturihash: 681c65d7 family: NoticiaBold domainset: true principal: [http://arstechnica.com/gaming/2013/08/microsoft-refutes-conspiracies-about-xbox-one-drm-bait-and-switch/] 
fontEntry: 0x16efaa780 fonturihash: 317a78be family: BebasNeue domainset: false principal: [http://arstechnica.com/] 
fontEntry: 0x16efaa780 fonturihash: 317a78be family: BebasNeue domainset: true principal: [http://arstechnica.com/] 
fontEntry: 0x16efaa900 fonturihash: 3a31f5ca family: NoticiaBoldItalic domainset: false principal: [http://arstechnica.com/] 
fontEntry: 0x16efaa900 fonturihash: 3a31f5ca family: NoticiaBoldItalic domainset: true principal: [http://arstechnica.com/] 
fontEntry: 0x16efaab40 fonturihash: 681c65d7 family: NoticiaBold domainset: true principal: [http://arstechnica.com/] 
fontEntry: 0x16f460840 fonturihash: 3a31f5ca family: NoticiaBoldItalic domainset: true principal: [http://arstechnica.com/gaming/2013/08/microsoft-refutes-conspiracies-about-xbox-one-drm-bait-and-switch/] 
fontEntry: 0x173fd78c0 fonturihash: 317a78be family: BebasNeue domainset: false principal: [http://arstechnica.com/security/2013/08/twitter-and-new-york-times-clash-with-hackers-for-control-of-their-sites/] 
fontEntry: 0x173fd8340 fonturihash: 681c65d7 family: NoticiaBold domainset: false principal: [http://arstechnica.com/security/2013/08/twitter-and-new-york-times-clash-with-hackers-for-control-of-their-sites/] 

Note how the domainset is different for the same font entry over time.  This will cause the key lookups to not find the right font entry.  So ForgetFont won't remove the right font entry and hash table lookups later on will crash when comparing entries that reference the deleted font entry.

Jonas, wondering if you have insight on what the "lifecycle" of principals are.  Under what conditions do they change state such that the result of the Equals operation will change?
As I understand it, the principal will be modified if the document explicitly sets document.domain at some point. This could happen at any time, before or after font loading, so it means we can't trust the document's principal to remain constant.

So a possible fix here would be to rewrite ForgetFont such that instead of looking up the entry by its key, it enumerates the entries in the hashtable looking for the given font entry. That would be slightly less performant than key lookup, I suppose, but in practice the number of cached font entries isn't likely to be huge.
John, if you're able to reproduce this crash consistently, would you like to try this patch and confirm whether it resolves the problem?
I also created tryserver builds with this patch; if those who can reliably reproduce the problem would like to try these and report whether the crash is fixed, that would be really helpful, thanks.

For builds, see
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jkew@mozilla.com-e03474dc3f9a/
(In reply to Jonathan Kew (:jfkthame) from comment #43)
> I also created tryserver builds with this patch; if those who can reliably
> reproduce the problem would like to try these and report whether the crash
> is fixed, that would be really helpful, thanks.
> 
> For builds, see
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jkew@mozilla.com-
> e03474dc3f9a/

I can not reproduce the problem with this build.
Add debug code for dumping out the contents of the userfont cache at insertion/deletion.
Attachment #807010 - Flags: review?(birtles)
Comment on attachment 807010 [details] [diff] [review]
patch, add debug logging for userfont cache

Review of attachment 807010 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with those comments addressed.

::: gfx/thebes/gfxUserFontSet.cpp
@@ +968,5 @@
> +    nsCOMPtr<nsIURI> prinURI;
> +    rv = aEntry->mPrincipal->GetURI(getter_AddRefs(prinURI));
> +    if (NS_SUCCEEDED(rv)) {
> +        prinURI->GetSpec(principalURI);
> +    }

This naming is hard to follow. 'prinURI' is the nsIURI but 'principalURI' is the string and defined in a different place? 'principalURI', 'principalURISpec' ? While you're at it could you make fontURI follow whatever pattern you decide for this.

@@ +984,5 @@
> +            nsURIHashKey::HashKey(aEntry->mURI),
> +            NS_ConvertUTF16toUTF8(aEntry->mFontEntry->FamilyName()).get(),
> +            (setDomain ? "true" : "false"),
> +            principalURI.get()
> +           );

Can mURI or mFontEntry ever be null? nsURIHashKey::HashKey appears to dereference its argument without checking.
Attachment #807010 - Flags: review?(birtles) → review+
Blocks: 918171
Comment on attachment 806560 [details] [diff] [review]
don't rely on key lookup to remove items from the user font cache, as the principal could have changed

Yeah, I confirmed that this fixes the problem.  I think the better solution is to remove the principal from the key and check the cache *after* the font data load completes but this patch is a good, small fix to the problem.  We can land this on aurora/beta also.

So I've opened another bug, bug 918171, to implement the efficiency improvement.
Attachment #806560 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/2d4f2df53563
https://hg.mozilla.org/mozilla-central/rev/aaa20750f69d
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Comment on attachment 806560 [details] [diff] [review]
don't rely on key lookup to remove items from the user font cache, as the principal could have changed

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 816483

User impact if declined: sporadic crashes (due to possible use-after-free) after visiting sites with webfonts, if the page subsequently sets document.domain

Testing completed (on m-c, etc.): currently on m-c; crash fix confirmed on tryserver build (comment 44, 47)

Risk to taking this patch (and alternatives if risky): minimal risk, simple and well-understood change to eliminate reliance on unchanging principal

String or IDL/UUID changes made by this patch: none
Attachment #806560 - Flags: approval-mozilla-beta?
Attachment #806560 - Flags: approval-mozilla-aurora?
Attachment #806560 - Flags: approval-mozilla-beta?
Attachment #806560 - Flags: approval-mozilla-beta+
Attachment #806560 - Flags: approval-mozilla-aurora?
Attachment #806560 - Flags: approval-mozilla-aurora+
We'll keep an eye on crash stats to backup comments that this fixes the issue.
Keywords: verifyme
Mozilla/5.0 (Windows NT 6.2; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (X11; Linux i686; rv:25.0) Gecko/20100101 Firefox/25.0

This did not reproduced anymore on Firefox 25 beta 2 (build ID: 20130923194050) using the steps from comment 35 and comment 36, for several attempts.
Will monitor the crash stats for a few days before setting the flag for firefox 25 to verified.
This appears fixed. There are no longer crashes with builds after 20130919030202 in 26 or 27. Based on that and comment #53 testing results in 25b2,  marking verified.
Comment on attachment 806560 [details] [diff] [review]
don't rely on key lookup to remove items from the user font cache, as the principal could have changed

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:

This fixes a sporadic crash (not reliably reproducible, but frequent enough to be noticed in crash-stats) related to use of webfonts on sites that also manipulate document.domain with JS.

Although I suspect the flaw is likely too unpredictable to be much use as a specific attack vector, it's clearly a browser stability issue.

User impact if declined: sporadic, hard-to-predict crashiness

Fix Landed on Version: landed for FF27, backported to 25 and 26

Risk to taking this patch (and alternatives if risky): minimal risk, localized simplification of code to avoid relying on the unsafe assumption that the principal is immutable

String or UUID changes made by this patch: none
Attachment #806560 - Flags: approval-mozilla-esr24?
Comment on attachment 806560 [details] [diff] [review]
don't rely on key lookup to remove items from the user font cache, as the principal could have changed

Approving on esr, given this is stability fix and low risk
Attachment #806560 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
Hi Cornel, can you verify this bug fixed on 24esr? Thank you.
Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Windows NT 6.2; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0

I have tried several times on 24 ESR (build ID: 20131021230807) using the STR from comment 35 and comment 36 - issue did not reproduce.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: