Closed Bug 1399847 Opened 5 years ago Closed 3 years ago

Crash in IPCError-browser | ShutDownKill with FontCollectionBuilder

Categories

(Core :: Graphics, defect, P2)

55 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox55 --- wontfix
firefox56 + wontfix
firefox57 + wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix

People

(Reporter: philipp, Assigned: aosmond)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-a21d3006-4d1e-4b15-a9f8-f070e0170914.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	dwrite.dll 	FontFileAnalyzer::IsPfm(unsigned __int64) 	
1 	dwrite.dll 	FontFileAnalyzer::Analyze() 	
2 	dwrite.dll 	FontFileAnalyzer::FontFileAnalyzer(FontFileReference const&) 	
3 	dwrite.dll 	FontFileReference::GetLastWriteTime() 	
4 	dwrite.dll 	FontCollectionBuilder::FontCollectionBuilder(IDWriteFactory*, void const*, unsigned int, unsigned __int64, FontLoaderManagers const&, FontCollection const&, CountedPtr<AccessToken> const&) 	
5 	dwrite.dll 	FontCollectionElement::AddToCacheImpl(FontLoaderManagers const&, CacheWriter&, void const**, unsigned int*) 	
6 	dwrite.dll 	CacheWriter::AddElement(FontLoaderManagers const&, IFontCacheElement&, unsigned int, unsigned int, void const**, unsigned int*, bool*) 	
7 	dwrite.dll 	ClientSideCacheContext::ClientLookup(IFontCacheElement&, unsigned int, unsigned int) 	
8 	dwrite.dll 	ClientSideCacheContext::InitializeElementImpl(IFontCacheElement&, unsigned int, unsigned int) 	
9 	dwrite.dll 	FontCollectionElement::FontCollectionElement(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&) 	
10 	dwrite.dll 	DWriteFontCollection::DWriteFontCollection(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&) 	
11 	dwrite.dll 	ComObject<DWriteFontCollection>::ComObject<DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection) 	
12 	dwrite.dll 	InnerComObject<DWriteFactory, DWriteFontCollection>::InnerComObject<DWriteFactory, DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection) 	
13 	dwrite.dll 	DWriteFactory::GetSystemFontCollectionInternal(bool) 	
14 	dwrite.dll 	DWriteFactory::GetSystemFontCollection(IDWriteFontCollection**, int) 	
15 	xul.dll 	gfxDWriteFontList::InitFontListForPlatform() 	gfx/thebes/gfxDWriteFontList.cpp:899
16 	xul.dll 	gfxPlatformFontList::InitFontList() 	gfx/thebes/gfxPlatformFontList.cpp:288
17 	xul.dll 	gfxWindowsPlatform::CreatePlatformFontList() 	gfx/thebes/gfxWindowsPlatform.cpp:499
18 	xul.dll 	gfxPlatformFontList::Init() 	gfx/thebes/gfxPlatformFontList.h:102
19 	xul.dll 	gfxPlatform::Init() 	gfx/thebes/gfxPlatform.cpp:761
20 	xul.dll 	gfxPlatform::InitChild(mozilla::gfx::ContentDeviceData const&) 	gfx/thebes/gfxPlatform.cpp:565
21 	xul.dll 	mozilla::dom::ContentChild::RecvSetXPCOMProcessAttributes(mozilla::dom::XPCOMInitData const&, mozilla::dom::ipc::StructuredCloneData const&, nsTArray<LookAndFeelInt>&&) 	dom/ipc/ContentChild.cpp:531
22 	xul.dll 	mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PContentChild.cpp:6920
23 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:2092
24 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:2018
25 	xul.dll 	mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) 	ipc/glue/MessageChannel.cpp:1887
26 	xul.dll 	mozilla::ipc::MessageChannel::MessageTask::Run() 	ipc/glue/MessageChannel.cpp:1920
27 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1446
28 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:97
29 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:302
30 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
31 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
32 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
33 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:278
34 	xul.dll 	XRE_RunAppShell() 	toolkit/xre/nsEmbedFunctions.cpp:882
35 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:270
36 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
37 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
38 	xul.dll 	XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/nsEmbedFunctions.cpp:699
39 	xul.dll 	mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/Bootstrap.cpp:65
40 	firefox.exe 	content_process_main(mozilla::Bootstrap*, int, char** const) 	ipc/contentproc/plugin-container.cpp:64
41 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
42 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
43 	kernel32.dll 	BaseThreadInitThunk 	
44 	ntdll.dll 	__RtlUserThreadStart 	
45 	ntdll.dll 	_RtlUserThreadStart

content process shutdownkill reports happening in FontCollectionBuilder are regressing in firefox 56 on windows 7, mainly on 32bit builds:
https://crash-stats.mozilla.com/signature/?product=Firefox&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs
on 56.0b they account for just a bit under 10% of the [@ IPCError-browser | ShutDownKill] signature last week...

the first occurrence on nightly was in 56.0a1 build 20170714030205 which contained two apparent font related changes: bug 1376026 & bug 1376964
whole pushlog to the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-07-13&tochange=67cd1ee26f2661fa5efe3d952485ab3c89af4271
Flags: needinfo?(milan)
Lee, Jonathan, can you take a look?  There seems to be a lot of these...
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jfkthame)
Priority: -- → P1
Whiteboard: [gfx-noted]
It looks like the first patch in bug 1376026 may cause us to use DWrite on older Win7 systems where previously we'd have skipped it -- perhaps that means we're trying to use it on flakier, older systems and hence hitting more crashes?

Bug 1376964 was to do with webfont loading, so shouldn't be involved here; the stack here shows a crash happening during platform initialization, way before we've gotten to the point of fetching and instantiating webfonts.
Flags: needinfo?(jfkthame)
I good first step might be to add telemetry that measures the time that gfxDWriteFontList::InitFontListForPlatform takes.
As I understand it this is not a crash but instead just the content process not shutting down in time. It's conceivable that this could happen if you had a lot of fonts with a slow disk.
We should find out why we have all these content processes getting created during shutdown...
Flags: needinfo?(lsalzman)
Priority: P1 → P2
(In reply to Milan Sreckovic [:milan] from comment #6)
> We should find out why we have all these content processes getting created
> during shutdown...

The crash reports generally seem to show a pretty short uptime (less than a minute, in most cases). Are we in the middle of restoring a session (or something like that) when the user changes their mind and closes the browser? So perhaps it's not so much that we're creating content processes during shutdown, and more like the browser is shutting down while content processes are still being created...?
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> (In reply to Milan Sreckovic [:milan] from comment #6)
> > We should find out why we have all these content processes getting created
> > during shutdown...
> 
> The crash reports generally seem to show a pretty short uptime (less than a
> minute, in most cases). Are we in the middle of restoring a session (or
> something like that) when the user changes their mind and closes the
> browser? So perhaps it's not so much that we're creating content processes
> during shutdown, and more like the browser is shutting down while content
> processes are still being created...?

A conceivable course of events: Jeff noted that sometimes pauses on the order of seconds up to 30s can occur during the GetSystemFontCollection call, according to telemetry. I might guess that this occurs perhaps due to hitting the pagefile? Maybe the user is seeing the hang, panic ensues, and trigger shutdown and/or a frenzy of trying to kill locked tabs and create new ones?
Andrew, is this related to the shutdown bugs you're working on at the moment?
Flags: needinfo?(aosmond)
The timing sort of matches.  It was also pointed out to me that we may be spinning up a content process on shutdown to deal with page snapshorts for activity stream or something related.
No crashes since 20170917031738 ?
See Also: → 1397214
(In reply to Milan Sreckovic [:milan] from comment #11)
> No crashes since 20170917031738 ?
if i narrow down the graph from comment #0 to the nightly channel this doesn't really seem to stop:
https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs
and there are also still reports from current builds like bp-c6edd743-17de-4b6a-ada4-233fb0170920
I thought this was fixed with our latest RC, but this is still getting a large amount of crash reports in 56 beta 12 and 56.0RC builds. It also still sounds like a content process issue and not something that causes a startup crash.     

Milan, Andrew, any thoughts here?  The difference between 55.0.3 and 56 seems so huge, do we know how this might affect users?
Flags: needinfo?(milan)
Hmmm, about 12% of the signatures from the 56 RC contain gfxPlatform::Init in the stack trace. There does not appear to be any single code path which dominates the reports (I identified several which have similar numbers). The similar crash I had investigated signature wise was bug 1400637 but that was 57+ only, so clearly this has a different root cause.
Andrew's currently the freshest on shutdown issues, so even though this is not quite related to work he'd have done, it makes sense to assign to him.
Assignee: nobody → aosmond
Flags: needinfo?(milan)
Priority: P2 → P1
I don't think this is any more prevalent than it was historically. I don't think we are getting stuck -- it is just a common place to be when setting up a content process if shutdown comes in soon after. (From what I can gather in the crash reports, it really only happens on x86 in significant numbers, amd64 has a much much smaller rate of occurrence.) As per bug 1375704 comment 20, we should see the number of reports go down to a more acceptable level now.
Flags: needinfo?(aosmond)
It's back...
Flags: needinfo?(aosmond)
Supposedly bug 1399796 was going to help, but this is not reflected in the crash stats. On 57.0b, gfxPlatform::Init related IPC ShutDownKill reports remain around 7%. On 58.0a, we are now less than 1%.

The dip observed in the graph appears to be luck. The one build that actually did see fewer crash reports was on a Saturday with an almost empty pushlog (and the one change was related to tests). The prediction in bug 1375704 comment 24 where we would see an uptick in reports in 57b.0 appears to be accurate...
It was suggested that bug 1385249 landing might reduce the crash rate in bug 1375704 comment 27, but it does not appear to have had any impact on nightly and it has been in for several days now.
Depends on: 1406442
Huge numbers here, but not a lot of user comments. On release 56, we're seeing 19791 crash reports here in the last week.    I don't think we are going to make any progress here for 56; we don't know how to improve it right now, and this probably isn't a good dot release candidate.
I'm guessing the same will happen for 57.
Priority: P1 → P2
Version: 56 Branch → 55 Branch
This still appears to have a pretty high crash rate :(

Reviewing the recent crashes, while we still see FontCollectionBuilder shutdown crashes, they are a drop in the bucket compared to the general IPC browser errors. None of the big hitters do not appear to be graphics related to me.

Flags: needinfo?(aosmond)

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.