Closed
Bug 1399847
Opened 7 years ago
Closed 5 years ago
Crash in IPCError-browser | ShutDownKill with FontCollectionBuilder
Categories
(Core :: Graphics, defect, P2)
Tracking
()
People
(Reporter: philipp, Assigned: aosmond)
References
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
This bug was filed from the Socorro interface and is
report bp-a21d3006-4d1e-4b15-a9f8-f070e0170914.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 dwrite.dll FontFileAnalyzer::IsPfm(unsigned __int64)
1 dwrite.dll FontFileAnalyzer::Analyze()
2 dwrite.dll FontFileAnalyzer::FontFileAnalyzer(FontFileReference const&)
3 dwrite.dll FontFileReference::GetLastWriteTime()
4 dwrite.dll FontCollectionBuilder::FontCollectionBuilder(IDWriteFactory*, void const*, unsigned int, unsigned __int64, FontLoaderManagers const&, FontCollection const&, CountedPtr<AccessToken> const&)
5 dwrite.dll FontCollectionElement::AddToCacheImpl(FontLoaderManagers const&, CacheWriter&, void const**, unsigned int*)
6 dwrite.dll CacheWriter::AddElement(FontLoaderManagers const&, IFontCacheElement&, unsigned int, unsigned int, void const**, unsigned int*, bool*)
7 dwrite.dll ClientSideCacheContext::ClientLookup(IFontCacheElement&, unsigned int, unsigned int)
8 dwrite.dll ClientSideCacheContext::InitializeElementImpl(IFontCacheElement&, unsigned int, unsigned int)
9 dwrite.dll FontCollectionElement::FontCollectionElement(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&)
10 dwrite.dll DWriteFontCollection::DWriteFontCollection(void const*, unsigned int, unsigned __int64, ClientSideCacheContext*, DWriteFactory*, FontCollection const&)
11 dwrite.dll ComObject<DWriteFontCollection>::ComObject<DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection)
12 dwrite.dll InnerComObject<DWriteFactory, DWriteFontCollection>::InnerComObject<DWriteFactory, DWriteFontCollection><unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection>(unsigned __int64*, unsigned int, unsigned __int64, IntrusivePtr<ClientSideCacheContext>, DWriteFactory*, FontCollection)
13 dwrite.dll DWriteFactory::GetSystemFontCollectionInternal(bool)
14 dwrite.dll DWriteFactory::GetSystemFontCollection(IDWriteFontCollection**, int)
15 xul.dll gfxDWriteFontList::InitFontListForPlatform() gfx/thebes/gfxDWriteFontList.cpp:899
16 xul.dll gfxPlatformFontList::InitFontList() gfx/thebes/gfxPlatformFontList.cpp:288
17 xul.dll gfxWindowsPlatform::CreatePlatformFontList() gfx/thebes/gfxWindowsPlatform.cpp:499
18 xul.dll gfxPlatformFontList::Init() gfx/thebes/gfxPlatformFontList.h:102
19 xul.dll gfxPlatform::Init() gfx/thebes/gfxPlatform.cpp:761
20 xul.dll gfxPlatform::InitChild(mozilla::gfx::ContentDeviceData const&) gfx/thebes/gfxPlatform.cpp:565
21 xul.dll mozilla::dom::ContentChild::RecvSetXPCOMProcessAttributes(mozilla::dom::XPCOMInitData const&, mozilla::dom::ipc::StructuredCloneData const&, nsTArray<LookAndFeelInt>&&) dom/ipc/ContentChild.cpp:531
22 xul.dll mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PContentChild.cpp:6920
23 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp:2092
24 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) ipc/glue/MessageChannel.cpp:2018
25 xul.dll mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) ipc/glue/MessageChannel.cpp:1887
26 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run() ipc/glue/MessageChannel.cpp:1920
27 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446
28 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:97
29 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:302
30 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319
31 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299
32 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156
33 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:278
34 xul.dll XRE_RunAppShell() toolkit/xre/nsEmbedFunctions.cpp:882
35 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:270
36 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319
37 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299
38 xul.dll XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/nsEmbedFunctions.cpp:699
39 xul.dll mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/Bootstrap.cpp:65
40 firefox.exe content_process_main(mozilla::Bootstrap*, int, char** const) ipc/contentproc/plugin-container.cpp:64
41 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115
42 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
43 kernel32.dll BaseThreadInitThunk
44 ntdll.dll __RtlUserThreadStart
45 ntdll.dll _RtlUserThreadStart
content process shutdownkill reports happening in FontCollectionBuilder are regressing in firefox 56 on windows 7, mainly on 32bit builds:
https://crash-stats.mozilla.com/signature/?product=Firefox&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs
on 56.0b they account for just a bit under 10% of the [@ IPCError-browser | ShutDownKill] signature last week...
the first occurrence on nightly was in 56.0a1 build 20170714030205 which contained two apparent font related changes: bug 1376026 & bug 1376964
whole pushlog to the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-07-13&tochange=67cd1ee26f2661fa5efe3d952485ab3c89af4271
Flags: needinfo?(milan)
Lee, Jonathan, can you take a look? There seems to be a lot of these...
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jfkthame)
Priority: -- → P1
Whiteboard: [gfx-noted]
Comment 2•7 years ago
|
||
It looks like the first patch in bug 1376026 may cause us to use DWrite on older Win7 systems where previously we'd have skipped it -- perhaps that means we're trying to use it on flakier, older systems and hence hitting more crashes?
Bug 1376964 was to do with webfont loading, so shouldn't be involved here; the stack here shows a crash happening during platform initialization, way before we've gotten to the point of fetching and instantiating webfonts.
Flags: needinfo?(jfkthame)
Comment 3•7 years ago
|
||
I good first step might be to add telemetry that measures the time that gfxDWriteFontList::InitFontListForPlatform takes.
Comment 4•7 years ago
|
||
As I understand it this is not a crash but instead just the content process not shutting down in time. It's conceivable that this could happen if you had a lot of fonts with a slow disk.
Comment 5•7 years ago
|
||
Here's some telemetry that's already there https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-09-11&keys=__none__!__none__!__none__&max_channel_version=aurora%252F56&measure=DWRITEFONT_DELAYEDINITFONTLIST_COLLECT&min_channel_version=null&os=Windows_NT%252C6.1&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-06-28&table=0&trim=1&use_submission_date=0
We should find out why we have all these content processes getting created during shutdown...
Flags: needinfo?(lsalzman)
Priority: P1 → P2
Comment 7•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #6)
> We should find out why we have all these content processes getting created
> during shutdown...
The crash reports generally seem to show a pretty short uptime (less than a minute, in most cases). Are we in the middle of restoring a session (or something like that) when the user changes their mind and closes the browser? So perhaps it's not so much that we're creating content processes during shutdown, and more like the browser is shutting down while content processes are still being created...?
Comment 8•7 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> (In reply to Milan Sreckovic [:milan] from comment #6)
> > We should find out why we have all these content processes getting created
> > during shutdown...
>
> The crash reports generally seem to show a pretty short uptime (less than a
> minute, in most cases). Are we in the middle of restoring a session (or
> something like that) when the user changes their mind and closes the
> browser? So perhaps it's not so much that we're creating content processes
> during shutdown, and more like the browser is shutting down while content
> processes are still being created...?
A conceivable course of events: Jeff noted that sometimes pauses on the order of seconds up to 30s can occur during the GetSystemFontCollection call, according to telemetry. I might guess that this occurs perhaps due to hitting the pagefile? Maybe the user is seeing the hang, panic ensues, and trigger shutdown and/or a frenzy of trying to kill locked tabs and create new ones?
Comment 9•7 years ago
|
||
Andrew, is this related to the shutdown bugs you're working on at the moment?
Flags: needinfo?(aosmond)
The timing sort of matches. It was also pointed out to me that we may be spinning up a content process on shutdown to deal with page snapshorts for activity stream or something related.
No crashes since 20170917031738 ?
Reporter | ||
Comment 12•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #11)
> No crashes since 20170917031738 ?
if i narrow down the graph from comment #0 to the nightly channel this doesn't really seem to stop:
https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&process_type=content&proto_signature=~FontCollectionBuilder%3A%3AFontCollectionBuilder&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-07-01T00%3A00%3A00.000Z#graphs
and there are also still reports from current builds like bp-c6edd743-17de-4b6a-ada4-233fb0170920
Comment 13•7 years ago
|
||
I thought this was fixed with our latest RC, but this is still getting a large amount of crash reports in 56 beta 12 and 56.0RC builds. It also still sounds like a content process issue and not something that causes a startup crash.
Milan, Andrew, any thoughts here? The difference between 55.0.3 and 56 seems so huge, do we know how this might affect users?
Assignee | ||
Comment 14•7 years ago
|
||
Hmmm, about 12% of the signatures from the 56 RC contain gfxPlatform::Init in the stack trace. There does not appear to be any single code path which dominates the reports (I identified several which have similar numbers). The similar crash I had investigated signature wise was bug 1400637 but that was 57+ only, so clearly this has a different root cause.
Andrew's currently the freshest on shutdown issues, so even though this is not quite related to work he'd have done, it makes sense to assign to him.
Assignee: nobody → aosmond
Flags: needinfo?(milan)
Priority: P2 → P1
Comment 16•7 years ago
|
||
Out of a crash signature that holds over 400K crash reports over 7 days, this crash represents about 7 percent of those.
https://crash-stats.mozilla.com/signature/?product=Firefox&process_type=content&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-09-20T21%3A16%3A49.000Z&date=%3C2017-09-27T21%3A16%3A49.000Z#summary
https://crash-stats.mozilla.com/signature/?product=Firefox&process_type=content&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2017-09-20T21%3A16%3A49.000Z&date=%3C2017-09-27T21%3A16%3A49.000Z#summary
Assignee | ||
Comment 17•7 years ago
|
||
I don't think this is any more prevalent than it was historically. I don't think we are getting stuck -- it is just a common place to be when setting up a content process if shutdown comes in soon after. (From what I can gather in the crash reports, it really only happens on x86 in significant numbers, amd64 has a much much smaller rate of occurrence.) As per bug 1375704 comment 20, we should see the number of reports go down to a more acceptable level now.
Flags: needinfo?(aosmond)
It's back...
Flags: needinfo?(aosmond)
Assignee | ||
Comment 19•7 years ago
|
||
Supposedly bug 1399796 was going to help, but this is not reflected in the crash stats. On 57.0b, gfxPlatform::Init related IPC ShutDownKill reports remain around 7%. On 58.0a, we are now less than 1%.
The dip observed in the graph appears to be luck. The one build that actually did see fewer crash reports was on a Saturday with an almost empty pushlog (and the one change was related to tests). The prediction in bug 1375704 comment 24 where we would see an uptick in reports in 57b.0 appears to be accurate...
Assignee | ||
Comment 20•7 years ago
|
||
It was suggested that bug 1385249 landing might reduce the crash rate in bug 1375704 comment 27, but it does not appear to have had any impact on nightly and it has been in for several days now.
Comment 21•7 years ago
|
||
Huge numbers here, but not a lot of user comments. On release 56, we're seeing 19791 crash reports here in the last week. I don't think we are going to make any progress here for 56; we don't know how to improve it right now, and this probably isn't a good dot release candidate.
I'm guessing the same will happen for 57.
Comment 23•7 years ago
|
||
This still appears to have a pretty high crash rate :(
status-firefox59:
--- → wontfix
status-firefox60:
--- → affected
Updated•7 years ago
|
status-firefox61:
--- → wontfix
status-firefox62:
--- → fix-optional
status-firefox-esr60:
--- → affected
Assignee | ||
Comment 24•5 years ago
|
||
Reviewing the recent crashes, while we still see FontCollectionBuilder shutdown crashes, they are a drop in the bucket compared to the general IPC browser errors. None of the big hitters do not appear to be graphics related to me.
Flags: needinfo?(aosmond)
Comment 25•5 years ago
|
||
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Updated•3 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•