Closed
Bug 705287
Opened 13 years ago
Closed 6 years ago
Hang in gfxDWriteFontList::DelayedInitFontList
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: scoobidiver, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: hang, Whiteboard: [Snappy:P1])
Crash Data
It's a new crash signature and is #73 top crasher in today's build.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3c8147998124&tochange=de483d897af4
The likely culprit is bug 627842.
Signature chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS*, PhysicalFontFace const*)
UUID d666d9b5-6687-435c-b471-ec5662111125
Date Processed 2011-11-25 04:55:53.367164
Uptime 51
Install Age 20.6 hours since version was first installed.
Install Time 2011-11-24 16:16:16
Product Firefox
Version 11.0a1
Build ID 20111124031031
Release Channel nightly
OS Windows NT
OS Version 6.1.7601 Service Pack 1
Build Architecture x86
Build Architecture Info GenuineIntel family 6 model 23 stepping 6
Crash Reason EXCEPTION_BREAKPOINT
Crash Address 0x6e3e193d
User Comments
App Notes AdapterVendorID: 10de, AdapterDeviceID: 0640, AdapterSubsysID: 00000000, AdapterDriverVersion: 8.17.12.6099
D2D? D2D+
DWrite? DWrite+
xpcom_runtime_abort(###!!! ABORT: HangMonitor triggered: file e:/builds/moz2_slave/m-cen-w32-ntly/build/xpcom/threads/HangMonitor.cpp, line 111)
Processor Notes
EMCheckCompatibility True
Thread 0
Frame Module Signature [Expand] Source
0 DWrite.dll InitializeFontMetrics
1 DWrite.dll FontCollectionBuilder::Builder::AddFontInternal
2 DWrite.dll FontCollectionBuilder::Builder::AddFont
3 DWrite.dll FontCollectionBuilder::Builder::AddFile
4 DWrite.dll FontCollectionBuilder::FontCollectionBuilder
5 DWrite.dll FontCollectionElement::AddToCacheImpl
6 DWrite.dll CacheWriter::AddElement
7 DWrite.dll ClientSideCacheContext::ClientLookup
8 DWrite.dll ClientSideCacheContext::InitializeElementImpl
9 DWrite.dll FontCollectionElement::FontCollectionElement
10 DWrite.dll DWriteFontCollection::DWriteFontCollection
11 DWrite.dll ComObject<DWriteFontCollection>::ComObject<DWriteFontCollection><unsigned __int64*,unsigned int,unsigned __int64,IntrusivePtr<ClientSideCacheContext>,DWriteFactory*,FontCollection>
12 DWrite.dll InnerComObject<DWriteFactory,DWriteFontCollection>::InnerComObject<DWriteFactory,DWriteFontCollection><unsigned __int64*,unsigned int,unsigned __int64,IntrusivePtr<ClientSideCacheContext>,DWriteFactory*,FontCollection>
13 DWrite.dll DWriteFactory::GetSystemFontCollectionInternal
14 DWrite.dll DWriteFactory::GetSystemFontCollection
15 xul.dll gfxDWriteFontList::DelayedInitFontList gfx/thebes/gfxDWriteFontList.cpp:764
16 xul.dll gfxDWriteFontList::ResolveFontName gfx/thebes/gfxDWriteFontList.cpp:1145
17 xul.dll gfxWindowsPlatform::ResolveFontName gfx/thebes/gfxWindowsPlatform.cpp:591
18 xul.dll gfxFontGroup::ForEachFontInternal gfx/thebes/gfxFont.cpp:2373
19 @0xe
20 @0x65006c
More reports at:
https://crash-stats.mozilla.com/report/list?signature=chromehang%20|%20InitializeFontMetrics%28OpenTypeFace%20const%26%2C%20DWRITE_FONT_METRICS*%2C%20PhysicalFontFace%20const*%29
Reporter | ||
Updated•13 years ago
|
Crash Signature: [@ chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS*, PhysicalFontFace const*) ] → [@ chromehang | InitializeFontMetrics(OpenTypeFace const&, DWRITE_FONT_METRICS* PhysicalFontFace const*) ]
[@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ]
[@ chromehang | FontFileAnalyzer::IsPfm(unsigned…
Comment 1•13 years ago
|
||
This is probably exactly the same issue as bug 705258.
Summary: Crash in gfxDWriteFontList::DelayedInitFontList → Hang in gfxDWriteFontList::DelayedInitFontList
(In reply to Scoobidiver from comment #0)
> It's a new crash signature and is #73 top crasher in today's build.
>
> The regression range is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=3c8147998124&tochange=de483d897af4
> The likely culprit is bug 627842.
To be clear: "chromehang | " is a new type of crash report, introduced by bug 429592, which landed in that range.
Comment 3•13 years ago
|
||
Some of the stacks in the dumps are incomplete but most of the stacks take place within the call to DWriteFactory::GetSystemFontCollection, the DirectWrite call that returns the object needed to look up names.
DirectWrite dll version:
Module|DWrite.dll|6.1.7600.16385
Module|DWrite.dll|6.1.7600.16385
Module|DWrite.dll|6.1.7600.16385
Module|DWrite.dll|6.1.7600.16385
Module|DWrite.dll|6.1.7600.16385
Module|DWrite.dll|6.1.7601.17514
Module|DWrite.dll|6.1.7601.17514
Module|DWrite.dll|6.1.7601.17563
Module|DWrite.dll|6.1.7601.17563
Module|DWrite.dll|6.1.7601.17563
Module|DWrite.dll|6.1.7601.17563
Module|DWrite.dll|6.1.7601.17563
Module|DWrite.dll|7.0.6002.18409
The delay problem in GetSystemFontCollection was fixed by Microsoft in KB2505438, which included DWrite.dll versions 6.1.7600.16763 and higher (non-SP1) and 6.1.7601.17563 and higher (SP1). Windows 8 should't have this problem at all.
This data seems to indicate that the problem is not completely fixed, 7/14 cases have supposedly fixed versions of the DWrite.dll.
Reporter | ||
Updated•13 years ago
|
![]() |
||
Comment 4•13 years ago
|
||
Scoobidiver, if we have a crash signature, we should leave the crash keyword on it or else Socorro will probably not find it any more.
Keywords: crash
Reporter | ||
Comment 5•13 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #4)
> Scoobidiver, if we have a crash signature, we should leave the crash keyword
> on it or else Socorro will probably not find it any more.
I don't think the crash keyword is required so that the bug shows up in Socorro. See bug 705761 for instance.
In addition, there are many crashes in Bugzilla with no crash keyword.
Comment 6•13 years ago
|
||
Yeah, I also don't think chromehang's should be tagged with 'crash' since that will directly affect crash-related metrics when they *aren't* crashes but hangs.
![]() |
||
Comment 7•13 years ago
|
||
(In reply to Scoobidiver from comment #5)
> I don't think the crash keyword is required so that the bug shows up in
> Socorro. See bug 705761 for instance.
Interesting, then I may be wrong. OK, I'll probably need to rewrite some other of my tooling so that "crash" and "hang" are counted in it as "crash bugs", I guess, or my analysis of issues to track in CrashKill is wrong... But thanks for making me realize that.
> In addition, there are many crashes in Bugzilla with no crash keyword.
Crashes without the crash keyword are wrong, that needs to be changed.
Keywords: crash
Updated•13 years ago
|
Assignee: nobody → jdaggett
Updated•13 years ago
|
Whiteboard: [Snappy:P1]
Comment 8•13 years ago
|
||
Need to track down the DirectWrite DLL version associated with the outliers here. There's a known MS bug with the timing of DirectWrite font cache startup than can cause this, but MS claims this is fixed.
Telemetry variable: DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
Comment 9•11 years ago
|
||
As mentioned in comment 3, this is fundamentally an OS-level bug that occurs because of a timing issue at *cold* startup between the system font cache and the dwrite system library. Microsoft has said they fixed this previously so hopefully as users update their OS the problem will disappear altogether. The only "solution" here is to basically notice when a long delay has occurred and effectively disable hardware acceleration. Given the *very* small percentage of users that are affected by this, I don't think that's the right way to go but it's certainly a possibility.
Bug 706340 added gfxinfo to telemetry data. So we should be able to use that to answer the question:
How many users of FF {23,24,25} hit a delay here of more that 2secs and what versions of DirectWrite are they using?
Vladan, could you look at the telemetry data for this? Or point me at the docs on how to pull this out?
Telemetry item: DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
The latest version of the telemetry metrics interface doesn't seem to have an export feature, that's kind of unfortunate.
http://telemetry.mozilla.org/#path=release/25/DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
Flags: needinfo?(vdjeric)
Comment 10•11 years ago
|
||
The chromehangs link I emailed you also contains info on hang stacks involving gfxDWriteFontList::DelayedInitFontList and the associated DWriteVersions.
I took a quick look at the top 5 hang signatures involving DelayedInitFontList in that link, and I found the following DWriteVersions (in various proportions):
6.1.7600.16385
6.1.7600.16699
6.1.7600.16763
6.1.7600.16961
6.1.7600.16972
6.1.7600.20655
6.1.7600.20710
6.1.7600.20781
6.1.7600.20830
6.1.7600.21162
6.1.7601.16562
6.1.7601.17105
6.1.7601.17514
6.1.7601.17563
6.1.7601.17776
6.1.7601.17789
6.1.7601.18126
6.1.7601.21935
6.2.9200.16433
6.2.9200.16492
6.2.9200.16571
6.2.9200.16581
7.0.6002.23097
Granted, the vast majority of the hangs reported in this sample are from DWrite versions before Microsoft's fix. Unfortunately, that means real users are still running into this issue.
If you'd like to work with the raw Telemetry data, you can do your analysis directly on our new Telemetry backend:
http://mreid-moz.github.io/blog/2013/11/06/current-state-of-telemetry-analysis/
http://jonasfj.dk/blog/2013/11/telemetry-rebooted-analysis-future/
Please come by the #telemetry channel for more details
Flags: needinfo?(vdjeric)
Comment 11•11 years ago
|
||
Latest results from Telemetry stats:
http://jsfiddle.net/69QNY/2/
For the past few months, using Firefox 24 release data:
The DirectWrite call GetSystemFontCollection() takes over 2.3secs
in roughly 2.1 million startups. This is out of a total of 137
million DirectWrite startups (for comparison, 209 million GDI
startups).
data: release/24/DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
gfxDWriteFontList::DelayedInitFontList GetSystemFontCollection (ms)
index start end count pct cum pct
6 647 2324 480054 0.35 98.45
7 2324 8350 1171404 0.85 99.31
8 8350 30000 790817 0.58 99.88
9 >30000 159719 0.12 100.00
total counts: 137304675
data: release/24/GDI_INITFONTLIST_TOTAL
gfxGDIFontList::InitFontList Total (ms)
index start end count pct cum pct
6 647 2324 1712605 0.82 99.56
7 2324 8350 696282 0.33 99.90
8 8350 30000 189563 0.09 99.99
9 >30000 28550 0.01 100.00
total counts: 209074700
I think we need to figure out two things -- are these cold
startups (i.e. does the startup occur soon after a system
restart) and, for the long delay cases, how many of these are due
to large numbers of fonts on the system?
One "solution" would be to shunt users affected by consistent
long DirectWrite startup times to GDI by effectively disabling
hardware acceleration. This is a bit of a "nuclear solution", the
user would lose the benefits of DirectWrite and related
acceleration.
Comment 12•11 years ago
|
||
Distribution of number of fonts at startup, only for DirectWrite users:
data: release/24/DWRITEFONT_DELAYEDINITFONTLIST_COUNT
gfxDWriteFontList::DelayedInitFontList Font Family Count
index start end count pct cum pct
0 0 1 0 0.00 0.00
1 1 3 24 0.00 0.00
2 3 10 60 0.00 0.00
3 10 32 3231 0.00 0.00
4 32 101 140744 0.10 0.10
5 101 319 123098407 89.65 89.76
6 319 1006 12889622 9.39 99.15
7 1006 3172 1076709 0.78 99.93
8 3172 10000 94342 0.07 100.00
9 >10000 1539 0.00 100.00
total counts: 137304678
So there's a *very* small percentage of users with more than 3000 fonts
on their system (roughly 0.07% of startups).
Comment 13•11 years ago
|
||
Updated telemetry stats don't show any significant change in the percentage of users affected by long times spent in DirectWrite system font collection initialization:
http://jsfiddle.net/eGvw5/4/
release/24 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
index start end count pct cum pct
6 647 2324 427213 0.35 98.46
7 2324 8350 1048207 0.85 99.32
8 8350 30000 701801 0.57 99.89
9 >30000 134983 0.11 100.00
total counts: 122697422
release/27 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
index start end count pct cum pct
6 647 2324 621027 0.37 98.44
7 2324 8350 1491087 0.88 99.32
8 8350 30000 963853 0.57 99.89
9 >30000 180538 0.11 100.00
total counts: 168563706
release/28 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
index start end count pct cum pct
6 647 2324 36820 0.32 98.77
7 2324 8350 81229 0.71 99.48
8 8350 30000 50133 0.44 99.92
9 >30000 9719 0.08 100.00
total counts: 11472782
nightly/31 DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
index start end count pct cum pct
6 647 2324 10084 0.42 98.44
7 2324 8350 21144 0.89 99.32
8 8350 30000 13668 0.57 99.90
9 >30000 2487 0.10 100.00
total counts: 2388461
Given that system startup time was added to telemetry data at the end of January, it would be interesting to know the following data:
For the set of telemetry data where (DWRITEFONT_DELAYEDINITFONTLIST_COLLECT > 2324), what's the distribution of time since system startup? Are these all within 60secs of startup?
Additionally, for the set of telemetry data where (DWRITEFONT_DELAYEDINITFONTLIST_COLLECT > 2324) what's the distribution of DirectWrite library versions? Are there recent versions included in that list? This would be useful information to report back to Microsoft. In the past they've assured me that this problem has been "fixed" in recent versions of the DirectWrite library.
Updated•9 years ago
|
Crash Signature: , PhysicalFontFace const*) ]
[@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ]
[@ chromehang | FontFileAnalyzer::IsPfm(unsigned __int64) ] → , PhysicalFontFace const*) ]
[@ chromehang | `anonymous namespace''::TrueTypeFontMetricsBuilder::GetBlackBox(unsigned short) ]
[@ chromehang | FontFileAnalyzer::IsPfm(unsigned __int64) ]
[@ chromehang | InitializeFontMetrics ]
[@ chromehang | `anonymo…
Updated•9 years ago
|
Assignee: jd.bugzilla → nobody
Comment 14•6 years ago
|
||
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 15•6 years ago
|
||
Closing because no crash reported since 12 weeks.
You need to log in
before you can comment on or make changes to this bug.
Description
•