Crash in gfxFontGroup::BuildFontList

VERIFIED FIXED in Firefox -esr60

Status

()

defect
P3
critical
VERIFIED FIXED
8 months ago
5 months ago

People

(Reporter: marcia, Assigned: jfkthame)

Tracking

({crash, regression})

Trunk
mozilla65
Unspecified
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 fixed, firefox63 wontfix, firefox64 verified, firefox65 verified)

Details

(crash signature)

Attachments

(3 attachments)

This bug was filed from the Socorro interface and is
report bp-13e170e2-018f-4751-b0fd-2f0f80180924.
=============================================================

Seen while looking at crash stats. Small volume Mac crash seems to be only confined to 10.14 at the moment: https://bit.ly/2PVHsiE. Several of the users crashed more than once. 

There are several Windows crashes, but they are not on 64 nightly.

Sorry wasn't sure if this is Graphics |Text or Texts and Fonts...

Top 10 frames of crashing thread:

0 XUL gfxFontGroup::BuildFontList gfx/thebes/gfxTextRun.cpp:1889
1 XUL gfxFontGroup::gfxFontGroup gfx/thebes/gfxTextRun.cpp:1814
2 XUL gfxPlatformMac::CreateFontGroup gfx/thebes/gfxTextRun.cpp:1810
3 XUL nsFontMetrics::nsFontMetrics gfx/src/nsFontMetrics.cpp:143
4 XUL nsFontCache::GetMetricsFor gfx/src/nsFontMetrics.cpp:123
5 XUL nsLayoutUtils::GetFontMetricsForComputedStyle gfx/src/nsDeviceContext.cpp:245
6 XUL mozilla::ReflowInput::CalcLineHeight layout/generic/ReflowInput.cpp:2871
7 XUL mozilla::BlockReflowInput::BlockReflowInput layout/generic/ReflowInput.cpp:2883
8 XUL nsBlockFrame::Reflow layout/generic/BlockReflowInput.cpp:50
9 XUL nsContainerFrame::ReflowChild layout/generic/nsContainerFrame.cpp:951

=============================================================
Randell looked at the crashes in triage that weren't 64 and they seem to be a different issue. Let's keep this bug about the Mac crashes showing up in Mac nightly on 10.14.
Can you figure out who should look at this?  Thanks
Flags: needinfo?(jfkthame)
Assignee

Comment 3

8 months ago
I looked at this a bit to try and understand a scenario that could lead to a null-deref in here, but without much success.

There are a few Windows crashes which point to https://hg.mozilla.org/mozilla-central/annotate/bae2e21a9312/gfx/thebes/gfxTextRun.cpp#l1833, which suggests the name.mName atom may be null. That shouldn't be the case when name.IsNamed() is true, but if we somehow have a default-constructed FontFamilyName, then it could happen. I think we should change the default FontFamilyName constructor to avoid that possibility, even though I haven't figured out how it would happen.

The macOS crashes point to https://hg.mozilla.org/mozilla-central/annotate/856103837d4dda0e176706cc64927546459d510c/gfx/thebes/gfxTextRun.cpp#l1889, which looks more like we have a null aFamily. But I can't see where that would come from.

If someone has a machine running 10.14 (Lee, Markus -- either of you have 10.14 yet?) and can reproduce this, we'd probably be able to figure out where it's coming from, but without that it's hard to see a way forward.
Flags: needinfo?(mstange)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jfkthame)
Priority: -- → P3
Assignee

Comment 4

8 months ago
I think we should do this, to avoid the risk of an inconsistent state, though I don't know if it will actually help anything here.
Attachment #9014106 - Flags: review?(lsalzman)
I don't have 10.14 yet. Stephen, have you seen this crash before?
Flags: needinfo?(mstange) → needinfo?(spohl.mozilla.bugs)
Comment on attachment 9014106 [details] [diff] [review]
speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided

It's worth a try, at least.

I don't have 10.14 unfortunately. Only 10.13.
Flags: needinfo?(lsalzman)
Attachment #9014106 - Flags: review?(lsalzman) → review+

Comment 7

8 months ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f34f0818f82d
speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided. r=lsalzman
Assignee

Comment 8

8 months ago
Let's keep this open for now, until we see whether this makes any difference; I'm not at all confident it's the real or only problem.
Keywords: leave-open
(In reply to Markus Stange [:mstange] from comment #5)
> I don't have 10.14 yet. Stephen, have you seen this crash before?

No, I haven't. If we find steps to reproduce I'm happy to investigate or verify a fix.
Flags: needinfo?(spohl.mozilla.bugs)
Assignee

Updated

8 months ago
See Also: → 1497563
23 crashes/5 installations in the last week. Most of the crashes are on 10.14 Mac.
NI for another look
Flags: needinfo?(jfkthame)
Assignee

Comment 13

7 months ago
The stack in reports like https://crash-stats.mozilla.com/report/index/e3949730-6a4d-4f44-9b3b-7e1700181019 makes it appear that perhaps we're getting a null gfxFontFamily pointer passed to gfxFontGroup::AddFamilyToFontList. That isn't supposed to be possible, but perhaps there's some circumstance where it happens. (We're clearly not seeing it in automation with debug builds, as there's an assertion that would be firing.)

I suggest we revise the code here to be safer in the face of a null aFamily argument, and see if that helps eliminate these crashes. In the worst case, I guess this could mean we end up with an empty font list, but we've got other code that kills the process if there's no usable font available; more likely, there's some kind of broken font present, and if we skip that we'll just end up using the next in the list, or a generic fallback.
Flags: needinfo?(jfkthame)
Assignee

Comment 14

7 months ago
I suggest we try this and see if some/all of these crashes stop. While I haven't tracked down exactly how a null pointer could arise here, it's clear that nothing good can come of attempting to use it.
Attachment #9019760 - Flags: review?(lsalzman)
Attachment #9019760 - Flags: review?(lsalzman) → review+

Comment 15

7 months ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/eff78d899e01
Make AddFamilyToFontList return safely if passed a null font-family. r=lsalzman
Assignee

Comment 16

7 months ago
Let's keep the leave-open keyword for now, until we see if this makes any difference.
Assignee

Comment 18

7 months ago
https://crash-stats.mozilla.com/report/index/a30ac042-1631-4328-a346-058e20181027 says we still have some kind of problem here.

In this case, it looks like a null nsAtom pointer is being used; not sure how that arises, though.
Assignee

Comment 19

7 months ago
I'm guessing this is the atom dereference that's crashing, though I haven't figured out how we can end up with a null pointer in the list here, and no idea how to reproduce. Anyhow, skipping the bad FontFamilyName should be safe enough, we'll just end up using a default font if no valid names are found here.
Attachment #9020729 - Flags: review?(lsalzman)
Attachment #9020729 - Flags: review?(lsalzman) → review+

Comment 20

7 months ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f6bc2bf601b
Make BuildFontList robust against an invalid FontFamilyName record with no name atom. r=lsalzman
Assignee

Comment 22

7 months ago
I don't see any reports from trunk builds later than 20181026220839; I think we can call this fixed.

There's a steady trickle of crashes on beta, though, so we should probably uplift these patches.
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → FIXED
Assignee

Comment 23

7 months ago
Comment on attachment 9020729 [details] [diff] [review]
Make BuildFontList robust against an invalid FontFamilyName record with no name atom

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: unknown

User impact if declined: Crashes (mostly on macOS, also some Windows), exact cause unknown but may be related to a bad font of some kind

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: No

Needs manual test from QE?: No

If yes, steps to reproduce: no known STR, just crash reports from the wild

List of other uplifts needed: recommend taking all 3 patches on this bug

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): All 3 patches here just add safe handling of unexpectedly-null/invalid objects (missing name, broken font?) so that we'll end up falling back to a default font, as if the bad entry were not present, rather than crashing.

String changes made/needed: none
Attachment #9020729 - Flags: approval-mozilla-beta?
Comment hidden (obsolete)
Comment hidden (obsolete)
Assignee

Comment 26

7 months ago
Hmm, trying to set the approval-beta? flag on the other two patches without providing additional comments -- as comment 23 was intended to cover all three -- turns out to partially fill in the approval form with some unexpected defaults. Comments 24-25 should be ignored.
Assignee: nobody → jfkthame
Target Milestone: --- → mozilla65
Comment on attachment 9014106 [details] [diff] [review]
speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided

This landed on m-c when it was still tracking 64.
Attachment #9014106 - Flags: approval-mozilla-beta?
Comment on attachment 9019760 [details] [diff] [review]
Make AddFamilyToFontList return safely if passed a null font-family

[Triage Comment]
Fixes font-related crashes on OSX and Windows. Approved for 64.0b7.
Attachment #9019760 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9020729 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify+ in order to check for Socorro data Friday in beta 8, since there are no known str.
Flags: qe-verify+
Based on crash-stats (https://bit.ly/2JXnD97), this crash did not occur on builds with the fix: Beta 64.0b7 (11.05.2018) or after and Nightly 65.0a1 builds created after 11.05.2018.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Given that bug 1497563 is showing up on ESR60 with decent volume and this is a pretty small set of patches, I think this would is worth uplift consideration. Please nominate unless you feel strongly otherwise :)
Flags: needinfo?(jfkthame)
Assignee

Comment 33

5 months ago
Yes, I think it'd be worth uplifting these -- they're all totally safe, just hardening the code so it handles unexpected error conditions safely.

(FWIW, I notice there are still some crash reports from Fx64 and 65 builds after these patches landed, so the issue hasn't entirely gone away, though it does seem to be much improved.)

Uplift request is intended to refer to all three patches (though I think the third one turned out to be the most helpful).
Flags: needinfo?(jfkthame)
Assignee

Comment 34

5 months ago
Comment on attachment 9020729 [details] [diff] [review]
Make BuildFontList robust against an invalid FontFamilyName record with no name atom

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Potential crashes when building font lists (this signature or bug 1497563)

User impact if declined: continued crashes

Fix Landed on Version: 64

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Just simple changes to safely handle unexpected error conditions

String or UUID changes made by this patch: none
Attachment #9020729 - Flags: approval-mozilla-esr60?
Comment on attachment 9020729 [details] [diff] [review]
Make BuildFontList robust against an invalid FontFamilyName record with no name atom

Low-risk fix for a crash showing up on ESR60 at moderate volumes. Approved for 60.5.0esr.
Attachment #9020729 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Assignee

Comment 38

5 months ago

It looks like we should just omit the first patch here when landing on esr60, to avoid the risk of the Rust panic. I've pushed a try run to double-check that it's happier without that:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e25ea502a0fc732890cc778e06439abf12315cf

Flags: needinfo?(jfkthame)
You need to log in before you can comment on or make changes to this bug.