Closed Bug 1493743 Opened 7 years ago Closed 7 years ago

Crash in gfxFontGroup::BuildFontList

Categories

(Core :: Layout: Text and Fonts, defect, P3)

Unspecified
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla65
Tracking Status
firefox-esr60 --- fixed
firefox63 --- wontfix
firefox64 --- verified
firefox65 --- verified

People

(Reporter: marcia, Assigned: jfkthame)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is report bp-13e170e2-018f-4751-b0fd-2f0f80180924. ============================================================= Seen while looking at crash stats. Small volume Mac crash seems to be only confined to 10.14 at the moment: https://bit.ly/2PVHsiE. Several of the users crashed more than once. There are several Windows crashes, but they are not on 64 nightly. Sorry wasn't sure if this is Graphics |Text or Texts and Fonts... Top 10 frames of crashing thread: 0 XUL gfxFontGroup::BuildFontList gfx/thebes/gfxTextRun.cpp:1889 1 XUL gfxFontGroup::gfxFontGroup gfx/thebes/gfxTextRun.cpp:1814 2 XUL gfxPlatformMac::CreateFontGroup gfx/thebes/gfxTextRun.cpp:1810 3 XUL nsFontMetrics::nsFontMetrics gfx/src/nsFontMetrics.cpp:143 4 XUL nsFontCache::GetMetricsFor gfx/src/nsFontMetrics.cpp:123 5 XUL nsLayoutUtils::GetFontMetricsForComputedStyle gfx/src/nsDeviceContext.cpp:245 6 XUL mozilla::ReflowInput::CalcLineHeight layout/generic/ReflowInput.cpp:2871 7 XUL mozilla::BlockReflowInput::BlockReflowInput layout/generic/ReflowInput.cpp:2883 8 XUL nsBlockFrame::Reflow layout/generic/BlockReflowInput.cpp:50 9 XUL nsContainerFrame::ReflowChild layout/generic/nsContainerFrame.cpp:951 =============================================================
Randell looked at the crashes in triage that weren't 64 and they seem to be a different issue. Let's keep this bug about the Mac crashes showing up in Mac nightly on 10.14.
Can you figure out who should look at this? Thanks
Flags: needinfo?(jfkthame)
I looked at this a bit to try and understand a scenario that could lead to a null-deref in here, but without much success. There are a few Windows crashes which point to https://hg.mozilla.org/mozilla-central/annotate/bae2e21a9312/gfx/thebes/gfxTextRun.cpp#l1833, which suggests the name.mName atom may be null. That shouldn't be the case when name.IsNamed() is true, but if we somehow have a default-constructed FontFamilyName, then it could happen. I think we should change the default FontFamilyName constructor to avoid that possibility, even though I haven't figured out how it would happen. The macOS crashes point to https://hg.mozilla.org/mozilla-central/annotate/856103837d4dda0e176706cc64927546459d510c/gfx/thebes/gfxTextRun.cpp#l1889, which looks more like we have a null aFamily. But I can't see where that would come from. If someone has a machine running 10.14 (Lee, Markus -- either of you have 10.14 yet?) and can reproduce this, we'd probably be able to figure out where it's coming from, but without that it's hard to see a way forward.
Flags: needinfo?(mstange)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jfkthame)
Priority: -- → P3
I think we should do this, to avoid the risk of an inconsistent state, though I don't know if it will actually help anything here.
Attachment #9014106 - Flags: review?(lsalzman)
I don't have 10.14 yet. Stephen, have you seen this crash before?
Flags: needinfo?(mstange) → needinfo?(spohl.mozilla.bugs)
Comment on attachment 9014106 [details] [diff] [review] speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided It's worth a try, at least. I don't have 10.14 unfortunately. Only 10.13.
Flags: needinfo?(lsalzman)
Attachment #9014106 - Flags: review?(lsalzman) → review+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f34f0818f82d speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided. r=lsalzman
Let's keep this open for now, until we see whether this makes any difference; I'm not at all confident it's the real or only problem.
Keywords: leave-open
(In reply to Markus Stange [:mstange] from comment #5) > I don't have 10.14 yet. Stephen, have you seen this crash before? No, I haven't. If we find steps to reproduce I'm happy to investigate or verify a fix.
Flags: needinfo?(spohl.mozilla.bugs)
See Also: → 1497563
23 crashes/5 installations in the last week. Most of the crashes are on 10.14 Mac.
NI for another look
Flags: needinfo?(jfkthame)
The stack in reports like https://crash-stats.mozilla.com/report/index/e3949730-6a4d-4f44-9b3b-7e1700181019 makes it appear that perhaps we're getting a null gfxFontFamily pointer passed to gfxFontGroup::AddFamilyToFontList. That isn't supposed to be possible, but perhaps there's some circumstance where it happens. (We're clearly not seeing it in automation with debug builds, as there's an assertion that would be firing.) I suggest we revise the code here to be safer in the face of a null aFamily argument, and see if that helps eliminate these crashes. In the worst case, I guess this could mean we end up with an empty font list, but we've got other code that kills the process if there's no usable font available; more likely, there's some kind of broken font present, and if we skip that we'll just end up using the next in the list, or a generic fallback.
Flags: needinfo?(jfkthame)
I suggest we try this and see if some/all of these crashes stop. While I haven't tracked down exactly how a null pointer could arise here, it's clear that nothing good can come of attempting to use it.
Attachment #9019760 - Flags: review?(lsalzman)
Attachment #9019760 - Flags: review?(lsalzman) → review+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/eff78d899e01 Make AddFamilyToFontList return safely if passed a null font-family. r=lsalzman
Let's keep the leave-open keyword for now, until we see if this makes any difference.
https://crash-stats.mozilla.com/report/index/a30ac042-1631-4328-a346-058e20181027 says we still have some kind of problem here. In this case, it looks like a null nsAtom pointer is being used; not sure how that arises, though.
I'm guessing this is the atom dereference that's crashing, though I haven't figured out how we can end up with a null pointer in the list here, and no idea how to reproduce. Anyhow, skipping the bad FontFamilyName should be safe enough, we'll just end up using a default font if no valid names are found here.
Attachment #9020729 - Flags: review?(lsalzman)
Attachment #9020729 - Flags: review?(lsalzman) → review+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3f6bc2bf601b Make BuildFontList robust against an invalid FontFamilyName record with no name atom. r=lsalzman
I don't see any reports from trunk builds later than 20181026220839; I think we can call this fixed. There's a steady trickle of crashes on beta, though, so we should probably uplift these patches.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment on attachment 9020729 [details] [diff] [review] Make BuildFontList robust against an invalid FontFamilyName record with no name atom [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: unknown User impact if declined: Crashes (mostly on macOS, also some Windows), exact cause unknown but may be related to a bad font of some kind Is this code covered by automated tests?: Yes Has the fix been verified in Nightly?: No Needs manual test from QE?: No If yes, steps to reproduce: no known STR, just crash reports from the wild List of other uplifts needed: recommend taking all 3 patches on this bug Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): All 3 patches here just add safe handling of unexpectedly-null/invalid objects (missing name, broken font?) so that we'll end up falling back to a default font, as if the bad entry were not present, rather than crashing. String changes made/needed: none
Attachment #9020729 - Flags: approval-mozilla-beta?
Hmm, trying to set the approval-beta? flag on the other two patches without providing additional comments -- as comment 23 was intended to cover all three -- turns out to partially fill in the approval form with some unexpected defaults. Comments 24-25 should be ignored.
Assignee: nobody → jfkthame
Target Milestone: --- → mozilla65
Comment on attachment 9014106 [details] [diff] [review] speculative patch - Set type to None rather than Named in default FontFamilyName constructor, as no name is provided This landed on m-c when it was still tracking 64.
Attachment #9014106 - Flags: approval-mozilla-beta?
Comment on attachment 9019760 [details] [diff] [review] Make AddFamilyToFontList return safely if passed a null font-family [Triage Comment] Fixes font-related crashes on OSX and Windows. Approved for 64.0b7.
Attachment #9019760 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9020729 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify+ in order to check for Socorro data Friday in beta 8, since there are no known str.
Flags: qe-verify+
Based on crash-stats (https://bit.ly/2JXnD97), this crash did not occur on builds with the fix: Beta 64.0b7 (11.05.2018) or after and Nightly 65.0a1 builds created after 11.05.2018.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Given that bug 1497563 is showing up on ESR60 with decent volume and this is a pretty small set of patches, I think this would is worth uplift consideration. Please nominate unless you feel strongly otherwise :)
Flags: needinfo?(jfkthame)
Yes, I think it'd be worth uplifting these -- they're all totally safe, just hardening the code so it handles unexpected error conditions safely. (FWIW, I notice there are still some crash reports from Fx64 and 65 builds after these patches landed, so the issue hasn't entirely gone away, though it does seem to be much improved.) Uplift request is intended to refer to all three patches (though I think the third one turned out to be the most helpful).
Flags: needinfo?(jfkthame)
Comment on attachment 9020729 [details] [diff] [review] Make BuildFontList robust against an invalid FontFamilyName record with no name atom [ESR Uplift Approval Request] If this is not a sec:{high,crit} bug, please state case for ESR consideration: Potential crashes when building font lists (this signature or bug 1497563) User impact if declined: continued crashes Fix Landed on Version: 64 Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): Just simple changes to safely handle unexpected error conditions String or UUID changes made by this patch: none
Attachment #9020729 - Flags: approval-mozilla-esr60?
Comment on attachment 9020729 [details] [diff] [review] Make BuildFontList robust against an invalid FontFamilyName record with no name atom Low-risk fix for a crash showing up on ESR60 at moderate volumes. Approved for 60.5.0esr.
Attachment #9020729 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+

It looks like we should just omit the first patch here when landing on esr60, to avoid the risk of the Rust panic. I've pushed a try run to double-check that it's happier without that:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e25ea502a0fc732890cc778e06439abf12315cf

Flags: needinfo?(jfkthame)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: