Closed Bug 1256678 Opened 8 years ago Closed 8 years ago

crash in mozilla::gfx::DrawTargetCairo::FillGlyphs

Categories

(Core :: Graphics: Text, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 - unaffected
firefox48 + fixed

People

(Reporter: dbaron, Assigned: eflores)

References

Details

(4 keywords, Whiteboard: gfx-noted)

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-ce8c4849-c370-4455-9884-c92872160315.
=============================================================

After bug 1247700 there are still a decent number of crashes in mozilla::gfx::DrawTargetCairo::FillGlyphs on nightly.  Filing this bug to investigate.

Query for crashes on nightly since bug 1247700:
https://crash-stats.mozilla.com/signature/?build_id=%3E%3D20160225000000&product=Firefox&release_channel=nightly&date=%3E%3D2016-02-25&signature=mozilla%3A%3Agfx%3A%3ADrawTargetCairo%3A%3AFillGlyphs#aggregations
Flags: needinfo?(milan)
In bug 1247700, a change was made so that rather than crashing on null derefing a font, we now do an explicit gfxDevCrash on null fonts. That still leaves a crash without necessarily addressing the underlying cause.

Do we have any test-cases or an STR for this so we can do a more in-depth fix? From the stack trace, it looks like this may be related to printing, so perhaps that is a good place to try to reproduce?
Flags: needinfo?(anthony.s.hughes)
Keywords: testcase-wanted
Based on bug 1247700 comment 6 I expect the Nightly/Aurora crashes we're seeing are expected? I wouldn't even know where to begin trying to find STR for this as there are no strong correlations to any particular environment setup or URLs. We don't even have any comments or emails we can use for outreach.

For what it's worth this is actually the #1 issue in Aurora and #11 issue in Nightly so I feel like we should do something, I just don't know what we can do.
Flags: needinfo?(anthony.s.hughes)
[Tracking Requested - why for this release]: topcrash in Firefox 47+ with no actionable leads.
I looked more at the crash data and it looks like we're dealing with <30 users on Aurora hitting this (~0.03%), so it would appear to be one of those crashes where a small number of users are crashing multiple times. I've also not seen mention of printing (crash relate or otherwise) on our feedback channels. 

I cannot recommend root cause analysis for a crash that affects that small a population. My recommendation is to wait and see what this looks like in 47.0b1.
Whiteboard: gfx-noted
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #4)
> I looked more at the crash data and it looks like we're dealing with <30
> users on Aurora hitting this (~0.03%), so it would appear to be one of those
> crashes where a small number of users are crashing multiple times. I've also
> not seen mention of printing (crash relate or otherwise) on our feedback
> channels. 
> 
> I cannot recommend root cause analysis for a crash that affects that small a
> population. My recommendation is to wait and see what this looks like in
> 47.0b1.

It should also be noted for those not familiar with this type of crash that this will -not- be a crash in actual releases.
Edwin, it's probably worth spending some time on this to see if we can sort out the underlying issue, and either decide we shouldn't crash in night/aurora, because this is a *normal* thing, or figure out the fix.  See also (for a tiny bit more info) bug 1247700
Assignee: nobody → edwin
Flags: needinfo?(milan)
This looks like fallout from bug 1156742.

The null deref comes from PrintTranslator::LookupScaledFont() on the compositor process [1], which in turn comes from DrawTargetRecording on the content side failing to record the font data [2].

Looking at the code, this only happens when ScaledFont::GetFontFileData() failed. Particularly suspect is ScaledFontWin::GetFontFileData() as there are a few more failure paths there that are easier to hit than in the other implementations.

[1] https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/layout/printing/PrintTranslator.h#76
[2] https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/gfx/2d/DrawTargetRecording.cpp#384
[3] https://dxr.mozilla.org/mozilla-central/rev/55d557f4d73ee58664bdf2fa85aaab555224722e/gfx/2d/ScaledFontWin.cpp#30
Attached patch Crashy fun timesSplinter Review
Crash more things!

AFAICT this patch should crash all paths that could be causing this, and shouldn't be hit by any other code. Doing a try run to make sure.
Attachment #8738180 - Flags: review?(bobowen.code)
Comment on attachment 8738180 [details] [diff] [review]
Crashy fun times

I'm not a peer for this, sorry.
Attachment #8738180 - Flags: review?(bobowen.code)
Since the above is a diagnostic patch, leave open.
Keywords: leave-open
Depends on: 1263493
The crash reports show two sources of failure.

One is from GetFontData() [1] -- jfkthame gave a hint that this call doesn't work for non-sfnt data, but we still support old, non-sfnt font formats. Writing a patch for this case now to serialise the LOGFONTW structure and resolve the font from the system on the other side (we can assume any non-sfnt fonts are not web fonts, and are therefore provided by the system).

The other is from SFNTData::GetIndexForU16Name() [2]. This one is less clear, but it seems like the font should be present in the collection if we've made it this far. Two things I can see potentially going wrong are that: (a) we read the whole font name from the `name' table and compare it against LOGFONTW::lfFaceName, but the latter is truncated to 32 characters [3] so long font names might not match; and (b) if the full name doesn't exist we make one up by concatenating the partial names [4], which I'm not certain is kosher.

[1] https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/gfx/2d/ScaledFontWin.cpp#43
[2] https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/gfx/2d/ScaledFontWin.cpp#71
[3] https://msdn.microsoft.com/en-us/library/aa741231%28v=vs.85%29.aspx
[4] https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/gfx/2d/SFNTNameTable.cpp#206
Crash Signature: [@ mozilla::gfx::DrawTargetCairo::FillGlyphs] → [@ mozilla::gfx::DrawTargetCairo::FillGlyphs] [@ mozilla::gfx::ScaledFontWin::GetFontFileData]
(In reply to Edwin Flores [:eflores] [:edwin] from comment #14)
> The other is from SFNTData::GetIndexForU16Name() [2]. This one is less
> clear, but it seems like the font should be present in the collection if
> we've made it this far. Two things I can see potentially going wrong are
> that: (a) we read the whole font name from the `name' table and compare it
> against LOGFONTW::lfFaceName, but the latter is truncated to 32 characters
> [3] so long font names might not match; and (b) if the full name doesn't
> exist we make one up by concatenating the partial names [4], which I'm not
> certain is kosher.

Yes, both (a) and (b) here sound like plausible sources of failure. (If we made up a full-name by concatenation because one didn't exist, then we're unlikely to subsequently find it in the table!)
(In reply to Edwin Flores [:eflores] [:edwin] from comment #14)
> The crash reports show two sources of failure.
> 
> One is from GetFontData() [1] -- jfkthame gave a hint that this call doesn't
> work for non-sfnt data, but we still support old, non-sfnt font formats.
> Writing a patch for this case now to serialise the LOGFONTW structure and
> resolve the font from the system on the other side (we can assume any
> non-sfnt fonts are not web fonts, and are therefore provided by the system).

Ah right that makes sense, thanks for fixing this.

> The other is from SFNTData::GetIndexForU16Name() [2]. This one is less
> clear, but it seems like the font should be present in the collection if
> we've made it this far. Two things I can see potentially going wrong are
> that: (a) we read the whole font name from the `name' table and compare it
> against LOGFONTW::lfFaceName, but the latter is truncated to 32 characters
> [3] so long font names might not match; and (b) if the full name doesn't
> exist we make one up by concatenating the partial names [4], which I'm not
> certain is kosher.

(a) seems like a probable cause here. Perhaps we just need to add a max match length parameter or something.

for (b) this was added to match the code in thebes [1], which I had to port to Moz2D because apparently we're not allowed to call thebes code from there.
As this is only done when we can't find a proper full name, I don't see how it could cause extra failures for the call to SFNTData::GetIndexForU16Name. If the name in the LOGFONT was somehow created using the existing thebes code then it would match, but maybe that can't happen.

[1] https://dxr.mozilla.org/mozilla-central/rev/21bf1af375c1fa8565ae3bb2e89bd1a0809363d4/gfx/thebes/gfxFontUtils.cpp#1151
(In reply to Bob Owen (:bobowen) from comment #17)
> for (b) this was added to match the code in thebes [1], which I had to port
> to Moz2D because apparently we're not allowed to call thebes code from there.
> As this is only done when we can't find a proper full name, I don't see how
> it could cause extra failures for the call to SFNTData::GetIndexForU16Name.
> If the name in the LOGFONT was somehow created using the existing thebes
> code then it would match, but maybe that can't happen.

I think (IIRC) that the "full name" is only used in thebes to provide a name for the fontEntry object for debugging and display purposes (e.g. in the font inspector), so synthesizing a "full name" from family + style if it's not present in the name table is a reasonable thing to do. I'm not sure if such a synthetic name will ever show up in a LOGFONT -- I'd need to re-read the GDI font code carefully to see if that can happen, but it sounds fishy.
Will be backing out the original gfxDevCrash. This patch adds it back in for one specific case and adds the font name.
Attachment #8741856 - Flags: review?(jfkthame)
Comment on attachment 8741854 [details] [diff] [review]
1256678-logfont.patch

Review of attachment 8741854 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable to me (modulo a couple notes below), but I don't really know anything about the moz2d code, recording, etc, so I think someone like Bas should probably have a look.

::: gfx/2d/2D.h
@@ +644,5 @@
>    MOZ_DECLARE_REFCOUNTED_VIRTUAL_TYPENAME(ScaledFont)
>    virtual ~ScaledFont() {}
>  
>    typedef void (*FontFileDataOutput)(const uint8_t *aData, uint32_t aLength, uint32_t aIndex, Float aGlyphSize, void *aBaton);
> +  typedef void (*FontDescriptorOutput)(const uint8_t *aData, uint32_t aLength, Float aGlyphSize, void *aBaton);

I'd prefer this to be called aFontSize rather than aGlyphSize (here and throughout the patch); it's about the size at which the font as a whole is being used.

::: gfx/2d/DrawTargetRecording.cpp
@@ +407,5 @@
> +      // the system on the other side.
> +      RecordedFontDescriptor fontDesc(aFont);
> +      if (fontDesc.IsValid()) {
> +        mRecorder->RecordEvent(fontDesc);
> +      }

What if !fontDesc.IsValid()? Seems like in that case, nothing gets recorded, so presumably the results will be garbage. Should we warn/assert/curl up and die/...?
Attachment #8741854 - Flags: review?(jfkthame) → review?(bas)
Attachment #8741854 - Flags: review?(bas) → review+
Comment on attachment 8741855 [details] [diff] [review]
1256678-getindex-truncate-name.patch

Review of attachment 8741855 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/SFNTData.cpp
@@ +217,5 @@
> +    // then only compare the first 31 characters.
> +    size_t cmpLen = aU16FullName.length() == LF_FACESIZE - 1 ?
> +      LF_FACESIZE - 1 : std::max(aU16FullName.length(), name.length());
> +
> +    if (!name.compare(0, cmpLen, aU16FullName, 0, cmpLen)) {

Do you need the trailing 0 and cmpLen arguments? IIUC, you should be able to just do

   name.compare(0, cmpLen, aU16FullName)

here.

Though on second thoughts, instead of the rather confusing computation of |cmpLen|, would it be more transparent to simply truncate |name| to LF_FACESIZE-1 characters?
Attachment #8741856 - Flags: review?(jfkthame) → review+
Attachment #8741855 - Attachment is obsolete: true
Attachment #8741855 - Flags: review?(jfkthame)
Attachment #8743797 - Flags: review?(jfkthame)
Comment on attachment 8743797 [details] [diff] [review]
1256678-getindex-truncate-name.patch

Review of attachment 8743797 [details] [diff] [review]:
-----------------------------------------------------------------

Umm, just a minute... what if we're dealing with DirectWrite fonts, not GDI? In that case, will we find ourselves calling this method with names that don't come from a LOGFONT and hence aren't subject to the 31-char limit? Truncating the comparison will be wrong in that situation, and could result in using the wrong face, IIUC.
Yes, that would be bad. Fun, but bad.

This makes it more explicit.
Attachment #8743797 - Attachment is obsolete: true
Attachment #8743797 - Flags: review?(jfkthame)
Attachment #8743842 - Flags: review?(jfkthame)
Comment on attachment 8743842 [details] [diff] [review]
1256678-getindex-truncate-name.patch

Review of attachment 8743842 [details] [diff] [review]:
-----------------------------------------------------------------

There are moments when I really dislike some of the legacy APIs we have to deal with.

::: gfx/2d/SFNTData.h
@@ +72,5 @@
>     * @param aIndex out param for the index if found.
>     * @return true if the full name is successfully read.
>     */
> +  bool GetIndexForU16Name(const mozilla::u16string& aU16FullName, uint32_t* aIndex,
> +                          size_t aTruncatedLen = 0);

As we've got @param comments for this method (yay!), you should really add a mention of what the new param does, too.
Attachment #8743842 - Flags: review?(jfkthame) → review+
Hi Edwin, I did not see too many instances of this crash on 47. Do you think this fix is safe and stable to uplift to Beta47?
Flags: needinfo?(edwin)
Hi Ritu,

This bug is in a feature currently enabled only on Nightly. Uplift won't be necessary.
Flags: needinfo?(edwin)
(In reply to Edwin Flores [:eflores] [:edwin] from comment #31)
> Hi Ritu,
> 
> This bug is in a feature currently enabled only on Nightly. Uplift won't be
> necessary.

Got it. Thanks! Fx47 is unaffected that means.
You need to log in before you can comment on or make changes to this bug.