Closed Bug 1249600 Opened 4 years ago Closed 3 years ago

crash in get_style

Categories

(Core :: Canvas: 2D, defect, critical)

47 Branch
x86_64
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 + fixed
firefox47 + fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: jujjyl, Assigned: mchang)

References

Details

(Keywords: crash, Whiteboard: gfx-noted)

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-37b753a5-f081-4bf5-a889-b37432160219.
=============================================================

STR: Visit https://dl.dropboxusercontent.com/u/40949268/emcc/ShadowMap_novsync/ShadowMap_cpuprofiler.html

The page will crash on load. Tested on Win 10/64-bit, but not sure if specific to that.

More:

https://crash-stats.mozilla.com/report/index/e4f2260a-63da-492a-aff2-53b2c2160219
https://crash-stats.mozilla.com/report/index/d9831092-ffaa-4c48-a56f-aa3ee2160219
Bisected down to a regression with range

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9523cacef27678e8d3323824240240590f44c0d0&tochange=d8b44f0f253625e6428589508ea2821522e49bb8

referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1247775.

Bas, could you take a peek if you can reproduce the crash?
Flags: needinfo?(bas)
I'm basically perma-crashing with this. Just browsing the web, I'm crashing every couple of minutes. We're falling back to Skia canvas somehow even though my about:support says I should be using d2d 1.1.
Whiteboard: gfx-noted
Hmm that's also odd. My canvas backend pref was "skia" but my about:support said d2d 1.1. Resetting my preferences and will report back to see if I still crash.
Interestingly, I can only reproduce the crash on the test page from comment 0 with e10s enabled.
Seconded, also notice that if I run the STR test page in a non-e10s window, it does not crash. about:support Graphics section for me gives

Graphics
Adapter Description	NVIDIA GeForce GTX 980 Ti
Adapter Drivers	nvd3dumx,nvwgf2umx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um,nvwgf2um
Adapter RAM	4095
Asynchronous Pan/Zoom	wheel input enabled; touch input enabled
Device ID	0x17c8
Direct2D Enabled	true
DirectWrite Enabled	true (10.0.10586.0)
Driver Date	2-8-2016
Driver Version	10.18.13.6191
GPU #2 Active	false
GPU Accelerated Windows	1/1 Direct3D 11 (OMTC)
Subsys ID	00000000
Supports Hardware H264 Decoding	Yes
Vendor ID	0x10de
WebGL Renderer	Google Inc. -- ANGLE (NVIDIA GeForce GTX 980 Ti Direct3D11 vs_5_0 ps_5_0)
windowLayerManagerRemote	true
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
AzureSkiaAccelerated	0
(#0) 	CP+[GFX1-]: Failed to create a content image bridge device: 0x80004005
(#1) 	CP+[GFX1-]: Failed to create a D3D11 content device: 0x80004005
(#2) 	CP+[GFX1-]: Failed to create a content image bridge device: 0x80004005
(#3) 	CP+[GFX1-]: Failed to create a D3D11 content device: 0x80004005
We're actually crashing in Skia because by default, we have Cairo and Skia as the default canvas backends. When we're creating fonts, we are creating them with Cairo here [1] in the content process. The fonts are created without some of the data we expect with Skia. Then we create a Skia canvas backend and try to render the glyph, causing the crash. I'm still not sure why even with a d2d1.1 backend pref, we're creating a skia backend. 
 
[1] https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxDWriteFonts.cpp?case=true&from=gfxDWriteFonts.cpp#699
So even if I explicitly set the canvas backend to Skia, about:support still shows direct2d 1.1. I think deleting the d2d 1.0 backend might have messed up with the backend selection pref, especially with e10s. Can you please take a look at this Bas?
Depends on: 1245309
Did you manage to find out why the content device creation is failing Mason?
Flags: needinfo?(bas)
Flags: needinfo?(mchang)
Failing due to bug 1245309.
Flags: needinfo?(mchang)
Assignee: nobody → mchang
We'd get into this crash because we had different backup backends for content and canvas. Content without d2d would use cairo and canvas would use Skia. We'd create some fonts with Cairo, which only contains font face information. Then we'd use the same font with Skia when visiting a Canvas page, but Skia expects to have the font, font family, and font face, not only the font face which is what cairo expects.

This patch looks up the font and font family from the factory system font collection when requesting a SkTypeface if it doesn't already have the data.

Another option, is to unify how we create DWrite fonts to ALWAYS include the font, font family, and font face data in ScaledFontDWrite. This is what happens if we actually use Skia for both content/client today. I voted against this atm since all NativeFont data for the other platforms only use the font face, and we'd change [1] to be a bit different for only dwrite fonts. I'd be happy to do this once we get rid of all the other backends other than D2D 1.1 / Skia, or if you have a strong opinion to do it now.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/2d/Factory.cpp?from=Factory.cpp#500
Attachment #8722279 - Flags: review?(bas)
Attachment #8722279 - Attachment is obsolete: true
Attachment #8722279 - Flags: review?(bas)
Attachment #8722281 - Flags: review?(bas)
Is the site in comment 0 still crashing for you? It should be fixed now. Thanks!
Flags: needinfo?(jujjyl)
Currently traveling, so don't have access to the original hardware to reproduce. Testing on a Macbook Pro that I have, I get a crash at

https://crash-stats.mozilla.com/report/index/262eec6e-34eb-4c05-a2ef-2cc0a2160229

but it doesn't seem to reproduce deterministically, and after the initial crash, managed to load the site up three times without crashing.
Flags: needinfo?(jujjyl)
(In reply to Jukka Jylänki from comment #13)
> Currently traveling, so don't have access to the original hardware to
> reproduce. Testing on a Macbook Pro that I have, I get a crash at
> 
> https://crash-stats.mozilla.com/report/index/262eec6e-34eb-4c05-a2ef-
> 2cc0a2160229
> 
> but it doesn't seem to reproduce deterministically, and after the initial
> crash, managed to load the site up three times without crashing.

Thanks! That crash looks like something else, resolving this bug.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1250669
Looks like this can still happen if we correctly initialize D2d, get a TDR, already initialized DWRite fonts, and can't reinit the content device. Can we get this patch reviewed?
Status: RESOLVED → REOPENED
Flags: needinfo?(bas)
Resolution: DUPLICATE → ---
Marking affected for 46, because of the comments and uplift request in bug 1250669.
Blocks: 1253342
Comment on attachment 8722281 [details] [diff] [review]
Lookup font and font family from font face when requesting SkTypeface

Review of attachment 8722281 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/ScaledFontDWrite.cpp
@@ +120,5 @@
> +// but Skia canvas backend.
> +void
> +ScaledFontDWrite::GetFontDataFromSystemFonts(IDWriteFactory* aFactory)
> +{
> +  MOZ_ASSERT(mFontFace);

Please assign the results to an HRESULT here and log the error codes.
Attachment #8722281 - Flags: review?(bas) → review+
Flags: needinfo?(bas)
Updated with Bas' feedback.
Attachment #8722281 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/cd9b96b76188
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8727537 [details] [diff] [review]
Lookup font and font family from font face when requesting SkTypeface

Approval Request Comment
[Feature/regressing bug #]: Skia support for DWrite fonts
[User impact if declined]: We get this crash
[Describe test coverage new/current, TreeHerder]: Manual testing
[Risks and why]: Low, this only happens if a user initially has gpu acceleration working then fallsback to software.
[String/UUID change made/needed]: None

Since bug 1250669 didn't fully resolve the crash, this one seems to have on gecko 48 since it landed. Requesting uplift.
Attachment #8727537 - Flags: approval-mozilla-beta?
Attachment #8727537 - Flags: approval-mozilla-aurora?
Hi Mason, Milan: I was reviewing the crash reports for this one: https://crash-stats.mozilla.com/report/list?signature=get_style#tab-reports and noticed that there was a report on Fx48 on build 20160312030405. This was the only occurrence on 48. I still think uplifting this will improve the situation. Do you agree? Or should we investigate the crash report from 03-12 build to see if we missed something? Please let me know.
Flags: needinfo?(milan)
Flags: needinfo?(mchang)
(In reply to Ritu Kothari (:ritu) from comment #23)
> Hi Mason, Milan: I was reviewing the crash reports for this one:
> https://crash-stats.mozilla.com/report/list?signature=get_style#tab-reports
> and noticed that there was a report on Fx48 on build 20160312030405. This
> was the only occurrence on 48. I still think uplifting this will improve the
> situation. Do you agree? Or should we investigate the crash report from
> 03-12 build to see if we missed something? Please let me know.

I agree, I think uplifting this will fix many of the crashes in 46/47 we're seeing on crash-stats.
Flags: needinfo?(mchang)
Comment on attachment 8727537 [details] [diff] [review]
Lookup font and font family from font face when requesting SkTypeface

Crash fix, taking it in Aurora47 and Beta46.
Attachment #8727537 - Flags: approval-mozilla-beta?
Attachment #8727537 - Flags: approval-mozilla-beta+
Attachment #8727537 - Flags: approval-mozilla-aurora?
Attachment #8727537 - Flags: approval-mozilla-aurora+
has problems uplifting to beta:

Tomcats-MacBook-Pro-2:mozilla-central Tomcat$ hg graft --edit -r cd661d54ee73
grafting 334824:cd661d54ee73 "Bug 1249600 - Lookup font and font family from font face when requesting SkTypeface. r=bas r=ritu"
merging gfx/2d/ScaledFontDWrite.cpp
warning: conflicts while merging gfx/2d/ScaledFontDWrite.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)
Flags: needinfo?(mchang)
Attached patch Patch for betaSplinter Review
Rebase for beta.
Flags: needinfo?(milan)
Flags: needinfo?(mchang)
Flags: needinfo?(cbook)
Setting the flag for manual verification, instructions available in Comment 0.
Flags: qe-verify+
Unable to reproduce this crash with 47.0a1 (from 2016-02-19), under Windows 10 64-bit and Mac OS X 10.11.1; after changing the prefs via about:config, I still get d2d 1.1 for canvas:
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	skia
Any ideas how could I reproduce the crash in order to properly confirm the fix?

Although, via Socorro there are some crashes after this fix was pushed [1] and at a first glance, those doesn't seem related. Mason, what's your input on this matter? Thanks in advance!

[1] https://crash-stats.mozilla.com/signature/?product=Firefox&signature=get_style&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
Flags: needinfo?(mchang)
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #31)
> Unable to reproduce this crash with 47.0a1 (from 2016-02-19), under Windows
> 10 64-bit and Mac OS X 10.11.1; after changing the prefs via about:config, I
> still get d2d 1.1 for canvas:
> AzureCanvasBackend	direct2d 1.1
> AzureContentBackend	skia
> Any ideas how could I reproduce the crash in order to properly confirm the
> fix?

You can also set the preference "gfx.direct2d.disabled" to true, set the "gfx.canvas.azure.backends" preference to "skia" and set the "gfx.content.azure.backends" preference to "cairo". 

> 
> Although, via Socorro there are some crashes after this fix was pushed [1]
> and at a first glance, those doesn't seem related. Mason, what's your input
> on this matter? Thanks in advance!

Some of those crash reports look like they've had quite a bit of uptime. A lot of them also have device resets or failure to allocate some textures. I wonder if we originally had d2d and dwrite fonts, then we had a device reset, and can no longer access dwrite fonts and have to use GDI. I'll take a look.
Status: RESOLVED → REOPENED
Flags: needinfo?(mchang)
Resolution: FIXED → ---
This crash is happening because we somehow can't get an IDwriteFont at [1]. From the crash reports, it mostly seems to happen after a TDR as there are already lots of gfx errors in the metadata as well as multiple device resets. Since we never fallback to GDI if we initially had D2D, we're still trying to use dwrite fonts but can't get the right font for some reason.

This patch is more aggressive in handling this error. If any part of the chain to get a font from the system font fails, we just return null and don't draw anything. We also try to use the default Arial font from the system fonts just in case we can't find a font with the given font face.

[1] https://hg.mozilla.org/releases/mozilla-release/annotate/0b8492c110be/gfx/2d/ScaledFontDWrite.cpp#l154
Attachment #8754596 - Flags: review?(bas)
Comment on attachment 8754596 [details] [diff] [review]
Draw nothing if we can't get an SkTypeface

Review of attachment 8754596 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/ScaledFontDWrite.cpp
@@ +170,5 @@
>    }
>  
>    hr = mFont->GetFontFamily(getter_AddRefs(mFontFamily));
>    if (FAILED(hr)) {
> +    gfxCriticalNote << "Failed to get font family from font face. Code: " << hexa(hr);

It's not clear to me why we wouldn't default to Arial here as well?
Attachment #8754596 - Flags: review?(bas) → review+
Try looks good - https://treeherder.mozilla.org/#/jobs?repo=try&revision=5084eb63790e

(In reply to Bas Schouten (:bas.schouten) from comment #34)
> Comment on attachment 8754596 [details] [diff] [review]
> Draw nothing if we can't get an SkTypeface
> 
> Review of attachment 8754596 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/2d/ScaledFontDWrite.cpp
> @@ +170,5 @@
> >    }
> >  
> >    hr = mFont->GetFontFamily(getter_AddRefs(mFontFamily));
> >    if (FAILED(hr)) {
> > +    gfxCriticalNote << "Failed to get font family from font face. Code: " << hexa(hr);
> 
> It's not clear to me why we wouldn't default to Arial here as well?

Fixed to do that.
Comment on attachment 8754596 [details] [diff] [review]
Draw nothing if we can't get an SkTypeface

Approval Request Comment
[Feature/regressing bug #]: Bug 842894, dwrite font support in skia
[User impact if declined]: Users can crash after a TDR and after they fail to recover d2d.
[Describe test coverage new/current, TreeHerder]: Manual testing
[Risks and why]: Low, this code is only used after a device reset and the user cannot recover d2d, then visit a canvas page. 
[String/UUID change made/needed]: None
Attachment #8754596 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/be999dda2b73
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED
Comment on attachment 8754596 [details] [diff] [review]
Draw nothing if we can't get an SkTypeface

Next time, please create a new bug to follow up. Not doing it increases the risk that sheriff or release management miss the uplift.

By the way, are you sure we don't want to uplift that in beta too?
Flags: needinfo?(mchang)
Attachment #8754596 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Target Milestone: mozilla48 → mozilla49
(In reply to Sylvestre Ledru [:sylvestre] from comment #39)
> Comment on attachment 8754596 [details] [diff] [review]
> Draw nothing if we can't get an SkTypeface
> 
> Next time, please create a new bug to follow up. Not doing it increases the
> risk that sheriff or release management miss the uplift.
> 
> By the way, are you sure we don't want to uplift that in beta too?

We can, I just want to make sure it fixes some of the crashes in 48/47 first.
See comment 38. Rebased for beta. I haven't seen any crashes on 48 or 49 since we originally landed this, so requesting uplift for 47.
Flags: needinfo?(mchang)
Attachment #8758770 - Flags: review+
Attachment #8758770 - Flags: approval-mozilla-beta?
Hi Milan, RC1 was pushed to Beta channel this morning. I have no drivers for RC2 at this point in time, may be one ride-along fix. Do you think uplifting this to Fx47 is a must at this stage without disrupting existing quality/schedule? Please let me know.
Flags: needinfo?(milan)
This is a tough one.  We've identified the regression range (see comment 1) down to this showing up in 47.  It means we'd ship it for the whole release before giving people the fix in 48.  From that point of view, I'd like to see it in 47.  No need to ship a crash we know about, and we have fixed.

On the other hand, I'm somewhat confused by the matching crashes in 46, which suggests we had the problem before, and just increased the frequency enough to notice it.

Either way, I'd feel better if we put together the RC2 with this fix in it.
Flags: needinfo?(milan)
Thanks Milan! I wish we had included this in RC1 (I think we just missed it) because the fix was ready last week. In any case, I will go with your judgement call here and plan for an RC2 gtb tomorrow.
Comment on attachment 8758770 [details] [diff] [review]
Draw Nothing if we can't get an SkTypeface for beta

Milan suggested I do an RC2 with this fix as this is a medium volume issue in Fx47 but high volume on Fx46. The fix was verified on Nightly49/Aurora48. The crash data with fix looks very promising. Release47+
Attachment #8758770 - Flags: approval-mozilla-beta? → approval-mozilla-release+
I'm gonna go ahead and remove the qe-verify+ flag from this bug, here's why:
- we were unable to reproduce the crash in the past (see Comment 31) and my own attempt today has also failed
- according to socorro, there were no recent crashes on 47.0 or 48.0b2 for that matter, which is very reassuring

If there's anyone that managed to reproduce the crash in the first place and is willing to confirm the patch on a fixed build [1], I'd really appreciate it!


[1] http://archive.mozilla.org/pub/firefox/candidates/48.0b3-candidates/build1/
Flags: qe-verify+ → qe-verify-
You need to log in before you can comment on or make changes to this bug.