crash in SkFontMgr_DirectWrite::SkFontMgr_DirectWrite(IDWriteFactory*, IDWriteFontCollection*, wchar_t*, int)

RESOLVED FIXED in Firefox 42

Status

()

Core
Graphics: Text
--
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: aklotz, Assigned: dmajor)

Tracking

({crash, topcrash-win})

Trunk
mozilla44
x86
Windows NT
crash, topcrash-win
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 wontfix, firefox42+ fixed, firefox43+ fixed, firefox44 fixed)

Details

(Whiteboard: gfx-noted, crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-2c594b93-4ca0-4c98-ba6b-e6baf2150330.
=============================================================
Mentioned by a reddit user. See
http://www.reddit.com/r/firefox/comments/30tewy/strange_crash_always_happens_when_visiting_a/
From the reddit post, it looks like it might be a driver issue. Let's see if this still comes up after this user updates their drivers.
Whiteboard: gfx-noted

Comment 2

3 years ago
Crash on http://www.mmoga.com/Diablo-2/Diablo-2-Keys/Diablo-2-and-Lord-of-Destruction-Key.html
Crash report https://crash-stats.mozilla.com/report/index/b0aa2d51-3745-496f-a009-975bf2150417

Comment 3

3 years ago
https://crash-stats.mozilla.com/report/index/ccadcc4a-46d6-49ff-9661-33e9a2150507

I have the latest nvidia drivers for my GPU and got this crash just now. My Intel GPU drivers are quite old however, and I believe Firefox was using the Intel GPU.

I will update my drivers after work and see if I get this crash again.

Comment 4

3 years ago
Have been seeing this on several different machines on Google Maps Street View.  I have HWA disabled (Firefox disables it automatically anyway), but hardware fonts enabled via gfx.font_rendering.directwrite.enabled.  Disabling hardware fonts prevents the crash but makes websites look terrible.

Crash 100% reproducable by visiting Google Maps and attempting to go into Street View.  On other browsers I notice that Street View now shows street names projected on the ground.  I do not recall seeing that feature at all prior to the crashing, so I suspect that's the cause.

https://crash-stats.mozilla.com/report/index/ae42bbdb-26fc-435a-97ae-61e0e2150519

Comment 5

2 years ago
I've been receiving reports from customers that text operations were not working on the website, so I decided to investigate. And to my surprise only 1 of the machines in the office was able to replicate the issue.

While investigating the issue I attempted to load https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Drawing_text and this caused the browser to crash. Upon submitting the crash report (https://crash-stats.mozilla.com/report/index/e284dc1a-dc6c-4002-91de-90e852150629) all canvas text operations seem to be working flawlessly.

Updated

2 years ago
Duplicate of this bug: 1185017

Updated

2 years ago
Assignee: nobody → jdaggett

Comment 7

2 years ago
I haven't experienced this issue for a long time and can visit all of the links in prior comments without Firefox crashing. I still have DirectWrite and HW accel on.

I updated my Intel GPU drivers to the latest version, which I believe fixed the problem in my case.

Comment 8

2 years ago
Breakdown of dwrite.dll versions for recent beta crashreports:

  15 6.1.7600.16385
   7 6.1.7600.16699
   3 6.1.7600.16972
   1 6.1.7600.20602
  11 6.1.7601.17514
   2 6.1.7601.18834
   1 6.2.9200.16384
  17 6.2.9200.16492
  11 6.2.9200.16571
  15 6.2.9200.17292
  68 6.2.9200.17461
  12 6.3.9600.16384
   1 6.3.9600.17111
   1 6.3.9600.17415
   2 6.3.9600.17795
   2 6.3.9600.17999

Whatever is going on here is basically spread evenly across all versions of DirectWrite.

Comment 9

2 years ago
This is canvas text related, based both on the codepath and on the comments here.

Examples:

https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Drawing_text
any Google street view

Comment 4 suggests this may occur in D2D disabled, DirectWrite manually enabled cases.

Updated

2 years ago
Keywords: testcase-wanted

Comment 10

2 years ago
Created attachment 8649641 [details]
graphics adapter info for beta crashreports

Counts of the AdapterVendorId info from beta crashreports (41.0b). Doesn't seem to be specific to a single given driver/vendor. 0x8086 is Intel?

Comment 11

2 years ago
For recent beta crashreports, this is #39.
Keywords: topcrash-win

Comment 12

2 years ago
(In reply to John Daggett (:jtd) from comment #10)
> 0x8086 is Intel?

Can confirm 0x8086 is Intel, checked my laptop's About:Support page.
(Assignee)

Comment 13

2 years ago
This is showing up in rather high volume on 42b1.

There's definitely some corruption going on; one of the parameters on the stack (I think |factory| but I'm not certain) has been clobbered and replaced by a value derived form the user's locale setting.

For example, en-US users tend to crash on address 0x006e0065 ('en'), and es-ES users tend to crash on 0x00730065 ('es'), etc.

The majority of crash reports have Intel driver 8.15.10.2697. John, is there anything we can do in terms of blocklisting etc. to avoid whatever codepath causes this corruption?
Flags: needinfo?(jdaggett)
(Assignee)

Comment 14

2 years ago
[Tracking Requested - why for this release]: This is near the top of the crash scores on 42b1
status-firefox41: --- → wontfix
status-firefox42: --- → affected
status-firefox43: --- → affected
tracking-firefox43: --- → ?
(Assignee)

Comment 15

2 years ago
Oh you know what - I didn't even realize that there's a |localeName| parameter involved in that constructor; that could be relevant!

Sounds like there's a level of indirection missing? The code was expecting a pointer to that string, but got the raw chars reinterpreted as a pointer?
(Assignee)

Comment 16

2 years ago
> one of the parameters on the
> stack (I think |factory| but I'm not certain) has been clobbered
I was off by one; the bogus pointer value is actually where |fontCollection| was supposed to be. 

Here's a stack from a crash dump, showing the call between SkFontMgr_New_DirectWrite and SkFontMgr_DirectWrite::SkFontMgr_DirectWrite:

002e68dc  601e1ecd xul!SkFontMgr_New_DirectWrite+0x82 <-- return address
002e68e0  001c8b98 <-- factory
002e68e4  006e0065 <-- fontCollection
002e68e8  002e6908 <-- localeName
002e68ec  00000006 <-- localeNameLen
002e68f0  002e6be8
002e68f4  00000000
002e68f8  0083f020
002e68fc  774c03f3 
002e6900  006e0065 <-- 'e' 'n'
002e6904  0055002d <-- '-' 'U'
002e6908  00000053 <-- 'S' '\0'

(Note, no 'this' pointer because it's passed in the ecx register)

The |localeName| parameter actually points to 2e6908, which is the string "S", not "en-US"! Looking at the disassembly, I want to say that 0x2e6900 was actually intended to be the storage space for the caller's |sysFontCollection| variable.

My theory is: |localeNameStorage| was always meant to be 0x2e6908, and we did getUserDefaultLocaleNameProc(0x2e6908), but the function actually wrote to 0x2e6900 instead, clobbering the |sysFontCollection| variable.
(Assignee)

Comment 17

2 years ago
Another possibility is that |localeNameStorage| was (still) meant to be 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900).

How can I make this function get executed? I'd like to step through it under a debugger.

Comment 18

2 years ago
(In reply to David Major [:dmajor] from comment #13)
> This is showing up in rather high volume on 42b1.
> 
> There's definitely some corruption going on; one of the parameters on the
> stack (I think |factory| but I'm not certain) has been clobbered and
> replaced by a value derived form the user's locale setting.
> 
> For example, en-US users tend to crash on address 0x006e0065 ('en'), and
> es-ES users tend to crash on 0x00730065 ('es'), etc.
> 
> The majority of crash reports have Intel driver 8.15.10.2697. John, is there
> anything we can do in terms of blocklisting etc. to avoid whatever codepath
> causes this corruption?

I don't really know. Since this is related to the use of Skia in 2D canvas this may be a better question for Bas and/or Jeff M. Last time I checked, the Skia trunk code in this area hadn't changed much but it would probably be a good idea to reach out to them to see if they are aware of this. Guessing Chrome may have hit this once they enabled the use of DirectWrite. Jeff/Bas, any ideas?
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jdaggett)
Flags: needinfo?(bas)

Comment 19

2 years ago
(In reply to David Major [:dmajor] from comment #17)
> Another possibility is that |localeNameStorage| was (still) meant to be
> 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900).
> 
> How can I make this function get executed? I'd like to step through it under
> a debugger.

Google maps possibly?
(Assignee)

Comment 20

2 years ago
(In reply to John Daggett (:jtd) from comment #19)
> (In reply to David Major [:dmajor] from comment #17)
> > Another possibility is that |localeNameStorage| was (still) meant to be
> > 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900).
> > 
> > How can I make this function get executed? I'd like to step through it under
> > a debugger.
> 
> Google maps possibly?

No luck on google maps, nor on agar.io which is the most commonly reported URL. Does the use of this codepath depend on my gfx hardware and/or prefs?
FWIW, I don't think we should be using Skia with DirectWrite. We're only supposed to be using dwrite with D2D (with fallback to cairo). So does anyone know why we're using dwrite with skia?
Flags: needinfo?(jmuizelaar)
I suspect the large increase here is caused by bug 1208833.
Depends on: 1208833
Flags: needinfo?(bas)
(Assignee)

Comment 23

2 years ago
I got tired of trying to execute this function "for real", so I executed it artificially by setting my instruction pointer to SkFontMgr_New_DirectWrite.

The stack pointer gets corrupted after the call to getUserDefaultLocaleNameProc, due to a bad calling convention (missing WINAPI) on:
  typedef int (*SkGetUserDefaultLocaleNameProc)(LPWSTR, int);

As far as I can tell, _any_ machine that executes SkFontMgr_New_DirectWrite will crash; for some reason though, only certain machines are coming down this codepath to begin with.
(Assignee)

Comment 24

2 years ago
The code is correct upstream: https://github.com/google/skia/blob/master/src/utils/win/SkDWrite.h#L52

How does one bring in the fix?
Flags: needinfo?(gwright)
(Assignee)

Comment 25

2 years ago
Apparently this upstream patch wasn't pulled in correctly?

https://github.com/google/skia/commit/8124bf072c40c2a2a6d58ae19f1951a59dc4757e
https://hg.mozilla.org/mozilla-central/rev/3a3820a17f0b
Blocks: 1017113
(Assignee)

Comment 26

2 years ago
[Tracking Requested - why for this release]: This long-standing crash is spiking on version 42. The fix should be trivial and extremely safe. I think we should fix on as many channels as possible, maybe even ESR?
tracking-firefox42: --- → ?
Should just be as simple as applying the upstream patch and landing it. As it's already upstream, there's no need to keep track of it on our end.
Flags: needinfo?(gwright) → needinfo?(lsalzman)
(In reply to George Wright (:gw280) from comment #27)
> Should just be as simple as applying the upstream patch and landing it. As
> it's already upstream, there's no need to keep track of it on our end.

If it's already upstream and it is part of resolving this issue, then feel free to just attach the patch for checkin here.
Flags: needinfo?(lsalzman)
(Assignee)

Comment 29

2 years ago
Created attachment 8668014 [details] [diff] [review]
Fix the calling convention on SkGetUserDefaultLocaleNameProc

Copied the fix from upstream
Assignee: jdaggett → dmajor
Attachment #8668014 - Flags: review?(lsalzman)
Created attachment 8668015 [details] [diff] [review]
use WINAPI on SkGetUserDefaultLocaleNameProc typedef

Make us match upstream with WINAPI usage on the typedef, as requested by :dmajor
Attachment #8668015 - Flags: review?(gwright)
(Assignee)

Comment 31

2 years ago
Oops. All yours! :)
Assignee: dmajor → lsalzman
Attachment #8668015 - Attachment is obsolete: true
Attachment #8668015 - Flags: review?(gwright)
Comment on attachment 8668014 [details] [diff] [review]
Fix the calling convention on SkGetUserDefaultLocaleNameProc

Already upstreamed, so no problem.
Attachment #8668014 - Flags: review?(lsalzman) → review+
Tagback! :) If you are the one investigating and verifying the issue, I'd rather you remain the assignee.
Assignee: lsalzman → dmajor
(Assignee)

Comment 34

2 years ago
I've confirmed with a debugger that this patch fixes the issue. The code now finds the correct values on the stack.

Comment 35

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/d3f16a3d97a1

Comment 36

2 years ago
Given that this has been in upstream for a while and we can confirm in a debugger, let's get this nominated for uplift very soon, so that it's in the beta the goes to build tomorrow.
(Assignee)

Comment 37

2 years ago
Comment on attachment 8668014 [details] [diff] [review]
Fix the calling convention on SkGetUserDefaultLocaleNameProc

Approval Request Comment
[Feature/regressing bug #]: bug 1017113
[User impact if declined]: repeated crashes on certain machines
[Describe test coverage new/current, TreeHerder]: we can't test this in automation, but I have tested it under a debugger
[Risks and why]: Trivial patch, extremely safe, and necessary for correctness
[String/UUID change made/needed]: none
Attachment #8668014 - Flags: approval-mozilla-beta?
Attachment #8668014 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/d3f16a3d97a1
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox44: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Comment on attachment 8668014 [details] [diff] [review]
Fix the calling convention on SkGetUserDefaultLocaleNameProc

Fix a crash, Should be in 42b3.
Attachment #8668014 - Flags: approval-mozilla-beta?
Attachment #8668014 - Flags: approval-mozilla-beta+
Attachment #8668014 - Flags: approval-mozilla-aurora?
Attachment #8668014 - Flags: approval-mozilla-aurora+
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=3a47df4bf819

 https://hg.mozilla.org/releases/mozilla-beta/rev/33df25733a5a
status-firefox42: affected → fixed
status-firefox43: affected → fixed
(In reply to Carsten Book [:Tomcat] from comment #40)
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> aurora&revision=3a47df4bf819

erm https://hg.mozilla.org/releases/mozilla-aurora/rev/3a47df4bf819 of course :)

Comment 42

2 years ago
Thanks David for all your work tracking this down!!
tracking-firefox42: ? → +
tracking-firefox43: ? → +
Keywords: testcase-wanted
You need to log in before you can comment on or make changes to this bug.