Closed Bug 1149318 Opened 5 years ago Closed 4 years ago
crash in Sk
Font Mgr _Direct Write::Sk Font Mgr _Direct Write(IDWrite Factory*, IDWrite Font Collection*, wchar _t*, int)
This bug was filed from the Socorro interface and is report bp-2c594b93-4ca0-4c98-ba6b-e6baf2150330. ============================================================= Mentioned by a reddit user. See http://www.reddit.com/r/firefox/comments/30tewy/strange_crash_always_happens_when_visiting_a/
From the reddit post, it looks like it might be a driver issue. Let's see if this still comes up after this user updates their drivers.
https://crash-stats.mozilla.com/report/index/ccadcc4a-46d6-49ff-9661-33e9a2150507 I have the latest nvidia drivers for my GPU and got this crash just now. My Intel GPU drivers are quite old however, and I believe Firefox was using the Intel GPU. I will update my drivers after work and see if I get this crash again.
Have been seeing this on several different machines on Google Maps Street View. I have HWA disabled (Firefox disables it automatically anyway), but hardware fonts enabled via gfx.font_rendering.directwrite.enabled. Disabling hardware fonts prevents the crash but makes websites look terrible. Crash 100% reproducable by visiting Google Maps and attempting to go into Street View. On other browsers I notice that Street View now shows street names projected on the ground. I do not recall seeing that feature at all prior to the crashing, so I suspect that's the cause. https://crash-stats.mozilla.com/report/index/ae42bbdb-26fc-435a-97ae-61e0e2150519
I've been receiving reports from customers that text operations were not working on the website, so I decided to investigate. And to my surprise only 1 of the machines in the office was able to replicate the issue. While investigating the issue I attempted to load https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Drawing_text and this caused the browser to crash. Upon submitting the crash report (https://crash-stats.mozilla.com/report/index/e284dc1a-dc6c-4002-91de-90e852150629) all canvas text operations seem to be working flawlessly.
I haven't experienced this issue for a long time and can visit all of the links in prior comments without Firefox crashing. I still have DirectWrite and HW accel on. I updated my Intel GPU drivers to the latest version, which I believe fixed the problem in my case.
Breakdown of dwrite.dll versions for recent beta crashreports: 15 6.1.7600.16385 7 6.1.7600.16699 3 6.1.7600.16972 1 6.1.7600.20602 11 6.1.7601.17514 2 6.1.7601.18834 1 6.2.9200.16384 17 6.2.9200.16492 11 6.2.9200.16571 15 6.2.9200.17292 68 6.2.9200.17461 12 6.3.9600.16384 1 6.3.9600.17111 1 6.3.9600.17415 2 6.3.9600.17795 2 6.3.9600.17999 Whatever is going on here is basically spread evenly across all versions of DirectWrite.
This is canvas text related, based both on the codepath and on the comments here. Examples: https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Drawing_text any Google street view Comment 4 suggests this may occur in D2D disabled, DirectWrite manually enabled cases.
Counts of the AdapterVendorId info from beta crashreports (41.0b). Doesn't seem to be specific to a single given driver/vendor. 0x8086 is Intel?
For recent beta crashreports, this is #39.
(In reply to John Daggett (:jtd) from comment #10) > 0x8086 is Intel? Can confirm 0x8086 is Intel, checked my laptop's About:Support page.
This is showing up in rather high volume on 42b1. There's definitely some corruption going on; one of the parameters on the stack (I think |factory| but I'm not certain) has been clobbered and replaced by a value derived form the user's locale setting. For example, en-US users tend to crash on address 0x006e0065 ('en'), and es-ES users tend to crash on 0x00730065 ('es'), etc. The majority of crash reports have Intel driver 184.108.40.20697. John, is there anything we can do in terms of blocklisting etc. to avoid whatever codepath causes this corruption?
[Tracking Requested - why for this release]: This is near the top of the crash scores on 42b1
Oh you know what - I didn't even realize that there's a |localeName| parameter involved in that constructor; that could be relevant! Sounds like there's a level of indirection missing? The code was expecting a pointer to that string, but got the raw chars reinterpreted as a pointer?
> one of the parameters on the > stack (I think |factory| but I'm not certain) has been clobbered I was off by one; the bogus pointer value is actually where |fontCollection| was supposed to be. Here's a stack from a crash dump, showing the call between SkFontMgr_New_DirectWrite and SkFontMgr_DirectWrite::SkFontMgr_DirectWrite: 002e68dc 601e1ecd xul!SkFontMgr_New_DirectWrite+0x82 <-- return address 002e68e0 001c8b98 <-- factory 002e68e4 006e0065 <-- fontCollection 002e68e8 002e6908 <-- localeName 002e68ec 00000006 <-- localeNameLen 002e68f0 002e6be8 002e68f4 00000000 002e68f8 0083f020 002e68fc 774c03f3 002e6900 006e0065 <-- 'e' 'n' 002e6904 0055002d <-- '-' 'U' 002e6908 00000053 <-- 'S' '\0' (Note, no 'this' pointer because it's passed in the ecx register) The |localeName| parameter actually points to 2e6908, which is the string "S", not "en-US"! Looking at the disassembly, I want to say that 0x2e6900 was actually intended to be the storage space for the caller's |sysFontCollection| variable. My theory is: |localeNameStorage| was always meant to be 0x2e6908, and we did getUserDefaultLocaleNameProc(0x2e6908), but the function actually wrote to 0x2e6900 instead, clobbering the |sysFontCollection| variable.
Another possibility is that |localeNameStorage| was (still) meant to be 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900). How can I make this function get executed? I'd like to step through it under a debugger.
(In reply to David Major [:dmajor] from comment #13) > This is showing up in rather high volume on 42b1. > > There's definitely some corruption going on; one of the parameters on the > stack (I think |factory| but I'm not certain) has been clobbered and > replaced by a value derived form the user's locale setting. > > For example, en-US users tend to crash on address 0x006e0065 ('en'), and > es-ES users tend to crash on 0x00730065 ('es'), etc. > > The majority of crash reports have Intel driver 220.127.116.1197. John, is there > anything we can do in terms of blocklisting etc. to avoid whatever codepath > causes this corruption? I don't really know. Since this is related to the use of Skia in 2D canvas this may be a better question for Bas and/or Jeff M. Last time I checked, the Skia trunk code in this area hadn't changed much but it would probably be a good idea to reach out to them to see if they are aware of this. Guessing Chrome may have hit this once they enabled the use of DirectWrite. Jeff/Bas, any ideas?
(In reply to David Major [:dmajor] from comment #17) > Another possibility is that |localeNameStorage| was (still) meant to be > 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900). > > How can I make this function get executed? I'd like to step through it under > a debugger. Google maps possibly?
(In reply to John Daggett (:jtd) from comment #19) > (In reply to David Major [:dmajor] from comment #17) > > Another possibility is that |localeNameStorage| was (still) meant to be > > 0x2e6908, but we accidentally called getUserDefaultLocaleNameProc(0x2e6900). > > > > How can I make this function get executed? I'd like to step through it under > > a debugger. > > Google maps possibly? No luck on google maps, nor on agar.io which is the most commonly reported URL. Does the use of this codepath depend on my gfx hardware and/or prefs?
FWIW, I don't think we should be using Skia with DirectWrite. We're only supposed to be using dwrite with D2D (with fallback to cairo). So does anyone know why we're using dwrite with skia?
I got tired of trying to execute this function "for real", so I executed it artificially by setting my instruction pointer to SkFontMgr_New_DirectWrite. The stack pointer gets corrupted after the call to getUserDefaultLocaleNameProc, due to a bad calling convention (missing WINAPI) on: typedef int (*SkGetUserDefaultLocaleNameProc)(LPWSTR, int); As far as I can tell, _any_ machine that executes SkFontMgr_New_DirectWrite will crash; for some reason though, only certain machines are coming down this codepath to begin with.
The code is correct upstream: https://github.com/google/skia/blob/master/src/utils/win/SkDWrite.h#L52 How does one bring in the fix?
Apparently this upstream patch wasn't pulled in correctly? https://github.com/google/skia/commit/8124bf072c40c2a2a6d58ae19f1951a59dc4757e https://hg.mozilla.org/mozilla-central/rev/3a3820a17f0b
[Tracking Requested - why for this release]: This long-standing crash is spiking on version 42. The fix should be trivial and extremely safe. I think we should fix on as many channels as possible, maybe even ESR?
Should just be as simple as applying the upstream patch and landing it. As it's already upstream, there's no need to keep track of it on our end.
Flags: needinfo?(gwright) → needinfo?(lsalzman)
(In reply to George Wright (:gw280) from comment #27) > Should just be as simple as applying the upstream patch and landing it. As > it's already upstream, there's no need to keep track of it on our end. If it's already upstream and it is part of resolving this issue, then feel free to just attach the patch for checkin here.
Copied the fix from upstream
Assignee: jdaggett → dmajor
Attachment #8668014 - Flags: review?(lsalzman)
Make us match upstream with WINAPI usage on the typedef, as requested by :dmajor
Oops. All yours! :)
Assignee: dmajor → lsalzman
Comment on attachment 8668014 [details] [diff] [review] Fix the calling convention on SkGetUserDefaultLocaleNameProc Already upstreamed, so no problem.
Attachment #8668014 - Flags: review?(lsalzman) → review+
Tagback! :) If you are the one investigating and verifying the issue, I'd rather you remain the assignee.
Assignee: lsalzman → dmajor
I've confirmed with a debugger that this patch fixes the issue. The code now finds the correct values on the stack.
Given that this has been in upstream for a while and we can confirm in a debugger, let's get this nominated for uplift very soon, so that it's in the beta the goes to build tomorrow.
Comment on attachment 8668014 [details] [diff] [review] Fix the calling convention on SkGetUserDefaultLocaleNameProc Approval Request Comment [Feature/regressing bug #]: bug 1017113 [User impact if declined]: repeated crashes on certain machines [Describe test coverage new/current, TreeHerder]: we can't test this in automation, but I have tested it under a debugger [Risks and why]: Trivial patch, extremely safe, and necessary for correctness [String/UUID change made/needed]: none
Comment on attachment 8668014 [details] [diff] [review] Fix the calling convention on SkGetUserDefaultLocaleNameProc Fix a crash, Should be in 42b3.
(In reply to Carsten Book [:Tomcat] from comment #40) > https://treeherder.mozilla.org/#/jobs?repo=mozilla- > aurora&revision=3a47df4bf819 erm https://hg.mozilla.org/releases/mozilla-aurora/rev/3a47df4bf819 of course :)
Thanks David for all your work tracking this down!!
You need to log in before you can comment on or make changes to this bug.