Last Comment Bug 1218473 - Crash spike in CreateWindowEx with Firefox 42.0b9 on Optimus
: Crash spike in CreateWindowEx with Firefox 42.0b9 on Optimus
Status: RESOLVED FIXED
[44dll]
: crash
Product: Core
Classification: Components
Component: IPC (show other bugs)
: Trunk
: x86 Windows NT
-- critical with 3 votes (vote)
: mozilla52
Assigned To: Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
:
: Bill McCloskey (:billm)
Mentors:
: 1218128 1236338 1236718 1237240 (view as bug list)
Depends on: 1243098 1249849
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-26 10:57 PDT by Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
Modified: 2016-10-17 13:17 PDT (History)
31 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
wontfix
-
wontfix
-
wontfix
-
wontfix
wontfix
wontfix
fixed


Attachments
Beta 42 backout (9.62 KB, patch)
2015-10-26 10:59 PDT, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
aklotz: review+
Details | Diff | Splinter Review
Beta 42 backout (9.62 KB, patch)
2015-10-26 11:02 PDT, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
aklotz: review+
lhenry: approval‑mozilla‑aurora+
lhenry: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Patch (5.87 KB, patch)
2015-11-27 13:06 PST, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
ehsan: review+
Details | Diff | Splinter Review
Patch (r2) (6.09 KB, patch)
2015-11-27 16:08 PST, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
aklotz: review+
Details | Diff | Splinter Review
Patch (r3) (6.97 KB, patch)
2016-01-07 09:31 PST, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
aklotz: review+
rkothari: approval‑mozilla‑aurora+
rkothari: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Patch (r4) (6.81 KB, patch)
2016-10-13 16:12 PDT, Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman)
aklotz: review+
Details | Diff | Splinter Review

Description User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 10:57:51 PDT
+++ This bug was initially created as a clone of Bug #1218128 +++

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-2fbc02f7-de63-439f-a485-f12992151024.
=============================================================


Top stack frames:
0 		@0x10f43280 	
1 	xul.dll 	js::CreateRegExpMatchResult(JSContext*, JS::Handle<JSString*>, js::MatchPairs const&, JS::MutableHandle<JS::Value>) 	js/src/builtin/RegExp.cpp
2 	xul.dll 	regexp_exec_impl 	js/src/builtin/RegExp.cpp
3 	xul.dll 	regexp_exec_impl 	js/src/builtin/RegExp.cpp
4 	xul.dll 	js::regexp_exec(JSContext*, unsigned int, JS::Value*) 	js/src/builtin/RegExp.cpp
5 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
6 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp
7 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
8 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
9 	xul.dll 	nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJSClass.cpp
10 	xul.dll 	nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJS.cpp
11 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/md/win32/xptcstubs.cpp
[...]

The three crash signatures attached to the bug are highly spiking in the last beta of this cycle, 42.0b9, and are ~40% of the early crashes for this version. Those crashes are practically all early in startup and the stacks for js::CreateRegExpMatchResult and js::AddTypePropertyId are similar (the latter has 2 more frames or so on top), the js::regexp_exec are heavily truncated.

The signatures did exist previously at much lower volume but exploded as startup crashes in this version. The regression range is https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_42_0b8_RELEASE&tochange=FIREFOX_42_0b9_RELEASE - I'm not sure what can cause this.

This is bad enough to block release and we are only a week before release with this, so we need urgent action here (there is a reason why I file this on a weekend).
Comment 1 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 10:58:15 PDT
As per https://bugzilla.mozilla.org/show_bug.cgi?id=1218128#c12 I am separating this bug out from the JS stack.
Comment 2 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 10:59:51 PDT
Created attachment 8678984 [details] [diff] [review]
Beta 42 backout

Approval Request Comment
[Feature/regressing bug #]: bug 1213567
[User impact if declined]: Crashes
[Describe test coverage new/current, TreeHerder]: plugin tests
[Risks and why]: Low, this was the state of beta prior to beta9
[String/UUID change made/needed]: None
Comment 3 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 11:02:27 PDT
Created attachment 8678986 [details] [diff] [review]
Beta 42 backout

Fixed bug number in commit message.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1218473#c2
Comment 4 User image Liz Henry (:lizzard) (needinfo? me) 2015-10-26 11:55:10 PDT
Comment on attachment 8678986 [details] [diff] [review]
Beta 42 backout

Approved for uplift to beta from discussion with Sylvestre. We need this to fix a startup crash in beta 9.
Comment 5 User image Liz Henry (:lizzard) (needinfo? me) 2015-10-26 11:55:41 PDT
I am guessing this also needs uplift to m-r.
Comment 6 User image Liz Henry (:lizzard) (needinfo? me) 2015-10-26 11:59:50 PDT
[Tracking Requested - why for this release]:

Setting tracking flags and marking each channel as affected. 
We are backing this out from beta but leaving it in 43/44 for the moment to get more information.
Comment 8 User image Arthur K. 2015-10-26 13:27:54 PDT
Could be just that another Display DLL (in this case, dlumd11.dll 9.18.7.9) needs blacklisting? Displaylink software has caused crashes before in bug 1195844 and bug 1160295. Displaylink released a new version of their software on the 22nd (http://downloads.displaylink.com/releasenotes/DisplayLink_7.9%20M3_release-notes.txt).
Comment 9 User image [:philipp] 2015-10-26 14:15:20 PDT
hi arthur, there is no indication that display link drivers would be involved here as well...
Comment 10 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 15:14:11 PDT
Reviewing the correlation data, there is 100% correlation with nVidia drivers. In particular, _etoured.dll is 100% correlation. This is significant because "Detours" is an API interception framework that is licensed by Microsoft Research. Depending on what nVidia drivers are doing with that, they could be modifying code in our address space in such a way that it is incompatible with our own code.
Comment 11 User image Bob Owen (:bobowen) 2015-10-26 16:12:30 PDT
Could https://bugzilla.mozilla.org/show_bug.cgi?id=1218395 also be related then.
Comment 12 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-26 18:00:51 PDT
Another thing I've noticed is that the crash reports are showing Intel Graphics.

My best hypothesis at the moment is that the crashing systems have both Intel Integrated Graphics and nVidia Discrete Graphics and the Intel GPU is set as the active GPU.

Unfortunately I don't have suitable hardware to test this.

In summary, we need:

* Hardware with both Intel Integrated Graphics and nVidia Discrete Graphics;
* Intel Integrated Graphics is the active GPU;
* Windows 8 or newer;
* Running 43 Dev Ed (for now at least, unless patch is backed out) or 42 Beta 9

Kairo, I think we need some QA for this. Who should be the point of contact?
Comment 13 User image Arthur K. 2015-10-26 19:09:12 PDT
(In reply to philipp from comment #9)
> hi arthur, there is no indication that display link drivers would be
> involved here as well...

When I was looking through the crash modules, I saw that DLL highlighted in red so I thought it might have been the culprit.
Comment 14 User image Robert Kaiser 2015-10-28 08:01:09 PDT
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #12)
> Kairo, I think we need some QA for this. Who should be the point of contact?

Usually, Florin is our contact for release QA, but I don't know if they have hardware that matches. The gfx team in Toronto (Milan) has a number of different machines esp. in terms of graphics configurations, so we may have a chance to dig up a machine like that there. I'm ni?ing both to see if we have a machine like that somewhere.
Comment 15 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-28 10:11:36 PDT
Comment on attachment 8678986 [details] [diff] [review]
Beta 42 backout

See comment 2 for request comment. A fix probably isn't going to be ready in the next day or two.
Comment 16 User image Liz Henry (:lizzard) (needinfo? me) 2015-10-28 10:38:35 PDT
Comment on attachment 8678986 [details] [diff] [review]
Beta 42 backout

Approved for uplift to aurora (43) so we avoid hitting this startup crash when 43 moves to beta.
Comment 17 User image Jeff Muizelaar [:jrmuizel] 2015-10-28 10:46:06 PDT
I have a W540 in the QA lab that may be similar enough
Comment 18 User image Jeff Muizelaar [:jrmuizel] 2015-10-28 10:58:05 PDT
I can't trivially reproduce on that machine
Comment 19 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-28 11:13:43 PDT
Thanks Jeff. Do we have any contacts for driver devs at nVidia? I'd love to discuss their use of detours with them...
Comment 20 User image Liz Henry (:lizzard) (needinfo? me) 2015-10-28 12:01:37 PDT
Aaron, should we also be backing this out on nightly (so it doesn't hold up the 44 aurora release?)  
It turns out are merging tomorrow instead of Monday. 
We could still uplift to 44 aurora after tomorrow though. Sorry to bug you about this but I wanted to check that you know about the early merge date.
Comment 21 User image Jeff Muizelaar [:jrmuizel] 2015-10-28 13:38:48 PDT
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #19)
> Thanks Jeff. Do we have any contacts for driver devs at nVidia? I'd love to
> discuss their use of detours with them...

I think Bas knows someone.
Comment 22 User image Robert Kaiser 2015-10-28 18:46:35 PDT
Removing the ni? to me, I already forwarded the QA request.
Comment 23 User image Carsten Book [:Tomcat] 2015-10-29 01:06:01 PDT
this has problems applying to aurora - 

grafting 311479:4d60b9516805 "Bug 1218473: Back out 45ab7cdffbb4 on suspicion of causing spike in CreateWindowEx crashes; r=backout a=lizzard CLOSED TREE"
merging ipc/glue/Neutering.h
warning: conflicts during merge.
merging ipc/glue/Neutering.h incomplete! (edit conflicts, then use 'hg resolve --mark')
merging ipc/glue/WindowsMessageLoop.cpp
merging widget/windows/nsWindow.cpp
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)

could you take a look, thanks!
Comment 24 User image Bob Owen (:bobowen) 2015-10-29 01:36:04 PDT
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/5e1587041f9a

(In reply to Carsten Book [:Tomcat] from comment #23)
> this has problems applying to aurora - 

This was because we had to use MOZ_STACK_CLASS and not MOZ_RAII on Beta.
We should be able to use the Aurora changeset if we decide to backout from 44 (or 45 if it comes to that.)
Comment 26 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-10-29 08:08:12 PDT
(In reply to Bob Owen (:bobowen) from comment #24)
> remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/5e1587041f9a
> 
> (In reply to Carsten Book [:Tomcat] from comment #23)
> > this has problems applying to aurora - 
> 
> This was because we had to use MOZ_STACK_CLASS and not MOZ_RAII on Beta.
> We should be able to use the Aurora changeset if we decide to backout from
> 44 (or 45 if it comes to that.)

Thanks Bob :-)
Comment 27 User image Andrei Vaida [:avaida], Desktop Release QA – please ni? me 2015-10-29 13:24:43 PDT
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #22)
> Removing the ni? to me, I already forwarded the QA request.

We're currently trying to setup a machine with hardware similar to what's been specified in Comment 12. Camelia will follow up here as soon possible.
Comment 28 User image Andrei Vaida [:avaida], Desktop Release QA – please ni? me 2015-11-02 07:20:52 PST
(In reply to Andrei Vaida, QA [:avaida] from comment #27)
> We're currently trying to setup a machine with hardware similar to what's
> been specified in Comment 12. Camelia will follow up here as soon possible.

Just a quick update here, we managed to setup a similar machine and we're currently trying to reproduce the bug. Camelia will follow up with test results on this matter, first thing tomorrow.
Comment 29 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-02 15:13:27 PST
It looks like these crashes went away when 44 went to Aurora (!), so I'd suggest that this be tested with 42 Beta 9.
Comment 30 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-02 15:14:00 PST
Looks like this isn't an issue in 44.
Comment 31 User image Camelia Badau, QA [:cbadau] 2015-11-03 04:40:02 PST
I've tested on Windows 8.1 32bit using Firefox 42 Beta 9 (buildID: 20151022152545) and I can't reproduce the issue (the crash). I've tried several times to reproduce this crash, but with no success. 

I've tested on a machine with the following characteristics: 
- Hardware with both Intel Integrated Graphics (Intel HD 2500) and nVidia Discrete Graphics (NVIDIA GeForce 210);
- Intel Integrated Graphics is the active GPU
Comment 32 User image Gian-Carlo Pascutto [:gcp] 2015-11-13 07:51:41 PST
I got a machine with a matching hardware configuration that kept hitting this repeatedly on startup with current Nightly. It stopped after a few times and restarting in safe mode once, so unfortunately there's no real steps to reproduce.
Comment 33 User image Jeff Muizelaar [:jrmuizel] 2015-11-13 08:21:07 PST
(In reply to Gian-Carlo Pascutto [:gcp] from comment #32)
> I got a machine with a matching hardware configuration that kept hitting
> this repeatedly on startup with current Nightly. It stopped after a few
> times and restarting in safe mode once, so unfortunately there's no real
> steps to reproduce.

Does it come back if you use a fresh profile?
Comment 34 User image Gian-Carlo Pascutto [:gcp] 2015-11-13 08:29:50 PST
(In reply to Jeff Muizelaar [:jrmuizel] from comment #33)
> Does it come back if you use a fresh profile?

Ding ding, you win!

(yes)
Comment 35 User image Milan Sreckovic [:milan] 2015-11-13 13:50:08 PST
Do you have a pointer (from about:crashes) to some of these crashes?
Comment 36 User image Gian-Carlo Pascutto [:gcp] 2015-11-14 01:18:05 PST
All similar to: https://crash-stats.mozilla.com/report/index/e884dc17-957f-4270-86ab-f59742151113
Comment 37 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-20 07:54:35 PST
The NVIDIA DLLs are injected via AppInit_DLLs:

ntdll!ZwMapViewOfSection+0x12
ntdll!LdrpMapViewOfSection+0xc7
ntdll!LdrpFindOrMapDll+0x345
ntdll!LdrpLoadDll+0x2b6
ntdll!LdrLoadDll+0xaa
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
KERNELBASE!LoadLibraryExW+0x1f7
KERNELBASE!LoadLibraryExA+0x26
kernel32!LoadLibraryA+0xba
nvinit+0x11cb
nvinit+0x5477
nvinit!nvCoprocThunk+0x6e94
nvinit!nvCoprocThunk+0x6e1a
ntdll!LdrpCallInitRoutine+0x14
ntdll!LdrpRunInitializeRoutines+0x26f
ntdll!LdrpLoadDll+0x453
ntdll!LdrLoadDll+0xaa
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
KERNELBASE!LoadLibraryExW+0x1f7
kernel32!BasepLoadAppInitDlls+0x167
kernel32!LoadAppInitDlls+0x82
USER32!ClientThreadSetup+0x1f9
USER32!__ClientThreadSetup+0x5
ntdll!KiUserCallbackDispatcher+0x2e
GDI32!GdiDllInitialize+0x1c
USER32!_UserClientDllInitialize+0x32f
ntdll!LdrpCallInitRoutine+0x14
ntdll!LdrpRunInitializeRoutines+0x26f
ntdll!LdrpLoadDll+0x453
ntdll!LdrLoadDll+0xaa
mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
KERNELBASE!LoadLibraryExW+0x1f7
firefox!XPCOMGlueLoad+0x23c
firefox!XPCOMGlueStartup+0x1d
firefox!InitXPCOMGlue+0xba
firefox!NS_internal_main+0x5c
firefox!wmain+0xbe
firefox!__tmainCRTStartup+0xfe
kernel32!BaseThreadInitThunk+0xe
ntdll!__RtlUserThreadStart+0x70
Comment 38 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-20 07:55:22 PST
CreateWindowExW is being patched by:
00000000 0031d458 00000001 nvd3d9wrap!setDeviceHandle+0x1c91
722b0000 00000001 00000000 nvd3d9wrap!initialise+0x373
722b0000 00000001 00000000 nvd3d9wrap!setDeviceHandle+0x467b
722b0000 00000001 00000000 nvd3d9wrap!setDeviceHandle+0x4602
722bb5d6 722b0000 00000001 ntdll!LdrpCallInitRoutine+0x14
00000000 778bdec7 00000000 ntdll!LdrpRunInitializeRoutines+0x26f
0031d750 0031d634 00000000 ntdll!LdrpLoadDll+0x453
0031d634 0031d768 0031d750 ntdll!LdrLoadDll+0xaa
0060d874 0031d768 0031d750 mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
00000000 00000000 0060d874 KERNELBASE!LoadLibraryExW+0x1f7
0031d7bc 00000000 00000000 KERNELBASE!LoadLibraryExA+0x26
0031d7bc 0031dd6c 752e485c kernel32!LoadLibraryA+0xba
0031de98 00000001 00000000 nvinit+0x11cb
752c0000 00000001 00000000 nvinit+0x5477
752c0000 00000001 00000000 nvinit!nvCoprocThunk+0x6e94
752c0000 00000001 00000000 nvinit!nvCoprocThunk+0x6e1a
752ccfd9 752c0000 00000001 ntdll!LdrpCallInitRoutine+0x14
00000000 778be887 00000000 ntdll!LdrpRunInitializeRoutines+0x26f
0031e190 0031e074 00000000 ntdll!LdrpLoadDll+0x453
0031e074 0031e1a8 0031e190 ntdll!LdrLoadDll+0xaa
005f6654 0031e1a8 0031e190 mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
00000000 00000000 005f6654 KERNELBASE!LoadLibraryExW+0x1f7
00000058 fffdb800 00000001 kernel32!BasepLoadAppInitDlls+0x167
00000000 77afe230 00000001 kernel32!LoadAppInitDlls+0x82
77ae010a 0031e578 00000000 USER32!ClientThreadSetup+0x1f9
0031e578 00000000 0031ed48 USER32!__ClientThreadSetup+0x5
75b70718 0031ec44 75b0b970 ntdll!KiUserCallbackDispatcher+0x2e
75af0000 00000001 0031ed14 GDI32!GdiDllInitialize+0x1c
75af0000 00000001 00000000 USER32!_UserClientDllInitialize+0x32f
75b0b6ed 75af0000 00000001 ntdll!LdrpCallInitRoutine+0x14
00000000 778be60b 00000000 ntdll!LdrpRunInitializeRoutines+0x26f
0031f00c 0031eef0 00000000 ntdll!LdrpLoadDll+0x453
0031eef0 0031f024 0031f00c ntdll!LdrLoadDll+0xaa
005f6654 0031f024 0031f00c mozglue!`anonymous namespace'::patched_LdrLoadDll+0x1b0
00000000 00000000 005f6654 KERNELBASE!LoadLibraryExW+0x1f7
0135141a 76c33656 76bb17e5 firefox!XPCOMGlueLoad+0x23c
76c33656 76bb17e5 00000001 firefox!XPCOMGlueStartup+0x1d
005f7f30 005f7e78 00000001 firefox!InitXPCOMGlue+0xba
00000000 00000000 00000000 firefox!NS_internal_main+0x5c
005f7e78 ffffefa0 005f6e90 firefox!wmain+0xbe
fffde000 0032f8e4 77b09882 firefox!__tmainCRTStartup+0xfe
fffde000 7788f02b 00000000 kernel32!BaseThreadInitThunk+0xe
0135257b fffde000 00000000 ntdll!__RtlUserThreadStart+0x70
0135257b fffde000 00000000 ntdll!_RtlUserThreadStart+0x1b
Comment 39 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-20 07:57:06 PST
Monitoring VirtualProtect calls made against addresses containing code, I see these functions are being hooked by these NVIDIA DLLs (this list maight not be complete): 

ole32!CoSetProxyBlanket
ole32!CoCreateInstance
GDI32!NtGdiDdDDIQueryAdapterInfo
GDI32!NtGdiDdDDIGetDisplayModeList
USER32!CreateWindowExW
USER32!ChangeDisplaySettingsExW
USER32!EnumDisplayDevicesW
USER32!EnumDisplayDevicesA
USER32!DisplayConfigGetDeviceInfo
kernel32!RegSetValueExW
kernel32!RegSetValueExA
kernel32!RegDeleteValueW
kernel32!RegQueryValueExW
kernel32!K32GetModuleInformation
kernel32!K32GetMappedFileNameW
kernel32!K32EnumProcessModulesEx
KERNELBASE!FreeLibrary
KERNELBASE!LoadLibraryExW
KERNELBASE!GetModuleHandleExW
KERNELBASE!GetModuleHandleW
Comment 40 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-20 08:08:43 PST
The NVIDIA stuff all happens during the loading of the NVINIT stuff and is early enough such that the Firefox process is still single-threaded. Our function hooking code *does* correctly patch the already-hooked CreateWindowExW.

I'm wondering, however, if there is some kind of edge case where the NVINIT dll could be injected off main thread. AppInit_DLLs depends on user32.dll being loaded, and we delayload user32.dll.
Comment 41 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-20 08:55:36 PST
We only delayload with mozglue.dll, so by the time we load xul.dll we should have brought in the Optimus stuff.
Comment 42 User image Ritu Kothari (:ritu) 2015-11-23 10:53:29 PST
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #29)
> It looks like these crashes went away when 44 went to Aurora (!), so I'd
> suggest that this be tested with 42 Beta 9.

Aaron, I was looking at the Aurora44 crash stats from the last 7 days and this is a top-crash at #5. Is there a fix that we might consider safe and worth uplifting to Aurora to address it? Thanks!
Comment 43 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-25 12:18:08 PST
gcp, could you please take a look at your registry in HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Windows and tell me what else (if anything) other than nvinit.dll is included in the AppInit_DLLs value?
Comment 44 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-25 15:18:53 PST
I just solved this mofo! There's a bug in Detours 2.x where if it sees the sequence of jmps from a NOP space patch, it incorrectly calculates the target address of the second jmp. It ends up copying bytes (and writing out a jmp to its trampoline) at an address that is 5 bytes prior to the intended target, corrupting everything on the way. That's why we were seeing CreateWindowExWHook show up in unrelated code like JS regex stuff.

I further confirmed this by examining gcp's crash dump above and comparing it to the bytes surrounding CreateWindowExWHook in the original binary. At CreateWindowExWHook-5, the instructions diverge from the binary, and 0xe9 (jmp opcode) is the first byte.

Corollaries:

1. Detours 2.x is fundamentally incompatible with NOP-space patching. Looking at the code, this bug was fixed in Detours 3.x;
2. We can't use NOP-space patching when Optimus Drivers are running because they use Detours 2.x;
3. There are probably a bunch of other random crashes that are caused by this that will be fixed as fallout from this;
4. The Optimus drivers may cause bad things to happen with other code that does NOP space patching, such as Windows hotfixes.

Solution:
I think that we can work around this by disabling NOP-space patching when detours 2.x is present. We could either make it conditional on the presence of the Detours DLL or on the presence of NVIDIA Optimus DLLs.
Comment 45 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-25 16:31:00 PST
*** Bug 1218128 has been marked as a duplicate of this bug. ***
Comment 46 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-27 13:06:47 PST
Created attachment 8693091 [details] [diff] [review]
Patch

This patch adds a compatibility check to WindowsNopSpacePatcher: It looks for evidence of Optimus DLLs having either already been injected or having the potential to be injected in the future. If the compat check fails, we either abort the initialization, or if we've already previously initialized, prevent any further NOP space patching.

I noticed that TestDllInterceptor should be moved to the same location as nsWindowsDllInterceptor.h but I'll file a follow-up bug for that.
Comment 47 User image :Ehsan Akhgari (needinfo please, extremely long backlog) 2015-11-27 15:00:13 PST
Comment on attachment 8693091 [details] [diff] [review]
Patch

Review of attachment 8693091 [details] [diff] [review]:
-----------------------------------------------------------------

Great find!

::: xpcom/build/nsWindowsDllInterceptor.h
@@ +146,5 @@
>  
>    void Init(const char* aModuleName)
>    {
> +    if (!IsCompatible()) {
> +      return;

Do you mind adding an NS_WARNING here, please?

@@ +164,5 @@
> +   * this NVIDIA code in our address space and disable NOP space patching if it
> +   * is. We also check AppInit_DLLs since this is the mechanism that the Optimus
> +   * drivers use to inject into our process.
> +   */
> +  bool IsCompatible()

Nit: please make this a static function.

@@ +189,5 @@
> +    // won't be loaded once user32 is initialized.
> +    HKEY hkey = NULL;
> +    if (!RegOpenKeyExW(HKEY_LOCAL_MACHINE,
> +          MOZ_UTF16("SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Windows"),
> +          0, KEY_READ, &hkey)) {

Since this is in HKLM, we should be asking for the minimum access rights that we need to maximize the chance of it working.  I think KEY_QUERY_VALUE should be all that you need here.

@@ +215,5 @@
> +          if (!_wsplitpath_s(token, nullptr, 0, nullptr, 0,
> +                             fname, mozilla::ArrayLength(fname),
> +                             nullptr, 0)) {
> +            // nvinit.dll is responsible for bootstrapping the DLL injection, so
> +            // so that's the library that we check for here

Nit: s/so //

@@ +218,5 @@
> +            // nvinit.dll is responsible for bootstrapping the DLL injection, so
> +            // so that's the library that we check for here
> +            const wchar_t kNvInitName[] = MOZ_UTF16("nvinit");
> +            if (!_wcslwr_s(fname, mozilla::ArrayLength(fname)) &&
> +                !wcsncmp(fname, kNvInitName, mozilla::ArrayLength(kNvInitName))) {

Why not just use _wcsnicmp() directly?
Comment 48 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-27 16:08:33 PST
Created attachment 8693129 [details] [diff] [review]
Patch (r2)

Carrying forward r+.

(In reply to :Ehsan Akhgari from comment #47)
> Comment on attachment 8693091 [details] [diff] [review]
> Patch
> 
> Review of attachment 8693091 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Great find!
> 
> ::: xpcom/build/nsWindowsDllInterceptor.h
> @@ +146,5 @@
> >  
> >    void Init(const char* aModuleName)
> >    {
> > +    if (!IsCompatible()) {
> > +      return;
> 
> Do you mind adding an NS_WARNING here, please?
Done.

> 
> @@ +164,5 @@
> > +   * this NVIDIA code in our address space and disable NOP space patching if it
> > +   * is. We also check AppInit_DLLs since this is the mechanism that the Optimus
> > +   * drivers use to inject into our process.
> > +   */
> > +  bool IsCompatible()
> 
> Nit: please make this a static function.
Done.

> 
> @@ +189,5 @@
> > +    // won't be loaded once user32 is initialized.
> > +    HKEY hkey = NULL;
> > +    if (!RegOpenKeyExW(HKEY_LOCAL_MACHINE,
> > +          MOZ_UTF16("SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Windows"),
> > +          0, KEY_READ, &hkey)) {
> 
> Since this is in HKLM, we should be asking for the minimum access rights
> that we need to maximize the chance of it working.  I think KEY_QUERY_VALUE
> should be all that you need here.
Done.

> 
> @@ +215,5 @@
> > +          if (!_wsplitpath_s(token, nullptr, 0, nullptr, 0,
> > +                             fname, mozilla::ArrayLength(fname),
> > +                             nullptr, 0)) {
> > +            // nvinit.dll is responsible for bootstrapping the DLL injection, so
> > +            // so that's the library that we check for here
> 
> Nit: s/so //
Fixed

> 
> @@ +218,5 @@
> > +            // nvinit.dll is responsible for bootstrapping the DLL injection, so
> > +            // so that's the library that we check for here
> > +            const wchar_t kNvInitName[] = MOZ_UTF16("nvinit");
> > +            if (!_wcslwr_s(fname, mozilla::ArrayLength(fname)) &&
> > +                !wcsncmp(fname, kNvInitName, mozilla::ArrayLength(kNvInitName))) {
> 
> Why not just use _wcsnicmp() directly?

Done. I'm not sure why I did that. I blame fatigue. :-)
Comment 50 User image zhoubcfan@163.com 2015-11-27 18:28:05 PST
With vs2015 I ran into errors.
nsWindowsDllInterceptor.h(179): error C2440: 'initializing': cannot convert from 'const char16_t [13]' to 'const wchar_t *'
Comment 51 User image Bob Owen (:bobowen) 2015-11-28 00:24:18 PST
Great piece of detective work, Aaron.
I wonder if this ever causes problems for the patching used by the sandbox.
I seem to remember that their patching was a bit different. 

(In reply to zhoubcfan from comment #50)
> With vs2015 I ran into errors.
> nsWindowsDllInterceptor.h(179): error C2440: 'initializing': cannot convert
> from 'const char16_t [13]' to 'const wchar_t *'

I think that all of those MOZ_UTF16s should just be L"...", as we're definitely dealing with wchar_t.
You only need to use MOZ_UTF16 for char16_t, because for VS2013 char16_t = wchar_t, but VS2015 correctly defines char16_t as a separate type.
So, they need a different literal specifier. 
(Not realising this caused me a fair amount of pain on the printing work I've just been doing.)
Comment 53 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-29 20:39:47 PST
(In reply to Bob Owen (:bobowen) from comment #51)
> Great piece of detective work, Aaron.
> I wonder if this ever causes problems for the patching used by the sandbox.
> I seem to remember that their patching was a bit different. 

It might be worth auditing the code to make sure. The Chromium sandbox uses just about every function interception technique under the sun (and them some), so it is possible ;-)
Comment 55 User image Phil Ringnalda (:philor)(back in August) 2015-11-30 10:04:39 PST
It's a shutdown crash, the reported "test" is garbage because it's just whatever was the last test to run in that suite (or in that chunk of the suite, for the suites that run a directory of tests, shut down, and start up again on the next directory).
Comment 56 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2015-11-30 13:37:37 PST
(In reply to Phil Ringnalda (:philor) from comment #55)
> It's a shutdown crash, the reported "test" is garbage because it's just
> whatever was the last test to run in that suite (or in that chunk of the
> suite, for the suites that run a directory of tests, shut down, and start up
> again on the next directory).

Were there any of these failures that you know of that were *not* on x64? The NOP space patcher is disabled on x64...
Comment 57 User image Phil Ringnalda (:philor)(back in August) 2015-11-30 13:53:47 PST
As you can see from the links to the logs, they were all on Win8, which has x64 in the jobname, which is the complete extent of my knowledge of whether or not it is actually x64, much less whether or not something that should be disabled is successfully disabled.

But if you need help retriggering lots of times after you push it to try if you think that the reason these crashes came to a sudden complete stop when I backed you out was coincidence, that's something I actually *do* have info about.
Comment 58 User image Ritu Kothari (:ritu) 2015-12-20 22:00:52 PST
Aaron, I am trying to look through 44+ tracked bug and it seems we have a fix ready but it was backed out due to test failures. Are those test failures still an issue? If not, do you think we should consider uplifting the fix to Beta44? Crash fixes are generally good uplift candidates. Please let me know.
Comment 59 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-07 09:28:48 PST
This should only be an issue on 45+.
Comment 60 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-07 09:30:28 PST
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #59)
> This should only be an issue on 45+.

Clarification: this should only be a *serious* issue on 45+. It will probably improve crash rates across the board, but given the problems we had the first time I checked this in, I'm going to want to let this bake a while before deciding on uplifting.
Comment 61 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-07 09:31:37 PST
Created attachment 8705206 [details] [diff] [review]
Patch  (r3)

Rebased, made some minor fixes to avoid conflicts. Carrying forward r+.
Comment 63 User image Ritu Kothari (:ritu) 2016-01-07 14:16:47 PST
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #60)
> (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #59)
> > This should only be an issue on 45+.
> 
> Clarification: this should only be a *serious* issue on 45+. It will
> probably improve crash rates across the board, but given the problems we had
> the first time I checked this in, I'm going to want to let this bake a while
> before deciding on uplifting.

I was leaning towards wontfix'ing it for FF44 until I looked at crash-stats and those don't like nice. We've had ~115 crashes on 44.0b6 with the first signature here. :( I will still keep this around and come back to in a week or so to see if it has baked enough.
Comment 64 User image Carsten Book [:Tomcat] 2016-01-08 05:51:32 PST
https://hg.mozilla.org/mozilla-central/rev/e135879cff29
Comment 65 User image [:philipp] 2016-01-09 03:56:33 PST
i think the "mozilla::UniquePtr<T>::reset" signature popping up since 44.0b6 is related to this as well, as it has a similar profile of intel/nvidia dual gpus on affected installations:
https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3AUniquePtr%3CT%3E%3A%3Areset&version=44.0b&_facets=signature&_facets=version&_facets=app_notes&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-app_notes

on early beta7 data this is making up 5% of all crashes
Comment 66 User image Andrew McCreight [:mccr8] 2016-01-09 08:24:50 PST
[@ nsDocument::MozExitPointerLock ] also may be related, as a user in bug 1236718 is experiencing both this signature and that signature with upgraded drivers.
Comment 67 User image Andrew McCreight [:mccr8] 2016-01-09 08:37:42 PST
*** Bug 1237240 has been marked as a duplicate of this bug. ***
Comment 68 User image Andrew McCreight [:mccr8] 2016-01-09 08:39:24 PST
*** Bug 1236338 has been marked as a duplicate of this bug. ***
Comment 69 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-09 14:39:04 PST
*** Bug 1236718 has been marked as a duplicate of this bug. ***
Comment 70 User image Wes Kocher (:KWierso) 2016-01-11 11:38:59 PST
This is showing up in my uplift request queue because of the approvals that were given out three months ago. Can we either remove those a+ flags if they no longer are valid or re-request approval for the current release branches if this needs uplifting?
Comment 71 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-11 12:30:21 PST
Comment on attachment 8705206 [details] [diff] [review]
Patch  (r3)

Approval Request Comment
[Feature/regressing bug #]: n/a
[User impact if declined]: Crashes for users with NVIDIA Optimus drivers
[Describe test coverage new/current, TreeHerder]: Covered by tests running on Windows, already on m-c, crash hasn't been seen on Nightly since Jan 9.
[Risks and why]: Low, disables some code in the presence of infringing drivers
[String/UUID change made/needed]: None
Comment 72 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-11 13:18:16 PST
(In reply to philipp from comment #65)
> i think the "mozilla::UniquePtr<T>::reset" signature popping up since 44.0b6
> is related to this as well, as it has a similar profile of intel/nvidia dual
> gpus on affected installations:
> https://crash-stats.mozilla.com/search/
> ?signature=%3Dmozilla%3A%3AUniquePtr%3CT%3E%3A%3Areset&version=44.
> 0b&_facets=signature&_facets=version&_facets=app_notes&_facets=build_id&_colu
> mns=date&_columns=signature&_columns=product&_columns=version&_columns=build_
> id&_columns=platform#facet-app_notes
> 
> on early beta7 data this is making up 5% of all crashes

From what I'm seeing, that sig is also a dup.
Comment 73 User image Ritu Kothari (:ritu) 2016-01-11 14:47:22 PST
Comment on attachment 8705206 [details] [diff] [review]
Patch  (r3)

A top crash fix, let's uplift to Aurora45 soon. Hopefully we can stabilize this fix for a day or two and then take this in 44.0b9.
Comment 75 User image [:philipp] 2016-01-13 10:58:05 PST
the fix seems to have its indented effect - there are no more crashes with the `anonymous namespace''::CreateWindowExWHook signature in recent nightly builds and there are also no new reports of this in yesterday's 45.0a2 build after the fix has landed (whereas there were around 10-30 per day before that).
Comment 76 User image Ritu Kothari (:ritu) 2016-01-13 12:14:50 PST
Comment on attachment 8705206 [details] [diff] [review]
Patch  (r3)

There are some indications on Nightly/Aurora that this fix is helping. Given that this was a top crash, it makes sense to uplift to Beta44.
Comment 77 User image Wes Kocher (:KWierso) 2016-01-13 15:11:22 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/160e194582c5
Comment 79 User image Tracy Walker [:tracy] 2016-01-14 05:34:24 PST
This looked good on Nightly for a few days with no reports after 20160109030208. However, something  is triggering this crash as of yesterdays build (20160113141333) for one user that submitted 9 reports.
Comment 80 User image [:philipp] 2016-01-20 01:15:33 PST
hi, bad news as the "mozilla::UniquePtr<T>::reset" signature is back in 44 rc1. it had been gone in 44.0b8 & 44.0b9 (the fix from comment #77 only landed in beta 9) - so it looks like this signature is appearing arbitrarily.

is it possible that this may be triggered by some compiler optimization?
if so there will probably be a rc2 anyway, so we should keep that under close inspection...
Comment 81 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-01-20 10:08:34 PST
(In reply to philipp from comment #80)
> hi, bad news as the "mozilla::UniquePtr<T>::reset" signature is back in 44
> rc1. it had been gone in 44.0b8 & 44.0b9 (the fix from comment #77 only
> landed in beta 9) - so it looks like this signature is appearing arbitrarily.
> 
> is it possible that this may be triggered by some compiler optimization?
> if so there will probably be a rc2 anyway, so we should keep that under
> close inspection...

No, this is probably due to a deficiency with the detection of the Optimus driver.

If we're going to do something for the next rc, I suggest that we take bug 1240607 which would eliminate this.
Comment 83 User image Ryan VanderMeulen [:RyanVM] 2016-02-10 13:35:28 PST
Backed out from 44 to hopefully address the issues being reported in bug 1243098, bug 1243914, and elsewhere. The backout should ship in Firefox 44.0.2.

https://hg.mozilla.org/releases/mozilla-release/rev/cde34eae03ba
Comment 84 User image Ryan VanderMeulen [:RyanVM] 2016-02-11 10:41:22 PST
Backed out from all other branches as well after IRC discussion with Aaron and Sylvestre.

https://hg.mozilla.org/mozilla-central/rev/576a6dcde5b6
https://hg.mozilla.org/releases/mozilla-aurora/rev/02156c26b928
https://hg.mozilla.org/releases/mozilla-beta/rev/fd105431c25d

I guess one unanswered question is whether we should also take the backout patch we'd been landing on previous releases or not for Fx45 or whether that ship has already sailed.
Comment 85 User image Sylvestre Ledru [:sylvestre] 2016-02-15 09:28:33 PST
Tracking as it is an important source of crash.
Aaron, ok with a new backout? thanks
Comment 86 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-02-15 12:56:00 PST
(In reply to Ryan VanderMeulen [:RyanVM] from comment #84)
> I guess one unanswered question is whether we should also take the backout
> patch we'd been landing on previous releases or not for Fx45 or whether that
> ship has already sailed.

Which backout patch are you referencing, exactly?
Comment 87 User image Ryan VanderMeulen [:RyanVM] 2016-02-15 15:36:19 PST
The one we'd landed on earlier releases.
Comment 88 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-02-17 10:39:21 PST
(In reply to Ryan VanderMeulen [:RyanVM] from comment #87)
> The one we'd landed on earlier releases.

I still don't know which patch I am being asked about here. Please be specific.

With respect to this bug, as long as bug 1240607 stays in the tree (and it is) then as far as I'm concerned we do not need to track this bug any longer. It would still be useful to learn why it has been triggering the problems that it has been, but that does not warrant the urgency of tracking.
Comment 89 User image Ryan VanderMeulen [:RyanVM] 2016-02-17 10:53:28 PST
Comment 7/25.
Comment 90 User image Liz Henry (:lizzard) (needinfo? me) 2016-02-26 08:48:44 PST
From the signatures listed this doesn't look like a topcrash anymore and aaron says in comment 88 that the main issues (startup crashes) are fixed in bug 1240607.  Untracking for 46+. Leaving the 45 tracking for sylvestre though.
Comment 91 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-10-13 15:54:24 PDT
This should be landed on 52 along with bug 1240977 and bug 1240848. The memory problems that were previously an issue in this patch should have been fixed by bug 1249849. Will verify.
Comment 92 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-10-13 16:08:52 PDT
(In reply to Aaron Klotz [:aklotz] from comment #91)
> This should be landed on 52 along with bug 1240977 and bug 1240848. The
> memory problems that were previously an issue in this patch should have been
> fixed by bug 1249849. Will verify.

Yes, the malloc situation is correct now.
Comment 93 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-10-13 16:12:28 PDT
Created attachment 8800905 [details] [diff] [review]
Patch (r4)

Rebased to 52. Carrying forward r+.
Comment 94 User image Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) 2016-10-13 16:16:09 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/fcbf4915f370db4034386ff9f4675ad93863daca
Bug 1218473: Add check for presence of NVIDIA Optimus drivers to WindowsNopSpacePatcher; r=ehsan
Comment 95 User image Carsten Book [:Tomcat] 2016-10-14 03:05:15 PDT
https://hg.mozilla.org/mozilla-central/rev/fcbf4915f370
Comment 96 User image Wes Kocher (:KWierso) 2016-10-17 12:56:09 PDT
This is showing up in my uplift queries because the old patches have beta/aurora approval and the status flags for 50 and 51 are not set. If we aren't uplifting this, can we set the status flags to unaffected or wontfix?

Note You need to log in before you can comment on or make changes to this bug.