Closed Bug 1318858 Opened 3 years ago Closed 2 months ago

e10s crash in dgapi.dll

Categories

(External Software Affecting Firefox :: Other, defect, P3, critical)

x86
Windows 7
defect

Tracking

(firefox50 wontfix, firefox51 wontfix, firefox52 wontfix, firefox-esr52 wontfix, firefox-esr60 wontfix, firefox-esr6871+ fixed, firefox53 wontfix, firefox56 wontfix, firefox57 wontfix, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix, firefox69 wontfix, firefox70 wontfix, firefox71 fixed, firefox72 fixed)

RESOLVED FIXED
Tracking Status
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 71+ fixed
firefox53 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fixed
firefox72 --- fixed

People

(Reporter: philipp, Assigned: toshi, NeedInfo)

References

Details

(Keywords: crash, regression, Whiteboard: [AV:Digital Guardian])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-cc510000-1cea-4f3d-9673-d313f2161119.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
Ø 0 	dgapi.dll 	dgapi.dll@0xa7e81 	
1 		@0x29aef51 	
2 	xul.dll 	XRE_InitEmbedding2 	toolkit/xre/nsEmbedFunctions.cpp:176
3 	xul.dll 	mozilla::ipc::ScopedXREEmbed::Start() 	ipc/glue/ScopedXREEmbed.cpp:106
4 	xul.dll 	mozilla::dom::ContentProcess::Init() 	dom/ipc/ContentProcess.cpp:123
5 	xul.dll 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp:647
6 	firefox.exe 	content_process_main(int, char** const) 	ipc/contentproc/plugin-container.cpp:224
7 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
8 	ucrtbase.dll 	remainderf 	
9 	ucrtbase.dll 	remainderf 	
Ø 10 	ntdll.dll 	ntdll.dll@0x36fe8 	
Ø 11 	kernelbase.dll 	kernelbase.dll@0x11ad4 	
Ø 12 	kernelbase.dll 	kernelbase.dll@0x11abd 	
Ø 13 	kernelbase.dll 	kernelbase.dll@0x3726f 	
14 	firefox.exe 	_SEH_epilog4 	
Ø 15 	ntdll.dll 	ntdll.dll@0x75844 	
Ø 16 	ntdll.dll 	ntdll.dll@0xc7513 	
17 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:286 

this is a crash in the content process of the browser that got more common after the firefox 50 release - dgapi.dll is part of the digital guradian software by verdasys hooking into firefox. currently this is accounting for around 1% of content crashes in firefox 50.

according to some user comments their tabs seem to crash as soon as they try to navigate to a website:
"since I upgraded Mozilla, I cannot open any of my favourites... and any other pages!!!!... I keep receiving "This tab has crashed" alert. What is happening???"
"can't open any websites, even tried google"
Flags: needinfo?(cloiselle)
A workaround for this particular crash is to set browser.tabs.remote.autostart.2 as false (which in FF50 is set to "true" by default.
I am having this issue too.  The workaround was effective.  Thanks!
However, Flash doesn't work properly with the workaround.  It fails to display properly sometimes, especially the first Flash page to be displayed.
blocklist candidate
Flags: needinfo?(cloiselle)
Priority: -- → P3
Whiteboard: [AV:Digital Guardian]
Are you still able to reproduce this crash (without the workaround)?
Would you be willing to test a try build and see if the crash is fixed?
Flags: needinfo?(b.k.wieczorek)
Flags: needinfo?(WPWoodJr+Bugzilla)
BTW, this mostly affects ESR. In the past two weeks:
1 	52.4.0esr 	389 	50.65 %
2 	52.3.0esr 	226 	29.43 %
3 	56.0 		29 	3.78 %

Wontfixing for 57. If it is a safe blocklist, we could uplift to ESR.
Crash Signature: [@ dgapi.dll@0xa7e81] [@ dgapi.dll@0xa8161] → [@ dgapi.dll@0xa7e81] [@ dgapi.dll@0xa8161] [@ dgapi.dll@0x16474b] [@ dgapi.dll@0xa66c7] [@ dgapi.dll@0x121090]
I haven't had this crash for almost a year now.  I think maybe the Digital Guardian software was updated.
Flags: needinfo?(WPWoodJr+Bugzilla)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
(In reply to Bill Wood from comment #7)
> I haven't had this crash for almost a year now.  I think maybe the Digital
> Guardian software was updated.

However still plenty of people crashing.  https://crash-stats.mozilla.com/search/?signature=~dgapi.dll&product=Firefox&date=%3E%3D2018-01-26T04%3A52%3A26.000Z&date=%3C2018-04-26T05%3A52%3A26.000Z&_sort=-date&_facets=signature&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version

Perhaps on old versions?
Flags: needinfo?(b.k.wieczorek)
Maybe we should contact the developers of this software. This way we will know if they are aware of this problem and, if they have fixed it, in which versions they have fixed it.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Bill, would you be willing to test a special Firefox build which prevents this software from injecting? I know you're not affected by the crash anymore, but if we can verify the block works and doesn't cause issues, we can solve the problems for the users who are still affected.
Flags: needinfo?(WPWoodJr+Bugzilla)
(In reply to Marco Castelluccio [:marco] from comment #10)
> Bill, would you be willing to test a special Firefox build which prevents
> this software from injecting? I know you're not affected by the crash
> anymore, but if we can verify the block works and doesn't cause issues, we
> can solve the problems for the users who are still affected.

OK sure, how?
Flags: needinfo?(WPWoodJr+Bugzilla) → needinfo?(mcastelluccio)
You can download the build from here: https://queue.taskcluster.net/v1/task/USFw899tSjCJVzF0OD0wEQ/runs/0/artifacts/public/build/target.zip.

After extracting it and launching it, you can run the following script in the Browser Console:
let wm = Cc["@mozilla.org/appshell/window-mediator;1"].
         getService(Ci.nsIWindowMediator);
let win = wm.getMostRecentWindow("navigator:browser");
let browser = win.gBrowser.selectedBrowser;
if (browser.isRemoteBrowser) {
  browser.messageManager.loadFrameScript('data:,let appInfo = Components.classes["@mozilla.org/xre/app-info;1"];if (appInfo && appInfo.getService(Components.interfaces.nsIXULRuntime).processType != Components.interfaces.nsIXULRuntime.PROCESS_TYPE_DEFAULT) {Components.utils.import("resource://gre/modules/ctypes.jsm");var zero = new ctypes.intptr_t(8);var badptr = ctypes.cast(zero, ctypes.PointerType(ctypes.int32_t));var crash = badptr.contents;}', true);
}

The tab you have selected should crash. You can restore it, then navigate to "about:crashes" and send us the link to the crash.
Flags: needinfo?(mcastelluccio) → needinfo?(WPWoodJr+Bugzilla)
Crash Signature: [@ dgapi.dll@0xa7e81] [@ dgapi.dll@0xa8161] [@ dgapi.dll@0x16474b] [@ dgapi.dll@0xa66c7] [@ dgapi.dll@0x121090] → [@ dgapi.dll@0xa7e81] [@ dgapi.dll@0xa8161] [@ dgapi.dll@0x16474b] [@ dgapi.dll@0xa66c7] [@ dgapi.dll@0x121090] [@ dgapi.dll@0x1864c1] [@ GetMBNoDefault]

Adding 69 and 70 as affected.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Is this something we could block at this point? Feel free to redirect that NI as needed too :)

Redirecting needinfo to Toshihito who hopefully can obtain a copy of Digital Guardian to test in a VM.

Flags: needinfo?(aklotz) → needinfo?(tkikuchi)

Unfortunately I cannot find a way to get Digital Guardian product. From the dump, the version of DG agent is 7.5.0.112. We can add dgapi64.dll to the block list, but not sure it's safe because we don't have a local repro.

Flags: needinfo?(tkikuchi)

One thing we could do is examine the call stacks for this dll once third-party-modules is fully operational. Once we see the stacks, we can get a better idea of how dangerous a block might be.

We can see the call stacks in https://crash-stats.mozilla.org/report/index/66edca38-7688-40d7-821f-9aff60190911#tab-rawdump.

The crashing thread is "Socket Thread" where dgapi64.dll was calling to WideCharToMultiByte, but a given wide string buffer was inaccessible, resulting in read av.

Given that dgapi64.dll was passing an invalid address, I don't think this is Firefox's issue, but I'm concerning blocking dgapi64.dll may impact the behavior of Digital Guardian. I can see Digital Guardian injected other modules such as COM_Sensor64, ame_smtpsensor64, ame_outlooksensor64, OS_Plugin64, and AE_MailSensor_Plugin64.

Allow me to clarify: The call stacks in third-party-modules show the call stack at the time of the dll injection, so we can get a better idea of whether blocking that injection will have an adverse affect on anything.

Ah, I got your idea. I quickly checked the current third-party-modules ping, but it seems that we don't have loading events of dgapi yet.

Assignee: nobody → tkikuchi

We still don't have data about dgapi.dll in the new ping, but we have some in the old Untrusted Modules ping.

I created this report. It looks like all instances indicate dgapi/dgapi64 was loaded via user32!_ClientLoadLibrary, which means Digital Guardian uses SetWindowHookEx to inject their dll. Looking at the crash callstack carefully, dgapi64's code was invoked via xul!_imp_PR_Write.

I random picked about 10 reports crashing at GetMBNoDefault, in which the version of dgapi64/dgapi was 7.5.0.xxx. In the report above, we have some data from 7.5.1.65, but I cannot find it in the crash report. We may start with blocking older versions (7.5.0) dgapi.

Here are a couple of callstack examples loading dgapi from the report:

0 mozglue!static long patched_LdrLoadDll(wchar_t *, unsigned long *, struct _UNICODE_STRING *, void * *)+0xe32
1 kernelbase!LoadLibraryExW+0x17b
2 user32!_ClientLoadLibrary+0xa6
3 ntdll!KiUserCallbackDispatcherContinue+0x0
4 win32u!NtUserPeekMessage+0x14
5 user32!_PeekMessage(tagMSG *,HWND__ *,unsigned int,unsigned int,unsigned int,int)+0x33
6 user32!PeekMessageW+0x105
7 msctf!CThreadInputMgr::PeekMessageW(tagMSG *,HWND__ *,unsigned int,unsigned int,unsigned int,int *)+0x96
8 xul!nsAppShell::ProcessNextNativeEvent(bool)+0xa5
9 xul!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal *,bool)+0xcc
...
0 mozglue!static long patched_LdrLoadDll(wchar_t *, unsigned long *, struct _UNICODE_STRING *, void * *)+0xd0b
1 kernelbase!LoadLibraryExW+0x17b
2 user32!_ClientLoadLibrary+0xa6
3 ntdll!KiUserCallbackDispatcherContinue+0x0
4 win32u!NtUserShowWindow+0x14
5 xul!nsWindow::Show(bool)+0x216
6 xul!nsXULWindow::SetVisibility(bool)+0xb1
7 xul!void nsXULWindow::OnChromeLoaded()+0x52
8 xul!nsresult nsWebShellWindow::OnStateChange(class nsIWebProgress *, class nsIRequest *, unsigned int, nsresult)+0x77
9 xul!void nsDocLoader::DoFireOnStateChange(class nsIWebProgress *, class nsIRequest *, int & const, const nsresult)+0x167
...
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2dd803221c80
Block Digital Guardian's module which crashes Firefox.  r=aklotz
Status: REOPENED → RESOLVED
Closed: 2 years ago2 months ago
Resolution: --- → FIXED

Please nominate this for Beta and ESR68 approval when you get a chance.

Flags: needinfo?(tkikuchi)

Comment on attachment 9105410 [details]
Bug 1318858 - Block Digital Guardian's module which crashes Firefox. r=aklotz

Beta/Release Uplift Approval Request

  • User impact if declined: This is a crash issue caused by the third party module (dgapi/dgapi64.dll of Digital Guardian) injected into Firefox process. This patch is to block that particular module from being injected. Without this patch, Firefox crashes if Digital Guardian is installed.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change is to add a new entry to the DLL blocklist. No logic change.
  • String changes made/needed: None
Flags: needinfo?(tkikuchi)
Attachment #9105410 - Flags: approval-mozilla-beta?

The attached patch will not fit ESR68. I'll create a new patch.

Comment on attachment 9105410 [details]
Bug 1318858 - Block Digital Guardian's module which crashes Firefox. r=aklotz

Low risk crash fix by blocking a thrid party module, uplift approved for 71 beta 8, thanks.

Attachment #9105410 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9107092 [details]
Bug 1318858 - Block Digital Guardian's module which crashes Firefox. r=aklotz

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is a crash issue caused by the third party module (dgapi/dgapi64.dll of Digital Guardian) injected into Firefox process. We can see a number of esr68 users hitting this crash in the crash report.
  • User impact if declined: If enterprise customers use the older versions of Digital Guardian, they continue to encounter sudden firefox crash. Although the root cause is not on us, we can stop the crash by blocking dgapi/dgapi64.dll.
  • Fix Landed on Version: 72.0a1, 71.0
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change is to add a new entry to the DLL blocklist. No logic change.
  • String or UUID changes made by this patch: None
Attachment #9107092 - Flags: approval-mozilla-esr68?

Comment on attachment 9107092 [details]
Bug 1318858 - Block Digital Guardian's module which crashes Firefox. r=aklotz

Blocklist to avoid a crash, OK for uplift for ESR 68.3.

Attachment #9107092 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

With the fix provided here, I am unable to visit any sites with Firefox Nightly since 73.0a1 on my work laptop which has Digital Guardian Agent 7.4.2 installed on it.
When I comment out the changes here, it works fine. Also, since FF Nightly 71.0a1 or 72.0a1, I have been getting a error dialog with error: The application was unable to start correctly 0xc0000022
Regarding the DG Agent, as it is installed by the IT team, I cannot upgrade or disable it.

Some of the history of what I tried is here:
https://www.reddit.com/r/firefox/comments/e86y34/730a1_doesnt_load_tabs_and_extensions/

I am not sure if I can re-open this bug or I need to create a new one.

(In reply to Phani Bhushan Tholeti from comment #36)

With the fix provided here, I am unable to visit any sites with Firefox Nightly since 73.0a1 on my work laptop which has Digital Guardian Agent 7.4.2 installed on it.
When I comment out the changes here, it works fine. Also, since FF Nightly 71.0a1 or 72.0a1, I have been getting a error dialog with error: The application was unable to start correctly 0xc0000022
Regarding the DG Agent, as it is installed by the IT team, I cannot upgrade or disable it.

Some of the history of what I tried is here:
https://www.reddit.com/r/firefox/comments/e86y34/730a1_doesnt_load_tabs_and_extensions/

I am not sure if I can re-open this bug or I need to create a new one.

Thank you for narrowing down a problem and reporting it. Would you create a new bug to handle your case and include me to CC?

We also want to know the version of dgapi64.dll (or dgapi.dll) you have. When you open a new bug, can you attach a screenshot of the Details tab of dgapi64.dll's properties dialog? Based on telemetry data we have, dgapi64.dll (or dgapi.dll) will be found in %SystemRoot%\system32.

A report that Bug 1318858 has not been fixed has been posted to the mozillaZine.jp forum.
The crash report of the forum reporter is https://crash-stats.mozilla.com/report/index/7ee014ac-9697-4bf1-9590-859950191209

I saw https://hg.mozilla.org/releases/mozilla-beta/rev/238c9916f2c9 .
It seems that version 7.5.5.xxx of dgapi.dll and dgapi64.dll was added to the block list.

According to the Signature report for GetMBNoDefault, Firefox 71.0 crashes have occurred 160 counts.
Looking at those crash reports, the version of dgapi64.dll is often 7.5.0.xxx, similar to the forum reporter.
In other cases, 7.4.1.xxx and the version are blank.

I think this bug should be reopened.

See Also: → 1603974
See Also: → 1603335

Looking at the dumps, it seems that out blocklist feature was disabled. Technically this is a different issue. I filed Bug 1603974 to track this.

I also did broad analysis on dgapi's module version as below. We can conclude the crashing version is 7.5.0.x, 7.4.1.x, or blank version.

https://sql.telemetry.mozilla.org/queries/67087
https://iodide.telemetry.mozilla.org/notebooks/391/?viewMode=report

You need to log in before you can comment on or make changes to this bug.