Closed Bug 883134 Opened 12 years ago Closed 6 years ago

crash in nsPluginHost::StopPluginInstance mainly with Adicional de Seguranca CAIXA add-on

Categories

(Core Graveyard :: Plug-ins, defect, P3)

x86
Windows 7
defect

Tracking

(firefox30+ affected, firefox31- affected, firefox32 ?)

RESOLVED WORKSFORME
Tracking Status
firefox30 + affected
firefox31 - affected
firefox32 --- ?

People

(Reporter: scoobidiver, Unassigned)

References

Details

(5 keywords, Whiteboard: [startupcrash])

Crash Data

Attachments

(1 file)

It's #69 browser crasher in 21.0 and #60 in 22.0b4. It might be bug 840606. Signature nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*) More Reports Search UUID 5f47501b-405e-45c4-bea4-7fedd2130606 Date Processed 2013-06-06 22:38:34 Uptime 6 Last Crash 3.0 days before submission Install Age 3.0 hours since version was first installed. Install Time 2013-06-06 19:39:34 Product Firefox Version 24.0a1 Build ID 20130606031128 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 16 model 5 stepping 2 Crash Reason EXCEPTION_ACCESS_VIOLATION_EXEC Crash Address 0x650064 User Comments App Notes AdapterVendorID: 0x1002, AdapterDeviceID: 0x954f, AdapterSubsysID: 22711787, AdapterDriverVersion: 8.970.100.3000 D2D? D2D+ DWrite? DWrite+ Processor Notes sp-processor01_phx1_mozilla_com_17437:2012 EMCheckCompatibility True Adapter Vendor ID 0x1002 Adapter Device ID 0x954f Total Virtual Memory 4294836224 Available Virtual Memory 3920560128 System Memory Use Percentage 55 Available Page File 5858197504 Available Physical Memory 1922928640 Frame Module Signature Source 0 @0x650064 1 xul.dll nsPluginHost::StopPluginInstance dom/plugins/base/nsPluginHost.cpp:3127 2 xul.dll mozilla::dom::FragmentOrElement::QueryInterface content/base/src/FragmentOrElement.cpp:1708 3 xul.dll nsObjectLoadingContent::DoStopPlugin content/base/src/nsObjectLoadingContent.cpp:2580 4 xul.dll nsObjectLoadingContent::StopPluginInstance content/base/src/nsObjectLoadingContent.cpp:2632 5 xul.dll nsObjectLoadingContent::UnloadObject content/base/src/nsObjectLoadingContent.cpp:2191 6 xul.dll CheckPluginStopEvent::Run content/base/src/nsObjectLoadingContent.cpp:176 7 xul.dll nsBaseAppShell::RunSyncSectionsInternal widget/xpwidgets/nsBaseAppShell.cpp:352 8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:640 9 xul.dll NS_ProcessNextEvent obj-firefox/xpcom/build/nsThreadUtils.cpp:238 10 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:474 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=nsPluginHost%3A%3AStopPluginInstance%28nsNPAPIPluginInstance*%29
Hm, how do you know it's a start-up crash?
Priority: -- → P3
(In reply to Georg Fritzsche [:gfritzsche] from comment #1) > Hm, how do you know it's a start-up crash? 85% of crashes happen within one minute. The startupcrash keyword is set when it's above 50%. It can be either a plugin-container.exe startup crash or a firefox.exe one.
It's now #32 browser crasher in 22.0 and #23 in 23.0b6. Almost every comments are in Portuguese. Here is the top correlation per extension on July 18: * 22.0: 94% (204/218) vs. 0% (490/110146) {87F8774F-B485-47E2-A755-A40A8A5E886D} (no version because of bug 879349) * 23.0 Beta: 95% (88/93) vs. 1% (205/39762) {87F8774F-B485-47E2-A755-A40A8A5E886D} (Adicional de Seguranca CAIXA) 0% (0/93) vs. 0% (1/39762) 2.3.5.30 0% (0/93) vs. 0% (3/39762) 2.6.3.10 0% (0/93) vs. 0% (6/39762) 3.0.0 95% (88/93) vs. 0% (195/39762) 3.2.0
Summary: crash in nsPluginHost::StopPluginInstance → crash in nsPluginHost::StopPluginInstance mainly with Adicional de Seguranca CAIXA add-on
Looks like this is an extension for brasilian Internet Banking
So it seems like StopPluginInstance is getting a dead pointer - but it's being called by DoStopPlugin which is holding it as a ref pointer retrieved from InstanceOwner which holds mInstance in a ref pointer as well. If InstanceOwner were dead we should be seeing DoStopPlugin crashes calling into it as well. Something might be doing manual ref counting on nsNPAPIPluginInstance, or the plugin itself might be doing something stupid.
This extension is for the federal bank in Brazil and found here: http://www1.caixa.gov.br/download/asp/download.asp -> Internet Banking Caixa -> Utilitários I can't reproduce a start-up crash though. So, if they would be doing manual ref-counting all bets are off anyway. I wonder if it may be possible to trigger bad behavior when an extension tries to use plugin tags and plugins too early during startup - i'll do a scan of their JS tomorrow to see if anything stands out.
(In reply to Georg Fritzsche [:gfritzsche] from comment #6) > This extension is for the federal bank in Brazil and found here: > http://www1.caixa.gov.br/download/asp/download.asp -> Internet Banking Caixa > -> Utilitários > > I can't reproduce a start-up crash though. > > So, if they would be doing manual ref-counting all bets are off anyway. > I wonder if it may be possible to trigger bad behavior when an extension > tries to use plugin tags and plugins too early during startup - i'll do a > scan of their JS tomorrow to see if anything stands out. These are main-process crashes, so the plugin would have to be doing something especially horrendous to cause this. The startup crash flag may just be caused by session restore or the user reloading the bad pages and causing repeat crashes, e.g. indicating that some page is causing a 100% crash rate.
Keywords: needURLs
Attached file xpi_excerpt.zip
Here are the extensions beautified JS files. On "toplevel-window-ready", this creates a hiddenDOMWindow via nsIAppShellService, in which it embeds its NPAPI plugin. There is nothing obviously weird standing out to me though.
Priority: P3 → P2
Tracy, can you handle the needURLs?
Flags: needinfo?(twalker)
Priority: P2 → P3
These are the crashes with more than one per page: Total Count URL 8 about:blank 4 https://www.facebook.com/ 2 http://www.yandex.ru/ Beyond that are a bunch of one crash per page from various sites around the globe.
Flags: needinfo?(twalker)
Thanks Tracy. So the URLs are not helpful. I *may* have hit something related: * i still accidentally had that extension enabled * started a youtube video * suspended one of the plugin containers (probably the extensions plugin? not sure though) * chose "stop" on the "unresponsive plugin" dialog. * i hit a browser-crash: bp-7d36501a-d063-4bde-bef0-e39df2130725 I haven't had success reproducing this, but if this was related to this extension then the callstack looks to support the theory of lifetime/refcount issues.
Priority: P3 → P2
Does the addon specifically turn off IPC for its plugin using prefs? Note that many of these stacks appear like we're in nested event loops, although there's stackwalking issues which truncate some of them.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > Does the addon specifically turn off IPC for its plugin using prefs? No, it runs OOP.
Keywords: needURLs
This is still a significant issue, is a browser process crash that happens within the first minute of uptime for >80% of cases, and it sits at #9 now for 30.0b, being one of the most major regressions between beta 29 and beta 30, with those correlations: Add-ons 56% (111/198) vs. 1% (239/32850) {87F8774F-B485-47E2-A755-A40A8A5E886C} 25% (50/198) vs. 0% (110/32850) {87F8774F-B485-47E2-A755-A40A8A5E8873} 20% (39/198) vs. 0% (99/32850) {87F8774F-B485-47E2-A755-A40A8A5E8874} From what I find, those are all "Módulo de Segurança" from "GAS Tecnologia", in versions for "Banco do Brasil" (bb), "Banco Itau Unibanco" (uni) and "Banco Santander Brasil S.A." (abn) and this points to the same module that we already were dealing with in bug 704125 and bug 838568 - just that their binaries now don't even appear in the crashes, I guess we crash before loading them successfully or something like that.
Keywords: topcrash-win
Given that this is a top-10 crash and significantly worse in 30 beta than in 29 beta, I'm nominating for tracking.
Jorge, is it possible to check if those addons have recently updated and if we're hitting the crash on specific versions can we look at blocklisting?
Flags: needinfo?(jorge)
(In reply to Lukas Blakk [:lsblakk] from comment #16) > Jorge, is it possible to check if those addons have recently updated and if > we're hitting the crash on specific versions can we look at blocklisting? According to the download page from comment #6, all installers were updated within a month or two. That doesn't necessarily mean they fixed the crash, though. It'd be good to get a version breakdown for those IDs, if there is one.
Flags: needinfo?(jorge)
Flags: needinfo?(benjamin)
There's some issues with correlations apparently as I see 0-size files for some of the last days, but those I found in 20140518_Firefox_30.0-interesting-addons-with-versions.txt.gz: 53% (123/234) vs. 1% (276/36184) {87F8774F-B485-47E2-A755-A40A8A5E886C} 0% (0/234) vs. 0% (1/36184) 3.4.0 53% (123/234) vs. 1% (273/36184) 3.4.1 0% (0/234) vs. 0% (2/36184) 3.7.1 27% (63/234) vs. 0% (153/36184) {87F8774F-B485-47E2-A755-A40A8A5E8873} 1% (2/234) vs. 0% (4/36184) 3.7.0 1% (2/234) vs. 0% (5/36184) 3.7.1 25% (59/234) vs. 0% (144/36184) 3.8.0.1 26% (60/234) vs. 0% (143/36184) {87F8774F-B485-47E2-A755-A40A8A5E8874} 1% (2/234) vs. 0% (4/36184) 3.2.0 0% (0/234) vs. 0% (1/36184) 3.4.0 1% (2/234) vs. 0% (6/36184) 3.7.0 24% (56/234) vs. 0% (132/36184) 3.7.1 Looks to me like it just matches the versions in circulation and isn't only happening on one. That said, given that it regressed significantly from a low-volume into a top-10 crash from beta 29 to beta 30, I expect that some change in our plugin code aggravated this crash.
Flags: needinfo?(benjamin)
So, looking at the data in comment #19, I think it would make sense to block: * {87F8774F-B485-47E2-A755-A40A8A5E886C}, v 3.4.1 and lower * {87F8774F-B485-47E2-A755-A40A8A5E8873}, v 3.8.0.1 and lower * {87F8774F-B485-47E2-A755-A40A8A5E8874}, v 3.7.1 and lower Most likely this will block all versions for those add-ons, and block a bunch of people in Brazil from using their internet banking. Are we okay with this?
Flags: needinfo?(lsblakk)
I do not think blocking is the right thing to do here. This would just break internet banking for the affected users. If we go that route, we at least need to talk to GAS Tecnologia first. This sounds a lot like an issue in our code, otherwise it would not have flared up with the switch from 20 to 30 on beta. In my eyes, it looks a lot like the rise of bug 1013972 is similar - Jon, does it look like that to you as well?
Flags: needinfo?(jcoppeard)
I think the cause of bug 1013972 is my changes in bug 934421, but these changes were present in version 29. This crash seems to have shot up in 30 so I think there may be something else at play here.
Flags: needinfo?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #23) > I think the cause of bug 1013972 is my changes in bug 934421, but these > changes were present in version 29. This crash seems to have shot up in 30 > so I think there may be something else at play here. Actually, bug 1013972 only shot up in 30 as well.
If the patch in bug 1013972 reduces the crashes, then we will not need to block (and it doesn't seem like a good idea to do that anyway).
Flags: needinfo?(lsblakk)
So, bug 1013972 did not help this one and it looks like Internet banking still crashes for our Brazilian users. Release Managers, do we have a contact for GAS Tecnologia? We should try to work with them to reproduce and debug this issue, which is threatening Firefox 30 as I don't think we can afford to lose those Brazilian users that use Internet banking.
Flags: needinfo?(release-mgmt)
There is a plugin as part of this addon. It runs OOPP, which is why it doesn't show up in module correlations. The binary plugin is probably not directly responsible for the crash here, which I suspect has something to do with the JS code in the addon. I have a copy forwarded from a live install that Felipe did, if anyone else wants to see. Here's some de-obfuscated tidbits: The primary function appears to be a content policy which is used to do greasemonkey-like injection and some other stuff. It appears that the following script runs in a chrome context, and the "document" is the chrome hidden window. var EXPORTED_SYMBOLS = ["sfbb"]; function sfbb(doc) { this._document = doc } sfbb.prototype = { myid: 'xyxyxyxy-xyxy-yxyx-xyxy-xyyxxyyxyxyx'.replace(/[xy]/g, function (c) { var r = Math.random() * 16 | 0, v = c == 'x' ? r : r & 0x3 | 0x8; return v.toString(16) }), getObj: function () { if (this._document.getElementById(this.myid) == null) { var objs = this._document.getElementsByTagName('object'); for (var i = 0; i < objs.length; i++) { if (objs[i].type == 'application/gas-events-bb') { this.myid = objs[i].id; return XPCNativeWrapper.unwrap(objs[i]) } } var obj = this._document.createElement('object'); obj.setAttribute('id', this.myid); obj.setAttribute('type', 'application/gas-events-bb'); this._document.body.appendChild(obj) } return XPCNativeWrapper.unwrap(this._document.getElementById(this.myid)) }, onCatch: function (err) { this._document.body.removeChild(this._document.getElementById(this.myid)) }, sl: function (url) { try { return this.getObj().sl(url) } catch (err) { this.onCatch(err) } return true }, etc... The thing that's interesting to me here is the synchronous remove of the plugin from within a catch block. Another script, which I believe runs in content context: function setup() { try { obj = findSFObj(); if (!obj) { var myid = 'xyxyxyxy-xyxy-yxyx-xyxy-xyyxxyyxyxyx'.replace(/[xy]/g, function (c) { var r = Math.random() * 16 | 0, v = c == 'x' ? r : r & 0x3 | 0x8; return v.toString(16) }); var objtag = document.createElement('object'); objtag.setAttribute('id', myid); objtag.setAttribute('width', '0'); objtag.setAttribute('height', '0'); objtag.setAttribute('type', 'application/gas-events-bb'); objtag.style.position = 'absolute'; objtag.style.left = '0px'; objtag.style.top = '0px'; getLastChild(document).appendChild(objtag); obj = document.getElementById(myid) } } catch (e) {} } function teardown() { obj = findSFObj(); if (obj) { try { obj.sdc() } catch (e) {} getLastChild(document).removeChild(obj) } } window.onerror = function (message, url, linenumber) { return true }; setup(); addEventListener("beforeunload", teardown); obj.spta(new PTA()); obj.dc(function (url) { try { if (url) { window.content.location = url } else { setTimeout(teardown, 50) } } catch (e) {} }); This removes a plugin from the DOM in a beforeunload handler.
I've successfully contacted GAS Tecnologia in the past for similar issues. I can try again. They usually have big issues to debug and to reproduce the crashes. Do we have clear STR? As a emergency plan what can I advise them? Last time the proposed action was to rollback the plugin version. Do we know if there is a plugin version that's working fine?
dmajor, most of the stacks here are suspiciously inexact. Could you load one or two such as bp-0d2b3f63-f846-412f-9a43-e0e812140530 or bp-1381db1e-2203-42a5-9907-d36052140531 and see if windbg gives you a better stack? I have no Windows until next week.
Flags: needinfo?(dmajor)
There are several people working on getting in touch with GAS and also recent comment show investigation of the addon (and its plugin). We're planning to go to build in several hours with our RC candidate but I'll be watching closely for a hint that we might have anything to go on Firefox-side.
Flags: needinfo?(release-mgmt)
We do not have good STR. If somebody can reproduce this, please take a plugin log using the following settings: NSPR_LOG_MODULES="objlc:5,Plugin:5,PluginNPP:5,PluginNPN:5,IPCPlugins:5" See https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging for instructions on how to use NSPR logging.
I have been trying navigating the caixa.gov.br site, and it looks like all their promotion links seem to want to load the plugin in question. If you go to about:addons - plugins and opt to "Ask to Activate" next to the Caixa Economica Fedaral plugin, every time you click on one of the promotion links you get prompted to allow www.facebook.com to load the "Modulo de Proteção" plugin when it takes you to https://www.facebook.com/CAIXAEsportes, for example. I've tried to sent email to a few users who reported the crash, but I haven't received a response so far.
Juan, have you been trying to contact them in English? If you want some help I could help you translating the emails to Portuguese or even getting in touch with them directly.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #29) > dmajor, most of the stacks here are suspiciously inexact. Could you load one > or two such as bp-0d2b3f63-f846-412f-9a43-e0e812140530 or > bp-1381db1e-2203-42a5-9907-d36052140531 and see if windbg gives you a better > stack? I have no Windows until next week. PGO did some impressive things here (inlined a virtual function after verifying the vtable pointer) but I've mostly deciphered it: xul!nsNPAPIPluginInstance::Stop+0x124 xul!nsPluginHost::StopPluginInstance+0x58 xul!nsObjectLoadingContent::DoStopPlugin+0x87 xul!nsObjectLoadingContent::StopPluginInstance+0xc1 xul!nsObjectLoadingContent::UnloadObject+0xa3 xul!CheckPluginStopEvent::Run+0x57 xul!nsBaseAppShell::RunSyncSectionsInternal+0x58 [xul!nsBaseAppShell::RunSyncSections optimized out] [xul!nsBaseAppShell::AfterProcessNextEvent optimized out] xul!nsThread::ProcessNextEvent+0x8a52aa (Huge offset is due to cold code) [It gets unclear here. Frames below are reconstructed from the other direction.] xul!NS_ProcessNextEvent+0x2d xul!mozilla::ipc::MessagePump::Run+0x46 xul!MessageLoop::RunHandler+0x51 xul!MessageLoop::Run+0x19 xul!nsBaseAppShell::Run+0x2c xul!nsAppShell::Run+0x19 xul!nsAppStartup::Run+0x1e xul!XREMain::XRE_mainRun+0x453 xul!XREMain::XRE_main+0xe8 xul!XRE_main+0x30 firefox!do_main+0x283
Flags: needinfo?(dmajor)
What are we derefing/calling at nsNPAPIPluginInstance::Stop+0x124 ?
Flags: needinfo?(dmajor)
Flags: needinfo?(dmajor)
I tried to craft artificial testcases for this by removing a plugin element from the DOM within the final NPP_DestroyStream calls, but was not able to find any obvious sequence. Has there been any response/luck from GAS?
Flags: needinfo?(seocam)
Given that we have good knowledge in comment #37 of what code change on our side made this spike and that the comment also has pointers to what kind of thing could go wrong here, is there any way we can try to get this fixed in 31? We know it's hard to reproduce because even in the wild, the number of installations hitting this is nearly as high as the number of crashes, so most users probably only see this once.
We don't know what's causing this. It's probably some nested event loop or dual-teardown issue, which was exposed by memory poisoning but that doesn't really help us figure out what to change.
We are now seeing this as a topcrash in Firefox 31, currently #6 in Beta though not showing up in RC yet.
(In reply to David Major [:dmajor] from comment #36) > It's the call to pluginFunctions->destroy: > http://hg.mozilla.org/mozilla-central/file/e6f580100975/dom/plugins/base/ > nsNPAPIPluginInstance.cpp#l327 If this is OOP, that call is actually going to PluginModuleParent::NPP_Destroy. Is PluginModuleParent being destroyed before all of the instances?
Except if we have a huge spike, we won't do a dot release for this bug. Please resubmit if the bug is a big problem in the 31 release. Does it show in 32?
Sorry for taking that long. I've just realized that Bugzilla's emails were sent to spam. No luck contacting GAS by phone. The secretary told me they would call back but they didn't this time.
Flags: needinfo?(seocam)
in firefox 33 the startup crashes with these addons are showing up as nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) signature, which is near the top ten crashers.
philipp, thanks for the update. That said, that's known and as we found out here, this is some complex interaction between our plugin init code and the add-on code of GAS, both of which are pretty complicated apparently and not easy to fix. (I'd appreciate anyone investigating our side of it but be aware that there be dragons in those places.) The reason why we get away with mostly ignoring it for now is that we have pretty good data showing us that this crash is very intermittent and usually only happens once a week or so at most for affected users and things work fine on the next startup after the crash.
Seeing this in high volume on 34. The GetJSObject signature is almost exclusively from PT-BR machines with the banking addon, but we've also seen the StopPluginInstance signature without the addon e.g. bug 1110598.
Crash Signature: [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] → [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ]
This is still happening a lot in 37.0.1 and many of the uptimes are below 60 seconds.
Crash Signature: [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] → [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] [@ nsNPAPIPluginInstance::Stop() ]
Crash Signature: [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] [@ nsNPAPIPluginInstance::Stop() ] → [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] [@ nsNPAPIPluginInstance::Stop() ] [@ nsNPAPIPluginInstance::GetValueFromPlugin(NPPVariable, void*) ]
With every new release, this crash signature briefly catches my attention, until I remember how little success we've had so far... Aaron, any interest in taking a look?
Flags: needinfo?(aklotz)
It's not going to be on my front burner any time soon.
Flags: needinfo?(aklotz)
Crash Signature: [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] [@ nsNPAPIPluginInstance::Stop() ] [@ nsNPAPIPluginInstance::GetValueFromPlugin(NPPVariable, void*) ] → [@ nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*)] [@ nsNPAPIPluginInstance::GetJSObject(JSContext*, JSObject**) ] [@ nsNPAPIPluginInstance::Stop() ] [@ nsNPAPIPluginInstance::GetValueFromPlugin(NPPVariable, void*) ] [@ nsPluginHost::StopPlu…
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: