[OOPP] Crash when visiting quakelive.com using FF 3.6.3plugin1 [@ xul.dll@0xae2b28]

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
10 years ago
8 years ago

People

(Reporter: shakal-hamburg, Assigned: benjamin)

Tracking

({crash, verified1.9.2})

Other Branch
x86_64
Windows 7
Points:
---

Firefox Tracking Flags

(blocking2.0 alpha4+, blocking1.9.2 .4+, status1.9.2 .4-fixed)

Details

(crash signature, )

Attachments

(2 attachments)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.3pre) Gecko/20100405 Firefox/3.6.3plugin1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.3pre) Gecko/20100405 Firefox/3.6.3plugin1

when Firefox is loading the Website, he crased totaly

Reproducible: Always

Steps to Reproduce:
1. start firefox 3.6.3plugin1
2. open www.quakelive.com
3. see the crash  :/
bp-d65ed0e7-d371-4bfa-991e-fba5a2100409 (safe mode and without addons/plugins)

bp-e850ae3a-0a45-4a58-94b1-397932100409 (normal)
0  	xul.dll  	xul.dll@0xae2b28  	
1 	xul.dll 	CreateNPAPIPlugin 	modules/plugin/base/src/nsPluginHost.cpp:3947
2 	xul.dll 	nsPluginHost::GetPlugin 	modules/plugin/base/src/nsPluginHost.cpp:4032
Component: General → Plug-ins
Keywords: crash
Product: Firefox → Core
QA Contact: general → plugins
Version: unspecified → 1.9.2 Branch
sorry, what does that mean?
and this crash is only in 3.6.3plugin1... not in 3.6.3!
Duplicate of this bug: 558901
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: browser crashed when i opening the URL → browser crashed when i opening the URL [@ xul.dll@0xae2b28]
Version: 1.9.2 Branch → Other Branch
Summary: browser crashed when i opening the URL [@ xul.dll@0xae2b28] → Crash when visiting quakelive.com using FF 3.6.3plugin1 [@ xul.dll@0xae2b28]
#3 crasher for 3.6.3plugin1. Most if not all of the reports have comments mentioning quakelive.
Summary: Crash when visiting quakelive.com using FF 3.6.3plugin1 [@ xul.dll@0xae2b28] → [OOPP] Crash when visiting quakelive.com using FF 3.6.3plugin1 [@ xul.dll@0xae2b28]
This is worrisome because it's apparently some change in the plugin host. Trying to reproduce.
blocking1.9.2: --- → ?
blocking2.0: --- → alpha4+
philip:
UUID	d65ed0e7-d371-4bfa-991e-fba5a2100409
npquakezero.dll  	0.1.0.277  	6C5BA596189A4DFDB530BE62B823B9041  	npquakezero.pdb

safe mode doesn't disable plugins.
I'm going to hold off on setting this as blocking until more information is known. Leaving the blocking request though so we can look at it in triage
I can reproduce this 100%. Something is being corrupted in the call to NP_Initialize. Still looking.
I can reproduce this 100% on Windows7 as well with the Quake Live plugin on Lorentz.
For additional info, when I view it on OS X with the Lorentz build, no crash.
I'm going to mark this as blocking while it is investigated more...seems pretty nasty at first blush.
blocking1.9.2: ? → .4+
This is nasty:

* We're calling NP_Initialize from quakelive using the "correct" WINAPI calling convention, which is callee-pops. This is required by Flash and other plugins, see bug 525605
* The NP_Initialize for quakelive is written using cdecl, which is caller-pops. This leaves ESP off by 4 bytes. This is definitely absolutely a bug in quakelive.

However, this happens to work in previous versions of Firefox, probably because in prior versions of our code the compiler used a frame pointer. The assembly probably then reset ESP before the final return using EBP... trying to verify against a 3.6.3 build.

The workaround here, if we want to try it, is to either force MSVC to de-optimize PluginPRLibrary::NP_Initialize so that it always uses a frame pointer, or to use custom assembly to thunk so that either calling convention continues to work. The latter solution is probably best. In either case, I'm definitely not going to be able to test/post a patch tonight.
Assignee: nobody → benjamin
(In reply to comment #14)
> in prior versions of our code the compiler used a frame pointer

What changed this?
Thanks for the analysis bsmedberg. I'm going to leave this as blocking...it seems to me this bug is worth blowing through code freeze. If others disagree tomorrow, we'll remove the blocking flag and start the beta build w/o a fix.
http://www.quakelive.com/forum/showthread.php?t=43147 is the upstream "bug report"
The reason we have a frame pointer in nsNPAPIPlugin::CreatePlugin in 3.6.3 is that modules/plugin/base/src is compiled with ENABLE_CXX_EXCEPTIONS, which requires a frame pointer. The function prolog is

// Creates the nsNPAPIPlugin object. One nsNPAPIPlugin object exists per plugin (not instance).
nsresult
nsNPAPIPlugin::CreatePlugin(const char* aFilePath, PRLibrary* aLibrary,
                            nsIPlugin** aResult)
{
6B23C1B2  push        8    
6B23C1B4  mov         eax,offset __ehhandler$?CreatePlugin@nsNPAPIPlugin@@SAIPBDPAUPRLibrary@@PAPAVnsIPlugin@@@Z (6B01C1F1h) 
6B23C1B9  call        _EH_prolog3 (6AFDC312h)
Comment on attachment 438759 [details] [diff] [review]
Force the function to use a frame pointer, so that stdcall/cdecl both work, rev. 1

>+// Some plugins on Windows, notably Quake Live, implement NP_Initialize using
>+// cdecl instead of the documented stdcall. In order to work around this,
>+// we force the caller to use a frame pointer.
>+#if defined(XP_WIN) && defined(_M_IX86)
>+#define ALLOCA_HACK void* foo = _alloca(gCompilerHack);
>+static int gCompilerHack;
>+#else
>+#define ALLOCA_HACK
>+#endif

Maybe we should call this CALLING_CONVENTION_HACK
instead of ALLOCA_HACK 'cause that's makes it more obvious that it's related to dealing with the calling convention instead of allocation. 

The gCompilerHack is a run-time zero to avoid optimization right? It could also use a better name or a comment. I'm also a little worried that the optimizer could still throw the alloca away. Can we add a test plugin that uses the wrong calling convention?
Attachment #438759 - Flags: review?(jmuizelaar) → review+
Adding a new test plugin is a bit complicated for this. I think we should rely on manual/litmus testing for this for the time being until they can fix it for real. I can certainly rename the variables: I've verified that the compiler does not in fact optimize away the alloca.
Comment on attachment 438759 [details] [diff] [review]
Force the function to use a frame pointer, so that stdcall/cdecl both work, rev. 1

We should limit use of this hack to functions declared with OSCALL - those are WINAPI explicitly. The functions in the NPAPI function table are default calling convention (NPP_New is default, for example).

#if defined(__OS2__)
NPError OSCALL NP_GetPluginData(NPPluginData * pPluginData);
#endif
NPError OSCALL NP_GetEntryPoints(NPPluginFuncs* pFuncs);
NPError OSCALL NP_Initialize(NPNetscapeFuncs* bFuncs);
NPError OSCALL NP_Shutdown();

There are only 4 functions declared OSCALL I think.
Posted patch FollowupSplinter Review
Attachment #438780 - Flags: review?(joshmoz)
Attachment #438759 - Flags: review?(joshmoz) → review-
Attachment #438780 - Flags: review?(joshmoz) → review+
Attachment #438780 - Flags: approval1.9.2.4+
Attachment #438759 - Flags: approval1.9.2.4+
a=LegNeato for 1.9.2.4. Please land as soon as possible
What was wrong with #pragma optimize( "y", off ) ?
In other bugs I've tried that, but it appears to work erratically or not at all in PGO builds.
We have indeed noticed a number of issues with our NPAPI implementation and
will be rolling out fixes with our next QUAKELIVE plugin release. 

Thanks for the detailed report.
Comment on attachment 438759 [details] [diff] [review]
Force the function to use a frame pointer, so that stdcall/cdecl both work, rev. 1

>+ * The Initial Developer of the Original Code is
>+ *   Josh Aas <josh@mozilla.com>

This is incorrect. Mozilla employees should use "Mozilla Foundation" as the initial developer in their code. See http://weblogs.mozillazine.org/gerv/archives/2010/02/mpl_initial_developer_for_mozilla_employ.html for more information.
This cleanly repros on my Win7 VM pre-fix. I'll get the official 3.6.4 build
later today or tomorrow to verify the fix.
No crash for me using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.4) Gecko/20100413 Firefox/3.6.4 (.NET CLR 3.5.30729). Will check using Windows 7 as well if Al does not get a chance.
Using the same machine and the 3.6.4 candidate build, I no longer crash. Verified for 1.9.2.

 Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100413 Firefox/3.6.4
Keywords: verified1.9.2
No longer depends on: 559703
Crash Signature: [@ xul.dll@0xae2b28]
You need to log in before you can comment on or make changes to this bug.