Closed Bug 638756 Opened 9 years ago Closed 9 years ago

Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled [@ nsWindow::HandleScrollingPlugins ][@ _SEH_prolog4 ][@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) ][@ FindNCHit ]

Categories

(Core :: Widget: Win32, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla5
Tracking Status
blocking2.0 --- Macaw+
status2.0 --- .1-fixed

People

(Reporter: wsmwk, Assigned: heycam)

References

Details

(Keywords: crash, Whiteboard: [tbird crash][tb33needed])

Crash Data

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #531341 +++

Thunderbird 3.3 trunk builds beginning with 20110213030010. all win7
I haven't checked firefox.

bp-3944e872-909f-40aa-9352-bb7a22110222
EXCEPTION_ACCESS_VIOLATION_READ
0x4400e8
2 finger swipe scrolling down leads to this crash. Elantech Smart-Pad 8.0.7.0 drivers
0	xul.dll	nsWindow::HandleScrollingPlugins	widget/src/windows/nsWindow.cpp:7936
1	xul.dll	nsWindow::OnMouseWheel	widget/src/windows/nsWindow.cpp:6700
2	xul.dll	nsWindow::ProcessMessage	widget/src/windows/nsWindow.cpp:5569
3	xul.dll	nsWindow::WindowProcInternal	widget/src/windows/nsWindow.cpp:4616
4	xul.dll	CallWindowProcCrashProtected	xpcom/base/nsCrashOnException.cpp:65
5	xul.dll	nsWindow::WindowProc	widget/src/windows/nsWindow.cpp:4563
6	user32.dll	user32.dll@0x162f9	
7	user32.dll	user32.dll@0x16d39	
8	user32.dll	user32.dll@0x177c3	
9	user32.dll	user32.dll@0x17889	
10	xul.dll	nsAppShell::ProcessNextNativeEvent	widget/src/windows/nsAppShell.cpp:339
11	xul.dll	nsBaseAppShell::DoProcessNextNativeEvent	widget/src/xpwidgets/nsBaseAppShell.cpp:173
bp-d09b635e-bd96-40f6-bd3d-7a4472110301 reporter (bv1578) comments ...

I use TB for testing on Lightning add on. Every time I get a
crash I usually work either on Lightning or on DOM inspector. In
particular the crash with Dom Inspector is systematic. Every time I
open that window I have to say to myself "OK now you are on DOM
inspector, remember, don't use the mouse wheel otherwise you get a
crash!"
So, we're crashing on a comparison check of two pointers?!
Ehsan, what revision of mozilla-central does this correspond to?  (I tried revisions around both 2011-02-13 and 2011-02-22 but couldn't get the line numbers in the stack trace to match up.)
(In reply to comment #3)
> Ehsan, what revision of mozilla-central does this correspond to?  (I tried
> revisions around both 2011-02-13 and 2011-02-22 but couldn't get the line
> numbers in the stack trace to match up.)

88204c53761b.  In order to get this information, you should click on one of the hg links in the crash report and get the revision from the URL there.
Thanks Ehsan.  I can reproduce this on a Thunderbird nightly (2011-02-23) with the Elantech hack pref set to 1.  With a fresh profile, the new mail account window pops up automatically on startup.  Using the mouse wheel while that window is open will trigger the crash.
Assignee: nobody → cam
Status: NEW → ASSIGNED
Summary: Stack overflow crash related to scrolling with trackpad and plugins [@ nsWindow::HandleScrollingPlugins] [@ _SEH_prolog4 ][@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*)][@ FindNCHit ] → Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled
heycam, Thanks for taking this.

signatures are still needed in bug summary a) as a search aid, b) so crash-stats correctly links crashes to this bug report
Summary: Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled → Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled [@ nsWindow::HandleScrollingPlugins] [@ _SEH_prolog4 ][@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*)][@ FindNCHit ]
Summary: Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled [@ nsWindow::HandleScrollingPlugins] [@ _SEH_prolog4 ][@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*)][@ FindNCHit ] → Crash in nsWindow::HandleScrollingPlugins when using the mouse wheel and with the Elantech hack enabled [@ nsWindow::HandleScrollingPlugins ][@ _SEH_prolog4 ][@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) ][@ FindNCHit ]
Something is going wrong in IsElantechHelperWindow.  Before the call to it from nsWindow::HandleScrollingPlugins, edi holds the this pointer.  In the prologue of IsElantechHelperWindow, edi is pushed onto the stack, but somewhere in the function the stack pointer ends up slightly off, so that by the end of the function when edi is popped it's from somewhere else in the stack.  The corrupted this pointer results in the crash in that comparison.
Missed a WINAPI on the typedef for a manually imported DLL function, which was causing the stack to get messed up inside IsElantechHelperWindow.
Attachment #519576 - Flags: review?(jmathies)
Attachment #519576 - Flags: review?(jmathies) → review+
Whiteboard: [tbird crash] → [tbird crash][tb33needs]
Ah, that's pretty bad.  But I wonder why we haven't seen this crash in Firefox?
(In reply to comment #9)
> Ah, that's pretty bad.  But I wonder why we haven't seen this crash in Firefox?

The two big things that I know Thunderbird does differently at the moment are not shipping the DirectX dlls (bug 632148) and not having IPC enabled (bug 617906). It looks like both of those could be at play in that file, though I'd kinda be surprised if one or other could actually cause this.
(In reply to comment #10)
> (In reply to comment #9)
> > Ah, that's pretty bad.  But I wonder why we haven't seen this crash in Firefox?
> 
> The two big things that I know Thunderbird does differently at the moment are
> not shipping the DirectX dlls (bug 632148) and not having IPC enabled (bug
> 617906). It looks like both of those could be at play in that file, though I'd
> kinda be surprised if one or other could actually cause this.

The DirectX DLLs can't be relevant.  And I really don't think that having IPC enabled should be a factor here too.  The only way that we might not get bitten by this in Firefox is if we passed -Gz to the compiler (which sets the default calling convention to stdcall) but we don't do that...  This is a mystery to me, but I think we'd want the patch on trunk too.
Haven't investigated why I've not been able to get Firefox to crash from this, but I agree it's odd.
blocking2.0: --- → ?
There is no way this blocks Firefox4/Gecko 2.0. You can ask for approval from drivers, though, if Thunderbird needs it. Not sure what repository Thunderbird will be building from, though!
blocking2.0: ? → -
(In reply to comment #12)
> Haven't investigated why I've not been able to get Firefox to crash from this,
> but I agree it's odd.

Can you investigate the same way you did in comment 7 to see if edi is corrupted in Firefox too?
Sorry, I meant to ask "should we take this for the next point release", but I guess just setting blocking2.0:? doesn't mean that!  There's no ".x?" value...

Ehsan, I'll look into it.
(In reply to comment #15)
> Sorry, I meant to ask "should we take this for the next point release", but I
> guess just setting blocking2.0:? doesn't mean that!  There's no ".x?" value...
> 
> Ehsan, I'll look into it.

I think depending on what we find out in answer to comment 14, this may be a .x blocker.  So, let's renom if this actually proves to be a problem for Firefox.
For Thunderbird I expect we'll need this on mozilla-central and mozilla-2.0 and whatever else is implied in that. At the moment we've got no rush for it, so I don't mind holding off for a few days whilst various release activities happen.
It seems in Firefox that the compiler has just done register allocation differently.  On the line that does the pointer comparison, the this pointer is fetched off the stack rather than already being in edi (or some other register).  So the messing up of the stack during IsElantechHelperWindow, which can cause esi and edi to be restored to incorrect values, doesn't cause the crash.

edi is overwritten with something else later on anyway, so its incorrectly restored value doesn't matter.

Towards the start of HandleScrollingPlugins, esi is set to 1.  esi is used in a couple of places after the IsElantechHelperWindow call, so some weirdo non-zero values (actually, the first four bytes of the `PRUnichar path[256]` local variable) get stored in places (and returned from HandleScrollingPlugins) instead of 1s.  It seems like all of these are using the assumed 1 value as booleans, which I guess is why it's not causing problems.

I would still be happier with the fix in, though. :-)
Oh, this is extremely scary.  So what you're saying is that we've basically been lucky that this has not bitten us yet.  I think we should really take this fix in a .x release, as soon as possible.  Nominating for blocking because of this.
blocking2.0: - → ?
blocking2.0: ? → .x+
http://hg.mozilla.org/mozilla-central/rev/a4e4d6ca41f3
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.2
Finally i know the reason to my problems with Firefox 4. Last half year i have had very strange problems with ping, flash and silverlight some specific sites. And yes those Thunderbird crashes almost all are mine, but i never thought that this also causes problems with Firefox 4 too. I've been beta tester for F-secure and always believe that antivirus/firewall was messing with Firefox 4. I have send apology mail to F-secure guys. F-secure and i have tried very long time to find the problem. Now with 4.2a1pre all problems are gone.

So this bugfix should land very soon to Firefox 4.X.

(In reply to comment #19)
> Oh, this is extremely scary.  So what you're saying is that we've basically
> been lucky that this has not bitten us yet.  I think we should really take this
> fix in a .x release, as soon as possible.  Nominating for blocking because of
> this.
Could you please add the fix to the Thunderbird 3.3 branch too?
Comment on attachment 519576 [details] [diff] [review]
Add missing calling convention keyword

Requesting approval for the 4.0.x branch - this is a significant number 1 crasher seen on the Thunderbird 3.3 alpha builds, and we need this fixing if we're going to ship off mozilla-2.0. It should be low risk as it is a correctness fix for the code.
Attachment #519576 - Flags: approval2.0?
And I think this needs to hardblock 2.0.1.
blocking2.0: .x+ → ?
(because a PGO build can possibly turn this into a constant crash for Firefox users who have an Elantech touchpad.)
blocking2.0: ? → Macaw
status2.0: --- → wanted
Comment on attachment 519576 [details] [diff] [review]
Add missing calling convention keyword

Approved for the mozilla2.0 repository, a=dveditz for release-drivers
Attachment #519576 - Flags: approval2.0? → approval2.0+
Whiteboard: [tbird crash][tb33needs] → [tbird crash][tb33needed]
Crash Signature: [@ nsWindow::HandleScrollingPlugins ] [@ _SEH_prolog4 ] [@ nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) ] [@ FindNCHit ]
You need to log in before you can comment on or make changes to this bug.