Closed Bug 530962 Opened 15 years ago Closed 14 years ago

Taskbar tab preview crashes [@ mozilla::widget::WindowHook::Lookup(unsigned int)] in Firefox 3.6 at TabPreview destruction

Categories

(Core :: Widget: Win32, defect, P1)

1.9.2 Branch
All
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a4
Tracking Status
blocking2.0 --- beta1+
blocking1.9.2 --- needed
status1.9.2 --- .4-fixed
status1.9.1 --- unaffected

People

(Reporter: jst, Assigned: mak)

References

()

Details

(Keywords: crash, regression, Whiteboard: [crashkill][crashkill-fix][Fx3.6.3plugin1])

Crash Data

Attachments

(1 file, 1 obsolete file)

There's a new crash in Firefox 3.6b3 with the signature "mozilla::widget::WindowHook::Lookup(unsigned int)" that hasn't been seen in any of the versions 3\.5.*. So far we've seen 43+ of these crashes in the wild.

Please see http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6b3&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=mozilla%3A%3Awidget%3A%3AWindowHook%3A%3ALookup%28unsigned%20int%29&do_query=1 for more crash info.
Flags: blocking1.9.2?
It looks like we are getting a NULL WindowHook. Since it lives in the nsWindow, that means that we have a NULL nsWindow which we're getting from mWnd so the window must have been destroyed by the time the GC happens.

However, I anticipated this ordering of events so something is mysterious here. The TaskbarPreview should have hooked mWnd's WM_DESTROY in its ctor. When the window was destroyed, it should have called DetachFromNSWindow which would have set mWnd to NULL and thus prevent the call stack in the crash report from appearing.

I suspect that the WM_DESTROY message must not be getting sent and this is due to the weird way we construct/destruct native widgets in nsWindow by creating them with the DefWindowProc and then subclassing them. Why do we do this? I dunno, it's code from the initial checkin.
New crash, looks potentially frequent, absolutely blocking.
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
This is currently ranked as topcrash #201 in 3.6b3 (0.07% of crashes), and is not in the top 300 for 3.6b4.

It seems good to fix, but is it really a blocker?
its interesting that many of the urls reported are chrome speedial

20091129-crashdata.csv 3.6b4 http://primbux.ru/members.php
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 chrome://speeddial/content/speeddial.xul
20091129-crashdata.csv 3.6b4 chrome://speeddial/content/speeddial.xul
20091129-crashdata.csv 3.6b4 chrome://speeddial/content/speeddial.xul
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 chrome://speeddial/content/speeddial.xul
20091129-crashdata.csv 3.6b4 about:blank
20091129-crashdata.csv 3.6b4 chrome://speeddial/content/speeddial.xul
20091129-crashdata.csv 3.6b5pre 
20091129-crashdata.csv 3.6b5pre 
20091129-crashdata.csv 3.6b5pre 

https://addons.mozilla.org/en-US/firefox/reviews/display/4810 says speeddial is marked compat with -3.7a1pre but one user comments that for some reason its marked not compat.

>>  Just upgraded to 3.6b2, even though I can see it says compatible, fiefox says it is not. I miss speeddial and can't install :( 

wonder if its speeddial related, and if it is, then are the reports we see the result of users overiding a compat check?
but "interesting addons from the 11/27 says its not around very much when crashing on this signature

  mozilla::widget::WindowHook::Lookup(unsigned int)|EXCEPTION_ACCESS_VIOLATION (23 crashes)
     52% (12/23) vs.   3% (383/11518) {CAFEEFAC-0016-0000-0016-ABCDEFFEDCBA} (Java Console, http://java.sun.com/javase/downloads/)
     39% (9/23) vs.   1% (126/11518) {d40f5e7b-d2cf-4856-b441-cc613eeffbe3} (BetterPrivacy, https://addons.mozilla.org/addon/6623)
     39% (9/23) vs.  12% (1374/11518) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
     30% (7/23) vs.   4% (490/11518) personas@christopher.beard (Personas, https://addons.mozilla.org/addon/10900)
     26% (6/23) vs.   5% (567/11518) {D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389} (Download Statusbar, https://addons.mozilla.org/addon/26)
     22% (5/23) vs.   2% (206/11518) {73a6fe31-595d-460b-a920-fcc0f8843232} (NoScript, https://addons.mozilla.org/addon/722)
     17% (4/23) vs.   1% (129/11518) {0538E3E3-7E9B-4d49-8831-A227C80A7AD3} (Forecastfox, https://addons.mozilla.org/addon/398)
     17% (4/23) vs.   2% (261/11518) foxmarks@kei.com (Xmarks (formerly Foxmarks), https://addons.mozilla.org/addon/2410)
     13% (3/23) vs.   1% (104/11518) en-GB@dictionaries.addons.mozilla.org (British English Dictionary, https://addons.mozilla.org/addon/3366)
     13% (3/23) vs.   1% (133/11518) {64161300-e22b-11db-8314-0800200c9a66} (Speed Dial, https://addons.mozilla.org/addon/4810)
     13% (3/23) vs.   2% (212/11518) {dc572301-7619-498c-a57d-39143191b318} (Tab Mix Plus, https://addons.mozilla.org/addon/1122)
     17% (4/23) vs.   6% (734/11518) {b9db16a4-6edc-47ec-a1f4-b86292ed211d} (Video DownloadHelper, https://addons.mozilla.org/addon/3006)
     13% (3/23) vs.   3% (381/11518) compatibility@addons.mozilla.org
      9% (2/23) vs.   0% (5/11518) stop-reload@design-noir.de (Smart Stop/Reload, https://addons.mozilla.org/addon/7401)
      9% (2/23) vs.   0% (25/11518) twitternotifier@naan.net (Echofon (Formerly TwitterFox), https://addons.mozilla.org/addon/5081)
      9% (2/23) vs.   0% (39/11518) {000a9d1c-beef-4f90-9363-039d445309b8} (Google Gears Portable, https://addons.mozilla.org/addon/13492)
      9% (2/23) vs.   1% (69/11518) {340c2bbc-ce74-4362-90b5-7c26312808ef} (Mozilla Labs - Weave Sync, https://addons.mozilla.org/addon/10868)
Should reconsider based on comment 5.
Flags: blocking1.9.2+ → blocking1.9.2?
Jim, you should probably look at this - it could turn into a blocker again, and Rob Arnold says "someone else besides me should learn how that code works." :)
Assignee: nobody → jmathies
Not blocker as per comment 5; didn't turn out to be as frequent as we'd thought.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Whiteboard: [crashkill][crashkill-fix]
blocking2.0: --- → ?
Flags: wanted-next+
I have someone on bug 529403 you have seen this crash too. We will try to figure out if it is reproducible.
I'm not going to be able to get to this in the near term.
Assignee: jmathies → nobody
a few spike days in the 30 crashes per day range durning Nov., but mostly in the twenties for Dec.
Summary: New crash [@ mozilla::widget::WindowHook::Lookup(unsigned int)] in Firefox 3.6b3 → Taskbar tab preview crashes [@ mozilla::widget::WindowHook::Lookup(unsigned int)] in Firefox 3.6
Somebody sent 48 reports of this crash within a 7 minute timespan earlier today, so now it's high on the 3.6b6pre crash reports list.
mostly we have 15-30 reports a day, but we also had a big spike on 12/13

20091211-crashdata 26 mozilla::widget::WindowHook::Lookup
20091212-crashdata 17 mozilla::widget::WindowHook::Lookup
20091213-crashdata 158 mozilla::widget::WindowHook::Lookup
20091214-crashdata 16 mozilla::widget::WindowHook::Lookup
20091215-crashdata 18 mozilla::widget::WindowHook::Lookup

that was another high concentration in the same hour situation

hourly frequency of reports
   1 2009121301
   1 2009121302
   1 2009121307
   2 2009121309
   1 2009121310
   3 2009121311
   2 2009121314
 136 2009121315
   4 2009121316
   2 2009121318
   5 2009121323
crash urls don't say much
 147 about:blank
   5 chrome://speeddial/content/speeddial.xul
   3 \N
   1 http://www.ustream.tv/channel/chocotan
   1 http://www.google.co.uk/
   1 chrome://google-toolbar/content/new-tab.html
speeddial shows up consistantly..  this was yesterday

new mo betta urls for testing
  13 about:blank
   8 chrome://speeddial/content/speeddial.xul
   4 \N

also seems highly correlated to Windows 7.

  27 mozilla::widget::WindowHook::Lookup(unsigned int) Windows NT 6.1.7600
   1 mozilla::widget::WindowHook::Lookup(unsigned int) Windows NT 6.1.7100
(In reply to comment #17)
> also seems highly correlated to Windows 7.
> 
>   27 mozilla::widget::WindowHook::Lookup(unsigned int) Windows NT 6.1.7600
>    1 mozilla::widget::WindowHook::Lookup(unsigned int) Windows NT 6.1.7100

It's a crash in Win7 specific functionality, so that's to be expected.
Is this the same bug? https://bugzilla.mozilla.org/show_bug.cgi?id=553293

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a4pre) Gecko/20100319 Minefield/3.7a4pre - Build ID: 20100319040017
Reported crashes:

http://fwd4.me/IKo
Gary, can you reproduce this consistently?
Seems like FF has to be running for at least ten minutes or so before you crash when exiting.  It's pretty consistent.  look at all the crash reports.
I think we can fix Bug 553293 by reordering the way we do deletion, but that doesn't fix the underlying problem here.
(In reply to comment #24)
> I think we can fix Bug 553293 by reordering the way we do deletion, but that
> doesn't fix the underlying problem here.

But what about the other way around? I think that fixing this bug will fix bug 553293 (in fact, I think it's likely a dupe of this one).
Blocks: 545228
i'm going to do some testing on a shared idea discussed on IRC with Jim and Rob to fix the crash.
If this should not be fixed before next nightly i'll probably backout the leak fix, or invert the deletes just to bring back crash ratio to where it was before.
While trying to reproduce another bug today on the trunk, I created a new trunk profile and when I started exiting Firefox I am consistently getting this crash on that profile. I can provide more information about the profile if it will help.
(In reply to comment #28)
>  I can provide more information about the profile if it will
> help.

can you zip it up and attach it eventually? anything that can help to make this persistent would help.

btw, i tried to notify WM_DESTROY in this patch in the case we did not get it, unfortunatly with the generated build i was able to crash. In a debug build with the same code (and a breakpoint on it) i crashed and the breakpoint was not hit.
http://hg.mozilla.org/try/rev/0d91b1c5425b
just before a crash i got this assertion
###!!! ASSERTION: Cannot use taskbar previews in an embedded context
 file c:/mozilla/mozilla-central/obj-pymake-i686-pc-msys/browser/widows/../../../../../widget/src/windows/TaskbarPreview.cpp, line 300
that is from TaskbarPreview::GetWindowHook()
just in case anyone guesses if inverting the win and preview deletes in the module helps, it does not. Inverted them, cleared fastload, started browser, crashed.
The trunk crash I referenced in Comment 28 started after I downloaded and installed http://www.freedownloadmanager.org/download.htm which I think is on both of my Windows 7 machines on which I can reproduce this crash. Will investigate a bit more to see if can figure out what I added to my 2 machines to make this start happening (since previously they were not crashing on exit)
(In reply to comment #31)
> just in case anyone guesses if inverting the win and preview deletes in the
> module helps, it does not. Inverted them, cleared fastload, started browser,
> crashed.

Darn.
i backed out the leak fix, we won't be able to fix it till we figure out this crash.
I am not able to reproduce this. I'm also not sure I buy my "missing WM_DESTROY" theory in comment 1 anymore. For those that can reproduce, does the patch in bug 545078 fix this?
reproducing in debug builds is harder, but a good way is to use it for about 10 minutes for your usual navigation, and then close it. i've not been able to find other ways to cause the crash other than this. Be sure to not use a m-c changeset after the backout too.
and just FYI in my tests WM_DESTROY was always handled, and additional code to handle cases where it was missing was never hit.
So working through this a bit on irc with robarnold, the problem is not that the WM_DESTROY messages don't exist/aren't being handled, they are getting to WindowHook::Notify but there aren't any registered hooks at that point in time.

Transcript from my breakpoints:
nsWindow::Destroy() 0x00470b04 {unused=-1063372352 }
Received WM_DESTROY in WindowHook::Notify() for 0x00470b04 {unused=-1063372352 }
data == nsnull, return.
nsWindow::OnDestroy()
~TaskbarPreview for preview 0x08692d4c {mTaskbar={...} mController={...} mWnd=0x00470b04 ...}
TaskbarPreview::DetachFromNSWindow(TRUE) window: 0x00470b04 {unused=-1063372352 } on preview 0x08692d4c {mTaskbar={...} mController={...} mWnd=0x00470b04 ...} THREAD 0x1918
First-chance exception at 0x69c21926 (xul.dll) in firefox.exe: 0xC0000005: Access violation reading location 0x000000b8.
Unhandled exception at 0x69c21926 (xul.dll) in firefox.exe: 0xC0000005: Access violation reading location 0x000000b8.
(In reply to comment #35)
> I am not able to reproduce this. I'm also not sure I buy my "missing
> WM_DESTROY" theory in comment 1 anymore. For those that can reproduce, does the
> patch in bug 545078 fix this?

i created a new trybuild with that fix plus my leak patch, and i still got this crash.
i don't think this wants a regression window, it is most likely existent from the first AeroPeek implementation, and just varying hit ratio.
I crashed in that stack today while running Mozmill tests against the 3.6.3plugin1 build:

http://crash-stats.mozilla.com/report/index/786e2581-8d92-42e4-ba77-c68c72100326
Whiteboard: [crashkill][crashkill-fix] → [crashkill][crashkill-fix][Fx3.6.3plugin1]
(In reply to comment #41)
> I crashed in that stack today while running Mozmill tests against the
> 3.6.3plugin1 build:
> http://crash-stats.mozilla.com/report/index/786e2581-8d92-42e4-ba77-c68c72100326

Shouldn't taskbar tab previews be turned off in 3.6.3plugin1?
(In reply to comment #42)
> (In reply to comment #41)
> > I crashed in that stack today while running Mozmill tests against the
> > 3.6.3plugin1 build:
> > http://crash-stats.mozilla.com/report/index/786e2581-8d92-42e4-ba77-c68c72100326
> 
> Shouldn't taskbar tab previews be turned off in 3.6.3plugin1?

They are not turned off enough to avoid crashing. The pref merely affects the nsITaskbarTabPreview.visible boolean.
(In reply to comment #43) 
> They are not turned off enough to avoid crashing. The pref merely affects the
> nsITaskbarTabPreview.visible boolean.

Well, could we turn them off properly on 3.6?
I just installed Windows 7 last week and have got this crash several times
(using trunk builds).
(In reply to comment #44)
> I just installed Windows 7 last week and have got this crash several times
> (using trunk builds).

last week there was the leak patch in, that was making this crash FAR MORE visible. after the backout i've not been able to crash.
Btw, makes probably sense to disable on 3.6, in another bug probably, changing the ifdef in browser.js should be enough.
we hit a spike of 3052 crashes per day on the 19th, and that sounds due to the leak regression fix.  Looks like we are now headed back to the sustained rate fo 60-140 crashes per day.

20100310-crashdata 88 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100311-crashdata 87 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100312-crashdata 95 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100313-crashdata 89 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100314-crashdata 61 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100315-crashdata 139 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100316-crashdata 122 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100317-crashdata 126 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100318-crashdata 138 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100319-crashdata 3052 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100320-crashdata 3043 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100321-crashdata 757 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100322-crashdata 365 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100323-crashdata 243 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100324-crashdata 194 mozilla::widget::WindowHook::Lookup.unsigned.int.
20100325-crashdata 186 mozilla::widget::WindowHook::Lookup.unsigned.int.

> Shouldn't taskbar tab previews be turned off in 3.6.3plugin1?

did anyone figure this out?  looks like we don't want that leak fix adding noise to the plugin beta.
(In reply to comment #45)
> Btw, makes probably sense to disable on 3.6, in another bug probably, changing
> the ifdef in browser.js should be enough.

This is a more complete fix. In the original bug to disable, it was easiest just to flip the default pref. That said, I think a number of users have flipped it on, despite the buggy behavior that forced the default so we are bound to disappoint them since they cannot work around it.

I'd rather we just fix this crash though. It should be easy to avoid the crash (simply check for a null nsWindow in ~TaskbarPreview), but no one understands *why* it's happening. Marco has been trying for the last several weeks to get the crash caught in vmware's record&replay feature so that we can determine the cause but it's been evading his best efforts.

As a compromise, we could apply a branch-only patch that avoids the crash while continuing to debug the cause on trunk.
I'm not sure what this bug has to do with the lorentz beta. Did something land on 1.9.2 that got merged into the beta? Or did I unintentionally land a leak fix in lorentz that I'm not aware of? As far as I know the pref is off by default for Lorentz, just as in 1.9.2 in general.
(In reply to comment #48)
> I'm not sure what this bug has to do with the lorentz beta. Did something land
> on 1.9.2 that got merged into the beta? Or did I unintentionally land a leak
> fix in lorentz that I'm not aware of? As far as I know the pref is off by
> default for Lorentz, just as in 1.9.2 in general.

I think it was seen in the beta per comment 41.  There's no evidence in this bug that the crash frequency is any higher in the lorentz beta than it is on 3.6 proper.
we have a possible candidate for the crash:
http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/WindowHook.cpp#123

Felipe has found a way to reproduce this easier in a debug build:
1. apply my leak patch (bug 545228)
2. open the browser, open about 12 empty tabs
3. open one or two new windows and close them
4. close a couple tabs
5. wait for CC and GC to run (about 1 minute)
6. close browser (X button)

Rob helped me debug this thing following my progress in Record and Replay, and this morning he had concerns about deleteIfEmpty. I've been able to confirm in a recording that indeed the method is removing a still valid hook with about 11 live monitors.

Now producing an opt trybuild with my leak patch and this crash fix and testing it, usually i've always been able to crash pretty easily with such a build.
(In reply to comment #50)
> 5. wait for CC and GC to run (about 1 minute)
FYI, CC runs a lot faster if user stops interacting with browser.
So, don't move mouse over the browser, or don't type anything.
so, for everybody willing to play with me today, this is the build: http://build.mozilla.org/tryserver-builds/mak77@bonardo.net-try-459e5427eac8
Obviously use Windows build on Windows 7.

It is practically similar to today's nightly (just some changeset later), i'd really appreciate help in trying to see if this still crashes for you.

Thanks.
Attached patch patch v1.0 (obsolete) — Splinter Review
So far i've been unable to crash with this patch and my leak fix.
Due to this error, when later we remove WM_DESTROY monitor we fail, and this can finally bring to trying to ask a window that does not exist anymore.

I'd say to fix this, check crash-stats in the next couple days, if no crashes are reported, push the leak fix again.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Attachment #435971 - Flags: review?(tellrob)
Attachment #435971 - Flags: review?(tellrob) → review+
Comment on attachment 435971 [details] [diff] [review]
patch v1.0

Nit: capital M in the comment for MessageData
Attached patch patch v1.1Splinter Review
Thanks.
Attachment #435971 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/c0bac5309fe2

resolving, till someone tells we are wrong.
I'll look closely at crash-stats in the next days, and push the leak fix once i get confirmation the crash has disappeared.
Thanks to everybody helped so far.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a4
Great catch Marco!  Hopefully this does it.
status1.9.2: --- → ?
So, looks like there is just one single report from yesterday's nightly (http://crash-stats.mozilla.com/report/index/77e76c63-ead6-4a1e-a097-9a1ce2100331), that means this has greatly reduced the hit ratio, but it is still happening.
There must be another problem in WindowHook management
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
to be noticed the above crash is different, it does not hit at destruction, but on SetVisible. This makes me think that now pushing the leak patch should not anymore have an interesting hit on crash ratio.
Thoughts?
I agree.  I also think that pushing the leak fix is a great way to test to see that this is in fact fixed.
So, after a brief talk with Rob, let's consider this fixed as the crash that happens at destruction, and i'll file a new bug for the different stack trace.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Summary: Taskbar tab preview crashes [@ mozilla::widget::WindowHook::Lookup(unsigned int)] in Firefox 3.6 → Taskbar tab preview crashes [@ mozilla::widget::WindowHook::Lookup(unsigned int)] in Firefox 3.6 at TabPreview destruction
filed Bug 556524 for the new stack.
blocking1.9.2: --- → ?
blocking2.0: ? → beta1+
Comment on attachment 435981 [details] [diff] [review]
patch v1.1

This is trivial enough to take on the branch and will let us take the leak fix as well.
Attachment #435981 - Flags: approval1.9.2.4?
blocking1.9.2: ? → needed
Comment on attachment 435981 [details] [diff] [review]
patch v1.1

a=beltzner for 1.9.2.4
Attachment #435981 - Flags: approval1.9.2.4? → approval1.9.2.4+
Al: to verify this you'll need to turn taskbar preview on. It's not a default option, so if verification is tricksy, I think you can skip it for now.
(In reply to comment #65)
> Al: to verify this you'll need to turn taskbar preview on. It's not a default
> option, so if verification is tricksy, I think you can skip it for now.

If he applies Marco's leak patch he should be able to hit it relatively consistently per comment 50, even without taskbar previews on.

I'll try to land this tonight or tomorrow morning.
Crash Signature: [@ mozilla::widget::WindowHook::Lookup(unsigned int)]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: