Closed Bug 594608 Opened 14 years ago Closed 14 years ago

Firefox 4.0b4/5 Crash Spike [@ KiUserCallbackDispatcher ]

Categories

(Core :: Widget, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- -

People

(Reporter: chofmann, Unassigned)

Details

(Keywords: crash)

Crash Data

we had an explosion of crashes with this signature going from 12-39 crashes per day to 1438 crashes on 09/07 with the big ramp at possible happening around 11am pacific.

crash
count  day/time

   5 2010090707
   3 2010090708
   1 2010090709
   5 2010090710
  42 2010090711
  84 2010090712
  76 2010090713
  74 2010090714
 104 2010090715
  80 2010090716
 167 2010090717
 175 2010090718
 170 2010090719
 116 2010090720
  99 2010090721
  91 2010090722
 128 2010090723

the crashes may have been there in previous releases and became active as the upgrade from beta 4 to 5 started and maybe when some test pilot b4/b5 activity kicked in.

checking --- KiUserCallbackDispatcher 20100907-crashdata.csv
found in: 4.0b4 4.0b5 4.0b3 4.0b2 3.6.8 3.6 3.6.6 3.6.4 3.6.3
release total-crashes
              KiUserCallbackDispatcher crashes
                         pct.
all     307049  1438    0.00468329
4.0b4   24675   1172    0.0474975
4.0b5   3345    105     0.0313901
4.0b3   1493    68      0.0455459
4.0b2   1175    65      0.0553191
3.6.8   189674  23      0.000121261
3.6     6069    2       0.000329544
3.6.6   7781    1       0.000128518
3.6.4   2999    1       0.000333444
3.6.3   11700   1       8.54701e-05

mostly happening on windows xp

KiUserCallbackDispatcherTotal 1436
Win5.1  0.94
Win6.0  0.03
Win6.1  0.03


The signature appears to be several different stacks and might lead to spin off of many bugs.

Here is the first one with possible test case from a user..

http://crash-stats.mozilla.com/report/index/94db5063-308c-478f-ab04-880ad2100908
 
everytime i try to do the survey with beta 5 it crashes


Frame  	Module  	Signature [Expand]  	Source
0 	ntdll.dll 	KiUserCallbackDispatcher 	
1 	mozsqlite3.dll 	pcache1Destroy 	db/sqlite3/src/sqlite3.c:33537
2 	msctfime.ime 	ImeSelect 	
3 	user32.dll 	TestWindowProcess 	
4 	user32.dll 	NtUserPeekMessage 	
5 	user32.dll 	_PeekMessage 	
6 	user32.dll 	GetShellWindow 	
7 		@0x5d021f 	
8 	nspr4.dll 	PR_IntervalToMilliseconds 	nsprpub/pr/src/misc/prinrval.c:136
9 	xul.dll 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:300
10 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:517
11 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
12 	xul.dll 	xul.dll@0xb94e03 	
13 	xul.dll 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
14 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
15 	xul.dll 	_SEH_epilog4 	
16 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
17 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:175
18 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:243

the most frequent of the crashes for 09/07 appears to be a 4.0b4 crash with a stack that looks like

http://crash-stats.mozilla.com/report/index/225e2c32-e4a8-4e2a-9559-401542100907


Frame  	Module  	Signature [Expand]  	Source
0 	ntdll.dll 	KiUserCallbackDispatcher 	
1 	xul.dll 	nsWindow::DealWithPopups 	widget/src/windows/nsWindow.cpp:8084
2 	user32.dll 	NtUserPeekMessage 	
3 	xul.dll 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:294
4 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:517
5 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:118
6 	xul.dll 	xul.dll@0xb7a45b 	
7 	xul.dll 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
8 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
9 	xul.dll 	_SEH_epilog4 	
10 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
11 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:175
12 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:243


and there is another user with a comment about "clicked on 'test pilot' again."

http://crash-stats.mozilla.com/report/index/ff375646-3a45-43be-a97f-cac4f2100908

Frame  	Module  	Signature [Expand]  	Source
0 	ntdll.dll 	KiUserCallbackDispatcher 	
1 	xul.dll 	nsWindow::DealWithPopups 	widget/src/windows/nsWindow.cpp:8170
2 	user32.dll 	NtUserPeekMessage 	
3 	xul.dll 	nsAppShell::ProcessNextNativeEvent 	widget/src/windows/nsAppShell.cpp:279
4 	xul.dll 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:294
5 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:517
6 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/threads/combined/prulock.c:404
7 	xul.dll 	nsTArray<nsTimerImpl*>::RemoveElement<nsTimerImpl*,nsDefaultComparator<nsTimerImpl*,nsTimerImpl*> > 	obj-firefox/dist/include/nsTArray.h:739
8 	xul.dll 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
9 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
10 	xul.dll 	_SEH_epilog4 	
11 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
12 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:175
13 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:243

we may need some skip list magic to help sort these out.
Keywords: crash
For the Firefox 3.6.x users do they have the Test Pilot Add-On installed?
there was a study launched at 11:00a pacific yesterday.  jono has turned it off at 17:39 to new users and we can see if that has an effect.
In addon info I'm seeing a lot of version info that looks like

                                    Version  	Current?
 testpilot@labs.mozilla.com      1.0.2 (or 3)       1.0.1
(In reply to comment #2)
> For the Firefox 3.6.x users do they have the Test Pilot Add-On installed?

checking a quick sample I don't see any test pilot installs on the 3.6.x crashes.  I think that is a different (pre-existing & low volume) crash  with a different stack that doesn't run though nsWindow::DealWithPopups
some additional comments that we can try and make a test case out of:

This crash happened right after the Test Pilot pop-up message which said a test is about to begin. This also happened in the earlier Beta version.

Test Pilot is probably at fault. I can click "More Info" just fine, but Firefox crashes as soons as the "Loading..." text shows up in the Test pilot window. http://i52.tinypic.com/4lloa1.png

I clicked learn more when test pilot screen

THIS time, I clicked on the 'x' to close the test pilot notification. 

Clicked on the pilot study - slow to load - clicked on another application while waiting (not the main Firefox application) -> crash

3rd crash - clicked 'more info' on the test pilot screen and this time waited without any other activity -> firfox crashed

beta 5 just updated crashed when pilot study screen came up and clicked on it

this user hit the crash 6 times in a row
http://crash-stats.mozilla.com/report/index/8add46bd-5e1d-4495-83ca-b87232100907
http://crash-stats.mozilla.com/report/index/d39476bf-6411-43ad-aee9-99e4b2100908
(In reply to comment #3)
> there was a study launched at 11:00a pacific yesterday.  jono has turned it off
> at 17:39 to new users and we can see if that has an effect.

crash volume when from about 60-70 per hour to 6 per hour since turning off the test pilot study. 

It looks like we are back to just seeing the older crash with the stack
0  	ntdll.dll  	KiUserCallbackDispatcher  	
1 	MSCTF.dll 	SysShellProc 	
2 	user32.dll 	NtUserPeekMessage
...

and the higher volume stack

Frame      Module      Signature [Expand]      Source
0     ntdll.dll     KiUserCallbackDispatcher     
1     xul.dll     nsWindow::DealWithPopups 

has disappeared.  probably should check back in a couple of hours to confirm.
Severity: normal → critical
blocking2.0: --- → beta6+
For testing purposes, you can re-enable the About: Firefox study on an individual machine to see if you can get it to cause the crash again.  Here's how:

1. Open the url chrome://testpilot/content/debug.html
2. From the menu in the upper left, pick "About Firefox"
(If the menu is empty, wait a few seconds and then reload the page)
3. Click the "Reset Task" button
4. Click the "Reload All Experiments" button
5. Click the "Notify Me" button.

You should now see the notification for the newly restarted About: Firefox study.

If you have the menu in the upper left but "About Firefox" is not in the menu (this may happen in a new profile), you need to subscribe to the development/testing study channel.  Where it says "Index file", choose "index-dev.json" from the dropdown menu, then click "Reload all experiments", then "Notify me".
Jono: Regarding Step #2, I don't see a study called "About Firefox" in the list. I am using a Mac and I reloaded the page. The only study that seems close is "How do you feel about your FF browser."


(In reply to comment #8)
> For testing purposes, you can re-enable the About: Firefox study on an
> individual machine to see if you can get it to cause the crash again.  Here's
> how:
Marcia, did you set index file to "index-dev.json" on the debug page?  You might need to make that change and then restart.
Unfortunately I don't see any place to select the index-dev.json file on the debug page. I am using an existing Mac Beta 5 profile.

(In reply to comment #10)
> Marcia, did you set index file to "index-dev.json" on the debug page?  You
> might need to make that change and then restart.
Can someone catch this in a debug build, in the MSVC debugger? The stack traces from crash-stats look unreliable/incomplete.
Moving this to betaN+; Jono, can you hop upstairs and see if someone can help you catch this in a debugger while Roc's around?
blocking2.0: beta6+ → betaN+
We cannot reproduce the bug anywhere.

Jono pointed out that the study uses JS-Ctypes to poke around USER32 for graphics card info. It is certainly possible that this triggered crashes in some way. We have not seen crash spikes on other studies. We are not going to rerun this study.

I suggest we assume this was a bug in the study itself, and move on.
blocking2.0: betaN+ → ?
I second Roc's suggestion.  From what Choffman says, the crash is happening at exactly the time when Test Pilot is trying to read the graphics card info out of User32, so that's almost certainly the culprit - it was study-specific code, not anything in Firefox or in the Feedback extension.
We are going to try again to gather graphics card info again to ship with crash reports over in bug 586048.  I put a comment there to watch for the possibility of tickling the same problems that test pilot might have run on to.
Status: NEW → RESOLVED
blocking2.0: ? → -
Closed: 14 years ago
Resolution: --- → WONTFIX
Crash Signature: [@ KiUserCallbackDispatcher ]
You need to log in before you can comment on or make changes to this bug.