Closed Bug 594608 Opened 14 years ago Closed 14 years ago

Firefox 4.0b4/5 Crash Spike [@ KiUserCallbackDispatcher ]

Categories

(Core :: Widget, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- -

People

(Reporter: chofmann, Unassigned)

Details

(Keywords: crash)

Crash Data

we had an explosion of crashes with this signature going from 12-39 crashes per day to 1438 crashes on 09/07 with the big ramp at possible happening around 11am pacific. crash count day/time 5 2010090707 3 2010090708 1 2010090709 5 2010090710 42 2010090711 84 2010090712 76 2010090713 74 2010090714 104 2010090715 80 2010090716 167 2010090717 175 2010090718 170 2010090719 116 2010090720 99 2010090721 91 2010090722 128 2010090723 the crashes may have been there in previous releases and became active as the upgrade from beta 4 to 5 started and maybe when some test pilot b4/b5 activity kicked in. checking --- KiUserCallbackDispatcher 20100907-crashdata.csv found in: 4.0b4 4.0b5 4.0b3 4.0b2 3.6.8 3.6 3.6.6 3.6.4 3.6.3 release total-crashes KiUserCallbackDispatcher crashes pct. all 307049 1438 0.00468329 4.0b4 24675 1172 0.0474975 4.0b5 3345 105 0.0313901 4.0b3 1493 68 0.0455459 4.0b2 1175 65 0.0553191 3.6.8 189674 23 0.000121261 3.6 6069 2 0.000329544 3.6.6 7781 1 0.000128518 3.6.4 2999 1 0.000333444 3.6.3 11700 1 8.54701e-05 mostly happening on windows xp KiUserCallbackDispatcherTotal 1436 Win5.1 0.94 Win6.0 0.03 Win6.1 0.03 The signature appears to be several different stacks and might lead to spin off of many bugs. Here is the first one with possible test case from a user.. http://crash-stats.mozilla.com/report/index/94db5063-308c-478f-ab04-880ad2100908 everytime i try to do the survey with beta 5 it crashes Frame Module Signature [Expand] Source 0 ntdll.dll KiUserCallbackDispatcher 1 mozsqlite3.dll pcache1Destroy db/sqlite3/src/sqlite3.c:33537 2 msctfime.ime ImeSelect 3 user32.dll TestWindowProcess 4 user32.dll NtUserPeekMessage 5 user32.dll _PeekMessage 6 user32.dll GetShellWindow 7 @0x5d021f 8 nspr4.dll PR_IntervalToMilliseconds nsprpub/pr/src/misc/prinrval.c:136 9 xul.dll nsBaseAppShell::OnProcessNextEvent widget/src/xpwidgets/nsBaseAppShell.cpp:300 10 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:517 11 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110 12 xul.dll xul.dll@0xb94e03 13 xul.dll MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:219 14 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:202 15 xul.dll _SEH_epilog4 16 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:176 17 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:175 18 xul.dll nsAppShell::Run widget/src/windows/nsAppShell.cpp:243 the most frequent of the crashes for 09/07 appears to be a 4.0b4 crash with a stack that looks like http://crash-stats.mozilla.com/report/index/225e2c32-e4a8-4e2a-9559-401542100907 Frame Module Signature [Expand] Source 0 ntdll.dll KiUserCallbackDispatcher 1 xul.dll nsWindow::DealWithPopups widget/src/windows/nsWindow.cpp:8084 2 user32.dll NtUserPeekMessage 3 xul.dll nsBaseAppShell::OnProcessNextEvent widget/src/xpwidgets/nsBaseAppShell.cpp:294 4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:517 5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:118 6 xul.dll xul.dll@0xb7a45b 7 xul.dll MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:219 8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:202 9 xul.dll _SEH_epilog4 10 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:176 11 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:175 12 xul.dll nsAppShell::Run widget/src/windows/nsAppShell.cpp:243 and there is another user with a comment about "clicked on 'test pilot' again." http://crash-stats.mozilla.com/report/index/ff375646-3a45-43be-a97f-cac4f2100908 Frame Module Signature [Expand] Source 0 ntdll.dll KiUserCallbackDispatcher 1 xul.dll nsWindow::DealWithPopups widget/src/windows/nsWindow.cpp:8170 2 user32.dll NtUserPeekMessage 3 xul.dll nsAppShell::ProcessNextNativeEvent widget/src/windows/nsAppShell.cpp:279 4 xul.dll nsBaseAppShell::OnProcessNextEvent widget/src/xpwidgets/nsBaseAppShell.cpp:294 5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:517 6 nspr4.dll _MD_CURRENT_THREAD nsprpub/pr/src/threads/combined/prulock.c:404 7 xul.dll nsTArray<nsTimerImpl*>::RemoveElement<nsTimerImpl*,nsDefaultComparator<nsTimerImpl*,nsTimerImpl*> > obj-firefox/dist/include/nsTArray.h:739 8 xul.dll MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:219 9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:202 10 xul.dll _SEH_epilog4 11 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:176 12 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:175 13 xul.dll nsAppShell::Run widget/src/windows/nsAppShell.cpp:243 we may need some skip list magic to help sort these out.
Keywords: crash
For the Firefox 3.6.x users do they have the Test Pilot Add-On installed?
there was a study launched at 11:00a pacific yesterday. jono has turned it off at 17:39 to new users and we can see if that has an effect.
In addon info I'm seeing a lot of version info that looks like Version Current? testpilot@labs.mozilla.com 1.0.2 (or 3) 1.0.1
(In reply to comment #2) > For the Firefox 3.6.x users do they have the Test Pilot Add-On installed? checking a quick sample I don't see any test pilot installs on the 3.6.x crashes. I think that is a different (pre-existing & low volume) crash with a different stack that doesn't run though nsWindow::DealWithPopups
some additional comments that we can try and make a test case out of: This crash happened right after the Test Pilot pop-up message which said a test is about to begin. This also happened in the earlier Beta version. Test Pilot is probably at fault. I can click "More Info" just fine, but Firefox crashes as soons as the "Loading..." text shows up in the Test pilot window. http://i52.tinypic.com/4lloa1.png I clicked learn more when test pilot screen THIS time, I clicked on the 'x' to close the test pilot notification. Clicked on the pilot study - slow to load - clicked on another application while waiting (not the main Firefox application) -> crash 3rd crash - clicked 'more info' on the test pilot screen and this time waited without any other activity -> firfox crashed beta 5 just updated crashed when pilot study screen came up and clicked on it this user hit the crash 6 times in a row http://crash-stats.mozilla.com/report/index/8add46bd-5e1d-4495-83ca-b87232100907 http://crash-stats.mozilla.com/report/index/d39476bf-6411-43ad-aee9-99e4b2100908
(In reply to comment #3) > there was a study launched at 11:00a pacific yesterday. jono has turned it off > at 17:39 to new users and we can see if that has an effect. crash volume when from about 60-70 per hour to 6 per hour since turning off the test pilot study. It looks like we are back to just seeing the older crash with the stack 0 ntdll.dll KiUserCallbackDispatcher 1 MSCTF.dll SysShellProc 2 user32.dll NtUserPeekMessage ... and the higher volume stack Frame Module Signature [Expand] Source 0 ntdll.dll KiUserCallbackDispatcher 1 xul.dll nsWindow::DealWithPopups has disappeared. probably should check back in a couple of hours to confirm.
Severity: normal → critical
blocking2.0: --- → beta6+
For testing purposes, you can re-enable the About: Firefox study on an individual machine to see if you can get it to cause the crash again. Here's how: 1. Open the url chrome://testpilot/content/debug.html 2. From the menu in the upper left, pick "About Firefox" (If the menu is empty, wait a few seconds and then reload the page) 3. Click the "Reset Task" button 4. Click the "Reload All Experiments" button 5. Click the "Notify Me" button. You should now see the notification for the newly restarted About: Firefox study. If you have the menu in the upper left but "About Firefox" is not in the menu (this may happen in a new profile), you need to subscribe to the development/testing study channel. Where it says "Index file", choose "index-dev.json" from the dropdown menu, then click "Reload all experiments", then "Notify me".
Jono: Regarding Step #2, I don't see a study called "About Firefox" in the list. I am using a Mac and I reloaded the page. The only study that seems close is "How do you feel about your FF browser." (In reply to comment #8) > For testing purposes, you can re-enable the About: Firefox study on an > individual machine to see if you can get it to cause the crash again. Here's > how:
Marcia, did you set index file to "index-dev.json" on the debug page? You might need to make that change and then restart.
Unfortunately I don't see any place to select the index-dev.json file on the debug page. I am using an existing Mac Beta 5 profile. (In reply to comment #10) > Marcia, did you set index file to "index-dev.json" on the debug page? You > might need to make that change and then restart.
Can someone catch this in a debug build, in the MSVC debugger? The stack traces from crash-stats look unreliable/incomplete.
Moving this to betaN+; Jono, can you hop upstairs and see if someone can help you catch this in a debugger while Roc's around?
blocking2.0: beta6+ → betaN+
We cannot reproduce the bug anywhere. Jono pointed out that the study uses JS-Ctypes to poke around USER32 for graphics card info. It is certainly possible that this triggered crashes in some way. We have not seen crash spikes on other studies. We are not going to rerun this study. I suggest we assume this was a bug in the study itself, and move on.
blocking2.0: betaN+ → ?
I second Roc's suggestion. From what Choffman says, the crash is happening at exactly the time when Test Pilot is trying to read the graphics card info out of User32, so that's almost certainly the culprit - it was study-specific code, not anything in Firefox or in the Feedback extension.
We are going to try again to gather graphics card info again to ship with crash reports over in bug 586048. I put a comment there to watch for the possibility of tickling the same problems that test pilot might have run on to.
Status: NEW → RESOLVED
blocking2.0: ? → -
Closed: 14 years ago
Resolution: --- → WONTFIX
Crash Signature: [@ KiUserCallbackDispatcher ]
You need to log in before you can comment on or make changes to this bug.