Closed Bug 545195 Opened 14 years ago Closed 14 years ago

topcrash [@ @0x0 | nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned int) ]

Categories

(Core :: General, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a2

People

(Reporter: dbaron, Assigned: bent.mozilla)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file)

It's hard for me to see what might have caused this.  My top guesses would probably be bug 527659 / bug 535649, bug 542318, or bug 517553.
All crashes are on Windows, none on Linux, none on Mac.

Bug 542318 is a Windows specific bug, that seems to make it more likely to be the cause.
If I download the yesterday's trunk nightly, click Help > Check for Updates, close the Update window, close Firefox, and then repeat this one more time, I get this crash on close.
(In reply to comment #3)
> If I download the yesterday's trunk nightly, click Help > Check for Updates,
> close the Update window, close Firefox, and then repeat this one more time, I
> get this crash on close.

Did you install any updates, or close before the update finished downloading?
I just opened and then closed it with the close button. I didn't download or install updates.
I can confirm Ria's info and have a reliable (at least for me) STR:

1. Download Minefield 20100209 zip distribuition and extract it: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010-02-09-07-mozilla-central/firefox-3.7a2pre.en-US.win32.zip
2. Create a new profile.
3. Launch Minefield build from Step 1 using the new profile from Step 2
4. Click Help --> Check for Updates...
5. Wait for the update to be listed then close the "Software Udate" dialog *by clicking "Ask Later"*
6. File --> Exit to close Minefield
7. Repeat steps 3 to 5 two more times.
8. Firefox crashes

http://crash-stats.mozilla.com/report/index/9c87296b-a12b-4f02-9477-789ab2100210

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 		@0x0 	
1 	xul.dll 	nsBaseAppShell::OnProcessNextEvent 	widget/src/xpwidgets/nsBaseAppShell.cpp:293
2 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:308
3 	nspr4.dll 	nspr4.dll@0xcccf 	
4 	xul.dll 	NS_ProcessPendingEvents_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:200
5 	xul.dll 	mozilla::ShutdownXPCOM 	xpcom/build/nsXPComInit.cpp:769
6 	xul.dll 	ScopedXPCOMStartup::~ScopedXPCOMStartup 	toolkit/xre/nsAppRunner.cpp:1042
7 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3521
8 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:120
9 	firefox.exe 	__tmainCRTStartup 	obj-firefox/memory/jemalloc/crtsrc/crtexe.c:591
10 	kernel32.dll 	BaseProcessStart
Looks like DoProcessNextNativeEvent triggers an event that releases some reference that also releases the nsBaseAppShell instance.  On next loop we crash.

I would not say bug 542318 is the culprit.  Just updating the tree and building to check the provided STR.
Error. STR Step 7 should be: Repeat steps 3 to 6 two more times.
Its probably best to start testing each hourly in the range to narrow it down.

Also, checking the stats, looks they are all definitely crashing at line 293 inside nsBaseAppShell::OnProcessNextEvent:

http://hg.mozilla.org/mozilla-central/annotate/19dbabe331ad/widget/src/xpwidgets/nsBaseAppShell.cpp#l293

--> keepGoing = DoProcessNextNativeEvent(PR_FALSE);


Also some reports confirming the same, shutting down was involved in the crash. 

FYI: The changeset for bug 517553 is quite large:
http://hg.mozilla.org/mozilla-central/rev/53308118abed 
checked in at: Sun Feb 07 10:52:43 2010 -0500 (at Sun Feb 07 10:52:43 2010 -0500)
(In reply to comment #9)
> Its probably best to start testing each hourly in the range to narrow it down.

Do we have hourly builds archived somewhere I don't know about?
Another possibility:  this code is quite close to the DLL blocklisting code, so it could be related to http://hg.mozilla.org/mozilla-central/rev/0ddf975663a0
I can reproduce it in a debugger, UI, thanks for STR.

However, it's not that simple to figure out at which moment 'this' pointer dies (it it is that).

If vksaver.dll is an installable plug-in, then I wouldn't say it's related to this bug.  I don't have this file on my machine.
Sorry, UI -> IU ;)
If you can reproduce it in a debugger (presumably in your own build?), can you bisect to figure out what changeset caused it?
I am using the nightly build.  I don't think I can reproduce in a debug build, as it seems to be somehow related to updates, but I can try to spoof the updater somehow, I was doing that ones in the past.

BTW it doesn't seems that nsBaseAppShell is released prior the crash, I am not getting it's destructor call before the crash.  On a normal non-crashing exit I do.
Using the hourly archive at:
http://hourly-archive.localgho.st/hourly-archive2/mozilla-central-win32/
(which is not reflected on the hourly-archive homepage!), I reduced the range to:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=be94483da3b4&tochange=b21188a34531
which makes no sense at all.
One way I can make it make sense:  if the cause was actually an earlier change, but there was a dependency bug, and the change to widget/public/Makefile.in caused it to actually get built for the first time.

It would be good if someone else could confirm that range, though.
[2010-02-10 16:39:13] <philor> along the lines of your dependency thought, another good question to ask would be whether your first bad build was a clobber, and if so, how many builds before that were not clobbers
[2010-02-10 16:58:09] <philor> dbaron: and if I'm right about telling the difference, yours was, and that would take your range back to 8c84037f3ad9
If I'm right about how to tell clobbers from depends ("does the compile step start off running configure or not?"), your first-bad was a clobber, and the two before (plus the one that was red, since it doesn't count) were not, which opens your range for something that didn't actually take effect until a clobber up to http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=872dcf34dab3&tochange=b21188a34531
Except that's forgetting one thing:  there are multiple build slaves and they clobber at different times.

I pulled the build from the earlier changeset that you identified as a clobber:  872dcf34dab3 -- and it had the crash too.  So the range we want is earlier.


Any chance you could figure out which builds prior to that one were also clobbers?  (Beware that there could be more than one build for some changesets.)
Prior to that build, I tested builds from 943afcbad1ac, 2caefaaa7d77, ed857569fabf, b234c7370793, 21d980d9b3a4, fc3d32011d31, and 62ade428367b, and none of them crashed.
Given where this is crashing, that range makes me want to stick to my theory that this is somehow a regression from http://hg.mozilla.org/mozilla-central/rev/0ddf975663a0 .
Blocks: 540692
I backed bug 540692 out:
http://hg.mozilla.org/mozilla-central/rev/83adba230467
http://hg.mozilla.org/mozilla-central/rev/096332cd6d39
to test the theory that it's the cause of this bug.

If this bug doesn't go away in tomorrow's nightly, we should reland it.
As usual, my first thought was crap: while a clobber will certainly run configure, a dep will too, if it happens to feel like it. However, it not happening to feel like it when it should have would be another way of expanding the range, so I went back and looked up which dep builds did and didn't.

The clobber information is coming out of the buildbot json that nthomas was kind enough to give me, so I think I'm not mixing up which are what (assuming as seems to be the case that the first number in the filename on localgho.st is the directory where it was on stage.m.o):

1265607751-20100207214231-b21188a34531 - dep, configure 
1265601266-20100207195426-be94483da3b4 - dep, no configure
1265600287-20100207193807-b76ad6cdd76e - dep, no configure
1265579422-20100207135022-872dcf34dab3 - forced clobber
1265562604-20100207091004-943afcbad1ac - dep, configure
1265562201-20100207090321-2caefaaa7d77 - dep, configure
1265558645-20100207080405-ed857569fabf - dep, no configure
1265549605-20100207053325-b234c7370793 - dep, no configure
1265549412-20100207053012-21d980d9b3a4 - dep, no configure
1265521871-20100206215111-fc3d32011d31 - dep, no configure
1265499628-20100206154028-62ade428367b - purged clobber
1265488028-20100206122708-16d4bba25a84 - dep, configure
1265485828-20100206115028-e544343970b4 - dep, configure
1265471973-20100206075933-bc6f2b598ff9 - forced clobber
1265463227-20100206053347-2e9d8868efc6 - forced clobber
1265461349-20100206050229-173248959f01 - nightly
1265454003-20100206030003-ada1992ccacb - forced clobber
1265453052-20100206024412-d72639947a60 - forced clobber
1265446534-20100206005534-2d873df39b6a - purged clobber 
1265428968-20100205200248-72d91445b838 - purged clobber

So for the it takes a clobber theory, not crashing in 62ade428367b puts the range at http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=62ade428367b&tochange=872dcf34dab3, and for the configure theory not crashing in 943afcbad1ac puts the range at the same "last build that doesn't crash to first that does" http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=943afcbad1ac&tochange=872dcf34dab3
Yes, Phil is correct. And excuse for being unclear in comment 3. I didn't realize that there was room for misunderstanding. I have been busy with it for at least 15 minutes until I realized where the crash came from (a crash in a new profile is rare that's why I went on searching) and spending 1 minute more for more detailed STR wouldn't have been too much effort to finish this properly. A possible next time just ask for detailed STR because if I know how to reproduce it no-one needs to do guesses, that's waste of time.
So it seems like backing out the vksaver.dll change didn't fix this.

My next most likely explanation is that something in the NSS upgrade has a bad interaction with our DLL blocklisting code.  Did NSS or PSM change anything about the loading or unloading of shared libraries?

That said, I think I have an idea about how to repro in a debug build, but that might not be practical over the Internet connection I have right now.
Bug 527659 asks to upgrade mozilla-central from beta to a release candidate of NSS 3.12.6

However, before I do that, I consider to back out my recent NSS/PSM landings, for a period of 2-3 hours.

I'd hope this give us a sufficient period of time to have hourly builds that allow us to confirm whether NSS is the culprit of this bug.
I have not yet done what I proposed in comment 27.
Philor proposed on IRC that a 2-3 hours period during european timezone might not give me what I want.

I decided to do reproduce myself using nightly builds. I used a fresh profile, but I disabled check-for-default-browser, and to minimize activity, I've changed the startup page to about:blank.

I've used the build mentioned in comment 6 (20100209) and the most recent I could find (20100211). I have two Desktop shortcuts to both builds, both using args -P thatprofile.

Here are my results:

- I can often reproduce using build 0209
- using 0209 I sometimes crash on exit, even without having checked for update
- even after running "check-updates" several times,
  I can never reproduce using build 0211

I conclude:
- either the bug has to do with status "updates are available"
- or the bug is gone in 0211
The number of crashes seems to have gone down dramatically in the builds of Feb. 10 and 11, and then disappeared in the builds of Feb. 12 (assuming there were any; not necessarily a reliable assumption).  The current histogram of this crash is:
  Feb. 8   703
  Feb. 9   618
  Feb. 10   30
  Feb. 11   16

It's not clear why this would have happened, though.
Also the following STR produce the bug:

- start a new profile, deny permanenty the "default browser" dialog
- once the startpage http://www.mozilla.org/projects/minefield/ has loaded, wait until the processor is quiet and close the browser (choose Quit in the "quit browser" dialog)
Repeat these steps. Between the second and the fourth close it will crash.

This is a typical "very clean profile" crash. My default profile does not crash. I discovered accidentally, that if I put this pref in the profile: user_pref("browser.bookmarks.autoExportHTML", true); ,  it will not crash.


There has been a temporary stop in these crashes. With the latest STR:
1eb1668ed9f6 crash
4ba8ccb0cadc no crash
Query: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1eb1668ed9f6&tochange=4ba8ccb0cadc
And also: with the latest hourly 92a84cecf4f1 and the STR from comment 31 it is still crashing.
The last build without crash (here): d43741a452c8
And the crash started again with: 92a84cecf4f1 
Query: 
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d43741a452c8&tochange=92a84cecf4f1
I found some tiny ranges in between! I hope that someone still understands it:

Not crashing: 46663814d764
Crashing: eafd8a60dfd8

Crashing: c492fb6295d1
Not crashing: 11006dbfb80e

I re-checked every non-crasher thoroughly a couple of times.
This appears to be back in today's nightly.
Would be cool to figure out which slave (vm) was making which build (non-crashing and crashing).  Maybe it is a hw failure or config flaw of a particular build machine...?
Tryserver builds don't crash, iac not the ones I tried.
but all 7 latest hourlies are crashing.
crashed, fresh install, into test profile with no addons except the ridiculous MS .net assistant.
bp-cb0811a9-6df3-482a-bb93-c7e712100217
I have the impression that it is the speed. If Firefox can close very fast, it crashes. If it has more tasks to do while closing, it does not crash.
If I open and close, open and close, open and close, in quick succession, it does not crash, because the processor is very busy with more tasks. Only if the processor has not much to do at the moment, it crashes.
Can't reproduce it anymore with the latest nightly on Windows Vista.
Crash still exists.  Seems this crash is related to more than just the updater.  Just crashed today simply restarting.  I had installed an extension about an hour before and continued surfing.  When I finally restarted, it crashed.

http://crash-stats.mozilla.com/report/index/8d8b4b55-4086-48b8-a15f-5e82b2100222
crash today too.  bp-cb0811a9-6df3-482a-bb93-c7e712100217
fwiw, I have FF updates disabled, but not add-ons
ignore comment 44. crash reporter helper tricked me with an old crash. my last one was rather, MirrorWrappedNativeParent bp-d0c54c42-53b4-4425-8f7b-cf9e42100222
If it helps at all, this appears to be only a shutdown crash. I'm trying to load a minidump to see if I get a better stack (we're skipping one frame).
bent, can you record with the steps in comment #31 and such?
Assignee: nobody → bent.mozilla
Attached patch PatchSplinter Review
Oy. Score another for record and replay!
Attachment #428587 - Flags: superreview?(jonas)
Attachment #428587 - Flags: review?(jonas)
Comment on attachment 428587 [details] [diff] [review]
Patch

Sorry!
Attachment #428587 - Flags: superreview?(jonas)
Attachment #428587 - Flags: superreview+
Attachment #428587 - Flags: review?(jonas)
Attachment #428587 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/e2fe146316cf
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a2
Crash Signature: [@ @0x0 | nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned int) ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: