Closed Bug 347899 Opened 18 years ago Closed 18 years ago

[FIX]crash when restarting for update [@ nsCSSStyleSheet::SetComplete]

Categories

(Core :: CSS Parsing and Computation, defect, P1)

1.8 Branch
x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla1.8.1beta2

People

(Reporter: shaver, Assigned: bzbarsky)

References

Details

(Keywords: crash, verified1.8.0.7, verified1.8.1)

Crash Data

Attachments

(4 files)

Stack looks like memory corruption, a wise man told me:

Thread 0 Crashed:
0   org.mozilla.firefox 	0x0014781b nsCSSStyleSheet::SetComplete() + 55
1   org.mozilla.firefox 	0x00104695 CSSLoaderImpl::SheetComplete(SheetLoadData*, int) + 131
2   org.mozilla.firefox 	0x00105943 SheetLoadData::OnStreamComplete(nsIUnicharStreamLoader*, nsISupports*, unsigned, nsIUnicharInputStream*) + 493
3   org.mozilla.firefox 	0x0034d4ee nsUnicharStreamLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned) + 378
4   org.mozilla.firefox 	0x0038c251 nsJARChannel::OnStopRequest(nsIRequest*, nsISupports*, unsigned) + 67
5   org.mozilla.firefox 	0x00346cec nsInputStreamPump::OnStateStop() + 88
6   org.mozilla.firefox 	0x00347271 nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) + 133
7   libxpcom_core.dylib 	0x01872b49 nsInputStreamReadyEvent::EventHandler(PLEvent*) + 45
8   libxpcom_core.dylib 	0x01847579 PL_HandleEvent + 21
9   libxpcom_core.dylib 	0x01847832 PL_ProcessPendingEvents + 103
10  libxpcom_core.dylib 	0x01848950 nsEventQueueImpl::ProcessPendingEvents() + 54
11  libxpcom_core.dylib 	0x0180e4b9 NS_ShutdownXPCOM_P + 323
12  org.mozilla.firefox 	0x00002668 ScopedXPCOMStartup::~ScopedXPCOMStartup [in-charge]() + 46
[...]

Recent-ish Bon Echo nightly, don't know how to find the specific version that I was coming from, now that I have updated.
So either memory is completely hosed or we're somehow ending up with an mDocument pointing to a deleted document, I guess...  Not really sure how either one would happen.
I've seen this consistently through several 1.8branch updates as well. I figured it was just my profile or hoochy extensions or something; I should have known better.
The latter option, almost certainly. We're in the middle of shutting down XPCOM and processing final events.
Severity: normal → critical
Keywords: crash
Summary: crash when restarting for update → crash when restarting for update [@ nsCSSStyleSheet::SetComplete]
Is this covered by any of our common smoketests?
Flags: blocking1.8.1?
Yeah, beltzner's been seeing this as well, and I assure you that I have a strict no-hoochy-extensions policy.

Wonder how to find out what the document is that we think we're crashing on.  Maybe there's some logging I can turn on that will give me document object addresses as they're created and destroyed?
Severity: critical → normal
~nsDocument has the following code:

719   // Let the stylesheets know we're going away
720   indx = mStyleSheets.Count();
721   while (--indx >= 0) {
722     mStyleSheets[indx]->SetOwningDocument(nsnull);
723   }
724   indx = mCatalogSheets.Count();
725   while (--indx >= 0) {
726     mCatalogSheets[indx]->SetOwningDocument(nsnull);
727   }
728   if (mAttrStyleSheet)
729     mAttrStyleSheet->SetOwningDocument(nsnull);
730   if (mStyleAttrStyleSheet)
731     mStyleAttrStyleSheet->SetOwningDocument(nsnull);

So the only way to have a bogus document here is to have a stylesheet that the document doesn't know about...

If this is reproducible, what's the URI of the stylesheet?
refcount logging can do that, sure XPCOM_MEM_LOG_CLASSES=nsDocument... unfortunately it's dog-slow
XPCOM_MEM_LEAK_LOG=leak.log XPCOM_MEM_LOG_CLASSES=nsDocument XPCOM_MEM_ALLOC_LOG=alloc.log

shouldn't be as slow as logging refcounts (and then look at alloc.log).
Update needs to work without crashing. My personal "rough regression window" is "within the last week".
Flags: blocking1.8.1? → blocking1.8.1+
Keywords: qawanted
Target Milestone: --- → mozilla1.8.1beta2
Severity: normal → critical
An exact regression window might be helpful for figuring this out.
regression window:  I saw this crash when updating from 2006080506 to 2006080606, but not from 2006080408 to 2006080506, on my intel mac.  I've seen it each subsequent day, and only on update, and not when I try to reproduce it :-(

So my regression window is 2006080506-2006080606.

Attached file Apple crash report
Attaching the Apple Report in case it is useful. I just crashed on my PPC machine updating from the 7-26 build, which is one of the most recent crashes that showed up in the list of talkback reports. I will nail a definite date.
I have seen it not only on update, but when I restarted Firefox after an extension update. See http://talkback-public.mozilla.org/search/start.jsp?search=2&type=iid&id=21917493 for a crash that was generated from an extension install.
(In reply to comment #11)
> regression window:  I saw this crash when updating from 2006080506 to
> 2006080606, but not from 2006080408 to 2006080506, on my intel mac.  I've seen
> it each subsequent day, and only on update, and not when I try to reproduce it
> :-(
> 
> So my regression window is 2006080506-2006080606.

Is the crash the old build shutting down, or the new build shutting down the extra run for registration?
new build
And that's a regression window on the 1.8 branch, or on the trunk?
(In reply to comment #16)
> And that's a regression window on the 1.8 branch, or on the trunk?
> 

1.8 branch (I didn't specify since we're all talking about bon echo nightly updates.  We *are* all talking about 1.8 branch nightly updates, right?)
when did addons.mozilla.org get updated?
(In reply to comment #19)
> when did addons.mozilla.org get updated?

Saturday, from 8-12 PDT, according to http://weblogs.mozillazine.org/it/.
bug 347831 and bug 347921 look suspicious - ssl errors at addons
Happened again here, again after a restart for an update (Marcia's seeing it on restart for extension install makes me suspect the "restart" method).
Here's my sessionstore.js file, in case that's informative.
Dietrich should really be cc'd on this bug. He may even be interested in taking it!
Status: NEW → ASSIGNED
I can reproduce this every single time on the first startup after a rebuild, if I start from the command line. The stack is the same.

---------

...
*** Registering nsSoftwareUpdate components (all right -- a generic module!)
/Users/sayrer/firefox/mozilla/fb-debug/dist/MinefieldDebug.app/Contents/MacOS/components/nsUrlClassifierLib.js mtime changed, invalidating FastLoad file
*** Registering Apprunner components (all right -- a generic module!)
WARNING: NS_ENSURE_TRUE(NS_SUCCEEDED(rv)) failed: file /Users/sayrer/firefox/mozilla/chrome/src/nsChromeRegistry.cpp, line 1258
++WEBSHELL 0x16219900 == 1
/Users/sayrer/firefox/mozilla/fb-debug/dist/MinefieldDebug.app/Contents/MacOS/chrome/classic.jar mtime changed, invalidating FastLoad file
WARNING: NS_ENSURE_TRUE(cv) failed: file /Users/sayrer/firefox/mozilla/xpfe/appshell/src/nsXULWindow.cpp, line 1544
++DOMWINDOW == 1
++DOMWINDOW == 2
###!!! ASSERTION: not an nsIRDFRemoteDataSource: 'remote != nsnull', file /Users/sayrer/firefox/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 345
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /Users/sayrer/firefox/mozilla/xpcom/base/nsExceptionService.cpp, line 191
fb-debug/dist/MinefieldDebug.app/Contents/MacOS/run-mozilla.sh: line 451: 15682 Bus error               "$prog" ${1+"$@"}

Robert, can you answer the question in comment 6?
*** Bug 348458 has been marked as a duplicate of this bug. ***
(In reply to comment #26)
> Robert, can you answer the question in comment 6?
> 

Hmm, I'm having trouble getting to happen now :/ I did get it in a debugger a couple times, and I believe resource://gre/res/forms.css is where the crash occurs. At least, I would see nativescrollbars.css come through, and get a SIGBUS on the next entry into SetComplete. Here's what I see with some printfs in a working load:

---

*************** mSheetURI: resource://gre/res/html.css
*************** mSheetURI: chrome://global/content/xul.css
*************** mSheetURI: resource://gre/res/quirk.css
*************** mSheetURI: resource://gre/res/ua.css
++DOMWINDOW == 2
*************** mSheetURI: chrome://global/skin/nativescrollbars.css
*************** mSheetURI: resource://gre/res/forms.css
*************** mSheetURI: chrome://mozapps/content/plugins/missingPluginBinding.css
###!!! ASSERTION: not an nsIRDFRemoteDataSource: 'remote != nsnull', file /Users/sayrer/firefox/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 346
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /Users/sayrer/firefox/mozilla/xpcom/base/nsExceptionService.cpp, line 191

*************** mSheetURI: chrome://browser/content/places/places.css

###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744
###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744
###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744

[... many lines ...]

###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744
###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744
###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp, line 744

--WEBSHELL 0x16218640 == 0
WARNING: Leaking the RDF Service.: file /Users/sayrer/firefox/mozilla/rdf/build/nsRDFModule.cpp, line 236
nsMenuItemX 0
nsMenuX 0
nsStringStats
 => mAllocCount:           6792
 => mReallocCount:           30
 => mFreeCount:            6407  --  LEAKED 385 !!!
 => mShareCount:           1979
 => mAdoptCount:            894
 => mAdoptFreeCount:        892  --  LEAKED 2 !!!
~/firefox/mozilla> 

and then  restart.
> ###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file
> /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp,
> line 744

Please do make sure a bug is filed on that...  ;)

In any case, if someone can catch this in a debugger, please look up the sheet URI.  You can get if from SetComplete, since it's a member of the sheet.

resource://gre/res/forms.css shouldn't have a non-null mDocument.  Furthermore, it's not loaded via the unichar stream loader (see stack in comment 0).  So I doubt you're crashing on forms.css.

(In reply to comment #29)
> > ###!!! ASSERTION: tag in XUL doc epilog: 'mState != eInEpilog', file
> > /Users/sayrer/firefox/mozilla/content/xul/document/src/nsXULContentSink.cpp,
> > line 744
> 
> Please do make sure a bug is filed on that...  ;)

That's bug 348075.
No crash at this morning's update (->2006081503 mac).  I looked at checkins . . . Bug 348360 might have had an effect?
I'd been seeing this crash after updating each day, too, (for about a week) and like davel, I didn't see it when I updated this morning.
Adam G or DaveL - can you take yesterday's nightlies and see if you can debug this (sounds like it happens with extension install restart) and answer Bz's question.

I can't seem to catch this in a debugger. For anyone that's trying to reproduce, I've found that installing an extension and using the Restart Now button will reliably crash on restart. (This is marginally more convenient than doing an update.)
Can't you just attach the debugger after we crash?  Or do we only do the sleep thing on Linux?
We don't do the sleep() thing on OS X, but crash reporter should be able to let you attach gdb.

http://developer.apple.com/technotes/tn2004/tn2123.html#SECCRASHREPORTERPREFS

If that isn't working for whatever reason, check the ulimit settings for the shell you start from, and see if you can get a core dump to analyze post-mortem?

I have not yet been successful in causing the crash to happen again, using the 20060810 build as a base and the partial update to 20060811 as the update.  I suspect my test profile/extensions/setup is missing something required to crash, or contains something that masks the crash.

Does anyone have a recipe (or can someone suggest a recipe) to cause the browser to do the registration thing?  things like create .qutoreg, remove compreg.dat and xpti.dat, ...
Did Adam's install-extension-and-restart-now approach in comment #34 not work?  (I suspect you can do that with a profiles.ini tweak too, but I'm not sure what the tweak in question is.)
adam's recipe did not work for me.
If nobody's seen this crash since that checkin, are we confident enough to resolve this as fixed by that patch?
I'm not, since ispiked applied that patch to an Aug 10 build and is still seeing this.  We're going to try debugging some more when he gets back from lunch.
I'm blind...
Keywords: qawanted
Comment on attachment 234078 [details] [diff] [review]
Patch that should fix this

ispiked says this does fix it.

David, the issue here is that we were crashing on intl.css, imported from global.css, imported from something (profileSelection.css?  Not sure).  In any case, when we tear down the document without this patch, it notifies its child sheet, which notifies its kids, which don't notify their kids.  We need to notify all the way down...
Attachment #234078 - Flags: superreview?(dbaron)
Attachment #234078 - Flags: review?(dbaron)
We should take this on all branches and trunk...
Assignee: nobody → bzbarsky
Status: ASSIGNED → NEW
Component: General → Style System (CSS)
Flags: blocking1.8.0.7?
Priority: -- → P1
Summary: crash when restarting for update [@ nsCSSStyleSheet::SetComplete] → [FIX]crash when restarting for update [@ nsCSSStyleSheet::SetComplete]
here's a really dumb question - are there ever any cycles in the nsCSSStyleSheet graph?
Comment on attachment 234078 [details] [diff] [review]
Patch that should fix this

Import loop detection happens well before we actually construct these objects, right?  (Might be worth double-checking on Hixie's infinite @import tests in the importtest.)

Anyway, r+sr=dbaron

(Maybe it'd be nice to avoid stack overflow on deep style sheet trees at some point.)
Attachment #234078 - Flags: superreview?(dbaron)
Attachment #234078 - Flags: superreview+
Attachment #234078 - Flags: review?(dbaron)
Attachment #234078 - Flags: review+
Comment on attachment 234078 [details] [diff] [review]
Patch that should fix this

> are there ever any cycles in the nsCSSStyleSheet graph?

Only if something is very majorly screwed up.  There shouldn't be any.

> Import loop detection happens well before we actually construct these objects,

Yes.  We do loop detection before even creating the child nsCSSStyleSheet object in LoadChildSheet.  I did double-check with hixie's @import tests, and we pass as much as we used to.

Requesting branch approvals.  This is a _really_ safe patch that prevents a deleted document object reference.  In practice, I suspect a malicious website could trigger this crash at will just by controlling stylesheet loading speed...
Attachment #234078 - Flags: approval1.8.1?
Attachment #234078 - Flags: approval1.8.0.7?
Comment on attachment 234078 [details] [diff] [review]
Patch that should fix this

a=beltzner on behalf of drivers, for landing on the MOZILLA_1_8_BRANCH. Please mark fixed1.8.1 when it lands.
Attachment #234078 - Flags: approval1.8.1? → approval1.8.1+
Fix checked in to trunk and 1.8 branch on behalf of author.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Comment on attachment 234078 [details] [diff] [review]
Patch that should fix this

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #234078 - Flags: approval1.8.0.7? → approval1.8.0.7+
Flags: blocking1.8.0.7? → blocking1.8.0.7+
Fixed on the 1.8.0 branch.

For future reference, please don't check in my patches for me unless it's an absolute "if we don't check this in tonight it'll miss the release" emergency (which it wasn't in this case, since freeze got delayed).  I really like having checkin comments that actually explain what the patch is about (and point to the bug and all), so that a year from now I can tell why I wrote the code that way.
Keywords: fixed1.8.0.7
(In reply to comment #51)
> Fixed on the 1.8.0 branch.
> 
> For future reference, please don't check in my patches for me unless it's an
> absolute "if we don't check this in tonight it'll miss the release" emergency
> (which it wasn't in this case, since freeze got delayed).

Understood, I'll refrain from doing so in the future.  FWIW, at the time I checked it in, I thought the freeze was still happening that night (I thought it was merely delayed by a few hours).
v.fixed on both branches based on latest Talkback data and the lack of any recent comments from people (no one seems to be seeing this crash with the latest builds/updates).
Crash Signature: [@ nsCSSStyleSheet::SetComplete]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: