Last Comment Bug 724129 - crash in nsXBLDocumentInfo::cycleCollection::Traverse (caused by addons?)
: crash in nsXBLDocumentInfo::cycleCollection::Traverse (caused by addons?)
Status: RESOLVED FIXED
: crash, reproducible, topcrash
Product: Core
Classification: Components
Component: XBL (show other bugs)
: 10 Branch
: All All
: -- critical with 1 vote (vote)
: mozilla10
Assigned To: Nobody; OK to take it and work on it
:
: Andrew Overholt [:overholt]
Mentors:
: 724267 (view as bug list)
Depends on:
Blocks: 469267
  Show dependency treegraph
 
Reported: 2012-02-03 15:14 PST by Andrew McCreight [:mccr8]
Modified: 2012-02-13 05:05 PST (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+


Attachments
Some Toolbar Correlations (13.71 KB, text/html)
2012-02-04 08:41 PST, Marcia Knous [:marcia - use ni]
no flags Details

Description Andrew McCreight [:mccr8] 2012-02-03 15:14:28 PST
NoteXPCOMChild is showing up as the top crash in 10, after the giant pile of empty stacks.  If I am reading this right, it is about 18.7% of crashes in 10.  It was about 2% of crashes in 10b3, 5% in 10 beta5 and 12% in beta 6.

All the ones I've looked at look like this:

1 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1710
2 	xul.dll 	TraverseBinding 	content/xbl/src/nsXBLPrototypeBinding.cpp:389
3 	xul.dll 	hashEnumerate 	xpcom/ds/nsHashtable.cpp:130
4 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.cpp:755
5 	xul.dll 	TraverseProtos 	content/xbl/src/nsXBLDocumentInfo.cpp:435
6 	xul.dll 	hashEnumerate 	xpcom/ds/nsHashtable.cpp:130
7 	xul.dll 	PL_DHashTableEnumerate 	obj-firefox/xpcom/build/pldhash.cpp:755
8 	xul.dll 	nsXBLDocumentInfo::cycleCollection::Traverse 	content/xbl/src/nsXBLDocumentInfo.cpp:473

Correlations suggest it may be addon related.
Comment 1 Andrew McCreight [:mccr8] 2012-02-03 15:15:37 PST
NoteXPCOMChild is only about 2% of crashes in 11, which kind of suggests it isn't a code change.  But who knows.
Comment 2 Andrew McCreight [:mccr8] 2012-02-03 15:37:01 PST
Looks like they are mostly happening within 15 seconds, so probably the first or second CC after startup.
Comment 3 Andrew McCreight [:mccr8] 2012-02-03 16:07:17 PST
Neal and bz: JST and I were looking over the XBL code, and it looks like Read/WritePrototypeBindings is new in 10 (bug 94199).  Is it possible that this could cause some kind of startup crash after upgrading from 9?
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2012-02-03 18:39:50 PST
Possible, sure...

None of the crashes I see have the stack from comment 0, though.  I do see some crashing from the NoteXPCOMChild(mBinding) call in nsXBLPrototypeBinding::Traverse.

Andrew, were the crashes you were looking at with the stack from comment 0 null-derefs, or something else?
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2012-02-03 18:40:11 PST
Also, is NoteXPCOMChild null-safe?
Comment 6 Andrew McCreight [:mccr8] 2012-02-03 18:49:14 PST
Most of the ones I see are like that, so maybe we're looking at different lists or something.  Here are 3 out of 5 reports I looked at:
https://crash-stats.mozilla.com/report/index/b25cc986-0967-4617-9579-5029b2120202
https://crash-stats.mozilla.com/report/index/0fa5099e-9e67-4625-a7ad-e23182120202
https://crash-stats.mozilla.com/report/index/73494659-9095-48fb-9cba-a14d62120202

The most common thing seems to be EXCEPTION_ACCESS_VIOLATION_EXEC on addresses that look kind of like 0x4246c83.  Hmm.  In fact, at least judging from the first page of crash reports, the bulk of the crashes are READs or EXECs of that exact address, 0x4246c83.  Mostly EXECs.  That seems... suspicious.

NoteXPCOMChild it should return right off the bat if it is passed null.
Comment 7 Andrew McCreight [:mccr8] 2012-02-03 18:51:32 PST
gamesbar@oberon-media.com is also a very common extension for these crashes.
Comment 8 Boris Zbarsky [:bz] (still a bit busy) 2012-02-03 19:05:58 PST
I was just looking at the list a search for NodeXPCOMChild gave me on crash-stats, and looking at the crashes for 10.

The crash is on this line:

  if (!child || !(child = canonicalize(child)))

Does this involve a virtual function call, perhaps?  If we always pass in things with the same busted vtable that would explain EXEC on the same address...
Comment 9 Andrew McCreight [:mccr8] 2012-02-03 19:10:39 PST
Ah ok.  I clicked on the link it gave for NoteXPCOMChild that it gives when you go to top crashes for 10.

Canonicalize is basically just a wrapper around a QI:
1313 canonicalize(nsISupports *in)
1314 {
1315     nsISupports* child;
1316     in->QueryInterface(NS_GET_IID(nsCycleCollectionISupports),
1317                        reinterpret_cast<void**>(&child));
1318     return child;
1319 }

It probably gets inlined.  Does a QI involve a virtual call?  I'm not really sure how that all works.
Comment 10 Andrew McCreight [:mccr8] 2012-02-03 19:14:37 PST
Ah, right, the in->QI is a virtual method invocation...
Comment 11 Boris Zbarsky [:bz] (still a bit busy) 2012-02-03 19:32:43 PST
That's my best guess so far, then, though having the same exact value there is still pretty odd....
Comment 12 Andrew McCreight [:mccr8] 2012-02-04 08:18:03 PST
Tomer Cohen posted the following over in bug 724267:
This bug was filed from the Socorro interface and is 
report bp-186ed3b4-f1df-49a3-a471-b99d02120204 .
============================================================= 

Since the last upgrade (9.0.1→10.0), Windows Firefox users are reporting on our community forum about a startup crash every time the browser starts. While we could not easily reproduce it on our own machines, we've found that we could fix it by giving the following instructions to users: 

a. Start Firefox in Safe-Mode (Please note that the usual routine of restarting in safe mode from the Help menu won't help because the users can't access Firefox UI)
b. Tools→Addons
c. Uninstall Greasemonkey


My tests show that aftere re-installing Greasemonkey on the users machines, nothing wrong happened, so it is safe to remove and reinstall.


See URL below for our forum thread and lists of crash ids. 
http://www.mozilla.org.il/board/viewtopic.php?t=10916
Comment 13 Andrew McCreight [:mccr8] 2012-02-04 08:18:32 PST
*** Bug 724267 has been marked as a duplicate of this bug. ***
Comment 14 Marcia Knous [:marcia - use ni] 2012-02-04 08:21:45 PST
Juan and I both tested the Oberon toolbar you get from their site by downloading a game, but I noted in my test results that it didn't seem to be the same version that some was in some individual reports. Also another thing I noted while combing through individual reports is a number of people had more than one toolbar installed in their extension list (Obernon and Yahoo, Oberon and MSN, etc)

(In reply to Andrew McCreight [:mccr8] from comment #7)
> gamesbar@oberon-media.com is also a very common extension for these crashes.
Comment 15 Andrew McCreight [:mccr8] 2012-02-04 08:24:10 PST
Thanks, Tomer Cohen, that's very interesting!  So it sounds like it is not the addon per se that is causing the problem, but the browser has some information associated with it that is causing problems in 10.  Sounds like it could be related to bug 94199, but I don't know anything about how caching for addon-related information works.  What kinds of things does the browser save along with an addon that would be deleted when the addon is uninstalled?

Marcia, it sounds like it might be worth trying starting up the browser in 9 with some of these addons installed, using it for a little bit, then upgrading to 10.
Comment 16 Marcia Knous [:marcia - use ni] 2012-02-04 08:41:27 PST
Created attachment 594464 [details]
Some Toolbar Correlations

Attaching some of the toolbar version correlations to help in the hunt.

Juan, Anthony and I were posting our testing results here: https://etherpad.mozilla.org/Bug-724129-Testing

So far we tried a number of combinations and have not been able to reproduce the crash. As I noted, I don't believe I had the version of the Oberon toolbar that seems to be highly correlated in the attached report. Will keep trying some combinations. In all cases I started with all the extensions in FF 9.0.1 and then moved to 10 via update.
Comment 17 Neil Deakin 2012-02-04 08:48:59 PST
Installing/unistalling an addon will invalidate the startup cache. If this is caused by 94199, you won't see the bug on the next startup (since there isn't anything cached any more) but you might see a crash on later startups.
Comment 18 Marcia Knous [:marcia - use ni] 2012-02-04 09:06:43 PST
Went back to my VM and I am now able to reproduce the issue consistently in a Win XP VM with the configuration I have - Neil was absolutely correct in that it did not show until later startups.  Here are the addons I have installed:

        Add-on Compatibility Reporter
        1.0.3
        true
        compatibility@addons.mozilla.org

        Ant Video Downloader
        2.4.5
        true
        anttoolbar@ant.com

        ant.com Community Toolbar
        3.9.0.3
        true
        {60190dac-b475-4be9-a099-4ca691de0d4f}

        freeride games Community Toolbar
        3.10.0.1
        true
        {6c94176c-d88a-4a15-b840-703b4237f992}

        Music Player Minion 2
        2.2.0
        true
        Music_Player_Minion@code.google.com

        Yahoo! Toolbar
        2.4.6.20120119024823
        true
        {635abd67-4fe9-1b23-4f01-e679fa7484c1}

        DataMngr
        1.0
        false
        {1FD91A9C-410C-4090-BBCC-55D3450EF433}

        Microsoft .NET Framework Assistant
        0.0.0
        false
        {20a82645-c095-46ed-80e3-08825760534b}

        ZoneAlarm Security Engine
        1.5.350.0
        false
        {FFB96CC1-7EB3-449D-B827-DB661701C6BB}

(In reply to Neil Deakin from comment #17)
> Installing/unistalling an addon will invalidate the startup cache. If this
> is caused by 94199, you won't see the bug on the next startup (since there
> isn't anything cached any more) but you might see a crash on later startups.
Comment 19 Tomer Cohen :tomer 2012-02-04 09:11:06 PST
I've backed up a profile directory of an affected computer, than run into uninstalling GreaseMonkey. After seeing everything went smooth, I reverted back to the old profile directory, and I am unable to reproduce the issue now.

(I was unable to reproduce it on my own computer(s), so I borrowed a computer with a crashing browser)
Comment 20 Marcia Knous [:marcia - use ni] 2012-02-04 09:11:37 PST
I will try to narrow down and see which addon is the truly problematic one. https://crash-stats.mozilla.com/report/index/8ab9ca1d-04e6-4764-bf03-b5ed92120204 was one of my crash reports.

After the crash you can relaunch, but just trying to open a new tab or do something in the URL bar seems to generate a crash quite easily.
Comment 21 Marcia Knous [:marcia - use ni] 2012-02-04 09:18:58 PST
Adding Juan and Anthony so they can track what the status is.
Comment 22 Marcia Knous [:marcia - use ni] 2012-02-05 09:33:56 PST
I have done some additional testing with the set of addons in the attachment.  So far I tried doing the following:

1. Disabled Yahoo Toolbar - still crashed
2. Disabled Ant Community toolbar - still crashed
3. Disabled Music Minion Player - no crash yet

I will keep trying to see if getting to Step 3 really prevents the crash and will play around with some other combinations.

One additional note: Having zone alarm installed, it sometimes detects the browser as "unstable" and restarts it when you hit OK. So people that have that program installed may have less instances of the crash since it restarts the browser before the crash in some instances.
Comment 23 Tomer Cohen :tomer 2012-02-05 10:30:33 PST
Please note that the issues I was facing was not involved these toolbars. It might be possible that it is caused by a common addons component, though.
Comment 24 Tomer Cohen :tomer 2012-02-07 12:36:16 PST
I am still hearing about people facing this issue, including one who is saying that it appeared on 20 computers he is responsible for. Others are saying this is caused by Greasemonkey or Video Downloader. Should I publish here more user reports or we have enough of them?
Comment 25 Olli Pettay [:smaug] 2012-02-07 12:40:44 PST
As a workaround deleting the startupcache (wherever it is on Windows) should help.
Comment 26 Ed Morley [:emorley] 2012-02-07 14:17:16 PST
> As a workaround deleting the startupcache (wherever it is on Windows)

On Windows 2000 and Windows XP:
%USERPROFILE%\Local Settings\Application Data\Mozilla\Firefox\Profiles\<ZZZZZZ>.default\startupCache\

On Windows Vista and later:
%USERPROFILE%\AppData\Local\Mozilla\Firefox\Profiles\<ZZZZZZ>.default\startupCache\

Note: Local profile folder rather than the main roaming one.
Comment 27 Tomer Cohen :tomer 2012-02-13 04:53:27 PST
People are reporting on our forum (mozilla-il) that this issue disappeared to them after updating to 10.0.1, and their workaround on 10.0 was to disable the greasemonkey addon. I'm not sure if this is the case as they had to workaround the problem in order to update, and if they did it probably the workaround still in affect. 

We have a lot of crash-ids there for further investigations.
Comment 28 Andrew McCreight [:mccr8] 2012-02-13 04:55:58 PST
10.0.1 included a fix for this issue.  Thanks for the update!  Good to know that it helped.

Note You need to log in before you can comment on or make changes to this bug.