Closed Bug 469267 Opened 16 years ago Closed 12 years ago

[meta] Crashes in GCGraphBuilder::NoteXPCOMChild

Categories

(Core :: XPCOM, defect)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox6 - ---
firefox10 - ---
firefox11 - ---
firefox12 - ---
blocking2.0 --- -

People

(Reporter: morac, Assigned: mccr8)

References

Details

(Keywords: crash, meta)

Crash Data

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2

The browser crashed when I closed Firefox 3.1b2 while on the movenetworks.com page with the plugin installed.


Reproducible: Couldn't Reproduce




Crash Report:
http://crash-stats.mozilla.com/report/index/8697bc7b-7b2b-4318-a4b9-d90572081211

crashed on xpcom/base/nsCycleCollector.cpp:1394
It happened again.

Both time I went to http://www.movenetworks.com and played video full screen.  Everything seemed fine until I tried to exit Firefox and I got the crash report box.

Second crash report:
http://crash-stats.mozilla.com/report/index/be8c4844-3c08-4eac-804e-07dd02081211?p=1
I just reproduced it a 3rd time.  Same steps, go to http://www.movenetworks.com and played video full screen.  I then restarted Firefox and crashed.  It's not crashing in the plug in though.

Move Network's plugin is used on a number of web sites such as abc.com and fox.com.
Component: XUL → Plug-ins
QA Contact: xptoolkit.widgets → plugins
Summary: EXCEPTION_ACCESS_VIOLATION in xul.dll - GCGraphBuilder::NoteXPCOMChild(nsISupports*) when closing Firefox 3.1b2 → EXCEPTION_ACCESS_VIOLATION in xul.dll - GCGraphBuilder::NoteXPCOMChild(nsISupports*) [@GCGraphBuilder::NoteXPCOMChild(nsISupports*) ]
Lower in the list.  Would love to block on all 20k crashes, but need to focus on the few that affect the most.  -ing.
Flags: blocking1.9.2? → blocking1.9.2-
it looks like this is actually pretty high frequency with around 600 crashes per day of this particular signature and other possibly related signatures

crashes on 2009 11 01

 558 GCGraphBuilder::NoteXPCOMChild(nsISupports*)
  33 @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
  10 _purecall | GCGraphBuilder::NoteXPCOMChild(nsISupports*)

distribution of all versions where the GCGraphBuilder::NoteXPCOMChild crash was found on 20091101-crashdata.csv
 275 Firefox 3.5.4
 199 Firefox 3.0.15
  54 Firefox 3.5.3
  19 Firefox 3.0.14
   7 Firefox 3.6b1

the url's are concentrated on high profile sites by that might be just an indication of overall browsing habits that are unrelated to the crash cause

domains of sites
  70 http://www.facebook.com
  67 \N//
  28 http://apps.facebook.com
  25 http://www.orkut.com.br
  19 http://www.youtube.com
  13 about:blank//
   9 http://vkontakte.ru
   6 http://mail.google.com
   5 http://www.google.fr
   5 about:sessionrestore//
   4 http://www.metaboli.fr
   4 http://nasza-klasa.pl
   4 http://mail.live.com
Move Networks does not serve video content from any of the URLs/sites mentioned above.  Content is only played from authorized providers/URLs.  This crash from these sites are highly unlikely related to the Move plug-in.
I just crashed in this stack using the latest 1.9.2 nightly. http://crash-stats.mozilla.com/report/index/bp-e6c0606a-bfa8-4026-be13-00ab22091216 is the report. I recall the crash happened rather suddenly right after I powered the machine on and tried to check for the latest update.
Another crash today - http://crash-stats.mozilla.com/report/index/5fa44f39-661e-41ec-8689-29cd72091229. The last thing I remember doing was clicking on an ad on the right hand side of the page in hotmail.
I saw this in the 3.6.4(build1) candidate. I have the Binb Bar toolbar installed, Silverlight 4. I had a few tabs open, one of them had some Silverlight content. I can crash reliably on this environment if I go to hulu.com, start playing a video, go to full-screen by clicking on the full-screen icon in the video, pause it while in full screen, then I click ESC.

http://crash-stats.mozilla.com/report/index/bp-39d42cfb-9a01-4c6a-b2cd-d3e032100416
this bug is probably in the wrong component now.  its sounding more like a GC issue.  if juan has a reproducible set of steps now, that is a good sign.
pretty high volume and it might also be going by these other signatures.  its currently the #15 non-flash non-hang crash in the 3.6.4 beta

signature list
 623 GCGraphBuilder::NoteXPCOMChild(nsISupports*)
  41 @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   9 _purecall | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   8 nsCOMPtr<nsISupports>::~nsCOMPtr<nsISupports>() | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   5 operator new(unsigned int) | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   2 nsCOMPtr_base::assign_assuming_AddRef(nsISupports*) | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   1 nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener>() | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   1 nsCOMPtr_base::assign_assuming_AddRef(nsISupports*) | nsCOMPtr_base::assign_with_AddRef(nsISupports*) | GCGraphBuilder::NoteXPCOMChild(nsISupports*)
   1 @0x0 | PL_DHashTableOperate | GCGraphBuilder::NoteXPCOMChild(nsISupports*)

checking --- GCGraphBuilder::NoteXPCOMChild.nsISupports 20100428-crashdata.csv
found in: 3.6.3 3.6 3.5.9 3.6.4 3.0.19 3.6.2 3.5.5 3.5.2 3.7a4 3.5.8 3.5.7 3.5.3 3.0b3 3.0.3 3.0.15 3.0 3.6b3 3.6.3plugin1 3.5b4 3.0.10 3.7a3 3.7a2 3.6b5 3.6b4 3.6b1 3.0.8 3.0.6 3
.0.5 3.0.4 3.0.16
release total-crashes
              GCGraphBuilder::NoteXPCOMChild.nsISupports crashes
                         pct.
all     382088  691     0.00180848
3.6.3   262546  389     0.00148165
3.6     16601   89      0.00536112
3.5.9   35348   85      0.00240466
3.6.4   15662   40      0.00255395
3.0.19  12165   21      0.00172626

Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1460
1 	xul.dll 	nsBindingManager::Traverse 	content/xbl/src/nsBindingManager.cpp:1623
2 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1611
3 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2554
4 	xul.dll 	XPCCycleCollectGCCallback 	js/src/xpconnect/src/nsXPConnect.cpp:391
5 	js3250.dll 	js_GC 	js/src/jsgc.cpp:3537
6 	js3250.dll 	JS_GC 	js/src/jsapi.cpp:2439
7 	xul.dll 	nsXPConnect::Collect 	js/src/xpconnect/src/nsXPConnect.cpp:478
8 	xul.dll 	nsCycleCollector::Collect 	xpcom/base/nsCycleCollector.cpp:2421

tomcat, this might be a good one for listing on the topcrash II wiki page to see if someone can look at the cycle collector.
High daily rate since build b6pre/20100912 : 60 crashes/day

There is a lot of kind of crashing thread :
* one kind on Windows 7:
Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1485
1 	xul.dll 	nsTypedSelection::cycleCollection::Traverse 	layout/generic/nsSelection.cpp:3420
2 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1636
3 	xul.dll 	xul.dll@0xd8afdb 	
4 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2552
5 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2589

* one kind on Windows XP :
Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1485
1 	xul.dll 	nsSelectionState::DoTraverse 	editor/libeditor/base/nsSelectionState.cpp:67

The regression range is :
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=73ab2c3c5ad9&tochange=cd3c926a7413
blocking2.0: --- → ?
Blocks: 547987
Because of the fixes of other b7pre bugs, it is now #1 top crasher for b7pre/20100922 build.
this has definitely been around before as mentioned above, but we might have seen a volume regression on trunk/4.0b7pre starting about sept15-17.  Its pretty hard to tell given the high overall volume and possible different usage patterns of trunk users but right now the crash ratio is 2% of all 4.0b7pre crashes v. 0.02-4% on older and more widely distributed releases.

Ranks #15 on 3.6.10 and #36 on 4.0b7pre which also may be an indication of trunk regression where its now #1 or #2


checking --- GCGraphBuilder::NoteXPCOMChild.nsISupports.. 20100922-crashdata.csv
found in: 3.6.10 3.6.8 4.0b7pre 4.0b6 3.6.9 3.5.13 3.6 3.6.6 3.6.3 3.0.19 3.6.4 4.0b4 4.0b5 4.0b1 3.6b1 3.5.9 3.5.7 3.5.2 3.6b2 3.5b4 3.5 3.0b5 3.7a5 3.6.7 3.5.6 
3.5.5 3.5.1 3.1b3 3.1b2 3.0b1 3.0.1 3.0 4.0b6pre 4.0b3 3.6b5 3.6.3plugin1 3.6.2 3.5b99 3.5.8 3.5.3 3.5.12 3.5.10 3.0b3 3.0.7 3.0.4 3.0.3 3.0.2 3.0.15
release total-crashes
              GCGraphBuilder::NoteXPCOMChild.nsISupports.. crashes
                         pct.
all     342643  1226    0.00357807
3.6.10  179878  790     0.00439187
3.6.8   29182   106     0.00363238
4.0b7pre2622    59      0.0225019
4.0b6   24477   53      0.0021653
3.6.9   9378    33      0.00351887
3.5.13  14415   26      0.00180368
The latest form of the regressions is probably the same as bug 597344. Older forms can be caused by any number of issues. I'm going to mark this final as a topcrasher, but if it doesn't show up high in later betas it'll be minused.
blocking2.0: ? → final+
Component: Plug-ins → XPCOM
QA Contact: plugins → xpcom
The volume on this one is pretty low now. Removing the topcrash keyword.
Keywords: topcrash
blocking2.0: final+ → -
It is #33 top crasher in 4.0b10.
It is #25 top crasher in 4.0b12.
It is #17 top crasher in 4.0.
Crash Signature: [@GCGraphBuilder::NoteXPCOMChild(nsISupports*) ]
I went looking through the last couple of dozen of these crashes, trying to look for a pattern in the stack traces.  A lot of them are nonsense (NoteXPComChild and nothing else, like this https://crash-stats.mozilla.com/report/index/d1da0778-b00a-4559-85a6-cf4612110711 ), and a number are various things, but a few patterns emerge.

One pattern that came up a few times involves TraverseMapEntry in nsDOMAttributeMap::cycleCollection::Traverse.  I found about a dozen of these.  They all look like this:  https://crash-stats.mozilla.com/report/index/f13c8e0c-d370-49a0-bb5f-3e8522110711

There are a few variations on that pattern, where somehow we end up in nsDOMAttributeMap's TraverseMapEntry from nsPresContext, that looks like this: https://crash-stats.mozilla.com/report/index/f6e7ab52-7f75-48df-9aa0-267432110711

I also found quite a few stack traces involving nsGenericElement's Traverse that look like this:
https://crash-stats.mozilla.com/report/index/2b763697-f2c0-4734-bced-c3ef52110711
https://crash-stats.mozilla.com/report/index/97813a0a-3df3-470a-9a17-05feb2110711 (line numbers are slightly different)

Maybe those two Traverses would be a good starting point for investigating these crashes.  The generic element line numbers are all over the place, and it is a pretty big and complicated looking, so it may just be a sink for a bunch of separate errors.
the line number differences might correspond to different releases where code is moving around a bit.

you can also use the reports at
https://crash-analysis.mozilla.com/crash_stacks/Stack-summary-5.0.txt
for 5.0 other releases to search across the top ten frames of the stack on the top 500 crash signatures to go looking for nsDOMAttributeMap::cycleCollection::Traverse and other patterns .
Interesting.  What does "distribution of 25 different stacks" mean?  Is that 25 random stacks from that crash, a representative sample or something else?

Using that, I found that the #143 crash is also in the hash table traversal in nsDOMAttributeMap::cycleCollection::Traverse.
The correlations suggest that half of these are graphics-related.  These are the top correlations, and they all look graphics driver related:

52% (251/486) vs.   3% (2879/104798) atidxx32.dll
52% (251/486) vs.   3% (2883/104798) atiuxpag.dll
52% (251/486) vs.   3% (3108/104798) aticfx32.dll
55% (268/486) vs.   9% (8965/104798) d2d1.dll
55% (268/486) vs.   9% (9006/104798) d3d10.dll
55% (268/486) vs.   9% (9006/104798) d3d10core.dll
56% (271/486) vs.  10% (10074/104798) d3d10_1core.dll
56% (271/486) vs.  10% (10074/104798) d3d10_1.dll
56% (271/486) vs.  10% (10173/104798) dxgi.dll

I manually looked through 20 or so crash reports, and of those that had atidxx32.dll (which was 12 that I saw), every single one had version 8.17.10.318 of that dll file.  Is there some way to automatically look at all of the driver versions in this crash?

According to http://developer.amd.com/download/ccc/pages/default.aspx this was released as part of the CATALYST 10.11 drivers, which look like they were released in October 2010.  So I would guess that those drivers are buggy, as there have been 7 driver releases since then.

The driver strings in the reports looked like this:
Module|atidxx32.dll|8.17.10.318|atidxx32.pdb|8FD758B1B16F49A0AA5610406B32F6771|0x6f270000|0x6f64afff|0
The second to last number varied, but the rest were identical.

Here are some example reports I found:
https://crash-stats.mozilla.com/report/index/7c979563-781a-44a7-83e2-b37442110726
https://crash-stats.mozilla.com/report/index/733d79cb-c41e-4668-9a29-3964f2110726
https://crash-stats.mozilla.com/report/index/0409d468-c1a8-4b01-9634-885462110726
https://crash-stats.mozilla.com/report/index/ccde4afd-349e-4b0d-a158-c168a2110726
https://crash-stats.mozilla.com/report/index/656fe2a3-e1e3-40f2-bdaf-a0b6d2110726
https://crash-stats.mozilla.com/report/index/278bca74-0a0d-455a-95b0-5da442110726
https://crash-stats.mozilla.com/report/index/c7739a5d-cf9d-4bc4-8322-e50112110726
https://crash-stats.mozilla.com/report/index/c20c3b2c-f3f8-42ef-be35-798d52110725
https://crash-stats.mozilla.com/report/index/194e80f3-e6c8-4a5b-89f1-6c2312110725
https://crash-stats.mozilla.com/report/index/eeb87eb5-729e-4f69-a9d8-667d62110725

CCing Benoit as he's assigned/reported a few graphics driver blacklist bugs.
Assignee: nobody → continuation
Those are 5.0 crashes.  atidxx32.dll doesn't show up at all in the correlations in 6.0, but this crash also isn't as high in the rankings there. (#5 vs #17)
I looked at a few other top crashes, including the JS GC one, and atidxx32.dll didn't show up in the correlations, so maybe there's something more than just simple memory corruption at work.
It is #5 top crasher in 5.0 and #16 on 6.0.

(In reply to comment #24)
> The correlations suggest that half of these are graphics-related.  These are
> the top correlations, and they all look graphics driver related:
This bug was filed before HW acceleration landed in Firefox.

Crash stats on 5.0 show a small correlation with Norton extensions:
    14% (69/486) vs.   6% (6554/104798) {BBDA0591-3099-440a-AA10-41764D9DB4DB} (Norton IPS)
    11% (52/486) vs.   5% (5294/104798) {2D3F3651-74B9-4795-BDEC-6DA2F431CB62} (Norton toolbar)
Crash Signature: [@GCGraphBuilder::NoteXPCOMChild(nsISupports*) ] → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*) ]
Summary: EXCEPTION_ACCESS_VIOLATION in xul.dll - GCGraphBuilder::NoteXPCOMChild(nsISupports*) [@GCGraphBuilder::NoteXPCOMChild(nsISupports*) ] → EXCEPTION_ACCESS_VIOLATION in xul.dll - GCGraphBuilder::NoteXPCOMChild
(In reply to comment #27)
> This bug was filed before HW acceleration landed in Firefox.

Sure, it probably isn't the entire story, but it seems pretty suspicious that half of the crashes on 5.0 have ATI drivers, and many, if not all, have the exact same version of those drivers.  Though I do wonder why those crashes seemed to have dropped away in 6.0.  Maybe due to a bugfix, or maybe people in beta are better about updating their drivers?
(In reply to comment #28)
> (In reply to comment #27)
> it seems pretty suspicious that half of the crashes on 5.0 have ATI drivers,
> and many, if not all, have the exact same version of those drivers.
Probably because it's the same person that crashes a lot with a faulty extension.
Well, in the crash reports above, the sets of extensions look very different in each crash report (except for {972ce4c6-7e08-4474-a285-3208198ce6fd} which looks like it is some kind of default built in thing) so it doesn't really look like the same person to me, but I'm not an expert at interpreting crash reports.
Note that this has been rising significantly in 6* in the last two days. This matches when 6.0b3 rose to higher user numbers. It's #35 overall in 6.0b2 but #14 overall in 6.0b3.
Is there any way to see how many of these crashes involved atidxx32.dll version 8.17.10.318 in 6.0b3 vs 6.0b2 vs. 5?

Looking at 6.0, atidxx32.dll has shown up in the correlations, at a even higher rate than 5.0:

89% (95/107) vs.   3% (840/30851) atidxx32.dll
89% (95/107) vs.   3% (840/30851) atiuxpag.dll
89% (95/107) vs.   3% (971/30851) aticfx32.dll
89% (95/107) vs.   8% (2552/30851) d3d10.dll
89% (95/107) vs.   8% (2552/30851) d3d10core.dll
89% (95/107) vs.   8% (2594/30851) d2d1.dll
90% (96/107) vs.   9% (2885/30851) d3d10_1core.dll
90% (96/107) vs.   9% (2885/30851) d3d10_1.dll
90% (96/107) vs.   9% (2915/30851) dxgi.dll
92% (98/107) vs.  19% (5830/30851) explorerframe.dll

When I looked at the correlations on the 26th or so, they did not include all of ATI and graphicsy things, so it looks like there's been a change.

I did see one crash that had version 8.17.10.370, but it was still the case that most of the crashes I found with the ATI drivers had the .318 version.
(In reply to comment #32)
> Is there any way to see how many of these crashes involved atidxx32.dll
> version 8.17.10.318 in 6.0b3 vs 6.0b2 vs. 5?

Unfortunately not. It's pretty probable that people who have been on 6.0b2 some days ago are the same ones that are now on 6.0b3, though, and the amount of changes on our side between those two is pretty small - could anything of those have an influence here?
I don't know.  It just looks weird to me, because it looks like we have three datapoints for these crashes:

5.0: half of these crashes involve ATI graphics, seem to be heavy on 8.17.10.318, lots of crashes
6.0b2: no substantial presence of ATI graphics crashes, not so many crashes
6.0b3: looks like 5.0, except even more skewed to ATI drivers, lots of crashes
Crashes on Windows 7 happen mainly with the "AuthenticAMD family 20 model 1 stepping 0" CPU, I'd say about 95%.
There are no CPU correlations for those on Windows XP.
Benoit, Joe, seems like we'd cut out the majority of these crashes if we blocklist a specific ATI driver version (details in previous comments). What needs to happen to block things here?
Easiest way is to follow https://wiki.mozilla.org/Blocklisting/Graphics . We can push this out at runtime and see if our crash rate changes.
Depends on: 675384
From the last 20 reports on Windows 7 in 5.0:
* AdapterVendorID - 1002 (AMD/ATI):                100% of all crashes    
* CPU - AuthenticAMD family 20 model 1 stepping 0:  95% of all crashes
* AdapterDriverVersion - 8.792.0.0:                 85% of all crashes
* AdapterDeviceID - 9803 (Radeon HD 6300 series):   40% of all crashes
* AdapterDeviceID - 9802 (Radeon HD 6300 series):   35% of all crashes
* AdapterDeviceID - 9804 (Radeon HD 6200 series):   20% of all crashes
* AdapterDeviceID - 68c1 (Radeon HD 6500M/5600/5700 Series):                                             5% of all crashes
* AdapterDriverVersion - 8.762.0.0:                  5% of all crashes
* AdapterDriverVersion - 8.813.0.0:                  5% of all crashes
* AdapterDriverVersion - 8.850.0.0:                  5% of all crashes
Source: http://developer.amd.com/download/pc_vendor_id/pages/default.aspx

Does Firefox crash because of the CPU (nothing to do) or HW acceleration with driver version 8.792.0.0 and some GPUs (blocklist these GPUs/driver version - bug 675384)?
We need to understand why this spiked on b3 and wasn't high for b2 with a similar number of users.
Looking over the list of changes at http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=ea7034005498&tochange=7864cfd02969 nothing really jumps out at me.
I found two patches online to random pieces of software to work around problems on the AMD fusion chip that I don't really understand, but nothing seems particularly related.  One of them is some kind of graphics thing (GART) and the other came up when they were running a VM in 64-bit mode.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8aeb96f80232e9a701b5c4715504f4c9173978bd
http://forums.virtualbox.org/viewtopic.php?f=1&t=41752
It is #41 top crasher in 6.0b1 and #30 in 6.0b2 with no correlation to the AMD CPU above.
It is #10 top crasher in 5.0 and #12 in 6.0b3 with a strong correlation to the AMD CPU above.
The only thing I can think of to do is to compare what the assembly looks like in 6b2, 6b3 and 5 to see if there's anything different about 6b2.  I'm probably not going to be in today, so somebody else can look at this if they want.
Well, what the heck, I'll look at this, though my disassembly skills are not mighty.
It is only #50 top crasher in 6.0b4, #42 in 7.0a2, and #77 in 8.0a1.
Looking at the correlations for 6, the atidxx32 thing (which seems like a good proxy for the AMD crashes) is fairly low.  (10% (5/51) vs.   3% (655/24719) atidxx32.dll)  Those crashes could even be from old versions, I guess.  The way the AMD crashes seem to appear or disappear from build to build, without any obviously related changes, suggests it could be some kind of PGO problem.  I haven't had a chance to compare the assembly yet.
It's #13 overall in 6.0b3, #47 overall in 6.0b4, and #26 so far in 6.0b5 - so has went slightly down in ranks, but not really far down. Could need a few more days of data on b5, still, though.
Whiteboard: [crashkill] → [crashkill] [trending down, tracking spike at beta 3]
It's been a few more days, how we looking?
It's #13 in 6.0b3, #35 overall in 6.0b4, and #22 so far in 6.0b5 - "Correlations for Firefox 6.0 Windows NT" are completely inconclusive for 6.0, no ATI ones even mentioned.
Keywords: topcrash
It's #38 top crasher in 9.0.1, #9 in 10.0b3, #20 in 11.0a2 and #17 in 12.0a1.

Some users reported in bug 547987 (TB version) that they hit this crash after several refreshes on html5test.com:
Frame 	Module 	Signature [Expand] 	Source
0 		@0x0 	
1 	xul.dll 	GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1765
2 	xul.dll 	nsDOMEvent::cycleCollection::Traverse 	content/events/src/nsDOMEvent.cpp:240
3 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1961
4 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2906
5 	xul.dll 	nsCycleCollectorRunner::Run 	xpcom/base/nsCycleCollector.cpp:3626
6 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:660
7 	xul.dll 	nsThreadStartupEvent::Run 	xpcom/threads/nsThread.cpp:221
8 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426

In Safe Mode, they hit the nsEventListenerManager::cycleCollection::Traverse crash signature:
Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsEventListenerManager::cycleCollection::Traverse 	content/events/src/nsEventListenerManager.cpp:180
1 	xul.dll 	nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1961
2 	xul.dll 	nsCycleCollector::BeginCollection 	xpcom/base/nsCycleCollector.cpp:2906
3 	xul.dll 	nsCycleCollectorRunner::Run 	xpcom/base/nsCycleCollector.cpp:3626
4 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:660
5 	xul.dll 	nsThreadStartupEvent::Run 	xpcom/threads/nsThread.cpp:221
6 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
Crash Signature: [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*) ] → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)]
There are a lot of Wikipedia-related comments.
11:
"happens about 50% of time when I visit wikipedia through search box"
"This crash occurs every time I attempt to search Wikipedia via the search bar plugin. It does not occur when I use other searches from the same bar. It does not occur when I navigate to Wikipedia using the URL bar or search via the Wikipedia landing page."
"Problem with wikipedia. Crashes everytime after 2 seconds."
12:
"This crash (along with the one submitted from my same machine about 2.5 hours ago, and along with the previous six crashes for which I didn't submit crash reports) was triggered by entering something into the #nav-bar's search bar, then clicking/pressng enter to tell Firefox to search, with Wikipedia as the selected search engine. (It used to seem associated with alt-tabbed away while still loading, but in this case Firefox still had focus.)"
"Yet another crash after entering text into the search bar and searching with Wikipedia as the search engine."
All 5 of these reported Wikipedia crashes occurs in nsDOMEvent::cycleCollection::Traverse, in the NoteXPComChildren from the line:
  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mEvent->target)
All the crashes are in canonicalize.  They are all on different CPUs, so they are probably different people.

4 of the 5 have HTTPS Everywhere 1.2.1 installed.  It looks like 1.2.2 is the latest.  The fifth reports 0 extensions, which I assume is erroneous.

One of the things noted in the changelog for 1.2.2 is: "Fixes: Wikipedia, Identi.ca, Verizon, CCC.de, UserScripts, Yandex"
Depends on: 718285
(In reply to Andrew McCreight [:mccr8] from comment #52)
> All 5 of these reported Wikipedia crashes occurs in
> nsDOMEvent::cycleCollection::Traverse, in the NoteXPComChildren from the
> line:
>   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mEvent->target)
> All the crashes are in canonicalize.  They are all on different CPUs, so
> they are probably different people.
> 
> 4 of the 5 have HTTPS Everywhere 1.2.1 installed.  It looks like 1.2.2 is
> the latest.  The fifth reports 0 extensions, which I assume is erroneous.
> 
> One of the things noted in the changelog for 1.2.2 is: "Fixes: Wikipedia,
> Identi.ca, Verizon, CCC.de, UserScripts, Yandex"

I just filed https://bugzilla.mozilla.org/show_bug.cgi?id=718284.
Depends on: 718284
It's "only" #35 top browser crasher in 9.0.1.
But it's currently #5 top browser crasher in 10.0b5, #12 in 11.0a2 and #10 in 12.0a1.

Here are 10.0b4 top correlations (see http://people.mozilla.org/~rhelmer/temp/Firefox-10.0b4-correlation-2012Jan19/):
  GCGraphBuilder::NoteXPCOMChild(nsISupports*)|EXCEPTION_ACCESS_VIOLATION_READ (65 crashes)
     17% (11/65) vs.   0% (34/13378) gamesbar@oberon-media.com
  GCGraphBuilder::NoteXPCOMChild(nsISupports*)|EXCEPTION_ACCESS_VIOLATION_EXEC (48 crashes)
     35% (17/48) vs.   0% (34/13378) gamesbar@oberon-media.com
I looked back over the history for 4.0. This has always been high. It fluctuates but has reliably been in the top 20 - spiking up higher at times. I do think we should investigate, I am don't see it work tracking unless we can correlate a specific rise in numbers to a regression range. It's one of these problems that hasn't been easy to fix and I don't think tracking will change that.
bug 718284 is a reproducible crash with this signature, so maybe that should get some kind of tracking.
(In reply to Andrew McCreight [:mccr8] from comment #56)
> bug 718284 is a reproducible crash with this signature, so maybe that should
> get some kind of tracking.

A reproducible crash for a top 20 signature? Yes, definitely looks like something we want investigated. :)
Andrew, now that we have a reproducible case, who is the right person to investigate this?
I'll take a look.
Depends on: 724129
Whiteboard: [crashkill] [trending down, tracking spike at beta 3] → [crashkill]
Users on our community forum were affected by this issue as well. We were able to fix it to them by removing the Greasemonkey addon. Please see bug 724267.
Depends on: 724267
So we fixed the issue that were causing the spike. Bug 726777 was landed on m-c, aurora and beta and was part of 10.0.2. I don't see a reason to track this for FF12 anymore. It's still a top crash but has been since 4. It's back at the same level. I am removing all the various flags.
ranked #38 in 10, #23 in 11, #60 in 12, #101 in 13.  About half of the crashes I looked at in 11 are in nsPresContext::Traverse.  I didn't see that function in any of the stacks I looked at in 10 or 12.  I'm not sure what's going on there.

example: https://crash-stats.mozilla.com/report/index/a2e3f230-af88-4cdf-8dd3-07de52120304
Crash Signature: [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild]
Depends on: 730246, 727426, 655835
No longer depends on: 718285, 724267
OS: Windows XP → All
Hardware: x86 → All
Summary: EXCEPTION_ACCESS_VIOLATION in xul.dll - GCGraphBuilder::NoteXPCOMChild → Crash in GCGraphBuilder::NoteXPCOMChild
My plan with these crashes, if it sounds reasonable to others, is to get NoteXPCOMChild added to the prefix list.  The two major bugs we've seen recently with this signature (bug 724129 and bug 718284) were not due to bugs in the cycle collector, but in other classes. I expect this is going to be the case most of the time. If we split up NoteXPCOMChild based on the underlying signature, it will be easier to see when a new kind of crash spikes up, as well as easier to figure out correlations for particular kinds of crashes. This bug can still be used as a sort of meta bug.

If this were in place before, bug 724129 would have shown up with the signature
[GCGraphBuilder::NoteXPCOMChild | TraverseBinding] and bug 718284 would have shown up with the signature [GCGraphBuilder::NoteXPCOMChild | nsDOMEvent::cycleCollection::Traverse].  A bit long, but at least we can see there's a common underlying issue.
Depends on: 735719
Hey Andrew, have we logged the bug to get NoteXMCOMChild added to the prefix list yet?
Depends on: 737690
Sorry, I'd lost track of that.  I filed it just now as bug 737690.
With GCGraphBuilder::NoteXPCOMChild added to the skiplist, this meta bug will die.
I close it as incomplete.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
When will the updated signatures for this start to show up?  I looked at crash stats just now, and I don't see anything.

I'd kind of like to keep around some kind of meta bug for my own purposes.  Maybe I could just strip off all of the crash related things from this bug?  Does that sounds reasonable?  Or maybe I could just come up with a clever bugzilla query that will find all the relevant bugs.
(In reply to Andrew McCreight [:mccr8] from comment #67)
> When will the updated signatures for this start to show up?

This landed late yesterday UTC time, and we're always showing full UTC days of stats, so this probably isn't visible in the stats yet, but should be in a number of hours when today's stats will be available.

> I'd kind of like to keep around some kind of meta bug for my own purposes.

Sure, feel free to, I think Scoobidiver closed all those trackers because they were mostly just for having something to match the signatures we got. If it has a different purpose for you as a normal tracker, that's reason enough to keep it open for sure. No need to strip off stuff as we'll just not have exactly matching signatures in crash-stats any more so we won't get linked here anyhow. :)
Great, thanks.
Status: RESOLVED → REOPENED
Keywords: topcrashmeta
Resolution: INCOMPLETE → ---
Summary: Crash in GCGraphBuilder::NoteXPCOMChild → [meta] Crashes in GCGraphBuilder::NoteXPCOMChild
Whiteboard: [crashkill]
Version: unspecified → Trunk
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> mccr8, are the other signatures in the following query related to this bug
> or are those individual bugs?
> https://crash-stats.mozilla.com/query/
> query?product=Firefox&query_search=signature&query_type=contains&query=GCGrap
> hBuilder%3A%3ANoteXPCOMChild&do_query=1

These are likely individual bugs (well, multiple signatures can be from a single bug, if they are from similar classes).  Take for instance this signature:

GCGraphBuilder::NoteXPCOMChild(nsISupports*) | xul.dll@0x35521f | xul.dll@0x35521f | nsXULPrototypeNode::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 

What has happened is that we are traversing a nsXULPrototypeNode, and reporting an XPCOMChild to the cycle collector.  Odds are, the nsXULPrototypeNode is bogus somehow, so when the CC touches it, it crashes.  But the underlying problem is with nsXULPrototypeNode, not the CC.

The various session storage signatures first showed up en masse due to some patch that messed up how session storage was handled.
These have the signature NoteChild now, due to some refactoring I did.  This also means they are aggregated again.  I don't see any such crashes on the top list for 16.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.