Closed Bug 604449 Opened 14 years ago Closed 14 years ago

New crash [@ nsQueryInterface::operator() ][@ nsContentUtils::CanCallerAccess ][@ nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ] (sometimes when using tinderboxpushlog)

Categories

(Core :: XPConnect, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: dholbert, Assigned: mrbkap)

References

()

Details

(4 keywords)

Crash Data

Attachments

(1 file)

I've hit two crashes in the last hour when using tinderboxpushlog (mstange's instance, at http://tests.themasta.com/tinderboxpushlog/ )

The first, I'd just clicked a bug number (to load that bug page) from a changeset that I'd pushed, on mstange's tinderboxpushlog instance. --> Crash
bp-e5128a3f-e771-4747-8c25-aca812101014

The second, I'd just switched tabs to view tinderboxpushlog, and I crashed less than a second later:
bp-b515da1b-738d-4ac9-a18a-462222101014

The stacks of those two crashes are the same.
Meant to mention -- I'm using today's nightly:
Mozilla/5.0 (X11; Linux x86_64; rv:2.0b8pre) Gecko/20101014 Firefox/4.0b8pre

Filing in component XPConnect since I suspect this may be a regression from Bug 580128 (or something associated with it), since that was a large landing that was new in today's nightly.

From glancing through the backtrace, stacklevel 5 is something that was touched in that landing, too:
http://hg.mozilla.org/mozilla-central/annotate/ad0a0be8be74/content/base/src/nsDocument.cpp#l7423
(not sure if that's actually relevant -- it's just the most recent changed line from stacklevels 1 thru 5, and it seems suspicious since it was changed yesterday)
(I don't have steps to reproduce at this point, sadly -- most of the time I can interact with tbpl just fine, with no crash.)
Just hit what looks like a version of this again, but with a different signature (crash is a few steps up the backtrace from comment 0's crashes, and has some other differences at higher-valued stack frames):
bp-edf24511-9395-4e99-9c6d-22aab2101014
Summary: New crash [@ nsQueryInterface::operator() ] when using tinderboxpushlog → New crash [@ nsQueryInterface::operator() ], [@ nsContentUtils::CanCallerAccess ] (sometimes when using tinderboxpushlog)
... that new signature being [@ nsContentUtils::CanCallerAccess ]  (which in my earlier stacktraces is stacklevel 3 -- it calls nsCOMPtr_base::assign_from_qi, which calls nsCOMPtr::operator() -- and that's where I crashed in my earlier stacktraces)
blocking2.0: --- → ?
Keywords: regression
I can reproduce this crash in the lab using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101013 Firefox/4.0b8pre

STR:
1. Load http://money.cnn.com/2010/10/14/technology/ipod_trade_up_program/index.htm?source=cnn_bin&hpt=Sbin
2. Click on the Google android link on the right.
3. Crash 100%

But my profile has quite a few extensions installed.

Add-on Compatibility Reporter0.6truecompatibility@addons.mozilla.org
Firebug1.5.4truefirebug@software.joehewitt.com
Shareaholic1.9.9.5truefirefox-extension@shareaholic.com
It's All Text!1.4.2trueitsalltext@docwhat.gerf.org
Web Developer1.1.8true{c45c406e-ab73-11d8-be73-000a95be3b12}
JSONView0.5truejsonview@brh.numbera.com
Adblock Plus1.2.2true{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d

It doesn't look as if Daniel has the same extensions installed as I do. http://crash-stats.mozilla.com/report/index/bp-1c5859ec-f380-4056-bf32-96e302101014 is my report.
Keywords: reproducible
I'm hitting this a lot too.

e.g.

http://crash-stats.mozilla.com/report/index/d44f158c-823b-4539-a94d-a7a842101015

I can't figure out what's causing it.  It happens periodically, even if I'm not interacting with a page.  I have gmail, greader, zimbra and rememberthemilk open all the time as app tabs.
I can reproduce this reliably with Marcia's STR. (interpreting "google android link on the right" to be just whatever the topmost link in the "Right Now" section at upper-right is)

Crash seems to go away if I disable NoScript.  (however, a fresh profile with NoScript installed doesn't reproduce the crash, so there might be something else going on)

Also, some of the crashes have this bug's signature, [@ nsQueryInterface::operator() ]:
   bp-a5b7c828-985f-4e13-8684-6a5a82101015
...and some of the crashes have bug 604368's signature, [@ XPCWrappedNative::GetObjectPrincipal ]:
   bp-aebcf3b6-f7d2-4636-936d-b6a482101015
   bp-e050b74a-c6fb-4a89-85d1-037ee2101015

The stacks don't look very similar, but since I'll get one or the other with the same STR, I wonder if this is somehow a variant of bug 604368...
Ok -- I'm still able to reproduce Marcia's CNN crash in my main browsing profile, with all extensions disabled except for NoScript.  (Still can't reproduce in a fresh profile + NoScript, so it must be dependent on something specific to my NoScript cfg or Firefox settings.  Not sure what though.)

Also: Turns out the CNN crashes nearly all land at bug 604368's signature -- I think I've only had it hit this bug's signature once (the first crash report in previous comment).  So, since that bug has a reviewed patch, it'd probably be good to see if this crash still happens after that patch lands.
(In reply to comment #8)
> Also: Turns out the CNN crashes nearly all land at bug 604368's signature

D'oh -- turns out **all** of my crashes with marcia's CNN STR are that other bug.  The first crash report that I mentioned in comment 7 was actually from *just before* I started trying Marcia's STR -- it was a sudden crash when I pressed "apply update" in the About Minefield dialog.

So: marcia's STR have only triggered bug 604368's crash for me, so far.  (but maybe there are really two potential crashes lurking there, and bug 604368's crash is just happening first for me?)
Marking blocking final until we can get more info here.  It's currently number 14 on b8pre and it also exists in b6 so it's not related to compartments.
blocking2.0: ? → final+
Severity: normal → critical
Just caught this in a debugger, & mrbkap/jst inspected the stack a bit and came up with this as a possible fix.

Posting this on behalf of mrbkap -- I'll leave it to him to get review on this & land (if it ends up being the right fix).
Comment on attachment 485440 [details] [diff] [review]
mrbkap's possible fix

r=jst, I think we should take this for beta7.
Attachment #485440 - Flags: review+
blocking2.0: final+ → beta7+
Assignee: nobody → mrbkap
Whiteboard: [compartments]
Summary: New crash [@ nsQueryInterface::operator() ], [@ nsContentUtils::CanCallerAccess ] (sometimes when using tinderboxpushlog) → New crash [@ nsQueryInterface::operator() ][@ nsContentUtils::CanCallerAccess ][@ nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ] (sometimes when using tinderboxpushlog)
not sure what's going on but the nsContentUtils::CanCallerAccess.nsPIDOMWindow.. signature spiked yesterday on trunk builds from yesterday, and it looks like its continuning today

date     tl crashes at, count build, count build, ...
         nsContentUtils::CanCallerAccess.nsPIDOMWindow..
20101010   
20101011 1 4.0b8pre2010101104 1 , 
20101012 1 4.0b8pre2010101204 1 , 
20101013 1 3.6.102010091412 1 , 
20101014   
      15-19 no reports   
20101020   
20101021 1 3.7a22010022816 1 , 
20101022   
20101023   
20101024 18  17 4.0b8pre2010102404, 
	        1 3.6.112010101211, 

see this query and sort by date

http://crash-stats.mozilla.com/report/list?signature=nsContentUtils::CanCallerAccess%28nsPIDOMWindow*%29
> not sure what's going on
It is a moving crash signature. I think the crash daily rate is constant if you look at all the crash signatures, that is:
[@ nsQueryInterface::operator() ][@ nsContentUtils::CanCallerAccess ][@ nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ][@ nsPIDOMWindow::IsOuterWindow() ][@ nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) | nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ] [@ _purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) | nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ]
ok, those combined signatures result in this number of crashes per day on 4.0b8pre

   1 20101009-crashdata.csv 4.0b8pre
   1 20101011-crashdata.csv 4.0b8pre
   1 20101012-crashdata.csv 4.0b8pre
  81 20101014-crashdata.csv 4.0b8pre
  73 20101015-crashdata.csv 4.0b8pre
  76 20101016-crashdata.csv 4.0b8pre
  89 20101017-crashdata.csv 4.0b8pre
  99 20101018-crashdata.csv 4.0b8pre
 126 20101019-crashdata.csv 4.0b8pre
  93 20101020-crashdata.csv 4.0b8pre
  92 20101021-crashdata.csv 4.0b8pre
  82 20101022-crashdata.csv 4.0b8pre
  79 20101023-crashdata.csv 4.0b8pre
  86 20101024-crashdata.csv 4.0b8pre

> r=jst, I think we should take this for beta7.

it appeared only at that pre-2010 10 14 one crash per day level when the trunk was b7pre.   if the changes that caused the regression on oct 14 are going to the branch it does sound like a fix for these combined signatures need to go to the branch too.
This too.
Whiteboard: [compartments] → [compartments][can land]
http://hg.mozilla.org/mozilla-central/rev/6482443105ca
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I haven't hit this recently, but I just got a new report of this crash in #firefox:  bp-3200d31b-d732-4d12-b6d2-9f99f2101029

It's from today's nightly (build ID 20101029030658 ).  From a quick glance, the stack looks identical to the first crash report in comment 0.

  --> Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #20)
> From a quick glance, the
> stack looks identical to the first crash report in comment 0.

(The first 18 stack levels are identical, at least.  They differ after that.)
re-running the stats from comment 17 to cover all the signatures on the days since the fix in comment 19 landed suggests that its having some possitive effect.

count date

 104 20101025-crashdata.csv 4.0b8pre
  95 20101026-crashdata.csv 4.0b8pre
  53 20101027-crashdata.csv 4.0b8pre
  39 20101028-crashdata.csv 4.0b8pre

but there still are some crashes remaining on recent builds.

looks like its mostly 
_purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&)
and
nsCOMPtr_base::assign_from_qi | nsContentUtils::CanCallerAccess

count build_id      vers   signature

  14 20101027063203 4.0b8pre 
       _purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&)
  13 20101028042244 4.0b8pre 
       _purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&)

   2 20101028030911 4.0b8pre 
       nsCOMPtr_base::assign_from_qi | nsContentUtils::CanCallerAccess  
   2 20101027030747 4.0b8pre 
       nsCOMPtr_base::assign_from_qi | nsContentUtils::CanCallerAccess  


   1 20101028042244 4.0b8pre nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) | nsContentUtils::CanCallerAccess(nsPIDOM
Window*)
   1 20101027063203 4.0b8pre nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) | nsContentUtils::CanCallerAccess(nsPIDOM
Window*)
   1 20101027031021 4.0b8pre nsQueryInterface::operator()    
   1 20101027030816 4.0b8pre nsQueryInterface::operator()    
   1 20101026030724 4.0b8pre 
     nsCOMPtr_base::assign_from_qi | nsContentUtils::CanCallerAccess
Whats the volume here? Can we do this after b7?
looks like it will end up at about 30 crashes per day; that's down from over 100 per day before the patch in comment 19.  

probably good enough for b7, then someone should look at what is left of these signatures

 _purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&)
and
  nsCOMPtr_base::assign_from_qi | nsContentUtils::CanCallerAccess

maybe that's a different bug.
Also confirmed here, when clicking the right border with middle mouse key:
bp-7b22488d-8f6a-424c-ba27-3f13b2101030
Not holding beta7 for this.
blocking2.0: beta7+ → betaN+
This is not related to compartments and existed in b6 and "only" top 14. It doesn't seem we have any resources to investigate this one right now. mrbkap is definitely busy with other stuff. We should move this to b8, except if any other xpconnect expert has free cycles.
208 crashes so far in Beta 7 - http://tinyurl.com/2d93rnw. The exact stack he is getting is in Bug 605017 but was duped to this bug. There is a Twitter user that reports that he has issues with Wordpress crashing.
_purecall | nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) | nsContentUtils::CanCallerAccess(nsPIDOMWindow*)  is the #5 topcrash in early beta7 data and it probably ranks a bit higher if we add up all the other signatures.
If there is a patch and it's been reviewed, can we get this in for beta8?
The reviewed patch has already landed (comment 19), but that only fixed one particular way of triggering this crash / crash-signature, and apparently other way(s) of triggering this still remain.

Perhaps, to minimize confusion, it'd be best to spin off a new version of this bug for the remaining issues, and close this here existing bug as fixed in comment 19?
Whiteboard: [compartments][can land] → [compartments]
Blocks: 612383
Yup, this bug as filed is fixed, marking as such, but opening up new bug 612383 on the remaining crashes with the same signature, but different reasons.
No longer blocks: 612383
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Whiteboard: [compartments] → [compartments][can land]
Whiteboard: [compartments][can land]
Crash Signature: [@ nsQueryInterface::operator() ] [@ nsContentUtils::CanCallerAccess ] [@ nsContentUtils::CanCallerAccess(nsPIDOMWindow*) ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: