Closed Bug 640904 Opened 13 years ago Closed 12 years ago

Crash in JSAutoEnterCompartment::enter

Categories

(Core :: XPConnect, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
firefox12 + ---
firefox13 + ---
firefox14 + fixed
firefox15 + verified
firefox-esr10 14+ wontfix

People

(Reporter: scoobidiver, Assigned: bholley)

References

Details

(Keywords: crash, regression, reproducible)

Crash Data

Attachments

(2 files, 1 obsolete file)

It is a new crash signature in the trunk.
It starts showing up as #59 top crasher in 4.0 RC1.

Signature	JSAutoEnterCompartment::enter(JSContext*, JSObject*)
UUID	3aec6aab-26ad-4f53-b2dd-3ff712110310
Time 	2011-03-10 07:22:19.412776
Uptime	205
Install Age	205 seconds (3.4 minutes) since version was first installed.
Product	Firefox
Version	4.0
Build ID	20110303194838
Branch	2.0
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
CPU	x86
CPU Info	AuthenticAMD family 16 model 6 stepping 2
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
User Comments	chrash when opening google docs
App Notes 	AdapterVendorID: 10de, AdapterDeviceID: 03d0, AdapterDriverVersion: 8.17.12.5896
D2D? D2D-
D3D10 Layers? D3D10 Layers-
D3D9 Layers? D3D9 Layers+

Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	JSAutoEnterCompartment::enter 	js/src/jsapi.cpp:1221
1 	xul.dll 	nsXPConnect::WrapNativeToJSVal 	js/src/xpconnect/src/nsXPConnect.cpp:1323
2 	xul.dll 	nsDocShell::GetInterface 	docshell/base/nsDocShell.cpp:1017
3 	xul.dll 	nsDOMClassInfo::WrapNative 	dom/base/nsDOMClassInfo.h:144
4 		@0x938587f 	
5 	xul.dll 	xpc::holder_get 	js/src/xpconnect/wrappers/XrayWrapper.cpp:194
6 	mozjs.dll 	js::JSProxyHandler::get 	js/src/jsproxy.cpp:139
7 	xul.dll 	xpc::XrayWrapper<JSCrossCompartmentWrapper>::get 	js/src/xpconnect/wrappers/XrayWrapper.cpp:781
8 	mozjs.dll 	js::JSProxy::get 	js/src/jsproxy.cpp:800
9 	mozjs.dll 	js::proxy_GetProperty 	js/src/jsproxy.cpp:917
10 	mozjs.dll 	JSObject::getProperty 	js/src/jsobj.h:1227
11 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4565
12 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:653
13 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:740
14 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:863
15 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5173
16 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1672
17 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:588
18 	xul.dll 	XPCWrappedNative::GetNewOrUsed 	js/src/xpconnect/src/xpcwrappednative.cpp:622
19 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
20 	xul.dll 	nsEventListenerManager::HandleEventSubType 	content/events/src/nsEventListenerManager.cpp:1127

More reports at:
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=JSAutoEnterCompartment%3A%3Aenter%28JSContext*%2C%20JSObject*%29
I'm assuming that this was filed in Editor by mistake.
Assignee: nobody → general
Component: Editor → JavaScript Engine
QA Contact: editor → general
> I'm assuming that this was filed in Editor by mistake.
Frame #2 let me think it could be an editor bug.
(In reply to comment #2)
> > I'm assuming that this was filed in Editor by mistake.
> Frame #2 let me think it could be an editor bug.

nsDocShell::GetInterface?
Assignee: general → nobody
Component: JavaScript Engine → Document Navigation
QA Contact: general → docshell
Crash Signature: [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ]
Stack traces are various.

Mac reports at:
https://crash-stats.mozilla.com/report/list?signature=JSAutoEnterCompartment%3A%3Aenter
Crash Signature: [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ] → [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ][@ JSAutoEnterCompartment::enter ]
Component: Document Navigation → XPConnect
OS: Windows 7 → All
QA Contact: docshell → xpconnect
Summary: Crash in nsDocShell::GetInterface [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ] → Crash in JSAutoEnterCompartment::enter
Crash Signature: [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ][@ JSAutoEnterCompartment::enter ] → [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ] [@ JSAutoEnterCompartment::enter ]
There has been a significant rise of this signature on Nightly since 2011-10-12, this really should be investigated.
Getting some URLs now to see if I can reproduce. One of the Mac reports mentions blackboard.
Here are the Windows correlations from 7.0.1 - nothing really jumps out:

37% (165/451) vs.   4% (4379/97799) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
34% (152/451) vs.   4% (3660/97799) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D}
6% (27/451) vs.   1% (502/97799) support@lastpass.com (LastPass Password Manager, https://addons.mozilla.org/addon/8542)

Some of the 7.0.1 comments mention ebay. The trunk crashes have very few comments to go on.
Keywords: needURLs
I looked at some URLs with the signature JSAutoEnterCompartment..enter from 11/16 and I see some commons sites:

*https://webmail2-gn.ziggo.nl/iwc_static/layout/login.html?lang=en-us&14.01_234924&svcs=abs,mail,c11n (there were a number of users who had crash reports that were using this Webmail, which uses Java)
*Searching Getty Images: http://www.gettyimages.com/Search/Search.aspx?EventId=133279901&EditorialProduct=Entertainment
*Searching Neiman Marcus: 
*searching Finish LIne store: 1 http://www.finishline.com/store/shop/womens/womens-running-shoes/nike/_/N-1sb9kZ31d78Z1z141x6/Ns-P_BestSellers%7C1?categoryId=cat20137&pageTitle=Women%27s+Shoes

Here are some of the other top hits:

31 \N
      6 http://www.comcast.net/
      5 http://xfinity.comcast.net/?
      5 about:blank
      4 http://vkontakte.ru/
      4 http://streamwm.ru/index.php?page=members
      3 javascript:false
      3 http://www.neimanmarcus.com/
      3 http://www.avito.ru/search
      2 wyciwyg://40/https://webmail2-gn.ziggo.nl/iwc_static/layout/main.html?lang=nl&14.01_234924&
      2 http://www.skatehut.co.uk/protection/helmets/triple_8_brain_saver_multi-sport_rubber_black_helmet.htm
      2 http://www.otvetru.com/
      2 http://www.odnoklassniki.ru/dk?st.cmd=userFeedNewsAlternative&st.page=2&st.replaceQuery=true
      2 http://www.odnoklassniki.ru/
      2 http://www.google.ru/
      2 http://www.google.com/
      2 http://www.finishline.com/store/results/index.jsp?N=3000583+5102596
      2 http://www.finishline.com/store/product/puma-tune-cat-toddler-shoe/_/A-64775?categoryId=cat20175&productId=prod671992
      2 http://www.finishline.com/
      2 http://www.ebay.co.uk/sch/i.html?_nkw=christmas+tree&_sacat=0&_odkw=christmas+trees&_osacat=0&_trksid=p3286.c0.m270.l1313
      2 http://www.boc.cn/sourcedb/whpj/
      2 http://vkontakte.ru/feed
      2 http://pages.ebay.com/ebaymotors/monthlypayment.html
      2 http://organisasi.org/taxonomy_menu/2?page=3
      2 http://dict.youdao.com/search?q=heartstirring&keyfrom=dict.plugin#q%3Dheartstirring%26keyfrom%3Ddict.plugin
      2 http://dict.youdao.com/search?q=enhance&keyfrom=dict.plugin#q%3Denhance%26keyfrom%3Ddict.plugin
      2 http://comcast.net/

*Searching ebay.com
Keywords: needURLsqawanted
This is topcrash #21 on 8.* in yesterday's data, with 10 crashes per million ADU.
In the Mac reports I noticed a high correlation - not sure if it is significant but just thought I would mention it.

100% (13/13) vs.  37% (708/1914) PrintCocoaUI
(In reply to comment #10)

That's puzzling.  PrintCocoaUI has to do with printing, but none of the Mac comments over the last week mention printing.

Do you find a similar correlation on Windows?
(Following up comment #11)

On second thought the PrintCocoaUI correlation may just be an accident -- there are only 13 cases.
Keywords: topcrash
There's a correlation in 9.0.1 with Спутник @Mail.Ru and Yandex.Bar:
48% (238/492) vs.   7% (7042/101230) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
44% (216/492) vs.   7% (6651/101230) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D}
Just got the same error, but with different stack trace:
https://crash-stats.mozilla.com/report/index/bp-212028f0-d015-449a-ba5b-c248a2120201
(In reply to gadjo from comment #14)
> Just got the same error, but with different stack trace:
Try with Fx 10.0 that fixed many crashes with Firebug.
In 11.0 Beta where it's #13 top crasher, there's a correlation with Spam Free Search Bar:
  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (130 crashes)
     50% (65/130) vs.   1% (159/30240) {00f12770-e60e-4dc6-9105-425bface7c73}
It's #12 top browser crasher in 11.0, #6 in 12.0, #60 in 13.0a2, and #47 in 14.0a1.

Here are fresh correlations in:
* 11.0:
  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (696 crashes)
     45% (310/696) vs.   0% (574/116167) {00f12770-e60e-4dc6-9105-425bface7c73} (Spam Free Search Bar)
     27% (191/696) vs.   6% (6534/116167) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
     26% (181/696) vs.   6% (6678/116167) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D} (Спутник @Mail.Ru)
     21% (145/696) vs.   6% (6844/116167) ffxtlbr@babylon.com
     10% (73/696) vs.   1% (805/116167) crossriderapp2258@crossrider.com
     11% (77/696) vs.   2% (2201/116167) plugin@yontoo.com
      8% (56/696) vs.   1% (942/116167) ffxtlbr@funmoods.com
      9% (63/696) vs.   3% (3757/116167) toolbar@ask.com
      6% (41/696) vs.   1% (726/116167) {C9B68337-E93A-44EA-94DC-CB300EC06444}

*12.0:
     52% (101/195) vs.   1% (175/25337) {00f12770-e60e-4dc6-9105-425bface7c73} (Spam Free Search Bar)
     21% (40/195) vs.   3% (742/25337) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
     18% (36/195) vs.   4% (892/25337) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D} (Спутник @Mail.Ru)
     13% (25/195) vs.   1% (218/25337) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
     18% (36/195) vs.   7% (1727/25337) ffxtlbr@babylon.com
     14% (27/195) vs.   3% (880/25337) toolbar@ask.com
     12% (24/195) vs.   3% (635/25337) plugin@yontoo.com
     89% (173/195) vs.  79% (20073/25337) testpilot@labs.mozilla.com (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)
      9% (18/195) vs.   2% (632/25337) bbrs_002@blabbers.com
      9% (17/195) vs.   2% (511/25337) {EEE6C361-6118-11DC-9C72-001320C79847}
      8% (16/195) vs.   2% (482/25337) m3ffxtbr@mywebsearch.com
      7% (14/195) vs.   1% (254/25337) ffxtlbr@funmoods.com
This is spiking again, now #3 on 12.*, #8 in 11.* in yesterday's data.
It's #10 top browser crasher in 11.0 and #1 in 12.0b6.

Correlations of the day are similar to the ones in comment 17:
*11.0:
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (32 crashes)
     41% (13/32) vs.   1% (26/3085) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)

  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (955 crashes)
     42% (397/955) vs.   1% (724/130698) {00f12770-e60e-4dc6-9105-425bface7c73} (Spam Free Search Bar 1.0)
     17% (163/955) vs.   1% (695/130698) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
     17% (162/955) vs.   2% (2945/130698) plugin@yontoo.com
     19% (177/955) vs.   6% (7552/130698) ffxtlbr@babylon.com

*12.0:
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (18 crashes)
     89% (16/18) vs.   5% (30/633) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)

  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (730 crashes)
     58% (421/730) vs.   2% (668/36794) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
     27% (200/730) vs.   1% (319/36794) {00f12770-e60e-4dc6-9105-425bface7c73} (Spam Free Search Bar 1.0)
     18% (133/730) vs.   7% (2706/36794) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
     16% (118/730) vs.   7% (2553/36794) ffxtlbr@babylon.com
     11% (80/730) vs.   3% (1063/36794) plugin@yontoo.com
This crash appears to be happening across all versions of Firefox, and has been for a long time. I don't think we need to track for any particular release, unless there's reason to believe this doesn't have to do a particular version instead of 3rd party SW.
One user using 12.0b6 says in bp-e60f1f74-1dbc-42cd-b636-1e82c2120420:
"The new update is very unstable. Firefox keeps crashing every 30 mins or so."

Moreover, in 11.0 it represents 0.78% of browser crashes while in 12.0b6 it's 22%.

So there's a code change in 12.0 that makes it more sensitive to those third-party add-ons.
In 12.0b6, it's more sensitive to Ant.com Toolbar:
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (15 crashes)
     93% (14/15) vs.  19% (18/96) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          7% (1/15) vs.   2% (2/96) 2.4.6.2
         87% (13/15) vs.  17% (16/96) 2.4.6.4
  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (516 crashes)
     93% (478/516) vs.  10% (526/5510) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          0% (1/516) vs.   0% (1/5510) 2.4.5
          6% (29/516) vs.   1% (31/5510) 2.4.6.2
         87% (448/516) vs.   9% (494/5510) 2.4.6.4
OK - we'll get QA on this bug and track in case it warrants chemspilling in the future.
Also including Dave to get his opinion on what may have changed in FF12 that would cause this crash to rise.
(In reply to Alex Keybl [:akeybl] from comment #24)
> Also including Dave to get his opinion on what may have changed in FF12 that
> would cause this crash to rise.

Alex: This really seems to be an XPConnect crash, and it seems fairly unlikely that a JS change affected it very much. It looks like 90%+ of the reports involve the Ant extension. It may just be that more Beta users use that extension.

Bobby/Blake/Peter: I checked this crash out:

 https://crash-stats.mozilla.com/report/index/6cc120e8-c001-41a5-b0ae-1c61b2120424
  
It's crashing right here, with |target == NULL|:

bool
JSAutoEnterCompartment::enter(JSContext *cx, JSObject *target) {
    AssertNoGC(cx);
    JS_ASSERT(state == STATE_UNENTERED);
    if (cx->compartment == target->compartment()) { 

I think they're pretty much all crashing that way (crash address is almost always 0x0). The one I looked at crashes in XPCWrappedNative::GetNewOrUsed, passing NULL by the name of |parent|. It looks like parent comes from either Scope->GetGlobalJSObject() or the PreCreate hook. I don't know if the invariants allow either of those to return NULL, so I'm not sure where the bug would be. Any thoughts?

Everybody: Depending on what the experts say, this might be a TE bug, or it might be a bug in XPConnect. It also occurs to me that we could make JSAutoEnterCompartment::Enter return false if |target == NULL|, which would at least stop this crash. That could stop the crash for now, but it might just hide the bug (depending on how this is supposed to work in XPConnect).
Depends on: 748269
(In reply to David Mandelin from comment #25)
> It may just be that more Beta users use that extension.
In order to know how it's widespread per Fx version, I removed crashes with this crash signature and extension in the overall crashes:
* 11.0: (2269-1826)/(103530-1826)=0.44%
* 12.0: (2660-2492)/(28831-2492)=0.64%
The breakdown is almost the same in 11.0 and 12.0. It doesn't explain that this crasher represents 1.67% of browser crashes in 11.0 and 13.26% in 12.0.
(In reply to David Mandelin from comment #25)
> The one I looked at crashes in XPCWrappedNative::GetNewOrUsed,
> passing NULL by the name of |parent|. It looks like parent comes from either
> Scope->GetGlobalJSObject() or the PreCreate hook.

I went through a bunch of the PreCreate hooks and they take care to not set parent to null, leaving the old value instead. If that's true for all PreCreate hooks it seems like GetGlobalJSObject() might return null here.
(In reply to Peter Van der Beken [:peterv] from comment #27)
> I went through a bunch of the PreCreate hooks and they take care to not set
> parent to null, leaving the old value instead. If that's true for all
> PreCreate hooks it seems like GetGlobalJSObject() might return null here.

The 2 possibilities would be that the scope was initialized with a null global, or that somehow mGlobalJSObject becomes null during its useful lifetime.

Let's consider the first case. We can either create a global and scope at the same time in XPCWrappedNative::WrapNewGlobal, or retroactively create a scope for an existing global in nsXPConnect::InitClasses. The first case definitely isn't the culprit, because we would have already entered the global's compartment and crashed. The second case is deprecated, and used in 3 places (XPCThreadContext, jsd, and sandbox creation). My cursory analysis indicates that none of those can end up with a null global. I'm quite sure about XPCThreadContext and sandboxes, jsd I didn't check quite as closely. Anyway, nothing jumping out at me here.

Now, let's consider the second case. This can only happen if we call SetGlobal on an existing scope, which can happen on XPCWrappedNativeScope::GetNewOrUsed. But if aGlobal were null, FindInJSObjectScope would fail, so that can't happen.

So modulo addons, I don't really see how a global could end up null here. It would be really nice if somebody could figure out a way to reproduce this...
(In reply to Bobby Holley (:bholley) from comment #28)
> So modulo addons, I don't really see how a global could end up null here. It
> would be really nice if somebody could figure out a way to reproduce this...

This is QA's first priority this morning after we release FF12.
There's no correlation with firebug or anything else, right? That would indicate jsd shenanigans...
(In reply to Bobby Holley (:bholley) from comment #30)
> There's no correlation with firebug or anything else, right?
No. It's mainly correlated to downloaders: Ant Tube Downloader, Video DownloadHelper, Easy YouTube Video Downloader, Flash Video Downloader, DownThemAll!, NetVideoHunter Video Downloader, Fast Video Download...
I've been trying with DownThemAll, Easy YouTube Video Downloader and Flash Video Downloader YouTube Downloader.  I have yet to crash.  Kind of testing in the dark here.
(In reply to Tracy Walker [:tracy] from comment #32)
> I've been trying with DownThemAll, Easy YouTube Video Downloader and Flash
> Video Downloader YouTube Downloader.  I have yet to crash.  Kind of testing
> in the dark here.
The highest correlation is with Ant Tube Downloader (89%).
I've been trying Ant Downloader 2.4.6.4 in Firefox 12.0 on Windows 7 64-bit and have yet to get it to crash. I've tried several different things like:
 * using the toolbar
 * using the addon bar
 * popping out video in a tab
 * popping out video in a window
 * downloading videos from Youtube, Vimeo and cbc.ca

Unfortunately, we are flying completely blind here. I think we are approaching the extent of what QA can accomplish without assistance from Engineering. Can we can some assistance in narrowing down the scope of our testing? Is there any particular activities or environment variables we should focus our attention?
I've been trying Ant as well.  No crashes.  I have fully up to date flash.  Note: Ant's help page points to updating flash if Fx users are having trouble with the player.
Yeah, I'm using Flash 11.2. Do we have any correlations to flash versions?
(In reply to Bobby Holley (:bholley) from comment #28)
> (In reply to Peter Van der Beken [:peterv] from comment #27)
> > I went through a bunch of the PreCreate hooks and they take care to not set
> > parent to null, leaving the old value instead. If that's true for all
> > PreCreate hooks it seems like GetGlobalJSObject() might return null here.

Can addons create PreCreate hooks, or is the basic Firefox set all of them?

> So modulo addons, I don't really see how a global could end up null here.

Almost all of these crashes are with addons, specifically Ant. Do you know in what way an addon could cause the global to end up null? Is there some way we can detect it or protect against it?

Failing that, we could just check for NULL either in JSAutoEnterCompartment::enter, or at its call sites. Would that make sense, or would that be dangerous?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #34)
> Can we can some assistance in narrowing down the scope of our
> testing? Is there any particular activities or environment variables we
> should focus our attention?
Read the comments (1293 over four weeks):
"I was sorting my bookmarks and it crashed"
"Was watching a video on youtube and it just stopped and crashed."
"when watching any online movie it automatically crash. http://www.animecrazy.net/area-no-kishi-episode-1/277512"
"Using the ANT add-on to download Youtube video "A Brief History of Time."
"For some reason every time i go to "Comcast.net" it crashes. like almost every time." (support@lastpass.com)

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> Yeah, I'm using Flash 11.2. Do we have any correlations to flash versions?
Flash DLLs are not loaded during the crash.
(In reply to David Mandelin from comment #37)
> (In reply to Bobby Holley (:bholley) from comment #28)
> > (In reply to Peter Van der Beken [:peterv] from comment #27)
> > > I went through a bunch of the PreCreate hooks and they take care to not set
> > > parent to null, leaving the old value instead. If that's true for all
> > > PreCreate hooks it seems like GetGlobalJSObject() might return null here.
> 
> Can addons create PreCreate hooks, or is the basic Firefox set all of them?

They could, but it would be pretty esoteric. I doubt that's the cause.

> 
> > So modulo addons, I don't really see how a global could end up null here.
> 
> Almost all of these crashes are with addons, specifically Ant. Do you know
> in what way an addon could cause the global to end up null? Is there some
> way we can detect it or protect against it? Failing that, we could just
> check for NULL either in
> JSAutoEnterCompartment::enter, or at its call sites. Would that make sense,
> or would that be dangerous?

That seems totally sane, but it adds a little bit more overhead to cross-compartment stuff, which isn't great for compartment-per-global. I'd prefer to just null-check the result of PreCreate here if it comes down to that.

More info: I realized that nsXPConnect::InitClasses enters the global's compartment before doing anything else, so we'd crash earlier if that were this issue. This means that I _think_ that XPCWrappedNativeScope is pretty much always guaranteed to have a global object.

This brings us back to peter's original claim in comment 27 that PreCreate is probably safe. I'm not so convinced. It's true that nobody sets parentObj to null, but there are a number of cases where they set parentObj to a value that comes from elsewhere without checking it directly. Here's the set (all of these live in nsDOMClassInfo.cpp):

nsDOMClassInfo::PreCreate (generic):

>  if (piwin->IsOuterWindow()) {
>    *parentObj = ((nsGlobalWindow *)piwin.get())->
>      GetCurrentInnerWindowInternal()->GetGlobalJSObject();
>  }

nsWindowSH::PreCreate - does a null check, so I think it's safe.

nsDOMConstructor::PreCreate:

*parentObj = win->FastGetGlobalJSObject();

nsLocationSH::PreCreate:

*parentObj = win->FastGetGlobalJSObject();

nsNavigatorSH::PreCreate - null checks, so I think it's safe.

nsNodeSH::PreCreate - uses the results of various wrapping operations, if they succeed. Not the likeliest issue, but possible.

nsEventTargetSH::PreCreate - Null checks.

IDBEventTargetSH::PreCreate - Null checks.

nsElementSH::PreCreate - just invokes nsNodeSH::PreCreate for this stuff.

nsDOMStringMapSH::PreCreate - Null checks.

nsHTMLPluginObjElementSH::PreCreate - just invokes nsElementSH::PreCreate.

nsHistorySH::PreCreate:
> *parentObj = static_cast<nsGlobalWindow*>(innerWindow.get())->FastGetGlobalJSObject();

nsCSSStyleDeclSH::PreCreate:
calls WrapNativeParent

WebGLExtensionSH::PreCreate:
calls WrapNativeParent


So by now, I've got a pretty good guess as to what's going on: one of the preCreate hooks that tries to pull the global off the inner window is somehow getting called after the window has been torn down, and that returns null. We've had security bugs on this in the past (bug 691178), and so we know it's possible for content JS code (and certainly chrome JS code) to cause this to happen.

So I can whip up a patch in the (european) morning that checks win->IsClosedOrClosing() in all of the hooks that pull the object off the window, and make it cause PreCreate to throw. If we want to be extra sure to beat this crash, I could also a MOZ_ASSERT+NULL-check at the PreCreate call site where we're crashing, which isn't ideal but also isn't the end of the world. If we really want to kill this crash, that's the most sure-fire way to do it. Thoughts?
To elaborate on that last point - Null-checking the call-site would definitely fix this crash, but it wouldn't tell us if we correctly identified the underlying problem, the the issue might manifest itself in a different way (and not obviously be connected to this one), and we'd be left with another null check in XPConnect for eternity.

So if this is important to fix ASAP, we should null-check the call site. If we have a few days, I'd obviously prefer to check win->IsClosedOrClosing() first and see if it reduces crash volume, since that would tell us if we correctly identified the issue.
(In reply to Bobby Holley (:bholley) from comment #40)
> To elaborate on that last point - Null-checking the call-site would
> definitely fix this crash, but it wouldn't tell us if we correctly
> identified the underlying problem, the the issue might manifest itself in a
> different way (and not obviously be connected to this one), and we'd be left
> with another null check in XPConnect for eternity.
> 
> So if this is important to fix ASAP, we should null-check the call site. If
> we have a few days, I'd obviously prefer to check win->IsClosedOrClosing()
> first and see if it reduces crash volume, since that would tell us if we
> correctly identified the issue.

akeybl might have a better sense of the urgency, but I don't think he plans to chemspill just yet, so trying to fix it properly seems good. Do we need to notify support so they can tell people about the affected add-ons?

Nick-just want to check-does the "using closed window" part of the cause of the bug suggest the addons are holding onto windows for too long, or creating zombies?
(In reply to David Mandelin from comment #41)
> Nick-just want to check-does the "using closed window" part of the cause of
> the bug suggest the addons are holding onto windows for too long, or
> creating zombies?
khuey's been looking at that recently.
> Nick-just want to check-does the "using closed window" part of the cause of
> the bug suggest the addons are holding onto windows for too long, or
> creating zombies?

Um, not sure, sorry.  CC'ing jlebar just in case he has anything to add.  (jlebar, comment 39 is the most important comment.)
> Nick-just want to check-does the "using closed window" part of the cause of
> the bug suggest the addons are holding onto windows for too long, or
> creating zombies?

  bool nsGlobalWindow::IsClosedOrClosing() {
    return (mIsClosed ||
            mInClose ||
            mHavePendingClose ||
            mCleanedUp);
  }

mCleanedUp is set to true during SetDocShell(null).  The other variables are set during Close() and related methods.  I don't know if Close() happens before or after SetDocShell(null).

If someone is touching a window after it's been SetDocShell(null)'ed, then they're holding onto a window longer than we'd like, yes.  It's not necessarily a zombie-compartment-type leak; for example, a closure could be holding onto the window, so the window will die after one or two trips through the event loop.

Does that help?
(In reply to Bobby Holley (:bholley) from comment #39)
> nsNodeSH::PreCreate - uses the results of various wrapping operations, if
> they succeed. Not the likeliest issue, but possible.

These seem fine to me, because of the checks after the wrapping.

> nsCSSStyleDeclSH::PreCreate:
> calls WrapNativeParent

This should be fine too, because we null-check the parent and the result of the wrapping.

> WebGLExtensionSH::PreCreate:
> calls WrapNativeParent

This one looks unsafer, we only create a WebGLContext in nsHTMLCanvasElement::GetContextHelper. If an addon creates one they have to make sure to set the element. We should probably null-check here before calling WrapNativeParent.


> So I can whip up a patch in the (european) morning that checks
> win->IsClosedOrClosing() in all of the hooks that pull the object off the
> window, and make it cause PreCreate to throw.

Yeah, let's do this.

> If we want to be extra sure to
> beat this crash, I could also a MOZ_ASSERT+NULL-check at the PreCreate call
> site where we're crashing, which isn't ideal but also isn't the end of the
> world. If we really want to kill this crash, that's the most sure-fire way
> to do it. Thoughts?

I'd rather crash if this happens, unless we think the problem is really widespread in addons (which seems unlikely to me).
(In reply to Peter Van der Beken [:peterv] from comment #45)
> I'd rather crash if this happens, unless we think the problem is really
> widespread in addons (which seems unlikely to me).

I mean the problem of addons implementing PreCreate badly by returning a null parent.
Actually, anything calling WrapNativeParent is not the cause of this issue, since we'd crash earlier because of a null nativeWrapperCache when doing |nativeWrapperCache->GetWrapper()|.
Attaching a patch. Flagging peterv for review.
Attachment #618206 - Flags: review?(peterv)
(In reply to Scoobidiver from comment #38)
> (In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> > Yeah, I'm using Flash 11.2. Do we have any correlations to flash versions?
> Flash DLLs are not loaded during the crash.

Just FYI, that's because the browser process crashes, but we load Flash only in plugin processes, so we don't know about Flash versions in browser crashes like this.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #34)
> I've been trying Ant Downloader 2.4.6.4 in Firefox 12.0 on Windows 7 64-bit
> and have yet to get it to crash.

> Can we can some assistance in narrowing down the scope of our
> testing?

Some more data on versions this happens with:

From add-on version correlations for Firefox 12.0 Windows NT:

95% (120/126) vs.  10% (126/1274) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
6% (8/126) vs.   1% (10/1274) 2.4.6.2
89% (112/126) vs.   9% (116/1274) 2.4.6.4

From signature summary of https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=JSAutoEnterCompartment%3A%3Aenter%28JSContext*%2C%20JSObject*%29

Operating System 	Percentage 	Number Of Crashes
Windows 7 	56.974 %	51363
Windows XP 	33.619 %	30308
Windows Vista 	8.916 %	8038


So it looks like the combination you tested with matches the majority of users hitting this.
Comment on attachment 618206 [details] [diff] [review]
Check for null window globals within PreCreate hooks and assert IsClosedOrClosing(). v1

Review of attachment 618206 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsDOMClassInfo.cpp
@@ +1979,5 @@
> +  MOZ_ASSERT(win);
> +  MOZ_ASSERT(win->IsInnerWindow());
> +  *parent = win->FastGetGlobalJSObject();
> +
> +  if (NS_UNLIKELY(!*parent)) {

I've started using MOZ_UNLIKELY.

@@ +5182,5 @@
>    nsGlobalWindow *win = nsGlobalWindow::FromSupports(nativeObj);
>    NS_ASSERTION(win->IsInnerWindow(), "Should be inner window.");
>  
> +  // If we're bootstrapping, we don't have a JS object yet.
> +  if (win->GetOuterWindowInternal()->IsCreatingInnerWindow())

Hmm, this is slightly different logic. This is only equivalent if FastGetGlobalJSObject() always returns null when win->GetOuterWindowInternal()->IsCreatingInnerWindow() is true. Is that the case?
Attachment #618206 - Flags: review?(peterv) → review+
Comment on attachment 618206 [details] [diff] [review]
Check for null window globals within PreCreate hooks and assert IsClosedOrClosing(). v1

Review of attachment 618206 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsDOMClassInfo.cpp
@@ +1973,5 @@
>  }
>  
> +// Helper to handle torn-down inner windows.
> +static inline nsresult
> +SetParentToWindow(nsGlobalWindow *win, JSObject **parent)

* on the left in dom/
Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20120424 Firefox/13.0a2

I managed to crash Firefox several times using the following add-ons:
- Ant Video Downloader 2.4.6.4
- DownloadHelper 4.9.9
- Easy YouTube Video Downloader 5.9
- Fast Video Download (with SearchMenu) 1.4.6

Steps to reproduce:
1. Install all the add-ons listed above
2. Open YouTube
3. Play some videos (I also played with the progress bar).

The crashes are intermittent. 

Reproduced crash on Firefox 12 beta 6 and Aurora 13.
https://crash-stats.mozilla.com/report/index/bp-c07023a1-7d2c-4b09-91de-d53842120425
https://crash-stats.mozilla.com/report/index/bp-1c6f5ae7-60d6-411a-b766-51b792120425
I forgot to mention in the previous comment that I also installed Babylon 9 from:
ffxtlbr@babylon.com

Flash version: 11.2.202.233
(In reply to Bobby Holley (:bholley) from comment #48)
> Created attachment 618206 [details] [diff] [review]
> Check for null window globals within PreCreate hooks and assert
> IsClosedOrClosing(). v1
> 
> Attaching a patch. Flagging peterv for review.

Were there recent changes in the related code in FF12? I'd like to understand if this crash is caused by changes in FF12 or the recent release of a new version of Ant Video Downloader (see [1]), and where the regression lies.

[1] https://addons.mozilla.org/en-US/firefox/addon/video-downloader-player/versions/
I was able to reproduce this crash with Firefox 13.0a2 2012-04-24 after playing some random videos on youtube and clicking around the progress bar. Took about 10 minutes or so before it crashed. So presumably this will happen in 13.0b1.

Now seeing if I can reproduce in Firefox 11.
Unable to reproduce this so far in Firefox 11.
I've been unable to reproduce this with Ant Downloader 2.6.2.3 as compared to Ant Downloader 2.6.2.4. Presumably this is a regression introduced by a change in Firefox 12 and/or Ant Download 2.6.2.4.
Interestingly I am unable to reproduce this crash on the 12.0a1 Nightly builds. I tried 2012-01-01, 2012-01-15, 2012-01-30, and 2012-01-31; after which point we merged to 13.0a1.

I'm not sure where to go from here.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #57)
> Unable to reproduce this so far in Firefox 11.

Interesting, as the signature is also quite high on 11, it's been rising there recently and now is #5 in the 7-day topcrashers of 11.0 release.
(In reply to Peter Van der Beken [:peterv] from comment #51)
> I've started using MOZ_UNLIKELY.

Ok, I'll change it.

> Hmm, this is slightly different logic. This is only equivalent if
> FastGetGlobalJSObject() always returns null when
> win->GetOuterWindowInternal()->IsCreatingInnerWindow() is true. Is that the
> case?

Yeah. It becomes no-null at the tail of nsJSContext::CreateNativeGlobalForInner, right before returning to nsGlobalWindow::SetNewDocument and setting mCreatingInnerWindow to false. There are no intervening calls to PreCreate.
Attaching an updated patch and carrying over review.

I've pushed this to try here:
https://tbpl.mozilla.org/?tree=Try&rev=cab09e2221e2

Inbound is closed right now, and I'm about to head out for the evening. If the try push goes green, someone should push the patch.
Attachment #618206 - Attachment is obsolete: true
Attachment #618369 - Flags: review+
Last one didn't seem to do anything. Trying again: https://tbpl.mozilla.org/?tree=Try&rev=bcf1c37952d1
So, it looks like we actually hit one of my MOZ_ASSERTs in nsWindowSH::PreCreate (on windows, during the file api tests). Windows is crappy about saying which assertion it actually hit, but I'm pretty sure it's the "assert IsClosedOrClosing if we got null" assertion).

This might be an indicator of the underlying bug here (since video downloaders also do stuff with files). But I think the best course of action here is just to convert it to a nonfatal assertion and get the fix landed.

Pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=cc61fef6cb82
Looks green. Pushed to inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/cc0e289b9f34

Lets see if this has an effect on the crash volume.
Assignee: nobody → bobbyholley+bmo
Whiteboard: [Leave open after merge]
Target Milestone: --- → mozilla15
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #59)
> Interestingly I am unable to reproduce this crash on the 12.0a1 Nightly
> builds. I tried 2012-01-01, 2012-01-15, 2012-01-30, and 2012-01-31; after
> which point we merged to 13.0a1.
> I'm not sure where to go from here.
It can be a regression from a change in 13.0a1 back ported in 12.0a2.
Keywords: reproducible
This is #2 in early 12.0 release data. Do we know enough that we can contact the Ant Video Downloader people so they can ship an update that is causing fewer/no crashes?
So yeah, this got backed out. Looks like I mis-diagnosed the file API issue.

Pushed to try again with a null check (r=smaug): https://tbpl.mozilla.org/?tree=Try&rev=6ab8a241d590
Target Milestone: --- → mozilla15
Target Milestone: mozilla15 → ---
Looks green on windows, though that doesn't tell us everything, since the failure was intermittent.

Trying again: http://hg.mozilla.org/integration/mozilla-inbound/rev/712bca8b8674
Target Milestone: --- → mozilla15
I did install the previous version of Ant and Firefox  still crashed but with a different error look:

Firefox 12.0 Crash Report [@ JSAutoEnterCompartment::enter(JSContext*, JSObject*) ]
Search Mozilla Support for Help
ID: 0c6a9e32-78a4-4176-b810-749532120429
Signature: JSAutoEnterCompartment::enter(JSContext*, JSObject*)

    Details
    Modules
    Raw Dump
    Extensions
    Comments
    Correlations

Signature 	JSAutoEnterCompartment::enter(JSContext*, JSObject*) More Reports Search
UUID	0c6a9e32-78a4-4176-b810-749532120429
Date Processed	2012-04-29 22:24:56
Uptime	855
Last Crash	19.3 hours before submission
Install Age	3.1 days since version was first installed.
Install Time	2012-04-26 14:43:32
Product	Firefox
Version	12.0
Build ID	20120420145725
Release Channel	release
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
User Comments	
App Notes 	

AdapterVendorID: 0x8086, AdapterDeviceID: 0x0106, AdapterSubsysID: 00000000, AdapterDriverVersion: 8.15.10.2509

D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 

Processor Notes 	
EMCheckCompatibility	True
Winsock LSP	

MSAFD Tcpip [TCP/IP] : 2 : 1 :  
 MSAFD Tcpip [UDP/IP] : 2 : 2 : %SystemRoot%\system32\mswsock.dll 
 MSAFD Tcpip [RAW/IP] : 2 : 3 :  
 MSAFD Tcpip [TCP/IPv6] : 2 : 1 : %SystemRoot%\system32\mswsock.dll 
 MSAFD Tcpip [UDP/IPv6] : 2 : 2 :  
 MSAFD Tcpip [RAW/IPv6] : 2 : 3 : %SystemRoot%\system32\mswsock.dll 
 RSVP TCPv6 Service Provider : 2 : 1 :  
 RSVP TCP Service Provider : 2 : 1 : %SystemRoot%\system32\mswsock.dll 
 RSVP UDPv6 Service Provider : 2 : 2 :  
 RSVP UDP Service Provider : 2 : 2 : %SystemRoot%\system32\mswsock.dll

Adapter Vendor ID	
Adapter Device ID	
Total Virtual Memory	4294836224
Available Virtual Memory	3221057536
System Memory Use Percentage	64
Available Page File	5470371840
Available Physical Memory	1510686720

Bugzilla - Report this bug in Firefox, Core, Plug-Ins, or Toolkit
Related Bugs

    640904 NEW Crash in JSAutoEnterCompartment::enter
    723894 RESOLVED FIXED Firefox 13.0a1 Crash @ JSAutoEnterCompartment::enter
    705159 RESOLVED INCOMPLETE After crash, session is lost, FF boots into a weird UI, URL bar does not work until new tab is opened

Crashing Thread
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	JSAutoEnterCompartment::enter 	js/src/jsapi.cpp:1415
1 	xul.dll 	XPCWrappedNative::GetNewOrUsed 	js/xpconnect/src/XPCWrappedNative.cpp:453
2 	xul.dll 	XPCConvert::NativeInterface2JSObject 	js/xpconnect/src/XPCConvert.cpp:1056
3 	xul.dll 	XPCConvert::NativeData2JS 	js/xpconnect/src/XPCConvert.cpp:396
4 	xul.dll 	XPCConvert::NativeData2JS 	js/xpconnect/src/xpcprivate.h:3237
5 	xul.dll 	XPC_WN_GetterSetter 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1590
6 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:519
7 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:569
8 	mozjs.dll 	js::InvokeGetterOrSetter 	js/src/jsinterp.cpp:643
9 	mozjs.dll 	js::Shape::get 	js/src/jsscopeinlines.h:287
10 	mozjs.dll 	js::GetPropertyHelper 	js/src/jsobj.cpp:5400
11 	mozjs.dll 	js::GetPropertyOperation 	js/src/jsinterpinlines.h:288
12 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2664
13 	mozjs.dll 	js::ContextStack::pushInvokeFrame 	js/src/vm/Stack.cpp:648
14 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:537
15 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5449
16 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/xpconnect/src/XPCWrappedJSClass.cpp:1510
17 		@0xffffff81 	
18 	KERNELBASE.dll 	GetVersionExA 	
19 	xul.dll 	nsView::GetNearestWidget 	view/src/nsView.cpp:1086
20 	xul.dll 	nsWindow::GetParent 	widget/windows/nsWindow.cpp:1023
21 	xul.dll 	nsLayoutUtils::TranslateWidgetToView 	layout/base/nsLayoutUtils.cpp:1333
22 	xul.dll 	Flush 	content/base/src/nsDocument.cpp:6330
23 	xul.dll 	nsRefPtr<nsIRunnable>::~nsRefPtr<nsIRunnable> 	obj-firefox/dist/include/nsAutoPtr.h:908
24 	xul.dll 	_moz_pixman_region32_intersect 	gfx/cairo/libpixman/src/pixman-region.c:1220
25 	xul.dll 	nsCycleCollectingAutoRefCnt::decr 	obj-firefox/dist/include/nsISupportsImpl.h:211
26 	xul.dll 	nsPresContext::Release 	layout/base/nsPresContext.cpp:365
27 	xul.dll 	nsEventStateManager::PostHandleEvent 	content/events/src/nsEventStateManager.cpp:3038
28 		@0x6649e57 	
29 	xul.dll 	PresShell::HandleEventInternal 	layout/base/nsPresShell.cpp:6700
30 	xul.dll 	nsXULElement::Release 	content/xul/content/src/nsXULElement.cpp:388
31 	xul.dll 	PresShell::PopCurrentEventInfo 	layout/base/nsPresShell.cpp:5612
32 	xul.dll 	PresShell::HandlePositionedEvent 	layout/base/nsPresShell.cpp:6337
33 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/xpconnect/src/XPCWrappedJS.cpp:617
34 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:117
35 	xul.dll 	nsObserverList::NotifyObservers 	xpcom/ds/nsObserverList.cpp:130
36 	xul.dll 	nsObserverService::NotifyObservers 	xpcom/ds/nsObserverService.cpp:182
37 	nspr4.dll 	PR_GetHostByAddr 	nsprpub/pr/src/misc/prnetdb.c:1171
38 	xul.dll 	nsHttpChannel::ProcessResponse 	netwerk/protocol/http/nsHttpChannel.cpp:995
39 	xul.dll 	nsHttpChannel::OnStartRequest 	netwerk/protocol/http/nsHttpChannel.cpp:4164
40 	xul.dll 	nsInputStreamPump::OnStateStart 	netwerk/base/src/nsInputStreamPump.cpp:444
41 	xul.dll 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:399
42 	xul.dll 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:114
43 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
44 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
45 	xul.dll 	xul.dll@0xb6d5e3 	
46 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
47 	xul.dll 	_SEH_epilog4 	
48 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
49 	xul.dll 	nsSVGPathGeometryFrame::Render 	layout/svg/base/src/nsSVGPathGeometryFrame.cpp:469
50 		@0x65726965 	
51 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:220
52 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3537
53 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:107
54 	firefox.exe 	firefox.exe@0x4033 	
55 	firefox.exe 	__tmainCRTStartup 	crtexe.c:594
56 	firefox.exe 	_SEH_epilog4 	
57 	kernel32.dll 	BaseThreadInitThunk 	
58 	ntdll.dll 	__RtlUserThreadStart 	
59 	kernel32.dll 	LoadStringByReference 	
60 	kernel32.dll 	LoadStringByReference 	

What could be happening?
(In reply to PunkMaister from comment #73)
> What could be happening?
Can you test Nightly (http://nightly.mozilla.org/) that is going to released today around 14H GMT to see if you still crash with this signature? Thanks in advance for you help.
For anybody that missed it, we're tracking the blocklisting of the latest ANT Downloader version in bug 748269.
I'm the maintainer of the Ant Downloader add-on.

My understanding from reading the bug so far is that the highest correlation of crashes is on Firefox 12 with Ant Downloader 2.6.4 (currently blocked, 2.4.6.5 is public and is using code based on a previous version that doesn't crash).

So far I have been unable to reproduce, on Mac and Windows.

I've looked at the diff between the crash and non-crash version of the add-on and it mostly looks innocious. One thing that stands out is the last fix for bug 731100.

http://pastebin.mozilla.org/1573117

Basically what is happening is that an object is assigned to a listner passed into a Firefox notification. This object contains some document/window references. When the notification is torn down, the object is assigned to null.

If anyone has any solid steps to repro, or some ideas on how I can track this down, let me know it would be much appreciated.
Comment 53 had the reproducible case which I was able to confirm, though it takes sometimes 10-15 minutes to reproduce. I was able to reproduce that case pretty reliably with Firefox 13.0a2 2012-04-24 and Ant 2.4.6.4.
(In reply to Simona B [QA] from comment #53)
 > I managed to crash Firefox several times using the following add-ons:
> - Ant Video Downloader 2.4.6.4
> - DownloadHelper 4.9.9
> - Easy YouTube Video Downloader 5.9
> - Fast Video Download (with SearchMenu) 1.4.6

Has anyone been able to reproduce with *only Ant* installed? If it is a combination of different downlaoder add-ons, we are getting into murkier waters.
Did the fix I pushed solve the problem? If so, we should try to get it landed everywhere.
See https://support.mozilla.org/en-US/questions/925877
It looks like Video Downloader was the culprit.
(In reply to Bobby Holley (:bholley) from comment #79)
> Did the fix I pushed solve the problem? If so, we should try to get it
> landed everywhere.

Given there is a Firefox fix, would it be fair to classify this as a Firefox bug?

I am actively trying to track down the add-on code triggering it, but still shooting in the dark as I can't reproduce (see comment 76).
(In reply to Bobby Holley (:bholley) from comment #79)
> Did the fix I pushed solve the problem? If so, we should try to get it
> landed everywhere.

Looks like https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A15.0a1&range_value=4&range_unit=weeks&signature=JSAutoEnterCompartment%3A%3Aenter%28JSContext*%2C%20JSObject*%29 doesn't show a drop on the 30th as I would expect from the checkin. :(
FYI...

I have uploaded a new version of the Ant Downloader, 2.4.7, which in our tests at least fixes the crash issue:

https://addons.mozilla.org/en-US/firefox/addon/video-downloader-player/versions/
This hasn't dropped at all on Fx12. At what point can we determine this was fixed?
(In reply to Sheila Mooney from comment #84)
> This hasn't dropped at all on Fx12. At what point can we determine this was
> fixed?

Nothing was ever landed on Fx12, right?
(In reply to Sheila Mooney from comment #84)
> This hasn't dropped at all on Fx12. At what point can we determine this was
> fixed?

It is fixed in the add-on (see comment 83), but the new version hasn't been approved by AMO yet. I'll nudge them.
Ant Video Downloader 2.4.7 is now released on AMO.
(In reply to Scoobidiver from comment #87)
> Ant Video Downloader 2.4.7 is now released on AMO.

Let us now if the crash numbers drop.
Brian: I seem to still be hitting the crash using Firefox 12 on Mac 10.8 - https://crash-stats.mozilla.com/report/index/bp-a6a278fb-5890-4a1b-8819-ae6302120516 is one of my crashes.

I am using Version 2.4.7 of the addon.
I am not sure version 2.4.7 fix some crashes based on the crash ratio (8.36% over the last day versus 8.65% over the last week) and the latest correlations:
*Linux:
  JSAutoEnterCompartment::enter|SIGSEGV (54 crashes)
     96% (52/54) vs.   6% (67/1196) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          2% (1/54) vs.   0% (4/1196) 2.4.6.4
         19% (10/54) vs.   1% (14/1196) 2.4.6.5
         76% (41/54) vs.   4% (49/1196) 2.4.7
*Mac:
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (398 crashes)
     93% (370/398) vs.  14% (472/3408) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          5% (20/398) vs.   1% (20/3408) 2.4.6.2
          6% (23/398) vs.   1% (29/3408) 2.4.6.4
         29% (115/398) vs.   4% (143/3408) 2.4.6.5
         53% (210/398) vs.   8% (275/3408) 2.4.7
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE (106 crashes)
     88% (93/106) vs.  14% (472/3408) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          2% (2/106) vs.   0% (4/3408) 2.4.5
          5% (5/106) vs.   1% (29/3408) 2.4.6.4
         24% (25/106) vs.   4% (143/3408) 2.4.6.5
         58% (61/106) vs.   8% (275/3408) 2.4.7
*Windows:
  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (10063 crashes)
     86% (8668/10063) vs.   7% (9218/141435) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          0% (1/10063) vs.   0% (1/141435) 2.4.2rc1
          0% (1/10063) vs.   0% (1/141435) 2.4.3
          0% (1/10063) vs.   0% (1/141435) 2.4.4
          0% (7/10063) vs.   0% (7/141435) 2.4.5
          1% (118/10063) vs.   0% (129/141435) 2.4.6.2
          0% (2/10063) vs.   0% (2/141435) 2.4.6.3
          2% (170/10063) vs.   0% (184/141435) 2.4.6.4
         19% (1958/10063) vs.   1% (2067/141435) 2.4.6.5
         64% (6410/10063) vs.   5% (6826/141435) 2.4.7
A(In reply to Scoobidiver from comment #90)
> I am not sure version 2.4.7 fix some crashes based on the crash ratio

And what about with the build containing the fix checked in from comment 70? Any changes there.

BTW I don't really know how to read this data. Can you break it down in simple terms for me?

We will continue to test more and try and identify the code. With 2.4.7 we have been unable to reproduce but obviosly some of our users have reported it.
No longer depends on: 748269
Depends on: 748269
A few days after the release of version 2.4.7, crash stats are better and it's now correlated to Ant Tube Downloader at a lower percentage.
Indeed, it's #8 top browser crasher (was #2 a few days ago), #10 on Mac OS X in 12.0 over the last day.

On Windows, it's now correlated to Spam Free Search Bar:
* Mac OS X:
  JSAutoEnterCompartment::enter|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (18 crashes)
     28% (5/18) vs.   1% (25/2194) support@lastpass.com (LastPass Password Manager, https://addons.mozilla.org/addon/8542) (1.90.6)
     28% (5/18) vs.   3% (65/2194) {c0c9a2c7-2e5c-4447-bc53-97718bc91e1b} (Easy YouTube Video Downloader, https://addons.mozilla.org/addon/10137)
         11% (2/18) vs.   1% (16/2194) 5.9
         17% (3/18) vs.   2% (49/2194) 6.0
     22% (4/18) vs.   1% (25/2194) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          6% (1/18) vs.   0% (1/2194) 2.4.6.2
          0% (0/18) vs.   0% (1/2194) 2.4.6.4
          6% (1/18) vs.   0% (4/2194) 2.4.6.5
         11% (2/18) vs.   1% (19/2194) 2.4.7

* Windows:
  JSAutoEnterCompartment::enter(JSContext*, JSObject*)|EXCEPTION_ACCESS_VIOLATION_READ (1135 crashes)
     43% (487/1135) vs.   1% (881/121229) {00f12770-e60e-4dc6-9105-425bface7c73} (Spam Free Search Bar 1.0)
     27% (311/1135) vs.   5% (5728/121229) {37964A3C-4EE8-47b1-8321-34DE2C39BA4D} (Спутник @Mail.Ru)
          1% (6/1135) vs.   0% (87/121229) 2.5.2.32
          0% (1/1135) vs.   0% (14/121229) 2.5.2.42
          0% (2/1135) vs.   0% (23/121229) 2.5.2.54
         27% (301/1135) vs.   5% (5537/121229) 2.5.2.66
          0% (1/1135) vs.   0% (38/121229) 2.5.2.67
     27% (309/1135) vs.   5% (6427/121229) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
         27% (303/1135) vs.   4% (4766/121229) 6.5.0
          0% (2/1135) vs.   0% (47/121229) 6.5.1
          0% (1/1135) vs.   0% (250/121229) 6.7
          0% (3/1135) vs.   1% (1181/121229) 6.8
     20% (223/1135) vs.   3% (3869/121229) plugin@yontoo.com (1.20.00)
     21% (233/1135) vs.   6% (7855/121229) ffxtlbr@babylon.com
          0% (3/1135) vs.   0% (39/121229) 1.1.8
          9% (106/1135) vs.   4% (4924/121229) 1.1.9
         11% (123/1135) vs.   2% (2868/121229) 1.2.0
          0% (1/1135) vs.   0% (1/121229) 1.5.0
     14% (155/1135) vs.   1% (677/121229) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
          0% (2/1135) vs.   0% (13/121229) 2.4.6.2
          0% (2/1135) vs.   0% (11/121229) 2.4.6.4
          2% (19/1135) vs.   0% (131/121229) 2.4.6.5
         12% (132/1135) vs.   0% (521/121229) 2.4.7
Whoops, I accidentally removed the tracking-firefox13 flag. Alex, can you please set it again?
Unable to reproduce any crashes so far. I can't find any add-on specifically called "Spam Free Search Bar". Searches for it seem to imply Blekko as the vendor. 

I did find the following Blekko add-ons:
 * Blekko 1.0 (on AMO)
 * Blekko Search Plugin 0.6 (on AMO)
 * Blekko Search Bar 1.5.18.12 (http://blekko.com/about/blekkogear)

So far I've been unable to crash in Firefox 12 on Windows 7. I've been playing around with different search functionality and page previews. The toolbar has some Twitter and Facebook integration which I've tried but have not been able to reproduce a crash.

Are there any specific leads QA can follow?
It looks like Blekko is, in fact, the vendor:
http://help.blekko.com/index.php/how-do-i-uninstall-search-bar/

Maybe Kev has contacts there who can help us with this?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #94)
> I can't find any add-on specifically called "Spam Free Search Bar".
You found it more than one month ago. See bug 741179 comment 2.
(In reply to Scoobidiver from comment #96)
> (In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #94)
> > I can't find any add-on specifically called "Spam Free Search Bar".
> You found it more than one month ago. See bug 741179 comment 2.

Hmm...I guess I should have taken note of where I found it then, because I can't find it now.
Is it possible they updated this and renamed it to "Blekko Search Bar"?

The only other company I've been able to find with a product called "Spam Free Search Bar" is Visicom Media Inc. but I've been unable to find a download for the add-on.
bp-362c4af8-aed8-45e7-9c03-1b7072120525
http://hg.mozilla.org/mozilla-central/rev/f43e8d300f21
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/15.0 Firefox/15.0a1 ID:20120524030520

STR:
1. Start Firefox with clean profile
2. Install Secure Login 0.9.9( https://addons.mozilla.org/eu/firefox/addon/secure-login/ )and Restart
3. Open Option setting of Secure Login
     Check 1st.2nd.3rd.4th checkbox in Main tab, (unchecked 5th, 6th checkbox)
     Check 1st.2nd.4th, 5th checkbox in Interfae tab, (unchecked 3rd checkbox)
4. Restart
5. Open http://dict.youdao.com/search?q=heartstirring&keyfrom=dict.plugin#q%3Dheartstirring%26keyfrom%3Ddict.plugin

Actual:
  Crashes
Regression window(mozilla-central) with STR in comment #98
Not crash:
http://hg.mozilla.org/mozilla-central/rev/b57524d6706f
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110211 Firefox/4.0b12pre ID:20110211093147
Crash:
http://hg.mozilla.org/mozilla-central/rev/1ae724a3a546
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110211 Firefox/4.0b12pre ID:20110211100318
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b57524d6706f&tochange=1ae724a3a546

Regression window(tracemonkey)
Not crash:
http://hg.mozilla.org/tracemonkey/rev/85610aaf53cc
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110209 Firefox/4.0b12pre ID:20110209152220
Crash:
http://hg.mozilla.org/tracemonkey/rev/2d60ba7e95a1
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b12pre) Gecko/20110207 Firefox/4.0b12pre ID:20110210134023
Pushlog:
http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=85610aaf53cc&tochange=2d60ba7e95a1
Hardware: x86 → All
Version: Trunk → 2.0 Branch
On Local build 
First bad; 02be97f9ef0d	Luke Wagner — Bug 627954, part 2: ensure nsXPCConvert::VariantData2JS et al are in the correct compartment (r=mrbkap)
Last Good: d518bc36d7b4	Luke Wagner — Bug 627954, part 1: take out unnecessary compartment hack (r=gal)
Hardware: All → x86
Version: 2.0 Branch → Trunk
Assignee: bobbyholley+bmo → luke
Great job finding STR!  Wow, the csets in comment 100 are from FF4, fixing bugs where we weren't even close to using the right compartment to create wrappers.

Fortunately, the cause is really simple: apparently nsGlobal::GetGlobalJSObject() can return NULL (other places in nsGlobalWindow.cpp and nsDOMClassInfo.cpp check for this possibility) and this is unchecked when calling WrapNative in nsWindowSH::GetProperty.  There is an NS_ASSERTION in WrapNativeToJSVal that seems to imply that the caller is wrong for doing this.

I'm not sure what the right "fix" is.  Can nsWindowSH::GetProperty just test and early exit like other methods?  I don't understand this part of the code at all, so I'm not really the right person to make a judgment.

Perhaps jst or peterv knows the right fix?
Assignee: luke → nobody
Thanks Alice. I confirm that the test in comment 98 is reproducible.
Whiteboard: [Leave open after merge] → [Leave open after merge][qa+]
If someone can debug and find out why the EnsureInnerWindow just above the call to GetGlobalJSObject is failing, that'd be a huge help here.
Bobby, can you dig in here? Luke pretty much explains what the problem is, we just need to iron out what the fix is, and where to put it (and also verify how this is happening per mrbkap's comment). Seems to me that throwing exceptions is fine in the case where a closing window is being accessed after having been torn down enough to not have a global object any more, so we should audit the cases where that can happen.
Assignee: nobody → bobbyholley+bmo
(In reply to Blake Kaplan (:mrbkap) from comment #103)
> If someone can debug and find out why the EnsureInnerWindow just above the
> call to GetGlobalJSObject is failing, that'd be a huge help here.

Here's the stack: http://pastebin.mozilla.org/1656020

What appears to be happening is that we run a bunch of script in nsDocShell::Destroy, and it eventually manages to trigger nsWindowSH::GetProperty.
* nsWindowSH::GetProperty calls nsGlobalWindow::EnsureInnerWindow
* which calls nsGlobalWindow::GetDocument to force window creation
* which grabs the docshell and does GetInterface(nsIDocument)
* which calls EnsureContentViewer
* which throws NS_ERROR_FAILURE if mIsBeingDestroyed is true

The existence of mIsBeingDestroyed suggests that this is a known phenomenon, so presumably the answer here is to null-check after EnsureInnerWindow(). I'll write a patch to do that.
Attaching a patch. Flagging mrbkap for review.
Attachment #630133 - Flags: review?(mrbkap)
Comment on attachment 630133 [details] [diff] [review]
Null-check after EnsureInnerWindow in nsWindowSH::GetProperty. v1

Review of attachment 630133 [details] [diff] [review]:
-----------------------------------------------------------------

We have too many notifications.
Attachment #630133 - Flags: review?(mrbkap) → review+
Pushed to m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/793d4912247b

Let's see what it does to crash volume.
https://hg.mozilla.org/mozilla-central/rev/793d4912247b
Target Milestone: mozilla15 → mozilla16
KaiRo, Scoobidiver - did this have an effect on crash volume? If it did, we should get it landed on the branches.
(In reply to Bobby Holley (:bholley) from comment #110)
> KaiRo, Scoobidiver - did this have an effect on crash volume? If it did, we
> should get it landed on the branches.
Before the patch landed, there were about 3 crashes per build. After there have been one crash related to DOM worker (bp-02ad6236-acbc-4702-b327-c8fa52120611) and another crash on Mac (bp-73b1fe1c-7993-4fd9-a81e-e617c2120614) for 7 consecutive builds.
For me, it's fixed.

It's #61 top browser crasher in 13.0, #93 in 14.0b6 and #59 in 15.0a2.
Status: NEW → RESOLVED
Closed: 12 years ago
Keywords: topcrash
Resolution: --- → FIXED
Comment on attachment 630133 [details] [diff] [review]
Null-check after EnsureInnerWindow in nsWindowSH::GetProperty. v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): long ago
User impact if declined: topcrasher
Testing completed (on m-c, etc.): Fixed on m-c. See comment 111
Risk to taking this patch (and alternatives if risky): Very low risk - just a null check.
String or UUID changes made by this patch: None.
Attachment #630133 - Flags: approval-mozilla-beta?
Attachment #630133 - Flags: approval-mozilla-aurora?
Comment on attachment 630133 [details] [diff] [review]
Null-check after EnsureInnerWindow in nsWindowSH::GetProperty. v1

[Triage Comment]
Null check - let's take on both Aurora 15 and Beta 14.
Attachment #630133 - Flags: approval-mozilla-beta?
Attachment #630133 - Flags: approval-mozilla-beta+
Attachment #630133 - Flags: approval-mozilla-aurora?
Attachment #630133 - Flags: approval-mozilla-aurora+
Would be great to see this landed today so it can go into our third beta of this cycle, which goes to build tomorrow.
Optimistically marking this fixed on branch given that it landed. Let me know if the crash volume doesn't go down.
Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0

While verifying this fix I encountered a crash on Firefox 14 beta 8.
The crash signature is the same but the stack seems to be different:
https://crash-stats.mozilla.com/report/index/6b38fb79-e24b-4bc3-b46f-b32bd2120628

Is the above crash affiliated with this bug?
It's #7 top browser crasher in 10.0.5 ESR over the last four weeks.
please land this today so it can be in the upcoming esr release.
(In reply to Lukas Blakk [:lsblakk] from comment #122)
> please land this today so it can be in the upcoming esr release.

sorry, my comment was premature, now i see there's no nomination yet. if this patch is able to land cleanly on ESR please nominate and ping me for approval to land.  since it's a crash i'm not wouldn't hold go to build on this fix landing, but we if we can get it landed before tomorrow morning pacific then it would be in the ESR that ships with 14 and that's a nice win, stability-wise.
I'm in europe, so I didn't get this in time.
Depends on: 778424
Depends on: 777875
This caused bug 777875 (and the two clones bug 778424 and bug 781078, due to running out of room in the summary - which combined are the current #1 toporange on trunk and have double the failures of the #2 orange.

I don't suppose you could take a look at them when you get back? :-)
(In reply to Ed Morley [:edmorley] from comment #125)
> This caused bug 777875 (and the two clones bug 778424 and bug 781078

Hm, really? It seems to me that those oranges were first in late july, over 3 months after this bug landed that assertion (comment 72). Can we correlate the regression range of what actually caused us to start hitting it?
Keywords: verifyme
Bobby, can you please take a look at the crashes mentioned in the second part of comment 127? I need to know if they mean this bug still reproduces or I can mark it as verified. Thanks
(In reply to Ioana Budnar [QA] from comment #128)
> Bobby, can you please take a look at the crashes mentioned in the second
> part of comment 127? I need to know if they mean this bug still reproduces
> or I can mark it as verified. Thanks

From the crash stack, that appears to be a different bug.
Marking this bug as verified per the above comments...
Keywords: verifyme
QA Contact: ioana.budnar
Whiteboard: [Leave open after merge][qa+] → [Leave open after merge]
This is not an easy bug to write a crashtest for, and I'm unlikely to ever do it.
Flags: in-testsuite? → in-testsuite-
Whiteboard: [Leave open after merge]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: