Closed Bug 637304 Opened 9 years ago Closed 6 years ago

Crash @ js::Shape::getChild

Categories

(Core :: JavaScript Engine, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

It is a new crash signature in the trunk.
It is currently #148 top crasher in 4.0b12.

Signature	js::Shape::getChild
UUID	ef67a20d-9457-4d03-8427-9b5322110227
Time 	2011-02-27 20:17:01.947168
Uptime	43189
Last Crash	463215 seconds (5.4 days) before submission
Install Age	43189 seconds (12.0 hours) since version was first installed.
Product	Firefox
Version	4.0b13pre
Build ID	20110227030400
Branch	2.0
OS	Mac OS X
OS Version	10.6.6 10J567
CPU	amd64
CPU Info	family 6 model 37 stepping 2
Crash Reason	EXC_BAD_ACCESS / 0x0000000d
Crash Address	0x0
App Notes 	Renderers: 0x22600,0x20400

Frame 	Module 	Signature [Expand] 	Source
0 	XUL 	js::Shape::getChild 	js/src/jsscope.h:632
1 	XUL 	js::Bindings::add 	js/src/jsscript.cpp:159
2 	XUL 	CompileUCFunctionForPrincipalsCommon 	js/src/jsscript.h:247
3 	XUL 	JS_CompileUCFunctionForPrincipalsVersion 	js/src/jsapi.cpp:4912
4 	XUL 	nsJSContext::CompileEventHandler 	dom/base/nsJSEnvironment.cpp:1756
5 	XUL 	nsEventListenerManager::AddScriptEventListener 	content/events/src/nsEventListenerManager.cpp:836
6 	XUL 	nsGenericElement::AddScriptEventListener 	content/base/src/nsGenericElement.cpp:4558
7 	XUL 	nsGenericHTMLElement::AfterSetAttr 	content/html/content/src/nsGenericHTMLElement.cpp:1136
8 	XUL 	nsGenericElement::SetAttrAndNotify 	content/base/src/nsGenericElement.cpp:4777
9 	XUL 	nsGenericElement::SetAttr 	content/base/src/nsGenericElement.cpp:4667
10 	XUL 	nsGenericHTMLElement::SetAttr 	content/html/content/src/nsGenericHTMLElement.cpp:1210
11 	XUL 	nsHtml5TreeOperation::Perform 	parser/html/nsHtml5TreeOperation.cpp:505
12 	XUL 	nsHtml5TreeOpExecutor::RunFlushLoop 	parser/html/nsHtml5TreeOpExecutor.cpp:489
13 	XUL 	nsHtml5ExecutorFlusher::Run 	parser/html/nsHtml5StreamParser.cpp:153
14 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:633
15 	XUL 	NS_ProcessPendingEvents_P 	nsThreadUtils.cpp:200
16 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:132
17 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:399
18 	CoreFoundation 	CoreFoundation@0x4e400 	
19 	CoreFoundation 	CoreFoundation@0x4c5f8 

More reports at:
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=js%3A%3AShape%3A%3AgetChild
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=js%3A%3AShape%3A%3AgetChild%28JSContext*%2C%20js%3A%3AShape%20const%26%2C%20js%3A%3AShape**%29
It is now #10 top crasher in 4.0 over the last 3 days.

Windows 4.0 correlations by module give:
  js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**)|EXCEPTION_ACCESS_VIOLATION_READ (2599 crashes)
     80% (2067/2599) vs.   2% (7590/388604) kikin_4_0.dll
Depends on: 644712
Keywords: topcrash
Summary: Crash [@ js::Shape::getChild ][@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] → Crash [@ js::Shape::getChild ][@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] mainly with Kikin or Firebug
Hello, I'm the developer of the Firefox extension at kikin. I'm looking for information that help me fix this problem. Since kikin is not in the crash call stack it is difficult for me to know what I'm doing wrong exactly. kikin works fine in Firefox 3.6 but I suspect I'm not doing something FF 4 requires. 

If me providing a description of what the extension is trying to do, or source code, would help in figuring out the problem please let me know. 

Any information would be appreciated. Thank you in advance.
Summary: Crash [@ js::Shape::getChild ][@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] mainly with Kikin or Firebug → Crash [@ js::Shape::getChild ][@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] mainly with Kikin (Windows) or Firebug (Mac)
I believe the Kikin add-on injects content scripts from chrome. I'm not sure if there's various callbacks going between the two. There's been a lot of changes to the JS engine since 3.6, so figuring out which part of the engine is leading to this might be useful by turning off various parts from about:config.

Would it be possible to find a regression range without recompiling kikin_4_0.dll assuming we find reliable STR?
We set the following flags to false in about:config:

javascript.options.tracejit.content
javascript.options.methodjit.content
javascript.options.tracejit.chrome

And reproduced the crash (https://crash-stats.mozilla.com/report/index/bp-33225302-b6ab-41c8-b00d-bc7902110328). I'm not sure whether it offers any new information though :(.
I have more reports generated in OSX with the above jit flags set to false. In some of these the crash location is different but also inside JS compiler/parser. It all seems to start at EvaluateUCScriptForPrincipalsCommon.

https://crash-stats.mozilla.com/report/index/bp-1467933e-e806-4b77-b5a5-970b02110328
http://crash-stats.mozilla.com/report/index/bp-5b2748f7-caaa-461b-8e12-674702110328
https://crash-stats.mozilla.com/report/index/bp-7bebdb06-3141-4974-a691-3e76b2110328
https://crash-stats.mozilla.com/report/index/bp-cc29011c-ca6a-4051-87ff-331452110329
https://crash-stats.mozilla.com/report/index/e6bc0387-bfea-481f-8739-299aa2110329
https://crash-stats.mozilla.com/report/index/55aa6f6a-0575-4830-846a-2cf492110329

In one of these the kikin binary is part of the call stack. Is there a way to submit my symbols to crash-stats so they are included when generating the report, or is there a way to download the raw dump to process it against my own symbols?

The kikin add-on does inject scripts from the C++ component running in chrome into browser elements of type "content". The script in "content" can set callbacks that the C++ component will invoke at a later time, but we save the JSContext* for each callback and push this context before invoking the callback.
(In reply to comment #5)
> In one of these the kikin binary is part of the call stack. Is there a way to
> submit my symbols to crash-stats so they are included when generating the
> report, or is there a way to download the raw dump to process it against my own
> symbols?

If you email me, I can help you get symbols for your binary onto our crash-stats system. I'd also be happy to give you a stack for that crash if you provide me with a PDB file that matches the binary in the crash report. Unfortunately we can't provide minidumps to non-employees for privacy reasons.
Oh, these are on OS X, so s/PDB/dSYM/.
(In reply to comment #5)
> I have more reports generated in OSX with the above jit flags set to false. In
> some of these the crash location is different but also inside JS
> compiler/parser. It all seems to start at EvaluateUCScriptForPrincipalsCommon.
> 
> https://crash-stats.mozilla.com/report/index/bp-1467933e-e806-4b77-b5a5-970b02110328
> http://crash-stats.mozilla.com/report/index/bp-5b2748f7-caaa-461b-8e12-674702110328
> https://crash-stats.mozilla.com/report/index/bp-7bebdb06-3141-4974-a691-3e76b2110328
> https://crash-stats.mozilla.com/report/index/bp-cc29011c-ca6a-4051-87ff-331452110329
> https://crash-stats.mozilla.com/report/index/e6bc0387-bfea-481f-8739-299aa2110329
> https://crash-stats.mozilla.com/report/index/55aa6f6a-0575-4830-846a-2cf492110329
> 
> In one of these the kikin binary is part of the call stack. Is there a way to
> submit my symbols to crash-stats so they are included when generating the
> report, or is there a way to download the raw dump to process it against my own
> symbols?
> 
> The kikin add-on does inject scripts from the C++ component running in chrome
> into browser elements of type "content". The script in "content" can set
> callbacks that the C++ component will invoke at a later time, but we save the
> JSContext* for each callback and push this context before invoking the
> callback.

It sounds like this might be generating violations of the new Compartment model. I can't say for sure without more detail on what you're doing, but it sounds kind of likely.

With compartments, each JavaScript heap value (object, function/script, or string) belongs to a certain compartment. A compartment is roughly identified by its global object. JS/JSAPI are not allowed to call directly from one compartment to another: they must use the compartment-switching APIs JS_EnterCrossCompartmentCall/JS_LeaveCrossCompartmentCall. This is not explicitly enforced, but things will tend to crash if those rules are violated. 

Content is in separate comparments (roughly, one per window/iframe) from chrome. So if you have a chrome function calling back a content function, or vice versa, it needs to switch compartments. I suspect that the places where you now switch to a different context might be the places where you need to switch compartments as well.
I'll check the compartment idea ASAP. In the meantime, this is the way I handle callbacks:

------------------------------
When the callback is specified from JavaScript (content) to C++ (running in Chrome):
------------------------------
Let cx be the JSContext of the incoming call. 

JS_BeginRequest(cx)
JS_AddValueRoot(cx, jsVal)
JS_EndRequest(cx)

if (JS_GetOptions(cx) & JSOPTION_PRIVATE_IS_NSISUPPORTS)
    static_cast<nsISupports*>(JS_GetContextPrivate(cx))->AddRef();

Save cx and jsval for invoking later.

------------------------------
At invocation time:
------------------------------
Let cx be the JSContext received when the callback was specified
Let func be the jsval of the callback
Let argv be the 
Let argc be the number of elements in argv

Get nsIJSContextStack from do_GetService("@mozilla.org/js/xpc/ContextStack;1") -> contextStack
contextStack->Push(cx)
JSObject* globalObject = JS_GetGlobalObject(cx)
JS_CallFunctionValue(cx, globalObject, jsval, argc, argv, &retval);
stack->Pop(cx);
--------------------------
To continue my last post. This is the way our chrome C++ code evals a piece of JavaScript in the context of a page. I'll look into adding JS_EnterCrossCompartmentCall/JS_LeaveCrossCompartmentCall here and when invoking callbacks:

--------------------------------------
Let cx be the JSContext of the page where we will evaluate the script
--------------------------------------

Get nsIScriptSecurityManager instance -> securityManager
Call securityManager->GetPrincipalFromContext(cx) to get nsIPrincipal -> principal
Call principal->GetJSPrincipals(cx) to get JSPrincipals* jsPrincipals

Get nsIJSContextStack from do_GetService("@mozilla.org/js/xpc/ContextStack;1") -> contextStack
contextStack->Push(cx)

Get nsIScriptGlobalObject from cx (via nsIDOMWindowInternal -> nsPIDOMWindow -> nsIScriptGlobalObject)
Get JSObject* from sgo_inner->GetGlobalJSObject() -> object

JS_BeginRequest(cx)
JS_EvaluateUCScriptForPrincipals(cx, object, jsPrincipals, script)
JS_EndRequest(cx)

stack->Pop(cx);
(void)JSPRINCIPALS_DROP(cx, jsPrincipals)
Delfin, we have a few ideas.

1. If you could supply us with a debug version of your addon, we can try to reproduce the problem here and debug it. In particular, we need a version that will work with a debug build of Firefox--one where you have '#define DEBUG' before you include any JS headers. We have compartment assertions that might show the problem pretty quickly.

2. From your comment about calling callbacks:

contextStack->Push(cx)
JSObject* globalObject = JS_GetGlobalObject(cx)
JS_CallFunctionValue(cx, globalObject, jsval, argc, argv, &retval);
stack->Pop(cx);

For this to be correct, cx, globalObject, jsval, and argv[0]...argv[argc-1] must all be in the same compartment. Given the above:

  cx and globalObject will definitely be in the same compartment, because globalObject is the
                      global of cx
  jsval *may* be in the same compartment, depending on where it came from.

 ** It seems that argv[i] is probably not in the same compartment.

If any of the arguments are not in the same compartment, you must call JS_WrapObject on it before passing it to JS_CallFunctionValue.

There are several promising next steps:

A. Give us the ability to run Kikin with a debug build of Firefox so we can try to reproduce and debug.

B. Try putting JS_WrapObject in place to see if it helps. It will be harder to know that we've fixed the problem for sure if you just do that, though.

C. Tell us more about how this works, so we can try to spot other potential problems. In particular, we don't understand where the callback functions come from (what compartment they belong to, how they are generated, how they are passed to your C++ component).

We prefer A.
Blocks: 646745
We'll do A. I will build an XPI and send you a URL ASAP. I tried to build a debug version of Firefox for OSX but after the build finished I could not find a Firefox executable anywhere in the dist folder. My .mozconfig is as follows:

. $topsrcdir/build/macosx/universal/mozconfig
. $topsrcdir/browser/config/mozconfig
. $topsrcdir/xulrunner/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-sm-debug
ac_add_options --disable-optimize
ac_add_options --enable-debug 
ac_add_options --enable-tests
ac_add_options --enable-codesighs
ac_add_options --disable-install-strip

export CFLAGS="-gdwarf-2"
export CXXFLAGS="-gdwarf-2"

# For NSS symbols

export MOZ_DEBUG_SYMBOLS=1

ac_add_options --enable-debug-symbols="-gdwarf-2"

# Needed to enable breakpad in application.ini
export MOZILLA_OFFICIAL=1

# Enable parallel compiling
mk_add_options MOZ_MAKE_FLAGS="-j4"
(In reply to comment #12)
> We'll do A. I will build an XPI and send you a URL ASAP. I tried to build a
> debug version of Firefox for OSX but after the build finished I could not find
> a Firefox executable anywhere in the dist folder.

Relative to an OSX build dir, the binary should be something like:

  dist/MinefieldDebug.app/Contents/MacOS/firefox-bin
Assignee: general → wmccloskey
I have uploaded a debug build of the kikin extension (XPI installer) to:

http://www.kikin.com/download/kikin/2_11_16/kikin_installer_2.11.16_kikin_osx.xpi <- OSX
http://www.kikin.com/download/kikin/2_11_16/kikin_installer_2.11.16_kikin_win.xpi <- Win32

I have also placed the PDB for KikinFirefoxPlugin_4_0.dll at
http://www.kikin.com/download/kikin/2_11_16/KikinFirefoxPlugin_4_0.pdb 
and the dSYM for libkikinfirefoxplugin_4_0.dylib at
http://www.kikin.com/download/kikin/2_11_16/libkikinfirefoxplugin_4_0.dylib.dSYM/Contents/Resources/DWARF/libkikinfirefoxplugin_4_0.dylib

It seems easier to reproduce the bug (at least in OSX) if you open a second
Firefox window and continue navigating in the first one.
(In reply to comment #13)
> (In reply to comment #12)
> > We'll do A. I will build an XPI and send you a URL ASAP. I tried to build a
> > debug version of Firefox for OSX but after the build finished I could not find
> > a Firefox executable anywhere in the dist folder.
> 
> Relative to an OSX build dir, the binary should be something like:
> 
>   dist/MinefieldDebug.app/Contents/MacOS/firefox-bin

Thanks. I was finally able to get MinefieldDebug.app (my .mozconfig was not completely correct) but I ran it from objdir-sm-debug/i386/dist/universal/firefox/MinefieldDebug.app and it crashes on launch :(
(In reply to comment #11)
> Delfin, we have a few ideas.
> 
> <snip/>
> 
> C. Tell us more about how this works, so we can try to spot other potential
> problems. In particular, we don't understand where the callback functions come
> from (what compartment they belong to, how they are generated, how they are
> passed to your C++ component).

Let's see:

1. The kikin overlay js initializes the kikin C++ component via XPConnect.

2. The kikin overlay listens to tab addition/removal events.

3. For every tab present at startup and for every tab added thereon, the kikin extension adds a <xul:browser> element to the chrome of the browser (the kikin bar). This browser will host the "kikin bar" JavaScript code. The kikin extension listens to browser navigation events and page DOMContentLoaded events for each page and each kikin bar. 

4. The extension loads a remote URL into each kikin bar <xul:browser> element. This URL contains the kikin bar javascript. When the page is loaded into the bar, the C++ part of the extension will inject a global javascript object into the page via JSAPI. This object is an entry point to several services the C++ part of the component exposes to the bar. These services include:

a) A KikinXmlHttpRequest object modeled after the normal XmlHttpRequest object provided by the browser but it contains some extra encryption features. The script of the bar can create instances of the KikinXmlHttpRequest object and use them to communicate only with kikin.com servers. The script may get callbacks when the request state changes (just like for the standard XmlHttpRequest object). This is one case in which the C++ invokes JavaScript-provided callbacks.

b) An API for evaluating a custom piece of javascript in the context of the active browser page. When this API is invoked, the bar javascript passes a string to the C++ component, and the C++ component evals this string on the JSContext of the page. This call is synchronous and if the eval returns a value, the C++ component will return that value to the javascript of the bar making the kikin API call.

c) APIs to retrieve kikin-specific information from the C++ component such as version number.

d) APIs to show and hide the bar, or change the bar location from top to side. In response to these calls the C++ component may move the <xul:browser> element hosting the bar.

e) APIs to request notifications when the browser page is about to navigate or when it has finished loading. This is another case where the C++ component may invoke a Javascript callback.

5. The C++ component also injects a custom JavaScript object into the page context. This object exposes a limited set of the API described above. Basically the JavaScript of the page can send messages to kikin.com servers (KikinXmlHttpRequest) or send messages to the bar through the C++ component (similar to the postMessage API of the browser). 

As you can see, there are at least 2 JavaScript contextes we handle for every tab and we pass objects from one to the other, and they can both receive callbacks.
Delfin,

Thanks for the debug plugin. I debugged on Windows using just-in-time debugging and spotted a few things:

1. The first assert occurs when trying to evaluate some JS code where the scope chain is set to a browser window. I think the problem there is that you need to add JSOPTION_VAROBJFIX to the contexts you create (with JS_NewContext or whatever).

2. I forced that change in Firefox, and soon tripped compartment asserts with the Kikin plugin. One comes via:

        Kikin::Plugin::Firefox::KikinComponent::OnDomContentLoaded
  calls Kikin::Plugin::Firefox::KikinComponent::InjectKikinGlobalObject
  calls Kikin::Plugin::JsObject::SetPropertyObject
  calls Kikin::Plugin::Detail::JsObjectImpl::SetProperty
  calls Kikin::Plugin::Firefox::jsapi::SetObjectProperty
  calls JS_DefineProperty

The assert is because the cx argument and the jsval argument (the property value to set) are in different compartments. cx and obj do have matching compartments. Since I don't know what those Kikin functions do, I'm not exactly sure why that is.

One change that should fix this would be to wrap the jsval with JS_WrapValue before passing it to that call. This assumes that it is intentional that 'obj' and 'value' are in different compartments. E.g., if |obj| is a content object, and |value| is a Kikin object created by your C++ component, then they are necessarily in different compartments, and wrapping is the right thing to do.

3. Another compartment assert I got involved the return value from 

  Kikin::Plugin::Firefox::JsContextWrapper::JsWrapperCaller2

The return value is in a different compartment from |cx|. This is another place that calls for JS_WrapValue.

When we get bugs like this, we usually run with debug builds, see where the compartment assertions trip, and add JS_WrapValue/JS_EnterCrossCompartmentCall as needed. Since I don't know your code, I can't directly offer anything better than putting those calls at the points of the assertions. But, if given your knowledge of the code you know which values should be in the same compartment in the Kikin plugin, you might want to add compartment-check assertions of your own (js::assertSameCompartment, see jscntxtinlines.h) to enforce your own design invariants, which would allow you to catch the bugs closer to the point where things actually go wrong.
(In reply to comment #17)
> Delfin,
> 
> Thanks for the debug plugin. I debugged on Windows using just-in-time debugging
> and spotted a few things:
> 
> 1. The first assert occurs when trying to evaluate some JS code where the scope
> chain is set to a browser window. I think the problem there is that you need to
> add JSOPTION_VAROBJFIX to the contexts you create (with JS_NewContext or
> whatever).

The JSContextes I store (in which I execute scripts) are the ones from the browser pages. When the browser page loads, I take the nsIDOMWindow of the page, QI it to nsIScriptGlobalObject, call GetContext() on it to obtain a nsIScriptContext* and call GetNativeContext() on that to get the JSContext*.

I don't think I should add options like JSOPTION_VAROBJFIX to this JSContext before it is created by someone else but maybe I'm wrong?

2. I forced that change in Firefox, and soon tripped compartment asserts with
> the Kikin plugin. One comes via:

I'm fixing those at the moment and I'll upload a new build as ASAP.

> When we get bugs like this, we usually run with debug builds, see where the
> compartment assertions trip, and add JS_WrapValue/JS_EnterCrossCompartmentCall
> as needed. 

Understood. We'll make "running extension in debug Firefox" a standard test case for us. Hopefully this will uncover many potential bugs.

Thanks!
(In reply to comment #17)
> When we get bugs like this, we usually run with debug builds, see where the
> compartment assertions trip, and add JS_WrapValue/JS_EnterCrossCompartmentCall
> as needed. 

Is this kind of info captured anywhere on https://addons.mozilla.org/developers/ ? Would seem a shame if useful advice like the above got lost in a bug report, if it could help other addon authors too?
David, 

(In reply to comment #17)
> 
> 1. The first assert occurs when trying to evaluate some JS code where the scope
> chain is set to a browser window. I think the problem there is that you need to
> add JSOPTION_VAROBJFIX to the contexts you create (with JS_NewContext or
> whatever).
 
I fixed this problem by using OBJ_TO_INNER_OBJECT before evaluating the script. In other words:

JSObject* aScopeObject = ::JS_GetGlobalObject(js_context_of_window); <-- This returns an object of class Proxy.

OBJ_TO_INNER_OBJECT(js_context_of_window, aScopeObject); <-- aScopeObject is now a JSObject of class Window. I pass this to JS_EvaluateUCScriptForPrincipals and the assert is gone.

> 2. I forced that change in Firefox, and soon tripped compartment asserts with
> the Kikin plugin. One comes via:
> 
>         Kikin::Plugin::Firefox::KikinComponent::OnDomContentLoaded
>   calls Kikin::Plugin::Firefox::KikinComponent::InjectKikinGlobalObject
>   calls Kikin::Plugin::JsObject::SetPropertyObject
>   calls Kikin::Plugin::Detail::JsObjectImpl::SetProperty
>   calls Kikin::Plugin::Firefox::jsapi::SetObjectProperty
>   calls JS_DefineProperty

I don't get this assert (although I'm running a debug Firefox build in OSX). I placed a js::assertSameCompartment inside Kikin::Plugin::Firefox::jsapi::SetObjectProperty and that always passes. 

> The assert is because the cx argument and the jsval argument (the property
> value to set) are in different compartments. cx and obj do have matching
> compartments. Since I don't know what those Kikin functions do, I'm not exactly
> sure why that is.
>  
> 3. Another compartment assert I got involved the return value from 
>

I don't get this assert either. I have placed a js::assertSameCompartment before returning from Kikin::Plugin::Firefox::JsContextWrapper::JsWrapperCaller2 and that always passes.
 
in fact, the compartment assert I do get has the following call stack:

#0	0x101a250af in JS_Assert at jsutil.cpp:80
#1	0x1019c0c71 in js::CompartmentChecker::fail at jscntxtinlines.h:545
#2	0x1019c0cd9 in js::CompartmentChecker::check at jscntxtinlines.h:561
#3	0x1019c0d08 in js::CompartmentChecker::check at jscntxtinlines.h:569
#4	0x10189d71c in js::CompartmentChecker::check at jscntxtinlines.h:579
#5	0x1018a09d3 in js::assertSameCompartment<js::Value> at jscntxtinlines.h:641
#6	0x101959b68 in js::CallJSNative at jscntxtinlines.h:703
#7	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#8	0x101956701 in js::RunScript at jsinterp.cpp:653
#9	0x101956e07 in js::Invoke at jsinterp.cpp:740
#10	0x101903816 in js_fun_apply at jsfun.cpp:2206
#11	0x101959b4f in js::CallJSNative at jscntxtinlines.h:701
#12	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#13	0x101956701 in js::RunScript at jsinterp.cpp:653
#14	0x101956e07 in js::Invoke at jsinterp.cpp:740
#15	0x101903816 in js_fun_apply at jsfun.cpp:2206
#16	0x101959b4f in js::CallJSNative at jscntxtinlines.h:701
#17	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#18	0x101956701 in js::RunScript at jsinterp.cpp:653
#19	0x101956e07 in js::Invoke at jsinterp.cpp:740
#20	0x101903816 in js_fun_apply at jsfun.cpp:2206
#21	0x101959b4f in js::CallJSNative at jscntxtinlines.h:701
#22	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#23	0x101956701 in js::RunScript at jsinterp.cpp:653
#24	0x101956e07 in js::Invoke at jsinterp.cpp:740
#25	0x1019034f1 in js_fun_call at jsfun.cpp:2143
#26	0x101959b4f in js::CallJSNative at jscntxtinlines.h:701
#27	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#28	0x101956701 in js::RunScript at jsinterp.cpp:653
#29	0x101956e07 in js::Invoke at jsinterp.cpp:740
#30	0x101903816 in js_fun_apply at jsfun.cpp:2206
#31	0x101959b4f in js::CallJSNative at jscntxtinlines.h:701
#32	0x101943e35 in js::Interpret at jsinterp.cpp:4799
#33	0x101956701 in js::RunScript at jsinterp.cpp:653
#34	0x101958647 in js::Execute at jsinterp.cpp:1028
#35	0x101880ebe in EvaluateUCScriptForPrincipalsCommon at jsapi.cpp:5059
#36	0x101881246 in JS_EvaluateUCScriptForPrincipalsVersion at jsapi.cpp:5075
#37	0x1009a33d8 in nsJSContext::EvaluateString at nsJSEnvironment.cpp:1460
#38	0x1006eb881 in nsScriptLoader::EvaluateScript at nsScriptLoader.cpp:906
#39	0x1006ebc7c in nsScriptLoader::ProcessRequest at nsScriptLoader.cpp:799
#40	0x1006ebf1b in nsScriptLoader::ProcessPendingRequests at nsScriptLoader.cpp:965
#41	0x1006ec411 in nsScriptLoader::OnStreamComplete at nsScriptLoader.cpp:1183
#42	0x1000c99d6 in nsStreamLoader::OnStopRequest at nsStreamLoader.cpp:125
#43	0x1000eea08 in nsHTTPCompressConv::OnStopRequest at nsHTTPCompressConv.cpp:127
#44	0x1000c8c15 in nsStreamListenerTee::OnStopRequest at nsStreamListenerTee.cpp:71
#45	0x1001878c6 in nsHttpChannel::OnStopRequest at nsHttpChannel.cpp:4032
#46	0x1000907ba in nsInputStreamPump::OnStateStop at nsInputStreamPump.cpp:578
#47	0x1000908d8 in nsInputStreamPump::OnInputStreamReady at nsInputStreamPump.cpp:403
#48	0x10165aae3 in nsInputStreamReadyEvent::Run at nsStreamUtils.cpp:112
#49	0x101688266 in nsThread::ProcessNextEvent at nsThread.cpp:633
#50	0x10160e2cb in NS_ProcessPendingEvents_P at nsThreadUtils.cpp:200
#51	0x101418118 in nsBaseAppShell::NativeEventCallback at nsBaseAppShell.cpp:132
#52	0x1013cb842 in nsAppShell::ProcessGeckoEvents at nsAppShell.mm:399
#53	0x7fff88e45401 in __CFRunLoopDoSources0
#54	0x7fff88e435f9 in __CFRunLoopRun
#55	0x7fff88e42dbf in CFRunLoopRunSpecific
#56	0x7fff85d957ee in RunCurrentEventLoopInMode
#57	0x7fff85d955f3 in ReceiveNextEventCommon
#58	0x7fff85d954ac in BlockUntilNextEventMatchingListInMode
#59	0x7fff81d0be64 in _DPSNextEvent
#60	0x7fff81d0b7a9 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
#61	0x7fff81cd148b in -[NSApplication run]
#62	0x1013cb15e in nsAppShell::Run at nsAppShell.mm:746
#63	0x10111b054 in nsAppStartup::Run at nsAppStartup.cpp:220
#64	0x100036a4a in XRE_main at nsAppRunner.cpp:3786
#65	0x100001287 in main at nsBrowserApp.cpp:158

As you can see, the call stack says the error happens when a dynamic script fetched from the network is executed by nsScriptLoader. I am sure this is a kikin script but I don't understand how a script loaded and executed by nsScriptLoader could have a compartment error. I guess maybe the error happens when this script tries to access one of the C++ objects through the JSAPI API but then why are the assertSameCompartment checks I place succeeding then (I check the return jsval compartment before returning anything via JSAPI)? 

If you can shed any light on this issue it would be awesome.

Thanks,

-delfin
Just an update,

I have now reproduced and fixed the compartment-crossing issues David Mandelin identified. We are testing a new version internally before updating our distribution channels and users.
(In reply to comment #21)
> Just an update,
> 
> I have now reproduced and fixed the compartment-crossing issues David Mandelin
> identified. We are testing a new version internally before updating our
> distribution channels and users.

Great! By the way, your addition of OBJ_TO_INNER_OBJECT seems like a correct alternative fix, at least to the immediate problem I found (I don't know a lot about the VAROBJFIX option, so there could be other issues involved, but I'm not aware of any right now).
(In reply to comment #22)
> (In reply to comment #21)
> > Just an update,
> > 
> > I have now reproduced and fixed the compartment-crossing issues David Mandelin
> > identified. We are testing a new version internally before updating our
> > distribution channels and users.
> 
> Great! By the way, your addition of OBJ_TO_INNER_OBJECT seems like a correct
> alternative fix, at least to the immediate problem I found (I don't know a lot
> about the VAROBJFIX option, so there could be other issues involved, but I'm
> not aware of any right now).

David, one more thing. I ended up using JS_WrapObject in one place where a JavaScript object is created by the browser page, handed to the C++ component,  and from there handed to the JavaScript running in the kikin bar (different <xul:browser> element). Using JS_WrapObject passing the JSConext of the kikin bar (destination) fixed the cross-compartment asserts but the Javascript trying to access the wrapped object gets an access-denied error. Is this normal? Is there a way to avoid it?

Thanks
(In reply to comment #23)
> David, one more thing. I ended up using JS_WrapObject in one place where a
> JavaScript object is created by the browser page, handed to the C++ component, 
> and from there handed to the JavaScript running in the kikin bar (different
> <xul:browser> element). Using JS_WrapObject passing the JSConext of the kikin
> bar (destination) fixed the cross-compartment asserts but the Javascript trying
> to access the wrapped object gets an access-denied error. Is this normal? Is
> there a way to avoid it?

Blake, can you help with this question?
It seems not even a regular alert is able to work with the object in the destination environment after JS_WrapObject has been applied. I was looking at the JSAPI and I saw JS_TransplantObject. Should I be using this instead? If so, I don't understand the parameters very well :(
After much searching it seems the problem is security related. Here is my scenario again:

1. I have two Javascript contextes, one is the browser page and the other one is the kikin bar. 
2. The kikin bar is hosted in a xul:browser element of type "content-targetable" and source is a kikin http URL.
3. The bar environments asks the C++ component to execute code on the context of the page. 
4. The C++ component executes the code via JS_EvaluateUCScriptForPrincipals on the context of the page
5. The code executed returns an object (for example { foo : { a } } )
6. The C++ component detects the compartment of this returned object is different from the compartment of the bar, so it calls JS_WrapObject to wrap the result object
7. The bar environment tries to access a property in the returned (wrapped) object, and it gets access denied error.

However, if I change the type of the <xul:browser> element where the bar is hosted to "chrome" then the code can access the returned (wrapped) object properties without any errors. 

Is this happening because the code which produces the object and the code which reads the properties are in different contextes/principals but they are both "content" browsers? We are already planning to move our kikin bar xul:browser to type "chrome" but this will take a few days because of all the references to "top" in the code. If there is a way for the C++ code to do something with the returned object to work around this problem (similar to how the C++ code calls JS_WrapObject to work around the different apartment problem)?

Thanks in advance.
(In reply to comment #26)
> Is this happening because the code which produces the object and the code which
> reads the properties are in different contextes/principals but they are both
> "content" browsers? We are already planning to move our kikin bar xul:browser
> to type "chrome" but this will take a few days because of all the references to
> "top" in the code. If there is a way for the C++ code to do something with the
> returned object to work around this problem (similar to how the C++ code calls
> JS_WrapObject to work around the different apartment problem)?

This question really needs a DOM/security expert. My rough understanding is that the security checks are functioning as designed: one content compartment should not normally be able to access objects from another. I think there are exceptions for parent iframes and such--they must use a different kind of wrapper object that allows access--so there may be some workaround like you want. But I don't know how that's done or if it would be valid to do so in this case.
Hi Delfin,

Almost all of our work over the past few months has been to make it easier to do JS -> JS communication. Is there any reason that you have to have a C++ component in the middle here? (Re-reading what you've said, if your JS was not chrome to begin with, this makes more sense.)

Given what you said about making your <xul:browser> type="chrome", it sounds like right now, it isn't being compiled with chrome principals. I don't know what principals we'd use otherwise (do you load the internal page with a resource:// URL?) but it does sound like our security model is now catching something that was never intended to be allowed (you're breaking the same-origin model here). This isn't something that we really wanted to support without a really good reason. It sounds like making the bar a chrome page will obviate the need for this.

As to other solutions... since we really didn't consider this use-case, I don't really have any easy solutions. There are a couple of things I can think of, but they're both pretty fragile and will be difficult to get right. It sounds to me like making your bar chrome is the easiest way to go at this point.
This is definitely happening to non-kikin users.  See thread at http://slashdot.org/comments.pl?sid=2084712&cid=35826410 and going through my reply.
Oh, I guess Firebug could be triggering that, per summary...  Steps to reproduce would be interesting.
(In reply to comment #28)
> Hi Delfin,
> 
> Almost all of our work over the past few months has been to make it easier to
> do JS -> JS communication. Is there any reason that you have to have a C++
> component in the middle here? (Re-reading what you've said, if your JS was not
> chrome to begin with, this makes more sense.)
> 
> Given what you said about making your <xul:browser> type="chrome", it sounds
> like right now, it isn't being compiled with chrome principals. I don't know
> what principals we'd use otherwise (do you load the internal page with a
> resource:// URL?) but it does sound like our security model is now catching
> something that was never intended to be allowed (you're breaking the
> same-origin model here). This isn't something that we really wanted to support
> without a really good reason. It sounds like making the bar a chrome page will
> obviate the need for this.
> 
> As to other solutions... since we really didn't consider this use-case, I don't
> really have any easy solutions. There are a couple of things I can think of,
> but they're both pretty fragile and will be difficult to get right. It sounds
> to me like making your bar chrome is the easiest way to go at this point.

Hi Blake,

Thanks for your answer. In order to fix the problem we set the type of the kikin bar xul:browser to "chrome". However, as you well say, now that the bar is running in the chrome there should be no need to go through C++ for communication with the page. I'm looking forward to change this in the near future.

-delfin
I have released kikin 2.11.19 to the kikin site and existing users are getting it via update.rdf (and via our own update mechanism starting today). kikin 2.11.19 fixes all the cross-compartment violation issues as identified by Firefox debug build asserts so it should not cause the issue discussed in this bug.

Thanks,

-delfin
(In reply to comment #30)
> Oh, I guess Firebug could be triggering that, per summary...  Steps to
> reproduce would be interesting.

I crash FF4.0 occasionally (<20%) of the time running Firebug's FBTest suite on Win7 (not Mac). The only signature I remember seeing has js::Shape on top.

I remember the crash-stats links a bug, and I recall reading the bug the first time and thinking "oh, this is a real common crasher someone else will have a test case".  

The machine I'm currently on does not seem to have these crashes.  I'll look later on the one I know has it.

In view of comment 32, consider closing this bug as fixed by Rojas for Kikin and opening another one with the same signature for the ones related to Firebug.
(In reply to comment #33)
> In view of comment 32, consider closing this bug as fixed by Rojas for Kikin
> and opening another one with the same signature for the ones related to
> Firebug.
Kikin is "only" responsible of 78% of crashes on Windows. If you close this bug, what about Windows crashes without Kikin or Firebug: engine@conduit.com, toolbar@ask.com, Yahoo toolbar, and more.
Ok the crashes I see with Firebug running FBTest are all point to this bug. Not running Kikin or Test Pilot.  My stacks are *not* the ones reported in comment 4 or comment 5 or the original report.

https://crash-stats.mozilla.com/report/index/bp-067a2b54-d1c8-414b-84f8-c1fc22110410
https://crash-stats.mozilla.com/report/index/bp-52cf2e24-edde-4932-9819-d3a2c2110413
https://crash-stats.mozilla.com/report/index/bp-0ff99921-c1b1-4188-b816-3cc302110415
(In reply to comment #35)
> Ok the crashes I see with Firebug running FBTest are all point to this bug. Not
> running Kikin or Test Pilot.  My stacks are *not* the ones reported in comment
> 4 or comment 5 or the original report.

These stacks look more like bug 601102 to me. Is it possible that the patch in that bug fixes it for you?
(In reply to comment #36)
> These stacks look more like bug 601102 to me. 

Yes, I agree.  So my crashes may go away when that fix is released.
(In reply to comment #36)
> These stacks look more like bug 601102 to me. Is it possible that the patch in
> that bug fixes it for you?

In that case, the candidate 4.0.1 builds should have fixed it - can that be proven?
(In reply to comment #38)
> In that case, the candidate 4.0.1 builds should have fixed it - can that be
> proven?
bug 601102 has been reopened so this bug is not fixed.
(In reply to comment #39)
> (In reply to comment #38)
> > In that case, the candidate 4.0.1 builds should have fixed it - can that be
> > proven?
> bug 601102 has been reopened so this bug is not fixed.

I think the problem is that we crashed in Shape::trace for several different reasons. In bug 601102, we fixed the most common of these crashes---those that have exn_trace on the stack. And the crashes that John posted have this form. So I think that his crashes should be fixed in 4.0.1.
> So I think that his crashes should be fixed in 4.0.1.
Crash stats confirm it is not fixed (see links in comment 0).
(In reply to comment #41)
> > So I think that his crashes should be fixed in 4.0.1.
> Crash stats confirm it is not fixed (see links in comment 0).

I guess I was ambiguous. John posted some crashes in comment 35 that he thought were related to this bug. I replied that those crashes are different, and they should be fixed by the patch in bug 601102. Looking at the 4.0.1 crash stats for that bug, I don't see any signatures with exn_trace in them. So I still think his bug is fixed.

I agree that there are still a lot of crashes in this bug that we need to deal with (possibly in a separate bug).
(In reply to comment #32)
> I have released kikin 2.11.19 to the kikin site and existing users are getting
> it via update.rdf (and via our own update mechanism starting today). kikin
> 2.11.19 fixes all the cross-compartment violation issues as identified by
> Firefox debug build asserts so it should not cause the issue discussed in this
> bug.
> 
> Thanks,
> 
> -delfin

By the way, thanks for your quick work in updating this. I know it's even more important to you than me, but I was happy to see this get fixed so fast.
Crash Signature: [@ js::Shape::getChild ] [@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ]
We have 52 crashes on 8.0 in the past 4 weeks. A few crashes on Aurora. Low volume but still valid. Removing the top crash keyword.
Crash Signature: [@ js::Shape::getChild ] [@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] → [@ js::Shape::getChild ] [@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ]
Keywords: topcrash
No longer depends on: 644712
There is not enough crash data to get correlations.
No longer blocks: 646745
Summary: Crash [@ js::Shape::getChild ][@ js::Shape::getChild(JSContext*, js::Shape const&, js::Shape**) ] mainly with Kikin (Windows) or Firebug (Mac) → Crash @ js::Shape::getChild
Assignee: wmccloskey → general
Blocks: 645881
There have been no crashes for the last four weeks after 10.0.3.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.