Closed Bug 418443 Opened 16 years ago Closed 16 years ago

XPCOM embedded SpiderMonkey cannot access DOM properties - new in Beta 3

Categories

(Core :: JavaScript Engine, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED INVALID
mozilla1.9

People

(Reporter: ron.langham, Assigned: jst)

References

Details

(Keywords: regression)

Attachments

(1 obsolete file)

User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3) Gecko/2008020514 Firefox/3.0b3

We have a XPCOM extension that has its own internal SpiderMonkey engine that runs scripts on the loaded page Document. We injected some javascript into the page via XUL. A load event in the page then call methods in our XPCOM DLL that then passes the document object to our internal SM engine for other processing on the page. This has been working for a couple of years in FF 1.5, 2.0, and worked fine up to Beta 2, but broke in Beta 3. 

From high-level view, within the load event of the page, we add a 'document' variable to our SM that refers to the same document object in the main page.  We use the nsIDOMHTMLDocument object that we get from the event and add it to our SM variables.  We compile the JS script using the proper 'Principal' method,  JS_CompileScriptForPrincipals.  Other variables are added using JS_EvaluateScriptForPrincipals.

When we execute our script using JS_ExecuteScript, everything runs fine, we are able to access all document elements, properties, etc.  We can access all other DOM elements and their properties in the page with no problem.   If we then call a method in the javascript using JS_CallFunctionName immediately after the JS_ExecuteScript, then we start seeing problems. e.g.

     // this works ok
     JS_ExecuteScript(m_pJSContext, m_pJSGlobalObject, m_pJSScript, &vRet);

     // this returns FALSE
     JS_CallFunctionName(m_pJSContext, m_pJSGlobalObject, sName.c_str(), (uintN)argc, argv, &vRet);

The problem occurs if the method in the JS tries to access any properties on any of the DOM elements. The 'document' object is available and a typeof(document) properly returns [object HTMLDocument].  Other elements can be retrieved using document.getElementById('validName').  The element returned is valid and a typeof returns the proper value.  The problem occurs when there is any access to a property on any DOM element. For instance, the following examples cause the JS method to abort with no exception as soon as they are executed, the JS_CallFunctionName method simply returns FALSE.

        temp = document.location;
	temp = typeof(document.location);
        temp = elem.innerHTML;   // where elem is a valid value retrieved by getElementById
        temp = typeof(elem.innerHTML);

This error of accessing any property is very similar to what is seen in JS_ExecuteScript if we do not compile and load variables using the Principal methods.  Therefore, I suspect the the Principals assigned is lost for some reason when we call JS_CallFunctionName.

In other tests, I was able to do reflection on DOM objects in the method and see all the properties, e.g. for (var prop in elem), but it just fails when I try to access one, even just doing a typeof() on an element property.

I started going back to older nightly builds in order to track down when this bug occurred.  I was able track down the problem to a few lines of code added to the NewCompilerFunction() in jsparse.c on 12/19 by Igor Bukanov in version 3.316 of jsparse.c.  The change was related to bug 398219. There were many other changes similar to this, as well as some Principal specific code removed from fun_resolve() method in jsfun.c, but this code change didn't seem to affect the problem.

If I commented out the code in NewCompilerFunction() as shown below in the Beta 3 release source code, our extension then began to work again with no problems.   I copied the new patched js3250.dll to the Beta 3 release and did our tests.

    NewCompilerFunction(...)
    {
       // ...

       fun = js_NewFunction(cx, NULL, NULL, 0, JSFUN_INTERPRETED | lambda,
                         parent, atom);

       // COMMENT OUT THE LINES BELOW

       //if (fun && !(tc->flags & TCF_COMPILE_N_GO)) {
       //   STOBJ_SET_PARENT(fun->object, NULL);
       //   STOBJ_SET_PROTO(fun->object, NULL);
       //}

       return fun;
    }

Here are more low-level details on our implementation...

1) Take over the TabOpen event on Firefox and When a Tab component is first created, we initialize SM and get principals, e.g.

	nsCOMPtr<nsIJSRuntimeService> rtsvc;
	m_servMan->GetServiceByContractID("@mozilla.org/js/xpc/RuntimeService;1", 		NS_GET_IID(nsIJSRuntimeService), getter_AddRefs(rtsvc)); 
	rtsvc->GetRuntime(&m_pJSRuntime);
	...
     	nsAXPCNativeCallContext *callContext;
	m_xpc->GetCurrentNativeCallContext(&callContext);
	...
	callContext->GetJSContext(&m_pJSCallContext);
	...
	sSecurityManager->GetPrincipalFromContext(m_pJSCallContext, getter_AddRefs(principal));
	...
	principal->GetJSPrincipals(m_pJSCallContext, &m_pJSPrincipals);
	...
	m_pJSContext = JS_NewContext(m_pJSRuntime, JS_STACK_SIZE);
	...
	m_pJSGlobalObject = JS_NewObject(m_pJSContext, &globalClass, nsnull, nsnull);
	JS_InitStandardClasses(m_pJSContext, m_pJSGlobalObject);
	...
	m_xpc->InitClasses(m_pJSContext, m_pJSGlobalObject);

   These value are maintained in a C++ instance attached to the tab document.
    
2) On a page load event, take the nsIDOMHTMLDocument of the page and create new JS objects in our SM, add a variable 'document' into our SM for the document so that our scripts can access it as normal. e.g.

	nsCOMPtr<nsIXPConnectJSObjectHolder> wrapper;
	m_xpc->WrapNative(m_pJSContext, m_pJSGlobalObject, pDoc, NS_GET_IID(nsIDOMHTMLDocument),
		getter_AddRefs(wrapper));
	...
	JSObject* ourJSObject;
	wrapper->GetJSObject(&ourJSObject);
	...
	JS_DefineProperty(m_pJSContext,
		m_pJSGlobalObject,
		"document",
		OBJECT_TO_JSVAL(ourJSObject),
		nsnull, nsnull,
		JSPROP_PERMANENT);

3) Load and compile our script using JS_CompileScriptForPrincipals. e.g.

	m_pJSScript = JS_CompileScriptForPrincipals(m_pJSContext, m_pJSGlobalObject, 
             m_pJSPrincipals, sProcessedScript.c_str(), sProcessedScript.size(), 
             scriptName.c_str(), 0);

4) Execute our script using JS_ExecuteScript...

	JSBool bRet = JS_ExecuteScript(m_pJSContext, m_pJSGlobalObject, 
			m_pJSScript, &vRet);

5) Then try to call a method in the script

	JSBool bRet = JS_CallFunctionName(m_pJSContext, m_pJSGlobalObject, sName.c_str(), 
		(uintN)argc, argv, &vRet);

Scripts run fine as long as they do not access properties on any of the DOM elements. The script has no problem with other variables, even other of our XPCOM objects that we injected in to the SM engine.

Our Firefox add-on is a very large and complex add-on. It relies heavily on some XPCOM classes, e.g. CIDs, IIDs, etc.  Therefore, we generally have to compile the Firefox source and copy over the SDK directory and some unfrozen in order to properly re-compile our extension.  I can be available to test a patched js3250.dll or assist in any other method.  I can also be open to chat on IRC.


Reproducible: Always

Steps to Reproduce:
Full details in the 'Details' section


Actual Results:  
Full details in the 'Details' section


Expected Results:  
javascript loaded in our SM should be able to access the DOM, properties, etc without error.  This worked until Beta 3.

Bug is CRITICAL to us.  Without the fix, our extension will no longer be able to run under Firefox 3!!!
Version: unspecified → Trunk
Assignee: nobody → general
Status: UNCONFIRMED → NEW
Component: Extension Compatibility → JavaScript Engine
Ever confirmed: true
Product: Firefox → Core
QA Contact: extension.compatibility → general
Version: Trunk → unspecified
Brendan, Blake, Igor, what's up here?  We should be picking up the function's principals from the script, I would have thought...
Blocks: 398219
Flags: blocking1.9?
Keywords: regression
Version: unspecified → Trunk
Errrr, not if it's a cloned function, in that case, we'll walk up the parent chain, which has been severed! We used to fail silently in this case (see bug 409514) but I should have also added an NS_WARNING. Can someone test to see if that's what's happening here?
(In reply to comment #2)
> Errrr, not if it's a cloned function, in that case, we'll walk up the parent
> chain, which has been severed!

It is the parent chain of the original function stored in JSScript that is severed, not the parent chain of the clone. The clone's parent chain should include JSContext.globalObject. 
I am concerned you are not using the JS GC rooting (root set management) API at all, or correctly. You show retained C++ JSObject pointers, and even JSScript * usage that may be GC-unsafe. Can you show more code, perhaps by email? You need to follow the rules of SpiderMonkey's exact GC, which are tricky and onerous but which can be followed.

I'm not saying any GC safety bugs in your addon are involved in the proplem you report here -- it's not clear yet either way (they could be related).

/be
As I understood it, the function is defined by the script during the JS_ExecuteScript call.  So it shouldn't be a cloned function.

Perhaps the real issue is in one of the functions on one of our wrappers, of course...
(In reply to comment #4)
> I am concerned you are not using the JS GC rooting (root set management) API at
> all, or correctly. You show retained C++ JSObject pointers, and even JSScript *
> usage that may be GC-unsafe. Can you show more code, perhaps by email? 

Yes, let me extract some of our SpiderMonkey wrapper code and I will email to you.  

Also, regarding comment #5, yes it is a function that is defined within the script.

Thanks.

I have sent our Spider Monkey wrapper code and details on how it is used to Brendan
Any update on this bug?  
Ron, although this bug has been nominated for 1.9, it has not been assigned nor given a sufficient priority to be considered at this point. Pinging brendan or igor about the priority and blocking status:
Flags: tracking1.9? → blocking1.9?
Assignee: general → brendan
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.9
Assignee: brendan → igor
Status: ASSIGNED → NEW
Depends on: 420807
Ron:  You might get more eyes on this with a simplified testcase attached -to- the bug, instead of sent privately to Brendan...  is that possible?
Igor took this bug, he and I know what is going on. Plan is to fix general bug and then Ron should do some fast testing.

/be
It would be  somewhat difficult to provide a simple testcase, probably a couple of days work or more in order to  make a new extension that does this.  Our current extension has a very large code base.   If Brendan and Igor know what the bug is, then I could grab the nightly build once checked in and verify on our end, if that will be suffucient.

Thanks for handling this issue!
Priority: P2 → P1
Attached patch minimal fix (obsolete) — Splinter Review
The fix uses the global object if function's parent is not set.
Attachment #309174 - Flags: review?(brendan)
Comment on attachment 309174 [details] [diff] [review]
minimal fix

If I have two windows that can communicate, and get a reference to a parent-less function in the second window from a script running in the first, then call it, the call should be statically scoped to the second window, not the first.

But how would one get a reference to a parent-less compiler-created function object? Maybe the fix is nearer to the answer to that question.

/be
(In reply to comment #14)
> (From update of attachment 309174 [details] [diff] [review])
> If I have two windows that can communicate, and get a reference to a
> parent-less function in the second window from a script running in the first,
> then call it, the call should be statically scoped to the second window, not
> the first.

Without the original change that made useless JSFunction wrappers at least not to leak the scope used during compilation in the scenario above the function will be executed using that compiler scope. I claim that using the current global window is not worse than using the compilation scope. 
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 309174 [details] [diff] [review] [details])
> > If I have two windows that can communicate, and get a reference to a
> > parent-less function in the second window from a script running in the first,
> > then call it, the call should be statically scoped to the second window, not
> > the first.
> 
> Without the original change that made useless JSFunction wrappers at least not
> to leak the scope used during compilation in the scenario above the function
> will be executed using that compiler scope. I claim that using the current
> global window is not worse than using the compilation scope. 

The current global window will be the wrong one, it will be the dynamic scope:

In window w1, with target name "w1":

<script>
  this.name = "w1";
  var x = 42;
  var f = leak_proto(function () alert(x));
</script>

in window w2, loaded after w1 from same origin:

<script>
  var x = 24;
  var w1 = frames[0];
  w1.f();
</script>

Without the original change that fixed the leak, 42 would be alerted, per ECMA static scope rules. With the patch, unless I am mistaken 24 would be alerted.

I don't believe that leak_proto can be written except by using the JS API, due to the primordial bug where JSFunction pointers are exposed and a non-commutative fun->object link can be followed. But that's what is happening for Ron et al.'s add-on. Of course we agree on the long-term fix, but it would be best if the short-term one did not introduce dynamic scoping.

To retain static scope linkage without making a leak hazard, we would need a weak reference instead of a strong one from the parent slot of the proto-function to its window. This means overhead in tracking such proto-functions from their window and nulling their parent links when the window is finalized. Is this any easier than the long-term fix?

/be
Oops, w2's source should be:

<script>
  var x = 24;
  var w1 = window.open("", "w1");
  w1.f();
</script>

/be
Priority: P1 → P2
P2 per FF3 meeting today
(In reply to comment #16)
> To retain static scope linkage without making a leak hazard, we would need a
> weak reference instead of a strong one from the parent slot of the
> proto-function to its window.

The weak reference would not solve the problem of null scope after GC collects parent's scope.
Depends on: 423874
Attachment #309174 - Attachment is obsolete: true
Attachment #309174 - Flags: review?(brendan)
I think I will have the proper fix bases on making JSFunction equivalent to JSObject lter today. Most of the preparation work will be done as a part of bug 423874. 
Depends on: 424376
(In reply to comment #12)
> It would be  somewhat difficult to provide a simple testcase, probably a couple
> of days work or more in order to  make a new extension that does this.  Our
> current extension has a very large code base.   If Brendan and Igor know what
> the bug is, then I could grab the nightly build once checked in and verify on
> our end, if that will be suffucient.

I have landed the bug 424376 which should address the issue. Could you check your extension with a a nightly built compiled after 2008-03-23 12:00 UTC (04:00 AM PST)?
(In reply to comment #21)

Yes, I will get source and start preparing today.  I will try to let you know later, or tomorrow at the latest . Thanks

> (In reply to comment #12)
> > It would be  somewhat difficult to provide a simple testcase, probably a couple
> > of days work or more in order to  make a new extension that does this.  Our
> > current extension has a very large code base.   If Brendan and Igor know what
> > the bug is, then I could grab the nightly build once checked in and verify on
> > our end, if that will be suffucient.
> 
> I have landed the bug 424376 which should address the issue. Could you check
> your extension with a a nightly built compiled after 2008-03-23 12:00 UTC
> (04:00 AM PST)?
> 

(In reply to comment #21)
It appears that the problem is still present. Same thing is happening.  I did a check on calling a simple function that did not access any DOM and it worked ok, but as soon as I tried something that accessed a property on a DOM element (e.g. document), then had the same problem.

I did a double check on the code to make sure I had the latest.  I inspected the CVS Log on the js.c file that you checked in at 3/23 3:04 AM and my code had the same change, so I believe I am working with the proper code.

> (In reply to comment #12)
> > It would be  somewhat difficult to provide a simple testcase, probably a couple
> > of days work or more in order to  make a new extension that does this.  Our
> > current extension has a very large code base.   If Brendan and Igor know what
> > the bug is, then I could grab the nightly build once checked in and verify on
> > our end, if that will be suffucient.
> 
> I have landed the bug 424376 which should address the issue. Could you check
> your extension with a a nightly built compiled after 2008-03-23 12:00 UTC
> (04:00 AM PST)?
> 

(In reply to comment #21)
fyi, I made the same patch that I did in Beta 3, e.g. commenting out a few lines in the NewCompilerFunction in jsparse.c and did another test.  In this case, it no longer fixes the issue as it did in Beta 3.

> (In reply to comment #12)
> > It would be  somewhat difficult to provide a simple testcase, probably a couple
> > of days work or more in order to  make a new extension that does this.  Our
> > current extension has a very large code base.   If Brendan and Igor know what
> > the bug is, then I could grab the nightly build once checked in and verify on
> > our end, if that will be suffucient.
> 
> I have landed the bug 424376 which should address the issue. Could you check
> your extension with a a nightly built compiled after 2008-03-23 12:00 UTC
> (04:00 AM PST)?
> 

(In reply to comment #21)
I have created a simple extension that shows the bug.  I have sent the extension, source, and details to Igor and Brendan.

> (In reply to comment #12)
> > It would be  somewhat difficult to provide a simple testcase, probably a couple
> > of days work or more in order to  make a new extension that does this.  Our
> > current extension has a very large code base.   If Brendan and Igor know what
> > the bug is, then I could grab the nightly build once checked in and verify on
> > our end, if that will be suffucient.
> 
> I have landed the bug 424376 which should address the issue. Could you check
> your extension with a a nightly built compiled after 2008-03-23 12:00 UTC
> (04:00 AM PST)?
> 

No longer depends on: 423874
No longer depends on: 424376
Here is an explanation of bug details.

Prior clearing the parent and proto slots the compiler would create internal function objects with the parent scope pointing to the global object set during compilation. 

Then when the compiled script was executed against this global object, the interpreter would see that the function objects had the parent scope matching the global. In this case the interpreter avoids cloning assuming the function object was created for a so-called compile-and-go script or a script that would be immediately executed after compilation only single time.

Thus after JS_ExecuteScript returns the global object will be populated with functions that are not clones. For such functions the security manager uses the principals embedded into the script itself. In the case of the extension in question this principal allows access to DOM properties.

Now, with the parent slot of the compile-time function object properly cleared to prevent leaks, the interpreter populates the global scope with clones of compiler-time function objects when JS_ExecuteScripts runs.

For clones the security manager uses the principals embedded into the parent scope chain. For the extension in question the principals do not give access to DOM properties leading to the bug.

Thus to fix the bug the security manager should stop testing for clones and should instead check if the function comes from a brutally shared chrome script. Only in this case the security manager should extract the principals from the scope chain.

Now, given my little experience working with principals implementations, I am not the right person to address this.
--> JST can you take this?
Assignee: igor → jst
Igor/Brendan, how can caps tell if a function comes from a brutally shared chrome
script? Right now such scripts just get caught in the check that tests whether a function is a clone or not. I'm not sure what's different between a function cloned from a brutally shared chrome script and one cloned by the interpreter during normal script execution.
(In reply to comment #28)
> Igor/Brendan, how can caps tell if a function comes from a brutally shared
> chrome
> script? 

That is the essence of the bug - xpconnect has to use its own way to find out that and not to rely on checking if the function was cloned.

I was thinking to mark somehow the principals that is stored in the shared script and its functions. But I do not know how to do that.
Thanks to Igor, Brendan, and Johnny, I now have a solution.  They proposed that I implement nsIScriptObjectPrincipal and provide to the JS engine the principals.  The instance of the nsIScriptObjectPrincipal object is set in the private area of the global data.  The security manager will then find the instance and get the principals from it.   I have implemented this in our code and tests seem to show that it works.

This bug can be closed!  Thanks!
Awesome! Thank you Ron for working through this! Closing bug as invalid based on the above comment.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: