Closed Bug 372769 Opened 17 years ago Closed 17 years ago

[FIX]<field> evaluation happens at an unsafe time

Categories

(Core :: XBL, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta1

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: dev-doc-complete, Whiteboard: [sg:critical])

Attachments

(3 files, 5 obsolete files)

Right now we evaluate fields from nsXBLProtoImplField::InstallMember, called by nsXBLProtoImpl::InstallImplementation, called by nsXBLService::LoadBindings (via nsXBLBinding::InstallImplementation).

I think we should move that code to happening at the same time as we fire the constructor....  Or do something else to make it safe.
Flags: blocking1.9?
Assignee: general → jonas
Flags: blocking1.9? → blocking1.9+
Once this is fixed, we should enable the assertion I couldn't enable in bug 379229.
Target Milestone: --- → mozilla1.9alpha6
Whiteboard: [sg:critical]
Blocks: 384373
Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1
Blocks: 392189
Attached patch Proposed patch (obsolete) — Splinter Review
Unfortunately, I can't seem to reproduce the crashes in bug 384373 abd bug 392189 even without this patch.  So I'm not sure whether it fixes them.  I think it really should.

The idea here is to execute fields when someone tries to do something with them (in particular, to resolve them).  The change is basically to store fields on a separate list from other members, not install them at binding attachment time, and install them lazily as they get resolved.

Concerns that I have:
* Is just returning JS_FALSE in unexpected situations from newresolve() ok?
  Or do I also need to JS_SetPendingException or some such?
* Is setting *objp and returning JS_TRUE without calling JS_Define*Property
  ok?  Right now it would happen for a field with an empty field text or if the
  field evaluates to |undefined|, but I suppose I could just define the prop to
  JSVAL_NULL in those cases.  Or not set *objp in those cases.
* It seems like this would change the enumeration behavior, right?  Should I
  handle that by implementing an enumerate op that just tries to resolve all
  the field names or something?  Or is there a better approach?
Attachment #280980 - Flags: superreview?(brendan)
Attachment #280980 - Flags: review?(jonas)
(In reply to comment #2)
> * Is just returning JS_FALSE in unexpected situations from newresolve() ok?
>   Or do I also need to JS_SetPendingException or some such?

You need to JS_SetPendingException something. If you don't, then the script will silently stop executing.

> * Is setting *objp and returning JS_TRUE without calling JS_Define*Property
>   ok?  Right now it would happen for a field with an empty field text or if the
>   field evaluates to |undefined|, but I suppose I could just define the prop to
>   JSVAL_NULL in those cases.  Or not set *objp in those cases.

I think you need to define a property, or the engine will spin for a bit and not find it. You probably want to define the property to be JSVAL_VOID (instead of _NULL) though.

> * It seems like this would change the enumeration behavior, right?  Should I
>   handle that by implementing an enumerate op that just tries to resolve all
>   the field names or something?  Or is there a better approach?

We talked about this on IRC, it turns out that the way things are currently set up, this is close to impossible. In general, the answer to this question would be "yes," but since only prototype objects know how to enumerate leaf objects (the *Element classes), the JSAPI falls short.
Depends on: 396265
Attachment #281278 - Attachment description: XHTML testcase showing the problem → XHTML testcase showing the problem. THIS WILL CRASH
Attached patch With all those issues addressed (obsolete) — Splinter Review
Attachment #280980 - Attachment is obsolete: true
Attachment #280980 - Flags: superreview?(brendan)
Attachment #280980 - Flags: review?(jonas)
Jonas, can you review all the XBL/DOM goop?  Blake, want to review the JS engine usage?
Assignee: jonas → bzbarsky
Status: NEW → ASSIGNED
Attachment #281282 - Flags: superreview?(jonas)
Attachment #281282 - Flags: review?(mrbkap)
Priority: -- → P1
Summary: <field> evaluation happens at an unsafe time → [FIX]<field> evaluation happens at an unsafe time
Target Milestone: mozilla1.9 M8 → mozilla1.9 M9
Comment on attachment 281282 [details] [diff] [review]
Same as diff -w.  Might be easier to review

I was summoned, so here are a few nits:

> #define ELEMENT_SCRIPTABLE_FLAGS                                              \
>-  (NODE_SCRIPTABLE_FLAGS & ~nsIXPCScriptable::CLASSINFO_INTERFACES_ONLY)
>+  ((NODE_SCRIPTABLE_FLAGS & ~nsIXPCScriptable::CLASSINFO_INTERFACES_ONLY) | nsIXPCScriptable::WANT_ENUMERATE)

Wrap to preserve 80th column, as adjacent code does?

> 
> #define EXTERNAL_OBJ_SCRIPTABLE_FLAGS                                         \
>   (ELEMENT_SCRIPTABLE_FLAGS & ~nsIXPCScriptable::USE_JSSTUB_FOR_SETPROPERTY | \
>    nsIXPCScriptable::WANT_GETPROPERTY |                                       \
>    nsIXPCScriptable::WANT_SETPROPERTY |                                       \
>    nsIXPCScriptable::WANT_CALL)
. . .
>+// Nasty hack.  Maybe we could move some of the classinfo utility methods
>+// (e.g. WrapNative and ThrowJSException) over to nsContentUtils?
>+#include "nsDOMClassInfo.h"

FIXME citing new bug# is called for?

>+  // Default to not resolving things.
>+  JSObject* origObj = *objp;
>+  if (!origObj) {
>+    nsDOMClassInfo::ThrowJSException(cx, NS_ERROR_UNEXPECTED);
>+    return JS_FALSE;
>+  }

You should assert instead -- JS engine won't pass null in *objp given JSCLASS_NEW_RESOLVE{,_GETS_START} or it has a bug.

>+  JSString* str = JSVAL_TO_STRING(id);
>+  nsString fieldName(reinterpret_cast<PRUnichar*>(::JS_GetStringChars(str)),
>+                     ::JS_GetStringLength(str));

Isn't there a helper dependent string type jst created for this, or do you need a copy to mutate?


>+    ::JS_SetPrivate(cx, proto, aProtoBinding);
>+    // Keep this proto binding alive while we're alive
>+    aProtoBinding->XBLDocumentInfo()->AddRef();

Blank line before the comment line would be slightly easier on the eyes (on mine, anyway).

>+    // compile the literal string
>+    // XXX Could we produce a better principal here?  Should be able
>+    // to, really!

Another FIXME candidate.

>   PRBool undefined;
>-  // XXX Need a URI here!
>   nsCOMPtr<nsIScriptContext> context = aContext;
>   rv = context->EvaluateStringWithValue(nsDependentString(mFieldText,
>                                                           mFieldTextLength), 
>-                                        aScriptObject,
>-                                        nsnull, bindingURI.get(),
>+                                          aBoundNode,
>+                                          nsnull, uriSpec.get(),

Indentation two spaces two much?

>                                         mLineNumber, nsnull,
>                                         (void*) &result, &undefined);

Seems so.

>+  // Resolve all the fields for this binding on the object |obj|.
>+  // False return means a JS exception was set.
>+  PRBool ResolveAllFields(JSContext* cx, JSObject* obj) const
>+  {
>+    return
>+      mImplementation ? mImplementation->ResolveAllFields(cx, obj) : PR_TRUE;

ptr ? ptr->bool_return() : true => !ptr || ptr->bool_return().

/be
> Isn't there a helper dependent string type jst created for this

nsDependentJSString, yeah,  Good catch.  Good thing the classinfo include covers the header that defines it too.  ;)

> Indentation two spaces two much?

It's a diff -w.  See the other diff for the correct indent.

Fixed the other issues.
changing the testcase to:

        <implementation>
		<constructor>document.removeChild(document.documentElement)
		</constructor>
        </implementation>

crashes 2.0.0.6
trunk seems safe.
(In reply to comment #7)
>ptr ? ptr->bool_return() : true => !ptr || ptr->bool_return().
Not sure how clever compilers are these days but the last time I read the C spec (which admittedly was about 20 years ago) the latter actually compiles with an implicit ? 1 : 0; which actually makes it less efficient...
(In reply to comment #10)
> (In reply to comment #7)
> >ptr ? ptr->bool_return() : true => !ptr || ptr->bool_return().
> Not sure how clever compilers are these days but the last time I read the C
> spec (which admittedly was about 20 years ago) the latter actually compiles
> with an implicit ? 1 : 0; which actually makes it less efficient...

Right, because we are stuck with lousy int PRBool.

C++ with bool could work better -- Mozilla 2 fodder.

/be
> changing the testcase to:

That's bug 267833.  Not relevant here.

> which actually makes it less efficient...

I think it's more readable, though.
Blocks: 384716
What will happen if a <field> implementation tries to access itself, or enumerate all properties of the element?
> What will happen if a <field> implementation tries to access itself

JS protects against recursive resolve.  It'll get |undefined|.

> or enumerate all properties of the element

Note sure what the field enumeration itself will see (I could add a test for this too, if you want), but JS handles recursive enumeration too; the test in this patch doesn't crash or anything.
Attachment #281281 - Attachment is obsolete: true
Attachment #281282 - Attachment is obsolete: true
Attachment #281282 - Flags: superreview?(jonas)
Attachment #281282 - Flags: review?(mrbkap)
Attached patch Same as diff -w (obsolete) — Splinter Review
The tests show the following changes in behavior due to this patch, by the way:

1) If we have fields named A and B and field A set .B, then the value of .B
   will depend on whether .A has been accessed.
2) If a proto is inserted into the proto chain after binding instantiation, it
   may affect field values (before it didn't).
3) <field name="parentNode">this.parentNode</field> becomes undefined instead
   of the value of parentNode at binding attachment time.
4) A <field> no longer changes the value of a property that was set on the
   object itself (not some proto) before binding attachment. 
5) <field name="sixteen">16</field>
   <field name="sixteen">"16 later"</field>

   With my code we get "16"; before we got "16 later".  I could fix that if it
   matters.
6) <field name="eighteen">"18 field"</field>
   <property name="eighteen" readonly="true" onget="return 18"/>
   <property name="nineteen" readonly="true" onget="return 19"/>
   <field name="nineteen">"19 field"</field>

   With my code, the property wins.  In the old code, the field won.
Attachment #281914 - Flags: superreview?(jonas)
Attachment #281914 - Flags: review?(mrbkap)
Am I reading this right, was the old nsXBLBinding::InitClass dead code?
> Am I reading this right, was the old nsXBLBinding::InitClass dead code?

Yes.  It was.
Comment on attachment 281914 [details] [diff] [review]
Same as diff -w

Looks good to me except there are some tabs in nsXBLProtoImplField::InstallField around the EvaluateStringWithValue call.
Attachment #281914 - Flags: superreview?(jonas) → superreview+
Comment on attachment 281914 [details] [diff] [review]
Same as diff -w

jst, if you get to this before Blake, do you want to review the JS parts?  I'd like to get this into trunk ASAP...
Attachment #281914 - Flags: review?(jst)
Target Milestone: mozilla1.9 M9 → ---
We do want this in beta.
Target Milestone: --- → mozilla1.9 M9
Attachment #281914 - Flags: review?(mrbkap) → review+
Attachment #281914 - Flags: review?(jst)
I tried checking this in and backed it out because it caused leaks.

The basic problem is that I'm adding a reference from the XBL proto object to the XBLDocumentInfo (which I need to keep alive the XBLPrototypeBinding).  I thought this would be safe because there shouldn't be any refs in the other direction, but clearly I was wrong somehow.  Which is worrisome enough in itself, to be honest: how is the nsXBLDocumentInfo ending up holding refs to things in content?

Unfortunately, I don't see any way to teach the cycle collector about this reference (since the private is not nsISupports in this case, it doesn't get traversed, for example).  I suppose I could create an nsISupports-implementing shim object just to handle the cycle collection, but that seems silly....  And I do think we want the strong ref there; otherwise I see nothing guaranteeing that the JSObject for the proto won't outlive the nsXBLPrototypeBinding, right?

Peter, is there any way I can make this play nice with cycle collection?
(In reply to comment #21)
> I tried checking this in and backed it out because it caused leaks.
> 
>  Which is worrisome enough in
> itself :how is the nsXBLDocumentInfo ending up holding refs to
> things in content?

"worrisome" means just a leak or potential use after free - if xbl outlives content?

after this bug is fixed of course.
No, "worrisome" means I don't understand why there's a ref the other way and I don't feel like there should be one.  If there is one, then as long as the proto binding is alive we seem to be keeping all sorts of content documents alive, which is bad, because the proto binding can live for a long time.
since xbl destructors don't fire enough does this mean massive leakage?
This bug changes nothing about destructor behavior.
There's a mechanism for adding non-nsISupports objects to the cycle collector.  Look at the *NATIVE* macros in nsCycleCollectionParticipant.h and their users.
Comment on attachment 281914 [details] [diff] [review]
Same as diff -w

>@@ -1134,89 +1235,37 @@ nsXBLBinding::DoInitJSClass(JSContext *c
...
>+    ::JS_SetPrivate(cx, proto, aProtoBinding);
>+
>+    // Keep this proto binding alive while we're alive
>+    aProtoBinding->XBLDocumentInfo()->AddRef();
>+

Just wondering, why do you need this? DocInfo is AddRefed already in constructor.
(In reply to comment #27)
> Just wondering, why do you need this? DocInfo is AddRefed already in
> constructor.
> 
Hmm, or is that just to make sure prototype stays alive while the JSobject is alive.

David, I know that.  And actually, the object I need to traverse is in fact an nsISupports (the XBLDocumentInfo).  But the JS_GetPrivate of the JSObject is an XBLPrototypeBinding (which has a pointer to the XBLDocumentInfo), and what I don't know is how to tell the cycle collector that it needs to traverse either JS_GetPrivate of this particular JSObject or the JS_GetPrivate itself (either one) unless the private is ISUPPORTS (in which case nsXPConnect::Traverse handles it).

Basically, from the cycle collection point of view, the JSObject is holding a reference to the nsXBLDocumentInfo here....
> Hmm, or is that just to make sure prototype stays alive while the JSobject is
> alive.

Exactly.  Otherwise the XBLResolve method could be messing with deleted objects, which would be bad.
(In reply to comment #21)
> Which is worrisome enough in
> itself, to be honest: how is the nsXBLDocumentInfo ending up holding refs to
> things in content?

I might have made a mistake, but when I added cycle collection it looked as though nsXBLDocumentInfo is the *only* object that holds the nsXBLPrototypeBindings. If that is the case and nsXBLPrototypeBinding is a non-refcounted object then for cycle collection the ownership model can be simplified to nsXBLDocumentInfo holding anything that the nsXBLPrototypeBindings hold.

> Unfortunately, I don't see any way to teach the cycle collector about this
> reference (since the private is not nsISupports in this case, it doesn't get
> traversed, for example).  I suppose I could create an nsISupports-implementing
> shim object just to handle the cycle collection, but that seems silly....  And
> I do think we want the strong ref there; otherwise I see nothing guaranteeing
> that the JSObject for the proto won't outlive the nsXBLPrototypeBinding, right?
> 
> Peter, is there any way I can make this play nice with cycle collection?

In general we don't have a way for JS objects to inform the cycle collector what non-JS objects they hold. The traversal is hardcoded in nsXPConnect::Traverse (http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/xpconnect/src/nsXPConnect.cpp&rev=1.134&mark=841-867#835). Until now this solution got us far enough, but I'm certainly open to improving it. One way to solve it would be to make use of the trace hook to inform about non-JS objects but that's probably non-trivial.

In the short term, could we store the nsXBLDocumentInfo in the private slot (with JSCLASS_PRIVATE_IS_NSISUPPORTS) and use a reserved slot to store the nsXBLPrototypeBinding? That way the nsXBLDocumentInfo will be held alive and cycle collected if necessary, and you have fast access to the nsXBLPrototypeBinding. It's not a very elegant solution and I might be missing something. Let me know if it doesn't work, I can look into the trace hook solution but as I said, it's non-trivial and might take some time.
> nsXBLDocumentInfo is the *only* object that holds the nsXBLPrototypeBindings

That's correct.

And nsXBLPrototypeBindings shouldn't really be holding references to content that I see.  We could have been leaking the binding document and its global via the cycle:  nsXBLDocumentInfo -> xbl script global -> jscontext -> global JSObject -> xbl proto obj set up when compiling binding -> nsXBLDocumentInfo (since this proto is what has the nsXBLPrototypeBinding as a private).  But what I don't see is how we leaked ChromeWindows.  I seem to recall that there was a way to have the cycle collector dump out a graph on shutdown, right?  Is there some documentation on that somewhere?

> In the short term, could we store the nsXBLDocumentInfo in the private slot
> (with JSCLASS_PRIVATE_IS_NSISUPPORTS) and use a reserved slot to store the
> nsXBLPrototypeBinding?

Yeah, that should work.  I'll do it.
OK, so I no longer see this leaking ChromeWindows; I have no idea why I saw that to start with.

This patch clears up the XBL/CSS/etc leaks I do see.  It implements peterv's suggestion and makes unlinking in XBL a little more agressive.  The existing code failed to unlink enough, presumably because some nodes owned the nsIScriptContext (event handlers?), which owned a whole bunch of JSObjects, which now owned the nsXBLDocumentInfo, which owned all sorts of stuff which can hold on to either nodes or JSObjects (in particular, the DestroyMembers call in nsXBLProtoImpl::Unlink is absolutely necessary to avoid leaks.  Some of the other things may not be needed, but I suspect they all are....

Best thing to review is the diff from the previous patch (the one without -w) to this patch.
Attachment #281912 - Attachment is obsolete: true
Attachment #281914 - Attachment is obsolete: true
Attachment #282640 - Flags: review?(jonas)
Comment on attachment 282644 [details] [diff] [review]
Interdiff (for review purposes only)

>@@ -1240,10 +1247,19 @@
>+    // Keep this proto binding alive while we're alive.  Do this first so that
>+    // we can guarantee that in XBLFinalize this will be non-null.
>+    nsIXBLDocumentInfo* docInfo = aProtoBinding->XBLDocumentInfo();
>+    ::JS_SetPrivate(cx, proto, docInfo);
>+    NS_ADDREF(docInfo);
>+
>+    if (!::JS_SetReservedSlot(cx, proto, 0, PRIVATE_TO_JSVAL(aProtoBinding))) {
>+      (nsXBLService::gClassTable)->Remove(&key);
> 
>-    // Keep this proto binding alive while we're alive
>-    aProtoBinding->XBLDocumentInfo()->AddRef();
>+      // |c| will get dropped when |proto| is finalized

what is 'c'?

r=me with that fixed
Attachment #282640 - Flags: superreview+
Attachment #282640 - Flags: review?(jonas)
Attachment #282640 - Flags: review+
Attachment #282640 - Flags: approval1.9+
Feel free to get a more js-inclined sr if you want though.
> what is 'c'?

It's last used abot two lines outside the interdiff context.  It's declared as:

     nsXBLJSClass* c;

and is the JSClass used to set up |proto|.  |proto| keeps it alive until it gets finalized.

Comment on attachment 282640 [details] [diff] [review]
With the leaks fixed

Brendan, I'll probably reland this tomorrow, but whenever you get a chance to give it a once-over, that would be great.  See the interdiff if you only want to look at the stuff Blake didn't look at.
Attachment #282640 - Flags: review?(brendan)
Checked in.  Sadly, this seems to have no real perf improvement on tinderbox.  Which is odd, because it sure does over here...
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Er... actually, we still have to land the crash test (once the bug gets opened).
Flags: in-testsuite+ → in-testsuite?
(In reply to comment #39)
> Checked in.  Sadly, this seems to have no real perf improvement on tinderbox. 
> Which is odd, because it sure does over here...
> 

On the contrary bl-bldlnx03 is showing around a 1% Ts regression that looks to be due to this patch and its Tp and Tp2 numbers aren't looking so healthy either.
Yeah, I definitely can't reproduce that locally.  Locally I get an improvement in Ts and Txul and no effect on Tp (this latter is definitely what I would expect).

I do see tinderbox showing the (linux-only) Ts regression.  I just don't see why that would happen.
Depends on: 397924
Depends on: 398006
Depends on: 398028
Depends on: 398083
Depends on: 398135
Keywords: dev-doc-needed
Depends on: 398254
Depends on: 398466
Depends on: 398346
Can someone distill for me some insight into what about this impacts documentation?  At a quick overview, this looks more like fixing some internal bugginess than something that would directly affect documentation, so I could use a little help on that front.
The key difference from what we used to have is that fields are now evaluated the first time you access them, not at binding attachment time.  This can be observed in the various ways mentioned in comment 15, for example.

It's not a huge compat change, but it's a change that extension developers who were doing weird stuff may need to watch out for.
Added additional information to the documentation at:

http://developer.mozilla.org/en/docs/XBL:XBL_1.0_Reference:Elements#field

Please feel free to let me know if it's poorly worded or illogical (I only know enough about XBL to be dangerous, not enough to guarantee that I make sense when I talk about it).

Marking as documentation complete.
Attachment #282640 - Flags: review?(brendan)
Depends on: 404869
Flags: wanted1.8.1.x+
Depends on: 400705
Depends on: 420649
to verify on 1.8.0 branch after 398006 has landed.
Flags: blocking1.8.0.15+
Group: core-security
See Also: → 1758104
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: