Closed Bug 947555 Opened 11 years ago Closed 11 years ago

Rooting hazard failures (false positives) from script settings stack

Categories

(Core :: XPConnect, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 937317

People

(Reporter: bzbarsky, Assigned: sfink)

References

Details

Attachments

(1 file, 1 obsolete file)

Long story short, the analysis thinks GetIncumbentGlobal can GC, so this sort of code:

  new Function(&val.toObject(), mozilla::dom::GetIncumbentGlobal());

is now a GC hazard.

Or so the analysis thinks.  Its reason for thinking GetIncumbentGlobal can GC is:

  GC Function: nsIGlobalObject* mozilla::dom::GetIncumbentGlobal()
  JSContext* nsContentUtils::GetCurrentJSContextForThread()
  JSContext* nsContentUtils::GetCurrentJSContext()
  FieldCall: nsIXPConnect.GetCurrentJSContext

As in, it sees that nsIXPConnect.GetCurrentJSContext is virtual and assumes it can GC.  However:

  > mrgiggles, can nsXPConnect::GetCurrentJSContext GC?
  <mrgiggles> bz: No, |JSContext* nsXPConnect::GetCurrentJSContext()| cannot GC

so this is in fact a false positive.  That's good.  ;)

Can we just annotate nsIXPConnect.GetCurrentJSContext as not gcing and force the analysis to fail if nsXPConnect::GetCurrentJSContext ever starts being able to GC?

If not, can we rejigger nsContentUtils::GetCurrentJSContext() to not do virtual calls (e.g. store a reference to the XPCJSRuntime or the XPCJSContextStack in nsContentUtils directly)?
I hope to fix this with bug 947400, though unfortunately, I don't think it's adequate yet.

The analysis has had a limitation where virtual calls are treated as always GCing for the purpose of determining whether there's a hazard in the calling function. They are *not* assumed to GC for the purpose of constructing the call graph. So if you created a dummy function with a virtual call in it, and called the dummy function instead of the direct virtual call, the analysis would do the right thing. Bug 947400 fixes this to properly handle virtual function calls.

However, it looks like that's not enough to detect that GetCurrentJSContext() cannot GC. I am still looking into why. But yes, I'll either fix it directly in the analysis, or add an annotation.
The fix in bug 947400 failed to resolve this bug because the relevant method is on a class that inherits from nsISupports, and the analysis special-cases nsISupports stuff because they're overridable via script (or extensions, I guess?). If I undo the special-casing for nsISupports.GetCurrentJSContext (only), it does the virtual function resolution properly. I haven't finished a complete analysis with that change yet, but it's looking good so far.

With this setup, if nsXPConnect::GetCurrentJSContext were modified to GC, the analysis would not fail. But it would start reporting all these hazards, which seems fine since they would be real. We'd have to figure out what to do about it.

I'll attach a patch after verifying that it really does fix this problem.
Assignee: nobody → sphink
> the analysis would not fail. But it would start reporting all these hazards

That would turn the tree orange, yes?  That's all I really want: that we don't silently introduce hazards.  ;)

We have no plans to make GetCurrentJSContext GC.  We do have plans to try to get rid of it entirely.  ;)
(In reply to Boris Zbarsky [:bz] from comment #3)
> > the analysis would not fail. But it would start reporting all these hazards
> 
> That would turn the tree orange, yes?  That's all I really want: that we
> don't silently introduce hazards.  ;)

Yes. Well, ok, it would after I unbreak the result checking code. Which I have a patch for in bug 944158, but I can't land it yet because of the 678 new hazards from this bug. ;-)

And my fix didn't work. Haven't looked into why, yet. Argh.
Bug 947400 + special-casing GetCurrentJSContext revealed the next problem: nsIEventTarget.IsOnCurrentThread. That's pretty much the same story. Special-casing it gets me to:

GC Function: nsIGlobalObject* mozilla::dom::GetIncumbentGlobal()
    nsIGlobalObject* xpc::GetNativeForGlobal(JSObject*)
    nsCOMPtr<T>::nsCOMPtr(nsQueryInterface) [with T = nsIXPConnectWrappedNative]
    void nsCOMPtr<T>::assign_from_qi(nsQueryInterface, const nsIID&) [with T = nsIXPConnectWrappedNative; nsIID = nsID]
    uint32 nsQueryInterface::operator(nsID*, void**)(const nsIID&, void**) const
    FieldCall: nsISupports.QueryInterface

Somebody please tell me that can't GC, and why, so I can tell the analysis. needinfo? bholley.
Flags: needinfo?(bobbyholley+bmo)
(In reply to Steve Fink [:sfink] from comment #5)
> Somebody please tell me that can't GC, and why, so I can tell the analysis.
> needinfo? bholley.

So, we're QIing the global to nsIXPCWrappedNative. This is generally ok, I think, but it's sketchy there are all kinds of funny things that can happen during QI.

What would be the impact of just deciding that this function can GC?
Flags: needinfo?(bobbyholley+bmo) → needinfo?(bzbarsky)
(In reply to Bobby Holley (:bholley) from comment #6)
> 
> What would be the impact of just deciding that this function can GC?

Seems pretty bad. I mean, it triggers about 670 new hazards. Maybe there's an easy fix, but I'll let bz chime in.

Talking to bholley on irc:

	<sfink>	bholley: when you say QIing the global to nsIXPCWrappedNative is sketchy since all kinds of funny things can happen during QI (bug 947555), do you mean QI in general or QI to nsIXPCWrappedNative in particular?
	bholley	sfink: QI in general
	sfink: I think it's probably fine
	sfink: but QI in general _can_ run script
	sfink: and there's all sorts of machinery that can run
	sfink	bholley: right, I know that. I was wondering if we could declare this particular QI to be safe.
	sfink	and if so, why this one feels safer than others, so I can minimize the holes I poke in the analysis
	bholley	sfink: well, we know the object is a global object
	sfink: which means that it isn't an XPCWrappedJS
	sfink: and it isn't an aggregated-JS-with-native
	sfink	bholley: is this stuff easily assertable?
	bholley	sfink: well, in the common cases sure
	sfink: my concern is that maybe there's some case I hadn't thought of where we might hit some QI stuff that runs script
	bholley	sfink: for WNs, the QI implementation is in XPCWrappedNative.cpp: NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(XPCWrappedNative)
	sfink	bholley: right, so I was hoping that I could annotate + add a believed-to-be-accurate assertion to back it up, and watch for assert failures
	I mean, I could always just do an AutoAssertNoGC
	but that'll only fire if it happens to GC, not if it's potentially possible
	bholley	sfink: hm
	sfink: oh!
	sfink: I might have an idea
	sfink: instead of QIing the private, we can just check the isWrappedNative bit on the JSClass
	sfink: and then static_cast
	youpie
	sfink	ok. I mean, I'd be fine with continuing to QI, but asserting the isWrappedNative bit.
	but if that's faster/cleaner, sounds great!
	bholley	sfink: yes, and then we don't have to special-case the analysis
	sfink	right
	yay, thanks!

So I tried that out. It seems to work fine for elminating

  nsCOMPtr<nsIXPConnectWrappedNative> wn = do_QueryInterface(native)

but sadly, the return value of the whole function is

     nsCOMPtr<nsIGlobalObject> global = do_QueryInterface(native);

and that one isn't a plain static_cast<>. There's an offset of 0x8 for a SandboxPrivate, for example. I don't even know if doing a QI to nsIGlobalObject here is gc-free or not. bholley?
Flags: needinfo?(bobbyholley+bmo)
> What would be the impact of just deciding that this function can GC?

You'd need to change the ctor for CallbackObject and all subclasses to take a Handle instead of a raw JSObject* and then fix all callers to actually pass in a useful Handle (probably by sticking the value into a Rooted on the stack).

It's not insane, but it's a bit of work.  And a small (very small, so I'd be ok with it) perf hit.
Flags: needinfo?(bzbarsky)
Oh, and the vast majority of those 670 callsites are codegenned, so the change would not be _that_ big.
(In reply to Steve Fink [:sfink] from comment #7)
> (In reply to Bobby Holley (:bholley) from comment #6)
> > 
> > What would be the impact of just deciding that this function can GC?
> 
> Seems pretty bad. I mean, it triggers about 670 new hazards. Maybe there's
> an easy fix, but I'll let bz chime in.
> 
> Talking to bholley on irc:
> 
> 	<sfink>	bholley: when you say QIing the global to nsIXPCWrappedNative is
> sketchy since all kinds of funny things can happen during QI (bug 947555),
> do you mean QI in general or QI to nsIXPCWrappedNative in particular?
> 	bholley	sfink: QI in general
> 	sfink: I think it's probably fine
> 	sfink: but QI in general _can_ run script
> 	sfink: and there's all sorts of machinery that can run
> 	sfink	bholley: right, I know that. I was wondering if we could declare this
> particular QI to be safe.
> 	sfink	and if so, why this one feels safer than others, so I can minimize
> the holes I poke in the analysis
> 	bholley	sfink: well, we know the object is a global object
> 	sfink: which means that it isn't an XPCWrappedJS
> 	sfink: and it isn't an aggregated-JS-with-native
> 	sfink	bholley: is this stuff easily assertable?
> 	bholley	sfink: well, in the common cases sure
> 	sfink: my concern is that maybe there's some case I hadn't thought of where
> we might hit some QI stuff that runs script
> 	bholley	sfink: for WNs, the QI implementation is in XPCWrappedNative.cpp:
> NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(XPCWrappedNative)
> 	sfink	bholley: right, so I was hoping that I could annotate + add a
> believed-to-be-accurate assertion to back it up, and watch for assert
> failures
> 	I mean, I could always just do an AutoAssertNoGC
> 	but that'll only fire if it happens to GC, not if it's potentially possible
> 	bholley	sfink: hm
> 	sfink: oh!
> 	sfink: I might have an idea
> 	sfink: instead of QIing the private, we can just check the isWrappedNative
> bit on the JSClass
> 	sfink: and then static_cast
> 	youpie
> 	sfink	ok. I mean, I'd be fine with continuing to QI, but asserting the
> isWrappedNative bit.
> 	but if that's faster/cleaner, sounds great!
> 	bholley	sfink: yes, and then we don't have to special-case the analysis
> 	sfink	right
> 	yay, thanks!
> 
> So I tried that out. It seems to work fine for elminating
> 
>   nsCOMPtr<nsIXPConnectWrappedNative> wn = do_QueryInterface(native)
> 
> but sadly, the return value of the whole function is
> 
>      nsCOMPtr<nsIGlobalObject> global = do_QueryInterface(native);
> 
> and that one isn't a plain static_cast<>. There's an offset of 0x8 for a
> SandboxPrivate, for example. I don't even know if doing a QI to
> nsIGlobalObject here is gc-free or not. bholley?

Oh right. Duh.

So, here are the objects we might be dealing with:
* an XPCWN (detectable in the case above). That, in turn, will contain an nsGlobalWindow (inheriting nsIScriptGlobalObject).
* nsInProcessTabChildGlobal
* WorkerScope
* SandboxPrivate
* BackstagePass

(this is all from [1])

I can verify that these QI implementations will not GC when QIed to nsIGlobalObject. But I'm kind of concerned about leaving this footgun around.

I think we should just Handle-ify this API.

[1] http://mxr.mozilla.org/mozilla-central/search?string=public%2BnsIGlobalObject
Flags: needinfo?(bobbyholley+bmo)
> I think we should just Handle-ify this API.

I think I agree, as I said on irc.
Attached patch Root CallbackObject params (obsolete) — Splinter Review
It's straightforward to do. Is this what you were thinking of? My naming is kind of lame. I'll fix up the patch if it's roughly right.
Attachment #8345002 - Flags: feedback?(bzbarsky)
Comment on attachment 8345002 [details] [diff] [review]
Root CallbackObject params

Yes, just like that.  But wrap a scope around the tempRoot usage in codegen, please, to prevent name collisions and whatnot.
Attachment #8345002 - Flags: feedback?(bzbarsky) → feedback+
(In reply to Boris Zbarsky [:bz] from comment #13)
> Comment on attachment 8345002 [details] [diff] [review]
> Root CallbackObject params
> 
> Yes, just like that.  But wrap a scope around the tempRoot usage in codegen,
> please, to prevent name collisions and whatnot.

Ok, but that'll always result in an extra layer of scoping. The 1st tempRoot goes into a parameter passed to wrapObjectTemplate, which always puts it inside an if block. The 2nd tempRoot puts it into an if block just a little further down. I looked into adding a 'wrapScope' parameter to wrapObjectTemplate, but... well, it already always does, so the parameter would be ignored.

But your wish is my command! ;-) Updated patch attached. I tried and failed to come up with a better name for tempRoot.
Attachment #8345409 - Flags: review?(bzbarsky)
Attachment #8345002 - Attachment is obsolete: true
Comment on attachment 8345409 [details] [diff] [review]
Root around GC call GetIncumbentGlobal

>+                "{\n" +

Drop the + sign here and on the next two lines.  Python will automatically concatenate adjacent literal strings for you.  And just put the "% name" at the very end.

Similar at the other spot.  And add a comment after the '{' that it's a scope for tempRoot.

r=me
Attachment #8345409 - Flags: review?(bzbarsky) → review+
Since this may end up blocking bug 937317 from re-landing, moving the patch over there and resolving this bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: