Enable same-compartment-realms for system-principal windows

RESOLVED FIXED in Firefox 66

Status

()

enhancement
P2
normal
RESOLVED FIXED
8 months ago
5 months ago

People

(Reporter: jandem, Assigned: jandem)

Tracking

(Depends on 1 bug)

unspecified
mozilla66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 fixed)

Details

Attachments

(2 attachments)

This is similar to bug 1512029 but for window globals with the system principal. For this we need to fix some WindowProxy stuff (bug 1467124), my Try pushes disable WindowProxy JIT optimizations for now.
Depends on: 1514251
Depends on: 1514261
Depends on: 1514263
Filed bugs for a bunch of assertion failures I see on Try. Once these are fixed, debug builds will hopefully be much greener.
Priority: -- → P2
Depends on: 1514672
Here's a Try push with some in-flight fixes and workarounds:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=67445d5d8cb02837d3dc516fd5c0a5d03013609d

Remaining issues are mostly:

* Debugger test failures because the debugger and debuggee have to be in separate compartments, and we have some chrome-debugging tests. This is a bit unfortunate because if we use a separate compartment for the relevant debugger global(s) then we will slow down devtools for a niche use case, but it's probably the right thing to do for now.

* Bug 1514672.

* Some tests leak a "chrome://browser/content/webext-panels.xul" window.
> and we have some chrome-debugging tests.

We also support chrome debugging in general, right?

Seems like the right solution is to specifically use a separate compartment for debugger globals that plan to debug chrome...
Depends on: 1514776
To me, it sounds like we should be using a Sandbox and the invisibleToDebugger flag when we want to debug chrome code:
  https://searchfox.org/mozilla-central/source/devtools/shared/Loader.jsm#211-220

Here it looks like only platform Debugger API tests are failing, because they are not using Debugger API from a Sandbox... is that right?
(In reply to Alexandre Poirot [:ochameau] from comment #5)
> Here it looks like only platform Debugger API tests are failing, because
> they are not using Debugger API from a Sandbox... is that right?

Hm I was looking at the dt2 failure in browser_jsterm_autocomplete_in_chrome_tab.js for example: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=67445d5d8cb02837d3dc516fd5c0a5d03013609d&selectedJob=217381832

I haven't investigated it yet so maybe that's a different issue?
Oh yes, this one is a DevTools one which should pass.

We should probably somehow force this invisibleToDebugger flag when running such test:
  https://searchfox.org/mozilla-central/source/devtools/shared/Loader.jsm#220

Note that typical chrome debugging should be done right.
The typical way to debug parent process chrome code is to go through the Browser toolbox, which frontend runs from another process and the server it connects to runs in a distinct Loader where we force this invisibleToDebugger flag:
  https://searchfox.org/mozilla-central/source/devtools/client/framework/ToolboxProcess.jsm#146-155
(In reply to Jan de Mooij [:jandem] from comment #3)
> * Some tests leak a "chrome://browser/content/webext-panels.xul" window.

The good news is that these are gone now on top of the other changes, maybe from using the first global as canonical global for WNs. Try is starting to look pretty nice:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=d6fd19a0235e3583ff72997d01eb74dc6cc62dad

I don't know yet what's going on with the remaining few bc* and dt* failures though.

There's also a sandbox failure in a test that (minimized) does this:

  function f() {}
  function g() { return this; }
  f.g = g;
  var s = new Cu.Sandbox(window, { sandboxPrototype: window } );

  is(Cu.evalInSandbox('f.g()', s), f,
     "Should have the right function object");

We fail this with:

  Should have the right function object - got function f() {}, expected function f() {}

I think we now call this with a "sandbox callable proxy" around |f| as |this| instead of |f|.
I'm debugging a problem with the update service and running into a weird issue. Without any of my patches, if I add this LOG call to the code at [0]:

  update.QueryInterface(Ci.nsIWritablePropertyBag);
  update.setProperty("patchingFailed", oldType);
  LOG("+++++++>>>" + update.getProperty("patchingFailed") + "::" + oldType);

We get |undefined| for the result of the getProperty (oldType is "partial").

I think this is causing some test failures *with* my patches because there's a getProperty somewhere else that now returns |undefined| too.

Can you think of any obvious issues in this area?

[0] https://searchfox.org/mozilla-central/rev/13788edbabb04d004e4a1ceff41d4de68a8320a2/toolkit/mozapps/update/nsUpdateService.js#1015-1016
Flags: needinfo?(bzbarsky)
Depends on: 1512029
(In reply to Jan de Mooij [:jandem] from comment #9)
> We get |undefined| for the result of the getProperty (oldType is "partial").

Hm the getProperty ends up calling nsXPCWrappedJS::GetProperty instead of the getProperty defined on the class. I think this is because nsXPCWrappedJS implements nsIPropertyBag? Ugh.
Hmm.  So nsXPCWrappedJS definitely implements nsIPropertyBag.  

Presumably here we are working with an Update instance, right?

It seems a bit weird to me to use the built-in propertybag impl of nsXPCWrappedJS if the underlying object claims to implement nsIPropertyBag.  But that is certainly what we do.  nsIWritablePropertyBag does get punched through down to the underlying object.

That said, when accessing stuff from script, are we ending up with an XPCWrappedNative around an XPCWrappedJS here, in a situation where we used to just have the raw Update object or something?
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #11)
> That said, when accessing stuff from script, are we ending up with an
> XPCWrappedNative around an XPCWrappedJS here, in a situation where we used
> to just have the raw Update object or something?

I debugged some more: the problem happens when we QI nsIPropertyBag instead of or before nsIWritablePropertyBag. Then we find a getProperty method on nsIPropertyBag and things go wrong. If we however QI nsIWritablePropertyBag first then we find it there and we're good.

I think this used to work in this case because there was a compartment boundary and so we created a brand new WN when crossing it. Now we no longer do that and encounter our nsIPropertyBag.

Changing the QI of nsIPropertyBag to nsIWritablePropertyBag will probably fix this. Fingers crossed.
Depends on: 1515590
Depends on: 1515611
Depends on: 1480121
Depends on: 1515754
(In reply to Jan de Mooij [:jandem] from comment #12)
> (In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #11)
> > That said, when accessing stuff from script, are we ending up with an
> > XPCWrappedNative around an XPCWrappedJS here, in a situation where we used
> > to just have the raw Update object or something?
> 
> I debugged some more: the problem happens when we QI nsIPropertyBag instead
> of or before nsIWritablePropertyBag. Then we find a getProperty method on
> nsIPropertyBag and things go wrong. If we however QI nsIWritablePropertyBag
> first then we find it there and we're good.
> 
> I think this used to work in this case because there was a compartment
> boundary and so we created a brand new WN when crossing it. Now we no longer
> do that and encounter our nsIPropertyBag.

There are so many fun bugs with code that only works (or doesn't work) when we have a wrapper with a cached interface queried by some other code...

Anyway, this seems like a hell of a footgun. Can you file a bug to only return the XPCWrappedJS nsIPropertyBag implementation if the wrapped object doesn't query to nsIPropertyBag on its own?

I wonder if anything even uses this feature anymore...
(In reply to Kris Maglione [:kmag] from comment #13)
> I wonder if anything even uses this feature anymore...

It looks like the only thing that "uses" it is this HTTPConnectionManager code which assumes that if an object queries to nsIPropertyBag it's an XPCWrappedJS:

https://searchfox.org/mozilla-central/rev/9528360768d81b1fc84258b5fb3601b5d4f40076/netwerk/protocol/http/nsHttpConnectionMgr.cpp#543-547

Which is absurd, given that it can just query to nsIXPConnectWrappedJS.

The async shutdown code looked like it might rely on it, but it actually has a helper to explicitly convert a plain JS object to a property bag instead...

So it may make more sense to just remove this code altogether.
(In reply to Kris Maglione [:kmag] from comment #13)
> Anyway, this seems like a hell of a footgun. Can you file a bug to only
> return the XPCWrappedJS nsIPropertyBag implementation if the wrapped object
> doesn't query to nsIPropertyBag on its own?

Filed bug 1515884.
Depends on: 1516237
There's a small number of devtools test failures that will be fixed by Alexandre
in bug 1515290. When that lands we can revert this change.
This should be green on top of the patches in the dependent bugs. At least on Linux64; unfortunately our infra has been down all day so I can't test this on more platforms on Try.
(In reply to Jan de Mooij [:jandem] from comment #18)
> This should be green on top of the patches in the dependent bugs.

Well almost... Talos damp (devtools perf tests) is orange on Windows about 50% of the time. Unfortunately only on Windows but I can't repro locally on Windows. Try debugging Windows builds takes time sadly.
Depends on: 1516679
Depends on: 1516967
This should be green now on top of the in flight patches in the dependent bugs. The Win64 damp issue in bug 1516679 magically disappeared on Try after a rebase, probably because of a debugger update.
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ba3a14b06003
part 1 - Create the devtools sandbox in a new compartment for now. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/1b13a38c51e5
part 2 - Use the privileged junk scope's compartment for windows created with the system principal. r=bzbarsky
https://hg.mozilla.org/mozilla-central/rev/ba3a14b06003
https://hg.mozilla.org/mozilla-central/rev/1b13a38c51e5
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Let me just be the first to say \o/
Blocks: 1517694
Bug 1517758 has some Talos perf data. There's the tp5o responsiveness regression I'll investigate but there were many improvements as well.

As bug 1512029 landed in the same push than this bug it is hard to say which patch had the most impact, but we got nice speedups from your work in DevTools. Thanks Jan!

== Change summary for alert #18560 (as of Thu, 03 Jan 2019 09:04:31 GMT) ==

Regressions:

24% damp custom.jsdebugger.stepOver.DAMP osx-10-10 opt e10s stylo 781.48 -> 969.01
16% damp custom.inspector.collapseall.balanced windows7-32 opt e10s stylo 13.90 -> 16.15
14% damp custom.inspector.collapseall.manychildren windows7-32 opt e10s stylo 1.85 -> 2.12
11% damp custom.inspector.collapseall.manychildren windows10-64 opt e10s stylo 1.92 -> 2.13
11% damp custom.inspector.collapseall.balanced windows10-64-qr opt e10s stylo 15.59 -> 17.31
11% damp server.protocoljs.DAMP windows10-64 opt e10s stylo 1,649.41 -> 1,828.28
A couple of others, but I think they can be related to so many changes to the other subtests.
It often happen that making one test significantly faster, makes a couple others slower.

Improvements:

33% damp simple.styleeditor.close.DAMP windows7-32 pgo e10s stylo 16.99 -> 11.38
32% damp simple.netmonitor.reload.DAMP osx-10-10 opt e10s stylo 85.01 -> 57.94
28% damp simple.webconsole.close.DAMP osx-10-10 opt e10s stylo 53.98 -> 39.05
27% damp simple.styleeditor.close.DAMP linux64-qr opt e10s stylo 17.57 -> 12.84
26% damp simple.styleeditor.close.DAMP windows10-64 pgo e10s stylo 17.07 -> 12.66
24% damp simple.styleeditor.close.DAMP linux64 opt e10s stylo 19.17 -> 14.57
23% damp simple.styleeditor.close.DAMP windows10-64-qr opt e10s stylo 16.50 -> 12.68
23% damp custom.jsdebugger.stepOut.DAMP windows7-32 pgo e10s stylo 849.15 -> 653.98
And many other very significant speedups.

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18560

I re-ran the profiler against a recent tree and I'm still seeing cross compartment wrappers when running DevTools.
I imagine bug 1515290 is preventing the completion of this work around compartments in DevTools, so I'm expecting even more improvements later on.

(In reply to Alexandre Poirot [:ochameau] from comment #25)

I imagine bug 1515290 is preventing the completion of this work around compartments in DevTools, so I'm expecting even more improvements later on.

Yeah, when I disabled this for the devtools sandbox (part 1 here) DAMP regressed quite a bit on Try compared to my earlier Try pushes IIRC, so I'm looking forward to bug 1515290 landing :)

Blocks: 1519529
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.