Closed Bug 1512029 Opened 11 months ago Closed 10 months ago

Enable same-compartment realms for system-principal sandboxes

Categories

(Core :: XPConnect, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox64 --- unaffected
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Depends on 1 open bug)

Details

Attachments

(4 files)

See discussion in bug 1511359. 

There are two pieces to this:

(1) Use sameCompartmentAs(existingSystemCompartment) when creating system realms where possible.

(2) Ensure system realms are in the system zone to make (1) more effective. See bug 1511359 comment 15 and later.
Summary: Enable same-compartment realms for system compartments → Enable same-compartment realms for system realms
Flags: needinfo?(jdemooij)
Jan, are you planning to work on this?  Is that what the needinfo is for?

If you do plan to do it, the way I was figuring on doing this was creating the privileged junk scope pretty early on (where we create the unprivileged junk scope, like we used to) and then just using the privileged junk global as the "existing compartment" object for all later system-principal compartment creations.
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #1)
> If you do plan to do it, the way I was figuring on doing this was creating
> the privileged junk scope pretty early on (where we create the unprivileged
> junk scope, like we used to)

That's easier said than done. The code that creates the unprivileged junk scope runs too early for the logic that creates module globals to be able to function. It can still run pretty early, but as things stand now, it would have to happen from a different place.
I see.

The other option is to keep doing the privileged junk scope thing lazily and just do it the first time we need a system-principal global of various sorts.  I guess we'd need to check whether any of those are created "too early".  Would try catch that?
Basically, anywhere else we're currently creating a system scope should late enough for us to create shared JSM global/privileged junk scope. The problem with creating it where we create the unprivileged junk scope is just that it happens in the middle of XPConnect initialization, which means the module loader code which creates the global can't touch xpconnect.

Anything that runs before XPConnect initialization clearly can't create a system global, and anything that runs after it should be fine.
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #1)
> Jan, are you planning to work on this?  Is that what the needinfo is for?

Yeah, but please steal if you're interested!

Using the privileged junk scope like that makes sense to me.
Here's a patch to use the privileged junk scope's compartment for sandboxes created with the system principal and without the sameZoneAs/freshZone options.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=d50daf9080fc5bc4e779943adc67e9b938966311

The browser starts locally after I hacked around a few issues. Let's see what Try says.
That Try push is actually greener than I expected, but there is a bunch of orange. Mostly assertion failures but I also broke the subscript/module loader I think: some tests throw things like "XPCOMUtils is not defined".

We'll just have to debug I guess. I can look into that tomorrow if nobody beats me to it.
(In reply to Jan de Mooij [:jandem] from comment #7)
> That Try push is actually greener than I expected, but there is a bunch of
> orange. Mostly assertion failures but I also broke the subscript/module
> loader I think: some tests throw things like "XPCOMUtils is not defined".

Hrm. This seems to be for the weird case of the devtools loader loading Promise-backend.js as a CommonJS module, and it calling Cu.import() without a target object. I honestly don't really care about this case, since devtools modules aren't supposed to do that anyway (since it pollutes the namespace of all other modules), but it would be nice to figure out why that's not working now.

Essentially, those scripts are all loaded via the subscript loader into plain JS environment objects in a shared sandbox. In the case of a bare Cu.import() call in that environment, we should walk the scope chain, discard all of the scopes, since there's no NSVO, and fall back to the current realm's global.

So either we're failing to find the correct Sandbox global when we do the import, or we're failing to setup the scope chain correctly when we load Promise-backend.js, so that it can't see the properties we imported into that global object.
(In reply to Kris Maglione [:kmag] from comment #8)
> but it would be nice to figure out why that's not working now.

I think what's happening is that the Cu.import native here is in a different realm:

  Cu.import("resource://gre/modules/XPCOMUtils.jsm");

The JS engine switches to Cu.import's realm and then we end up defining XPCOMUtils on the wrong global. Then we return to CallJSNative where we switch back to our caller realm.
(In reply to Jan de Mooij [:jandem] from comment #9)
> (In reply to Kris Maglione [:kmag] from comment #8)
> > but it would be nice to figure out why that's not working now.
> 
> I think what's happening is that the Cu.import native here is in a different
> realm:
> 
>   Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> 
> The JS engine switches to Cu.import's realm and then we end up defining
> XPCOMUtils on the wrong global. Then we return to CallJSNative where we
> switch back to our caller realm.

Oh, yeah, that makes sense. When they were in separate compartments, passing Cu from one scope to the other would have caused us to create a new Cu wrapper for the target compartment/realm.

I suppose we should have require("chrome") return the appropriate objects for the sandbox, but the simplest solution for this case is to just pass `this` as a second argument to Cu.import()
Thanks, with that fix things are slightly better:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=aa86c6298c154e32e4f3a88f80e0fc22a68d009a

One issue is that nukeSandbox on a system sandbox will now nuke the whole system compartment. I think we have to aduit nukeSandbox calls to see what kind of sandbox they operate on.
(In reply to Jan de Mooij [:jandem] from comment #11)
> One issue is that nukeSandbox on a system sandbox will now nuke the whole
> system compartment. I think we have to aduit nukeSandbox calls to see what
> kind of sandbox they operate on.

Maybe there could be a supportsNuking sandbox option that forces the sandbox to be in its own compartment. Then in Cu.nukeSandbox we throw an exception if that option isn't set on the sandbox.
Or maybe nuking should work with a target *realm* instead of a target *compartment*, but then we can't really implement this "nuke references from target" behavior, because wrappers are shared with other realms in the target realm's compartment:

https://searchfox.org/mozilla-central/source/js/src/proxy/CrossCompartmentWrapper.cpp#483-486
At some point, we're going to need to support putting multiple content script sandboxes in the same compartment (mainly so they share the same X-ray expandos), and we're definitely going to want to support nuking those sandboxes.

I'm going to need to put some thought into how that's going to have to work. Either we're going to need to nuke the compartment when the last sandbox is destroyed, or we're going to need to nuke the realms instead, and then cut the wrappers when the last realm in the compartment is nuked.

In the mean time, though... we may just want to have nukeSandbox check if the target compartment is the same as the shared JSM scope, and bail if it is. We already have similar checks in a bunch of places to prevent people from doing stupid things to the shared JSM scope.
Depends on: 1512260
I spun off the wrapper nuking thing to bug 1512260.
Depends on: 1512396
Apart from the dependent bugs, things are starting to look pretty green on Try.

Try pushes uncovered a SpiderMonkey bug with same-compartment realms, but nothing too serious and it's exciting to see it in action :) There's also some devtools stuff I need to track down still.
Depends on: 1512410
Depends on: 1512417
Depends on: 1512509
Depends on: 1512655
Depends on: 1512718
(In reply to Jan de Mooij [:jandem] from comment #17)
> My WIP patches for this improve devtools tests a lot:

\o/
(In reply to Jan de Mooij [:jandem] from comment #17)
> My WIP patches for this improve devtools tests a lot:
> 
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=mozilla-
> central&newProject=try&newRevision=a394b4ee0dec&originalSignature=f79b6a4f1f8
> f53326d3056f6c8008c0ff4de0a94&newSignature=f79b6a4f1f8f53326d3056f6c8008c0ff4
> de0a94&framework=12&selectedTimeRange=172800

By forcing all system compartment sandboxes to be loaded in a unique compartment you actually fixed a long standing compartment issue that I was looking for fixing once this bug was addressed, so thanks for fixing everything at once \o/

DevTools are using two kinds of loaders. Each loader is using a shared global for all the sandboxes/modules so that we are not having cross compartments between two modules of a given loader. But as we started using React, we had to introduce another kind of loader, with a global specific to each panel (console, debugger, ...), exposing the panel's document's globals.
Our long standing issue was that calls between these two kinds of loaders were having cross compartment wrapper's overhead.

But that is only one long standing issue. We are still having another one, which is expected to be even more significant than the one you already fixed. We are still having cross compartments overhead between any of our loader/module and the tool panels. It is especially expensive when react (loaded in a module) interact with the DOM of the panel.
In order to mitigate that, we are trying to migrate React to be loaded from the panel, via a script tag or an es module.
But no matter what we do, we always end up with cross compartment wrappers at one border.
So, I was wondering if, once this bug is fixed, we could also have the panel documents to be sharing the same compartment?
My current comprehension of your patch is that it only impact JSM and Sandboxes, but could that also apply to chrome documents?

Thanks a lot for driving this work!

If there is anything that needs attention for this bug from the DevTools codebase, I would be happy to coordinate.
(Note that I tried to get rid of Promise.jsm completely from DevTools, but got stuck because of these long standing perf issues... So it looks like we may finally be able to remove them soon!)
(In reply to Alexandre Poirot [:ochameau] from comment #19)
> So, I was wondering if, once this bug is fixed, we could also have the panel
> documents to be sharing the same compartment?

These are also using the system principal? After we fix the remaining underlying issues in the dependent bugs and enable this for sandboxes, it will hopefully be easy to add more system-principal globals to this shared compartment.
(In reply to Jan de Mooij [:jandem] from comment #20)
> (In reply to Alexandre Poirot [:ochameau] from comment #19)
> > So, I was wondering if, once this bug is fixed, we could also have the panel
> > documents to be sharing the same compartment?
> 
> These are also using the system principal? After we fix the remaining
> underlying issues in the dependent bugs and enable this for sandboxes, it
> will hopefully be easy to add more system-principal globals to this shared
> compartment.

Yes, they are still using system principal as some are still using XUL.
They are loaded via chrome URI, like:
  chrome://devtools/content/debugger/new/index.html
> it will hopefully be easy to add more system-principal globals to this shared compartment.

It should be, yes.  SelectZone in nsGlobalWindowOuter.cpp is where the relevant logic lives.  The caller has a principal that it could pass into there.
Depends on: 1513277
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Summary: Enable same-compartment realms for system realms → Enable same-compartment realms for system-principal sandboxes
Because it release-asserts the compartment has a single realm.

I also renamed JS_GetCompartmentPrincipals to JS_DeprecatedGetCompartmentPrincipals
to discourage people from using it.
The debuggee and debugger must be in separate compartments. I tried using a null
principal for the debuggee, but that doesn't work because some of these tests
use Components or do interesting principal-related things with the sandbox (see
test_objectgrips-17.js for an example of this).

Depends on D14253
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #22)
> SelectZone in nsGlobalWindowOuter.cpp is where the
> relevant logic lives.  The caller has a principal that it could pass into
> there.

I tried this but we fail the ToWindowIfWindowProxy(obj) == cx_->global() assert in CacheIR.cpp during startup. This depends on the WindowProxy work I guess? It's pretty nice we get that far though :)

The patches here are all green:

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=59d5d27a7c8b2ca536fdde18617e3aadd52e172f
Attachment #9030708 - Attachment description: Bug 1512029 part 3 - Add a newCompartment sandbox option and use it in devtools xpconnect tests. r?bzbarsky → Bug 1512029 part 3 - Use freshZone for debuggee sandbox created in devtools xpconnect tests. r?bzbarsky
Priority: -- → P2
"Allocating" all CCWs in the same realm (bug 1469082) makes this a bigger win on damp than in comment 17:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=46f41eb8013f75875abba9b99668486a0c045225&originalSignature=f79b6a4f1f8f53326d3056f6c8008c0ff4de0a94&newSignature=f79b6a4f1f8f53326d3056f6c8008c0ff4de0a94&framework=12&selectedTimeRange=172800

-77% on jsdebugger.stepIn.DAMP?

Maybe we're now able to GC sandboxes faster. Tomorrow I'll try to run this benchmark locally to collect some data on this, it's pretty intriguing.
> This depends on the WindowProxy work I guess?

Yes, but shouldn't depend on the Gecko bits.  I think bug 1467124 (and especially the last sentence of bug 1467124 comment 0) more or less covers what needs to happen to make this assertion not fire.
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #29)
> Yes, but shouldn't depend on the Gecko bits.  I think bug 1467124 (and
> especially the last sentence of bug 1467124 comment 0) more or less covers
> what needs to happen to make this assertion not fire.

Oh right, I'll look into that after this lands. I guess I should just disable these IC stubs for now for my Try experiments :)
(In reply to Jan de Mooij [:jandem] from comment #28)
> "Allocating" all CCWs in the same realm (bug 1469082) makes this a bigger
> win on damp than in comment 17:
> 
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=mozilla-
> central&newProject=try&newRevision=46f41eb8013f75875abba9b99668486a0c045225&o
> riginalSignature=f79b6a4f1f8f53326d3056f6c8008c0ff4de0a94&newSignature=f79b6a
> 4f1f8f53326d3056f6c8008c0ff4de0a94&framework=12&selectedTimeRange=172800
> 
> -77% on jsdebugger.stepIn.DAMP?

You have to set a custom base revision. Here you are comparing against the last 2 days.
This "compare against last x days" is handy to ensure if there is a regression or not,
but if something is reported, it may just relate to some recent fixes/regression that happened in these X days.
Then it is safer to use a custom base revision.

To do that, push you base mozilla-central revision to try:
  ./mach try fuzzy --query "'damp 'linux64/" --rebuild 6

And on this page, select try on the left box and click on compare against specific revision:
  https://treeherder.mozilla.org/perf.html#/comparechooser
And enter the base try revision you just pushed.

The debugger regressed a lot (up to 800% for stepIn) via bug 1511352 and I know they are trying to land fixes for that these days... See bug 1513737, this report a 75% win on stepIn.
(In reply to Alexandre Poirot [:ochameau] from comment #31)
> The debugger regressed a lot (up to 800% for stepIn) via bug 1511352 and I
> know they are trying to land fixes for that these days... See bug 1513737,
> this report a 75% win on stepIn.

Yeah I profiled this earlier today and, long story short, my patch happens to prevent the shift() slowness. I reopened bug 1513702 and will fix this in the JS engine at some point.
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/091649f047ef
part 1 - Stop calling JS_GetCompartmentPrincipals for system compartments. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/4b2e268f6df4
part 2 - Some CompartmentPrivate changes for same-compartment realms. r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/871140f00537
part 3 - Use freshZone for debuggee sandbox created in devtools xpconnect tests. r=bzbarsky
The main patch still has to land.
Keywords: leave-open
Depends on: 1515290
Blocks: 1514210
Keywords: leave-open
Note: this will initially be disabled for the devtools loader sandbox (patch is in bug 1514210). It requires some devtools test fixes that Alexandre offered to take on in bug 1515290. Bug 1517210 is on file to re-enable once that's done.
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ba685536334
part 4 - Use the privileged junk scope's compartment for sandboxes created with the system principal. r=bzbarsky
https://hg.mozilla.org/mozilla-central/rev/9ba685536334
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.