Last Comment Bug 650353 - (cpg) have one global object per compartment
(cpg)
: have one global object per compartment
Status: RESOLVED FIXED
: dev-doc-needed
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal with 8 votes (vote)
: mozilla15
Assigned To: Luke Wagner [:luke]
:
Mentors:
: 631135 671352 (view as bug list)
Depends on: 752200 774119 839967 596351 650411 654646 667388 674195 683361 688069 720580 720753 722594 722775 729584 729589 731471 731804 731845 733984 734215 734475 734910 735544 735968 736316 737245 738874 739432 739451 739796 743034 743882 744034 746021 746221 747749 749617 750183 750269 751086 751101 751424 751505 751575 751873 751971 751995 752038 752259 752281 752309 752335 752340 752764 752865 753162 753557 754156 754600 757639 761422 761600 762215 763225 771202 781476 782899 CVE-2012-4212 786801 792280
Blocks: 661970 689343 750185 625199 638316 CVE-2012-3985 668558 673248 679939 680784 683069 686079 687724 709529 713813 721039 738080 742444 749173 751283 755158 755186 757332
  Show dependency treegraph
 
Reported: 2011-04-15 12:06 PDT by Luke Wagner [:luke]
Modified: 2013-03-09 15:09 PST (History)
47 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
empty arena distribution before c-p-g, after, and after-with-2K-arenas (1.33 KB, text/plain)
2011-08-25 18:35 PDT, Luke Wagner [:luke]
no flags Details
combined diff (WIP) (39.50 KB, patch)
2012-02-10 14:46 PST, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
Clean up the compartment private. (1.04 KB, patch)
2012-02-24 17:12 PST, Bobby Holley (:bholley) (busy with Stylo)
no flags Details | Diff | Splinter Review
Bug 650353 - Clean up the compartment private. v2 (1.44 KB, patch)
2012-02-29 13:15 PST, Bobby Holley (:bholley) (busy with Stylo)
no flags Details | Diff | Splinter Review
Implement Compartment-Per-Global in XPConnect. v3 (24.43 KB, patch)
2012-03-16 11:15 PDT, Bobby Holley (:bholley) (busy with Stylo)
mrbkap: review+
Details | Diff | Splinter Review

Description Luke Wagner [:luke] 2011-04-15 12:06:59 PDT
There is a never-ending stream of goodness that would flow from having a 1:1 relationship between global objects and compartments.  A few immediate ones:
 - avoid tricky XPCNativeScope/global invariants: just stick XPCNativeScope on the compartment
 - we can bake more pointers into jit code: global object, address of caches, address of initial .prototypes.
 - we can use compartment as "native goop associated with globals" instead of trying to pack it via the reserved-slot scheme in the global JSObject
 - security principals for objects becomes obj->compartment()->principals
 - obj->getGlobal() isn't a loop.  We probably don't even need this function in lieu of faster access via cx->compartment->global.
 - bug 631135 (I think) and a few existing "crossing global objects" bits of logic go away
 - maybe some of this "implicit this" logic can be taken off the hot path in lieu of something on the wrapper path?

... and I know I'm forgetting a few others.

The concern is: that's potentially a lot more compartments, what about all the big stuff we store in compartments?  Well, with bug 649537, we can just stick them in the (single-threaded) compartment.  Gregor Wagner apparently has measured the diff in cross-compartment wrappers and found that not too bad either.
Comment 1 Mike Moening 2011-04-15 15:20:43 PDT
Luke, Since you pointed me here... from another bug.

In our web server we have at least two globals in play at all times.
They are chained together.
The "session global" is the parent of the "request global" via prototype chaining.

This allows us to re-use standard classes and solve the dreaded "instanceOf" bugs when a client hits the server a 2nd time after saving something in the session.

Each request can put temporary stuff on its request global (dies with each request), but can also store stuff in the session global for future returns to the web server.

Standard classes ALWAYS exist in only the session global. Never in the request global.

We haven't switched to the new JSAPI compartments yet...trying to let the dust settle and waiting for per compartment GC to be completed.

Having 1 global per compartment might mess this scenario up pretty bad.

Ideas?
Comment 2 Luke Wagner [:luke] 2011-04-15 15:22:47 PDT
(Fixing dependencies)
Comment 3 Luke Wagner [:luke] 2011-04-15 15:26:40 PDT
MikeM: sorry for this recent flurry of bugs, just trying to get the dependency graph of work items set up.  (That is a good question, lemme think a bit about an answer or (hopefully) someone else can jump in.)
Comment 4 Luke Wagner [:luke] 2011-04-15 15:54:00 PDT
I just talked to jorendorff about this.  So, IIUC, you have a global which you want to live a long time, and a "temporary global", which you want to collect defs and throw away after each request.  I think that is exactly the use of the "var obj", which is an object that sits on the scope chain in front of the global and can be disposed of while keeping the global for future script execution.  Wes does this and explains his use here:
  http://groups.google.com/group/mozilla.dev.tech.js-engine/msg/f6a4b35232ff1572?pli=1)

Do you think that would work for you?
Comment 5 Mike Moening 2011-04-15 16:15:56 PDT
I'm not sure it would work.  I think I reviewed all possible solutions back when wrote it a few years back, including the CommonJS stuff Wes did. I know I re-wrote it at least 5 times.

Unfortunately its more complex than I described. We have two resolver hooks in the mix and the request global has different options than the session global to prevent standard classes from being cached there.  That "memoized" standard class thing is total disaster for those with resolver hooks in place.  I've got a few bugs filed in the past on this alone.

Any code run on the request global CANNOT save properties in the session global.
To access the session global directly in the script we can do this:

Session.global.blah = "Save Me";

Which adds a property to the session global.
It gets even more complex if you do this:

var localObject="put object/function or something here";
Session.global.blah = localObject;

In that case the object living on the request global is saved to the session global.  That took some special magic to keep it from falling out of scope when the request dies.

It's a complex problem but is currently working well for us.
There are many many test cases I can't even remember them all.

I'm certainly willing to change things.  But it needs to be researched & tested thoroughly before we can adopt it for production code.  So would it work?  I don't know.

Do you know if compartment level GC is done yet?  We were waiting for that before upgrading.
Comment 6 Luke Wagner [:luke] 2011-04-15 16:33:02 PDT
Well, except perhaps for the resolve hooks, nothing you mentioned (incl. the code sample) seems to be deal-breaker.  Of course I don't understand all your constraints.

Per-compartment GC (bug 605662) shipped with FF4 :)
Background compartment GC (GC one compartment while another is running) is in progress in bug 616927.  (Incidentally, you can follow the dependencies back from bug 616927 to this bug.)
Comment 7 Luke Wagner [:luke] 2011-05-17 12:21:30 PDT
More things this bug enables:
 - removing obj->parent
 - can remove findObjectPrincipals hook (and calls) in js
 - document.domain isn't such a corner case
Comment 8 Luke Wagner [:luke] 2011-06-10 18:13:03 PDT
Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have the property and is it permanent" optimization check in the jit at method-compile time.
Comment 9 Brian Hackett (:bhackett) 2011-06-10 18:19:31 PDT
(In reply to comment #8)
> Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting
> JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have
> the property and is it permanent" optimization check in the jit at
> method-compile time.

Can't we do this already?  For compileAndGo code we have the global the script will run against and can check its properties while compiling.  Also, in TI we statically guard against global properties being deleted or reconfigured, as well as against reallocation of the global slots.  This allows GNAME and GLOBAL opcodes to run at the same speed, so there might not be much use in keeping GLOBAL anyways.
Comment 11 Luke Wagner [:luke] 2011-06-10 19:56:34 PDT
Haha, good point.  I just assumed there was some reason but I didn't really think about it.
Comment 12 Boris Zbarsky [:bz] 2011-06-21 08:48:16 PDT
How will this affect compat (e.g. with Components.utils.import, which currently creates globals but not compartments)?

See bug 665810 for some concerns about our cross-compartment proxies (e.g. can't reflect XML objects)...
Comment 13 Luke Wagner [:luke] 2011-06-21 08:58:47 PDT
For the former question: compartments would need to be inserted in all those places.

For the latter: yes, this has been the concern and I think it will force us to fix proxies enough for what needs to work.  Jason has pointed out that there is engine work needed to correctly recognize, e.g., arrays from other compartments as arrays.  I think compartment-per-domain and the highly-restrictive sharing this entails has protected us putting full weight on the transparency of proxies.  So noone expects this to be easy or soon.

As for E4X, I would hate to see it be the only thing standing in the way of compartment-per-global...
Comment 14 Boris Zbarsky [:bz] 2011-06-21 09:49:34 PDT
> I would hate to see it be the only thing standing in the way

I would too, and I don't think it should unless there are actual Firefox extension compat issues involved.  If there are, though, we have a problem.
Comment 15 Blake Kaplan (:mrbkap) 2011-06-22 08:14:52 PDT
For Components.utils.import and friends, I was assuming that we'd still have one chrome compartment (aside from sandboxes).
Comment 16 Luke Wagner [:luke] 2011-08-25 18:35:21 PDT
Created attachment 555919 [details]
empty arena distribution before c-p-g, after, and after-with-2K-arenas

I got an incomplete version of CPG working by doing the obvious change to xpc_Create(MT)GlobalObject (this leaves out globals created via xpc_NewSystemInheritingJSObject when we wrap an nsISupports as a global).  Surprisingly, there was only a single cross-compartment assertion to fix (XUL proto cache).

With this, I was able to do basic memory/perf measurements to asses the impact of CPG.  I also included the patch from bug 675078 and a patch to hoist stuff out of JSCompartments into JSRuntime (allowed by bug 650411).

The main concerns were:
 1. increased memory usage due to cross-compartment cloning
 2. increased memory usage due to more, separate arena lists
 3. decreased perf due to cross-compartment calls.

In particular, for (3) I was worried about Dromaeo since I thought it did lots of nasty cross-global calls.  However, Dromaeo (JS and DOM tests) shows no drop and maybe a slight speedup.  V8 has high variance but seems to show a 100 point improvement (~7100 to ~7200).  SunSpider seems to get slightly worse (4ms), but not in the shell.

For 1 and 2, I used about:memory (after two "Minimize memory" clicks) to measure startup, 1 GMail tab, and 20 tech news tabs opened from news.google.com.
 One way to mitigate 2 is to shrink arenas down to 2KB, so I measured CPG with 2 and 4KB arenas.  Igor is apparently already wanting this for bug 671702.  V8 and SS did get a perf drop, but Igor is also looking at this (bug 681884).

                    Before CPG   After 2KB   After 4KB
Startup:
explicit            27.2M        28.6M       28.6M
js                  21.2M        22.2M       22.8M
js-gc-heap          8M           12M         13M
arena-unused        .8M          1.8M        3M
chunk-dirty-unused  1.7M         3.4M        3.5M

GMail:
explicit            66.4M        67.7M       66.3M
js                  55M          56.1M       56.6M
js-gc-heap          24M          28M         29M
arena-unused        3.6M         4.5M        4.6M
chunk-dirty-unused  3.5M         4.6M        4.6M

20x tech news articles
explicit            589M         586M        599M
js                  469M         468M        483M
js-gc-heap          217M         250M        253M
arena-unused        29.3M        42.8M       52.8M
chunk-dirty-unused  5.1M         12.5M       10.6M

Observations:

Total explicit allocation: not terribly effected.  In fact, 2K-arena CPG has less explicit/js allocation than trunk, despite having 33M more gc-heap!  I'm fairly certain this is the hoisting from JSCompartment to JSRuntime which saves memory on jaeger compartments and trace monitors.

The 2K arenas help arena-unused.  Attached is the distribution of full arenas, which show that 2K is also not packing gc things as well as 4K and hence may benefit from further twiddling.  However, 2K arenas hurt chunk-dirty-unused; this could be further improved by giving arenas an "affinity" toward chunks allocated/shared by a domain (this is equivalent to bug 671702 comment 61).

So this is positive for CPG.  There is still more work (mentioned above) to get CPG to actually give the necessary 1:1 invariant that makes all the magic happen; also work will be needed to fix bugs resulting from proxies not being totally transparent (obj->getClass() == &js_SomeClass, I'm looking at you).
Comment 17 Nicholas Nethercote [:njn] 2011-08-25 18:54:53 PDT
> Total explicit allocation: not terribly effected.  In fact, 2K-arena CPG has
> less explicit/js allocation than trunk, despite having 33M more gc-heap! 
> I'm fairly certain this is the hoisting from JSCompartment to JSRuntime
> which saves memory on jaeger compartments and trace monitors.

So you've already done that hoisting in the patch?

Yeah, those results aren't bad.  A good stress test is http://gregor-wagner.com/tmp/mem.
Comment 18 Boris Zbarsky [:bz] 2011-08-25 19:17:35 PDT
Dromaeo does do a bunch of cross-global calls, but they don't dominate the benchark, which is good in this case.

I'd be interested in measuring how the the performance of walking a same-origin subframe's DOM is affected (read: gmail).  So maybe just grab the source of http://www.whatwg.org/specs/web-apps/current-work/, stick it in a local file, create another local file pointing an iframe at it, walk over the DOM from script in the parent page, and measure how long that takes before and after the CPG changes?  That should be a worst-case of sorts for cross-global calls, I'd think.
Comment 19 Luke Wagner [:luke] 2011-08-26 09:05:32 PDT
(In reply to Nicholas Nethercote [:njn] from comment #17)
> Yeah, those results aren't bad.  A good stress test is
> http://gregor-wagner.com/tmp/mem.

Ah, yes, I forgot about that.  Good results (for 2K arenas, at least); similar to 20x news tabs:

                    Before CPG   After 2KB   After 4KB
explicit            1.50G        1.46G       1.58G
js                  1.11G        1.05G       1.18G
js-gc-heap          521M         552M        665M
arena-unused        109M         108.6M      152M
chunk-dirty-unused  18.7M        20.5M       23.7M
Comment 20 Luke Wagner [:luke] 2011-08-26 09:57:51 PDT
(In reply to Boris Zbarsky (:bz) from comment #18)
Good idea.  I iterated over all the children via childNodes and build a string containing triplets (nodeName, nodeType, nodeValue) (each of which is a cross-compartment access/call, IIUC).

Pretty harsh results: recursion time went from 1530ms to 3400ms.

So it's pretty clear we'll need jit fast paths for transparent cross-compartment wrappers.  Gal seems to be wanting this anyway for new-fangled proxy-based dom.  The question is whether it blocks CPG.
Comment 21 Boris Zbarsky [:bz] 2011-08-26 10:07:10 PDT
What does the corresponding number look like in other browsers?

My gut feeling is that it need not block as long as we actually commit to doing it and do it.... but I was serious about the gmail thing; cross-frame access is pretty common on the web.  The question is whether the difference is felt in practice (e.g. when using gmail on a phone).
Comment 22 Luke Wagner [:luke] 2011-08-26 17:54:43 PDT
(In reply to Boris Zbarsky (:bz) from comment #21)
> What does the corresponding number look like in other browsers?

Chrome gets anywhere from 1200ms to 1400ms.

> My gut feeling is that it need not block as long as we actually commit to
> doing it and do it.... but I was serious about the gmail thing; cross-frame
> access is pretty common on the web.  The question is whether the difference
> is felt in practice (e.g. when using gmail on a phone).

I made a counter that incs every time a JSCrossCompartmentWrapper::* function was called.  Starting gmail triggers 10K such calls.  Clicking "All Mail" triggers 3K.  1K to click "New Mail".  10K cross-compartment calls in the shell measures 2-3ms, so the problem is real but it seems ignorable for GMail.  Note that CPG allows bug 625199 which decreases this cost further: enter/leaving a compartment is just changing cx->compartment, no frame push/pop.
Comment 23 Boris Zbarsky [:bz] 2011-08-26 18:30:54 PDT
> 10K cross-compartment calls in the shell measures 2-3ms

Shell cross-compartment calls don't have to do security checks, unlike browser ones, right?  It's worth measuring the browser number here too, just to be sure.

But yes, with any luck that's low enough too and we can just get on with this change.
Comment 24 Luke Wagner [:luke] 2011-08-26 20:07:47 PDT
If a same-domain wrapper is doing security checks, it would seem like we're doing something wrong.  I'll double-check though.
Comment 25 Blake Kaplan (:mrbkap) 2011-08-27 09:22:25 PDT
(In reply to Boris Zbarsky (:bz) from comment #23)
> > 10K cross-compartment calls in the shell measures 2-3ms
> 
> Shell cross-compartment calls don't have to do security checks, unlike
> browser ones, right?  It's worth measuring the browser number here too, just
> to be sure.

The only times we do security checks for cross compartment calls are for cases involving either document.domain or UniversalXPConnect. Otherwise, we don't do any dynamic checking at the call site (the computation of which wrapper that we chose suffices in other cases).
Comment 26 Luke Wagner [:luke] 2011-09-27 15:03:45 PDT
*** Bug 631135 has been marked as a duplicate of this bug. ***
Comment 27 Eric Shepherd [:sheppy] 2011-10-27 10:49:40 PDT
Note to self for docs later: need to update the debugger API guide info on global ojects.
Comment 28 Boris Zbarsky [:bz] 2011-11-30 12:07:22 PST
So the main cost of the cross-compartment calls, if security checks are not an issue, is going to be lack of ICs and consequent slow paths through the proxy and the DOM binding, right?

Is there any way we can special-case any of that stuff to keep IC goodness and a fast path into the DOM?  I guess we still need to wrap things on the way out as needed, so can't just punch through the cross-compartment proxy entirely in jitcode... :(
Comment 29 Boris Zbarsky [:bz] 2011-11-30 12:08:35 PST
Also, things that used to be same-origin (and hence not needing a security check) can stop being so when document.domain is set.  We have a bug on that...
Comment 30 Luke Wagner [:luke] 2011-11-30 17:07:08 PST
(In reply to Boris Zbarsky (:bz) from comment #28)
> I guess we still need to wrap things on the
> way out as needed, so can't just punch through the cross-compartment proxy
> entirely in jitcode... :(

If need be, we could inline the crossCompartmentWrappers.lookup so that, on hit, we stay entirely in jit code.  That only helps if the same object is being wrapped, though, so it wouldn't help the "traverse the w3c spec" in comment 20 (since each call would wrap a new node).

(In reply to Boris Zbarsky (:bz) from comment #29)
> Also, things that used to be same-origin (and hence not needing a security
> check) can stop being so when document.domain is set.  We have a bug on
> that...

In that case, since proxies are already in place, it seems like we should be able to go around and update those proxies' ProxyHandlers to be security wrappers.
Comment 31 Andrew McCreight [:mccr8] 2011-12-01 16:06:07 PST
Luke, what sort of impact with this have on the heap graph?  I suppose it will make it strictly less connected than before, because there are more global objects?  I'm just trying to think about if this might have any negative effect on the cycle collector.
Comment 32 Luke Wagner [:luke] 2011-12-01 19:10:54 PST
(In reply to Andrew McCreight [:mccr8] from comment #31)
The graph structure should remain the same modulo extra proxies injected between objects that used to be different-global-but-same-compartment and are now different-global-different-compartment.  I'm not sure how these (transparent, non-security) wrappers interact with the cycle collector.  

Another difference is that there will be a bunch more compartments; I'm not sure if that adds any overhead to the CC algorithm.
Comment 33 Andrew McCreight [:mccr8] 2011-12-01 19:14:21 PST
Ah, I see.  Thanks!  The extra wrappers will probably add a few extra nodes, but the CC overhead is probably the least of your problems.

I think the CC is pretty much blind to compartments, except that XPConnect root traversal involves iterating over roots, so I guess it could end up spending more time traversing, but that's a pretty insignificant part of the time CC takes.
Comment 34 Andrew McCreight [:mccr8] 2011-12-01 19:15:18 PST
> involves iterating over roots

That should be "involves iterating over compartments".
Comment 35 Luke Wagner [:luke] 2012-02-10 14:46:27 PST
Created attachment 596175 [details] [diff] [review]
combined diff (WIP)

I just wanted to post a folded set of patches that let a basic c-p-g start and run.  The patch browses the web without issue.  With this applied atop the patch in bug 720580, we can push again to try and identify remaining issues exposed by mochitests.
Comment 36 Nicholas Nethercote [:njn] 2012-02-19 14:21:31 PST
This just occurred to me:  zombie compartments are pretty common, especially in add-ons.  The most common case is when a single global reference holds onto a particular compartment.  When that reference gets updated subsequently, the old zombie will be released and a new one created, i.e. only one zombie compartment is alive at any time.  Bug 669845 is a good example.

With CPG, each compartment will, on average, contain much less data.  So the impact of such zombie compartments will be less.  Nice!
Comment 37 Bobby Holley (:bholley) (busy with Stylo) 2012-02-24 17:12:22 PST
Created attachment 600576 [details] [diff] [review]
Clean up the compartment private.

This was the source of a leak with the c-p-g patch. The old code subtly cleaned up the compartment private with an nsAutoPtr. I've made it more explicit here.
Comment 38 Luke Wagner [:luke] 2012-02-24 17:16:12 PST
Ouch, nice catch!  Any more work to do in bug 729589?
Comment 39 Bobby Holley (:bholley) (busy with Stylo) 2012-02-24 17:16:59 PST
(In reply to Luke Wagner [:luke] from comment #38)
> Ouch, nice catch!  Any more work to do in bug 729589?

I've still got a failure in a cross-origin location XPConnect test that I need to ask someone about on monday, but otherwise the XPConnect test suite all seems to pass. Pushed this whole thing to try to get a better sense of how close we are:

https://tbpl.mozilla.org/?tree=Try&rev=5e535dd9d9f9
Comment 40 Bobby Holley (:bholley) (busy with Stylo) 2012-02-24 18:28:47 PST
Last push was borked - trying again: https://tbpl.mozilla.org/?tree=Try&rev=bef6f0742ea5
Comment 41 Bobby Holley (:bholley) (busy with Stylo) 2012-02-29 13:15:21 PST
Created attachment 601733 [details] [diff] [review]
Bug 650353 - Clean up the compartment private. v2

Looks like I can't assert what I thought I could assert in the compartment private cleanup patch. Updating.
Comment 42 :Ms2ger (⌚ UTC+1/+2) 2012-02-29 13:18:18 PST
Comment on attachment 601733 [details] [diff] [review]
Bug 650353 - Clean up the compartment private. v2

Will need rebasing over bug 730281, and static_cast please.
Comment 43 Bobby Holley (:bholley) (busy with Stylo) 2012-02-29 18:00:40 PST
I've been doing a lot of fixing, so here's another try push: https://tbpl.mozilla.org/?tree=Try&rev=742de674fd1f
Comment 44 Bobby Holley (:bholley) (busy with Stylo) 2012-03-07 15:33:01 PST
I'm maintaining a github branch for my current CPG commit stack (dependent patches and luke's patches). I'll be adding new fixes underneath the final patch, and removing the ones merged upstream when I rebase.

Follow along if you feel so inclined:

https://github.com/bholley/mozilla-central/commits/cpgtrain
Comment 45 Bobby Holley (:bholley) (busy with Stylo) 2012-03-09 14:57:49 PST
Gave this another try push:

https://tbpl.mozilla.org/?tree=Try&rev=b836b7809e8f

The only unresolved issue from the last try push is the cross-compartment stack thing (the CAPS test failure). So with any luck, this might look pretty green...
Comment 46 Bobby Holley (:bholley) (busy with Stylo) 2012-03-12 09:57:37 PDT
Arg, hit another bug (bug 734910)

Pushed again for linux and macosx64, since those build fast: https://tbpl.mozilla.org/?tree=Try&rev=ef777f8211a0
Comment 47 Bobby Holley (:bholley) (busy with Stylo) 2012-03-12 17:10:47 PDT
https://tbpl.mozilla.org/?tree=Try&rev=c1ea48f3c440
Comment 48 Bobby Holley (:bholley) (busy with Stylo) 2012-03-13 19:58:37 PDT
https://tbpl.mozilla.org/?tree=Try&rev=99e4053c4f2b
Comment 49 Bobby Holley (:bholley) (busy with Stylo) 2012-03-15 17:04:39 PDT
https://tbpl.mozilla.org/?tree=Try&rev=bc121fd5159a
Comment 50 Bobby Holley (:bholley) (busy with Stylo) 2012-03-16 11:15:47 PDT
Created attachment 606646 [details] [diff] [review]
Implement Compartment-Per-Global in XPConnect. v3

This patch is pretty much finalized. There are still a few mochitest-browser-chrome test failures, but they'll almost certainly be separate patches (possibly to the chrome code itself). Flagging mrbkap for review.
Comment 51 Blake Kaplan (:mrbkap) 2012-03-22 08:10:33 PDT
Comment on attachment 606646 [details] [diff] [review]
Implement Compartment-Per-Global in XPConnect. v3

Review of attachment 606646 [details] [diff] [review]:
-----------------------------------------------------------------

Straightforward. I like it!

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +262,3 @@
>  
> +    // Get the current compartment private.
> +    xpc::CompartmentPrivate *priv =

Why prefer the explicit delete to the AutoPtr?
Comment 52 Bobby Holley (:bholley) (busy with Stylo) 2012-03-23 18:22:40 PDT
https://tbpl.mozilla.org/?tree=Try&rev=7c454ec7fafd

Note that this stack will still break on the devtools tests and the webgl tests. The former I know the fix for but haven't yet written, and the latter is bug 737245 (typed array wrapper handling). I had a patch stack for that, but it wasn't really the right approach, and more architecture work was required. So sfink and evilpies are working on it, and will hopefully land their stuff mid next week.
Comment 53 Bobby Holley (:bholley) (busy with Stylo) 2012-03-23 23:20:30 PDT
Various failures that might be because of bug 738874. Did a push without it:

https://tbpl.mozilla.org/?tree=Try&rev=ba083543a99e
Comment 54 Bobby Holley (:bholley) (busy with Stylo) 2012-03-27 16:51:10 PDT
https://tbpl.mozilla.org/?tree=Try&rev=f886d566a87a

This is still waiting on some spidermonkey work by sfink and some Places debugging by mak.
Comment 55 Bobby Holley (:bholley) (busy with Stylo) 2012-04-06 12:40:21 PDT
https://tbpl.mozilla.org/?tree=Try&rev=df82f0071e57
Comment 56 Bobby Holley (:bholley) (busy with Stylo) 2012-04-06 13:50:30 PDT
Try broke this morning. Pushing again...

https://tbpl.mozilla.org/?tree=Try&rev=8eebe80e23f9
Comment 57 Bobby Holley (:bholley) (busy with Stylo) 2012-04-06 18:02:11 PDT
Alright, looking better. The typedarray stuff is still in an inconsistent state, so it's crashing the mochitest-o runs. Added a push with those commented out, and perf tests enabled:

https://tbpl.mozilla.org/?tree=Try&rev=42fa3aa03f84

And, for comparison, a baseline Talos run:
https://tbpl.mozilla.org/?tree=Try&rev=99d0b7c3668a
Comment 58 Bobby Holley (:bholley) (busy with Stylo) 2012-04-07 00:17:18 PDT
Oh, hm. 705.39->358.37 on dromaeo? I thought comment 16 indicated that dromaeo was fine. Ah, CPG...

I'll fire up a profiler on monday and see what's up.
Comment 59 Boris Zbarsky [:bz] 2012-04-07 11:00:48 PDT
Comment 16 seems to only cover a subset of the Dromaeo tests.  Specifically the DOM and jslib ones.  http://perf.snarkfest.net/compare-talos/index.html?oldRevs=42fa3aa03f84&newRev=99d0b7c3668a&submit=true shows that those are only a few % different with the patch.  But the "basics" tests are about 2x slower as you note, the "css" tests are 10% slower, and the Dromaeo versions of "sunspider" and "v8" are about 2.5 and 3x slower respectively.  That's overall.  Some subtests are hit more than others.

I'd probably start by profiling the dromaeo version of Sunspider's string-fasta, which got about 8x slower....
Comment 60 Boris Zbarsky [:bz] 2012-04-07 11:02:06 PDT
Comment 20 may be relevant here, btw.
Comment 61 Luke Wagner [:luke] 2012-04-09 10:41:40 PDT
I don't think comment 20 is what's happening: comment 20 did nothing but touch objects from another compartment.  For Dromaeo SS, there is the cross-compartment call per test-run, but otherwise there shouldn't any cross-compartment touching.  I looked at a profile with bholley and indeed there was no time under JSCompartment::wrap or CrossCompartmentWrapper.  It looks like we are spending a lot of time in SetElem and iterator stubs that should ordinarily be in jit fast-path code.  That suggests some fast-path pre-condition is being violated.  I still don't know why, so I'll have to dig in.
Comment 62 Luke Wagner [:luke] 2012-04-10 01:58:37 PDT
Hah!  So I was working my way back from "why are we calling stubs::Iter in string-fasta so much" until I realized that TI is completely disabled for the inner dromaeo iframe.  Slowlarity ensues.

The way TI is disabled is that, when the iframe's compartment is created, !(cx->runOptions & JSOPTION_TYPE_INFERENCE), which sets newCompartment->type.inferenceEnabled = false.  In theory, this setting gets set on the new context by JSOptionChangedCallback, but that isn't happening (or perhaps its happening after the new compartment is created?).  I tried to understand what nsJSContext is doing but it's all rather entwined.  Bug 742497 can't come soon enough.
Comment 63 Boris Zbarsky [:bz] 2012-04-10 07:52:28 PDT
If you breakpoint in xpc_CreateGlobalObject conditioned on !(cx->runOptions & JSOPTION_TYPE_INFERENCE), what's the stack look like?
Comment 64 Bobby Holley (:bholley) (busy with Stylo) 2012-04-10 09:15:55 PDT
Let's move the perf discussion over to bug 744034.
Comment 65 Nicholas Nethercote [:njn] 2012-04-17 17:49:13 PDT
*** Bug 671352 has been marked as a duplicate of this bug. ***
Comment 66 Bobby Holley (:bholley) (busy with Stylo) 2012-04-22 08:03:21 PDT
Try Push: https://tbpl.mozilla.org/?tree=Try&rev=8f2994ea0f7b

And a talos reference push: https://tbpl.mozilla.org/?tree=Try&rev=20892516ccc7

The last failure that I know about is bug 747749 (android reftest crashes). Let's hope that this try push doesn't reveal any new ones. ;-)
Comment 67 Bobby Holley (:bholley) (busy with Stylo) 2012-04-23 07:26:57 PDT
Alright, so things are looking pretty green. Here's what's left:

1 - The last test failure, bug 747749. I'm hoping luke will have some guesses here as to what's going on.

2 - Performance. I did some try runs, and here's what the numbers look like:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=20892516ccc7&newRev=8f2994ea0f7b&submit=true

The significant factors seem to be:
A: 3% dromaeo CSS regression.
B: 7-23% tpaint regression, depending on the platform.
C: A ~10% memory hit. AIUI, luke has patches to make that go away.

My gut feeling is that B might be unavoidable, since we're inherently doing a lot more work when we open windows (since we're always creating a compartment). But I don't have a ton of experience in this area, so I'm hoping that bz can weigh in.

As for A, I've done some profiling. The profiles look pretty similar, with the exception that 40% of the calls into the hot code come via JaegerShotAtSafePoint without CPG (with the other 60% coming through JaegerShot), but they _all_ come through JaegerShot with CPG. bhackett tells me that there isn't a performance difference, but I still think that this is somewhat suspicious since it indicates that the JIT is behaving differently in the two cases. bhackett suggests that someone investigate this and determine whether we're making more stub calls (which would indicate that we have to bail from the jit). Anyway, this isn't my area of expertise, so I'm hoping luke can look into it.

Apart from that, we should be just about ready to land. Onward!
Comment 68 Bobby Holley (:bholley) (busy with Stylo) 2012-04-26 07:02:11 PDT
https://tbpl.mozilla.org/?tree=Try&rev=3230c667f111

Theoretically, this should be totally green. Let's see if theory matches practice.
Comment 69 Luke Wagner [:luke] 2012-04-30 18:14:34 PDT
I looked into the cssquery-prototype regression.  There were no subtest outliers, most had a slowdown of 1-4%.  So, I focused on one subtest ($$("#speech5")) that has a reliable 4-5% slowdown.  Shark points to overhead from CCW calls for each execution of $$("#speech5").  To confirm this, I doubled/quadrupled the body of the test (to amortize the CCW call) and the slowdown decreased proportionately.  Thus, we have a short-running test that crosses globals 50K times.  To fix this, we need the CCW call ic mentioned earlier.  This wants IM and bug 625199 (which blocks on this bug).  Thus, I think we should eat this loss for now and fix CCW calls later.

Curiously, I couldn't get the overhead less than 1-2%.  Sharking $$("#speech5") executed 1000 times in a loop, I see all the CCW overhead gone, and instead the extra 1-2% is all in nsGenericElement::doQuerySelectorAll self time.  Any possible explanations bz?
Comment 70 Boris Zbarsky [:bz] 2012-04-30 18:28:53 PDT
Not offhand.  The only thing I can think of is that doQuerySelectorAll is very cache-sensitive (it's very tight-loop code mostly gated on how fast it can get data from the pointer-chase to the CPU), so if the CCW calls in between blow out the cache somehow....  I know it's a stretch.
Comment 71 Luke Wagner [:luke] 2012-04-30 22:09:35 PDT
I sent off some talos pushes to measure cpg when combined with bug 720753:
trunk:      https://tbpl.mozilla.org/?tree=Try&rev=36538bb034fb
bug 720753: https://tbpl.mozilla.org/?tree=Try&rev=94a57710a02b
cpg+720753: https://tbpl.mozilla.org/?tree=Try&rev=35de20b7eae9
Comment 72 Luke Wagner [:luke] 2012-05-01 09:38:21 PDT
Here's the talos results of trunk vs. cpg+720753:
  http://perf.snarkfest.net/compare-talos/index.html?oldRevs=36538bb034fb&newRev=35de20b7eae9&submit=true
It looks like bug 720753 decreases the tp4_rss regression from 6-8% (in comment 67) to 4-5%.

Note: GC arenas are still 4K.  comment 16 and 19 experimented with 2K arenas but, since those comments, this stopped being an option due to the GC decommit landing (which needs page-sized arenas).  That means we are still going to have a lot of under-utilized arenas as described in comment 16.  I talked about this will billm and we had a few plans to share GC arenas between compartments in many-small-compartment scenarios.  (This would be generally useful because it would avoid the tension between wanting to add more FinalizeKinds (for more GC-thing types or sizes of GC-things) and wanting fewer FinalizeKinds to decrease internal fragmentation.)
Comment 73 Bill McCloskey (:billm) 2012-05-01 10:33:30 PDT
Just to clarify about the arena thing:
I think it would be reasonable to put multiple FinalizeKinds in the same arena. I'm much more skeptical about sharing arenas between compartments. Doing so would require a ton of changes throughout the GC, since that assumption is baked in pretty deeply.
Comment 74 Bobby Holley (:bholley) (busy with Stylo) 2012-05-01 15:38:16 PDT
Gave this one final try push to make sure the tests are green: https://tbpl.mozilla.org/?tree=Try&rev=2b7d9a8bfa12

As for performance, it sounds like we're going to accept the 3% cssquery regression, as well as the 5% memory regression.

Luke's putting together a patch over in bug 749617 to mitigate the 10% tpaint regression (with the longer-term fix coming thereafter). If it's reviewed and green on try by tomorrow morning (europe time), I'll push with it. Otherwise, we'll do it as a followup.
Comment 75 Nicholas Nethercote [:njn] 2012-05-01 15:49:24 PDT
> I think it would be reasonable to put multiple FinalizeKinds in the same
> arena. I'm much more skeptical about sharing arenas between compartments.
> Doing so would require a ton of changes throughout the GC, since that
> assumption is baked in pretty deeply.

That sounds sensible to me.  Would the shared FinalizeKinds need to have the same GCThing size?
Comment 76 Boris Zbarsky [:bz] 2012-05-01 16:48:45 PDT
> As for performance, it sounds like we're going to accept the 3% cssquery regression

"Accept" as in "push to get it fixed in the JS engine", right?
Comment 77 Bobby Holley (:bholley) (busy with Stylo) 2012-05-02 00:53:49 PDT
(In reply to Boris Zbarsky (:bz) from comment #76)
> > As for performance, it sounds like we're going to accept the 3% cssquery regression
> 
> "Accept" as in "push to get it fixed in the JS engine", right?

The 'make-the-jit-CCW-aware' stuff? If so, yeah, I agree.
Comment 78 Bobby Holley (:bholley) (busy with Stylo) 2012-05-02 03:47:13 PDT
Sigh. More stupid leak stuff.

Two more identical pushes to try, so that we can be more sure about any potentially intermittent stuff:
https://tbpl.mozilla.org/?tree=Try&rev=7daa9d995250
https://tbpl.mozilla.org/?tree=Try&rev=295f04b65f06
Comment 79 Bobby Holley (:bholley) (busy with Stylo) 2012-05-03 00:14:41 PDT
Everything looks good, so I'm pushing to inbound. Here goes...

http://hg.mozilla.org/integration/mozilla-inbound/rev/bed8c4e3dfdf
Comment 80 :Ehsan Akhgari (away Aug 1-5) 2012-05-03 09:40:36 PDT
This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc.  Can we please back it out?
Comment 81 :Ehsan Akhgari (away Aug 1-5) 2012-05-03 09:50:35 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #80)
> This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc.  Can
> we please back it out?

OK, after reading the bug, it looks like we were planning to take this regression.  In the future, please send an email to dev-tree-management before landing this kind of expected regression so that people know in advance to expect this.  Thanks!
Comment 82 Ed Morley [:emorley] 2012-05-04 03:21:19 PDT
https://hg.mozilla.org/mozilla-central/rev/bed8c4e3dfdf


                             MMM88&&&,
       ,MMM8&&&.              `'MMM88&&&,
      MMMMM88&&&&                'MMM88&&&,
     MMMMM88&&&&&&                 'MMM88&&&,
     MMMMM88&&&&&&                   'MMM88&&&
     MMMMM88&&&&&&                    'MMM88&&&
      MMMMM88&&&&                       MMM88&&&
       'MMM8&&&'     MMMM888&&&&         'MM88&&&
                     MMMM88&&&&&          MM88&&&
                     MMMM88&&&&&          MM88&&&
       ,MMM8&&&.                          MM88&&&
      MMMMM88&&&&                        ,MM88&&&
     MMMMM88&&&&&&                      MMM88&&&'
     MMMMM88&&&&&&                     MMM88&&&'
     MMMMM88&&&&&&                   MMM88&&&'
      MMMMM88&&&&                  MMM88&&&'
       'MMM8&&&'                MMM88&&&'
                             MMM88&&&'
Comment 83 Danial Horton 2012-05-04 22:09:07 PDT
well... this makes reading about:compartments with memchaser enabled a fun time  an additional 60 lines of js files in the system compartments

Note You need to log in before you can comment on or make changes to this bug.