Closed Bug 650353 (cpg) Opened 13 years ago Closed 12 years ago

have one global object per compartment

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: luke, Assigned: luke)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(2 files, 3 obsolete files)

There is a never-ending stream of goodness that would flow from having a 1:1 relationship between global objects and compartments.  A few immediate ones:
 - avoid tricky XPCNativeScope/global invariants: just stick XPCNativeScope on the compartment
 - we can bake more pointers into jit code: global object, address of caches, address of initial .prototypes.
 - we can use compartment as "native goop associated with globals" instead of trying to pack it via the reserved-slot scheme in the global JSObject
 - security principals for objects becomes obj->compartment()->principals
 - obj->getGlobal() isn't a loop.  We probably don't even need this function in lieu of faster access via cx->compartment->global.
 - bug 631135 (I think) and a few existing "crossing global objects" bits of logic go away
 - maybe some of this "implicit this" logic can be taken off the hot path in lieu of something on the wrapper path?

... and I know I'm forgetting a few others.

The concern is: that's potentially a lot more compartments, what about all the big stuff we store in compartments?  Well, with bug 649537, we can just stick them in the (single-threaded) compartment.  Gregor Wagner apparently has measured the diff in cross-compartment wrappers and found that not too bad either.
Blocks: 625199
Luke, Since you pointed me here... from another bug.

In our web server we have at least two globals in play at all times.
They are chained together.
The "session global" is the parent of the "request global" via prototype chaining.

This allows us to re-use standard classes and solve the dreaded "instanceOf" bugs when a client hits the server a 2nd time after saving something in the session.

Each request can put temporary stuff on its request global (dies with each request), but can also store stuff in the session global for future returns to the web server.

Standard classes ALWAYS exist in only the session global. Never in the request global.

We haven't switched to the new JSAPI compartments yet...trying to let the dust settle and waiting for per compartment GC to be completed.

Having 1 global per compartment might mess this scenario up pretty bad.

Ideas?
(Fixing dependencies)
No longer blocks: new-web-workers
Depends on: 650411
MikeM: sorry for this recent flurry of bugs, just trying to get the dependency graph of work items set up.  (That is a good question, lemme think a bit about an answer or (hopefully) someone else can jump in.)
I just talked to jorendorff about this.  So, IIUC, you have a global which you want to live a long time, and a "temporary global", which you want to collect defs and throw away after each request.  I think that is exactly the use of the "var obj", which is an object that sits on the scope chain in front of the global and can be disposed of while keeping the global for future script execution.  Wes does this and explains his use here:
  http://groups.google.com/group/mozilla.dev.tech.js-engine/msg/f6a4b35232ff1572?pli=1)

Do you think that would work for you?
I'm not sure it would work.  I think I reviewed all possible solutions back when wrote it a few years back, including the CommonJS stuff Wes did. I know I re-wrote it at least 5 times.

Unfortunately its more complex than I described. We have two resolver hooks in the mix and the request global has different options than the session global to prevent standard classes from being cached there.  That "memoized" standard class thing is total disaster for those with resolver hooks in place.  I've got a few bugs filed in the past on this alone.

Any code run on the request global CANNOT save properties in the session global.
To access the session global directly in the script we can do this:

Session.global.blah = "Save Me";

Which adds a property to the session global.
It gets even more complex if you do this:

var localObject="put object/function or something here";
Session.global.blah = localObject;

In that case the object living on the request global is saved to the session global.  That took some special magic to keep it from falling out of scope when the request dies.

It's a complex problem but is currently working well for us.
There are many many test cases I can't even remember them all.

I'm certainly willing to change things.  But it needs to be researched & tested thoroughly before we can adopt it for production code.  So would it work?  I don't know.

Do you know if compartment level GC is done yet?  We were waiting for that before upgrading.
Well, except perhaps for the resolve hooks, nothing you mentioned (incl. the code sample) seems to be deal-breaker.  Of course I don't understand all your constraints.

Per-compartment GC (bug 605662) shipped with FF4 :)
Background compartment GC (GC one compartment while another is running) is in progress in bug 616927.  (Incidentally, you can follow the dependencies back from bug 616927 to this bug.)
More things this bug enables:
 - removing obj->parent
 - can remove findObjectPrincipals hook (and calls) in js
 - document.domain isn't such a corner case
Blocks: 638316
Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have the property and is it permanent" optimization check in the jit at method-compile time.
(In reply to comment #8)
> Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting
> JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have
> the property and is it permanent" optimization check in the jit at
> method-compile time.

Can't we do this already?  For compileAndGo code we have the global the script will run against and can check its properties while compiling.  Also, in TI we statically guard against global properties being deleted or reconfigured, as well as against reallocation of the global slots.  This allows GNAME and GLOBAL opcodes to run at the same speed, so there might not be much use in keeping GLOBAL anyways.
Haha, good point.  I just assumed there was some reason but I didn't really think about it.
How will this affect compat (e.g. with Components.utils.import, which currently creates globals but not compartments)?

See bug 665810 for some concerns about our cross-compartment proxies (e.g. can't reflect XML objects)...
For the former question: compartments would need to be inserted in all those places.

For the latter: yes, this has been the concern and I think it will force us to fix proxies enough for what needs to work.  Jason has pointed out that there is engine work needed to correctly recognize, e.g., arrays from other compartments as arrays.  I think compartment-per-domain and the highly-restrictive sharing this entails has protected us putting full weight on the transparency of proxies.  So noone expects this to be easy or soon.

As for E4X, I would hate to see it be the only thing standing in the way of compartment-per-global...
> I would hate to see it be the only thing standing in the way

I would too, and I don't think it should unless there are actual Firefox extension compat issues involved.  If there are, though, we have a problem.
For Components.utils.import and friends, I was assuming that we'd still have one chrome compartment (aside from sandboxes).
Depends on: 654646
Blocks: 679939
Blocks: 668558
I got an incomplete version of CPG working by doing the obvious change to xpc_Create(MT)GlobalObject (this leaves out globals created via xpc_NewSystemInheritingJSObject when we wrap an nsISupports as a global).  Surprisingly, there was only a single cross-compartment assertion to fix (XUL proto cache).

With this, I was able to do basic memory/perf measurements to asses the impact of CPG.  I also included the patch from bug 675078 and a patch to hoist stuff out of JSCompartments into JSRuntime (allowed by bug 650411).

The main concerns were:
 1. increased memory usage due to cross-compartment cloning
 2. increased memory usage due to more, separate arena lists
 3. decreased perf due to cross-compartment calls.

In particular, for (3) I was worried about Dromaeo since I thought it did lots of nasty cross-global calls.  However, Dromaeo (JS and DOM tests) shows no drop and maybe a slight speedup.  V8 has high variance but seems to show a 100 point improvement (~7100 to ~7200).  SunSpider seems to get slightly worse (4ms), but not in the shell.

For 1 and 2, I used about:memory (after two "Minimize memory" clicks) to measure startup, 1 GMail tab, and 20 tech news tabs opened from news.google.com.
 One way to mitigate 2 is to shrink arenas down to 2KB, so I measured CPG with 2 and 4KB arenas.  Igor is apparently already wanting this for bug 671702.  V8 and SS did get a perf drop, but Igor is also looking at this (bug 681884).

                    Before CPG   After 2KB   After 4KB
Startup:
explicit            27.2M        28.6M       28.6M
js                  21.2M        22.2M       22.8M
js-gc-heap          8M           12M         13M
arena-unused        .8M          1.8M        3M
chunk-dirty-unused  1.7M         3.4M        3.5M

GMail:
explicit            66.4M        67.7M       66.3M
js                  55M          56.1M       56.6M
js-gc-heap          24M          28M         29M
arena-unused        3.6M         4.5M        4.6M
chunk-dirty-unused  3.5M         4.6M        4.6M

20x tech news articles
explicit            589M         586M        599M
js                  469M         468M        483M
js-gc-heap          217M         250M        253M
arena-unused        29.3M        42.8M       52.8M
chunk-dirty-unused  5.1M         12.5M       10.6M

Observations:

Total explicit allocation: not terribly effected.  In fact, 2K-arena CPG has less explicit/js allocation than trunk, despite having 33M more gc-heap!  I'm fairly certain this is the hoisting from JSCompartment to JSRuntime which saves memory on jaeger compartments and trace monitors.

The 2K arenas help arena-unused.  Attached is the distribution of full arenas, which show that 2K is also not packing gc things as well as 4K and hence may benefit from further twiddling.  However, 2K arenas hurt chunk-dirty-unused; this could be further improved by giving arenas an "affinity" toward chunks allocated/shared by a domain (this is equivalent to bug 671702 comment 61).

So this is positive for CPG.  There is still more work (mentioned above) to get CPG to actually give the necessary 1:1 invariant that makes all the magic happen; also work will be needed to fix bugs resulting from proxies not being totally transparent (obj->getClass() == &js_SomeClass, I'm looking at you).
Assignee: general → luke
Status: NEW → ASSIGNED
> Total explicit allocation: not terribly effected.  In fact, 2K-arena CPG has
> less explicit/js allocation than trunk, despite having 33M more gc-heap! 
> I'm fairly certain this is the hoisting from JSCompartment to JSRuntime
> which saves memory on jaeger compartments and trace monitors.

So you've already done that hoisting in the patch?

Yeah, those results aren't bad.  A good stress test is http://gregor-wagner.com/tmp/mem.
Dromaeo does do a bunch of cross-global calls, but they don't dominate the benchark, which is good in this case.

I'd be interested in measuring how the the performance of walking a same-origin subframe's DOM is affected (read: gmail).  So maybe just grab the source of http://www.whatwg.org/specs/web-apps/current-work/, stick it in a local file, create another local file pointing an iframe at it, walk over the DOM from script in the parent page, and measure how long that takes before and after the CPG changes?  That should be a worst-case of sorts for cross-global calls, I'd think.
(In reply to Nicholas Nethercote [:njn] from comment #17)
> Yeah, those results aren't bad.  A good stress test is
> http://gregor-wagner.com/tmp/mem.

Ah, yes, I forgot about that.  Good results (for 2K arenas, at least); similar to 20x news tabs:

                    Before CPG   After 2KB   After 4KB
explicit            1.50G        1.46G       1.58G
js                  1.11G        1.05G       1.18G
js-gc-heap          521M         552M        665M
arena-unused        109M         108.6M      152M
chunk-dirty-unused  18.7M        20.5M       23.7M
(In reply to Boris Zbarsky (:bz) from comment #18)
Good idea.  I iterated over all the children via childNodes and build a string containing triplets (nodeName, nodeType, nodeValue) (each of which is a cross-compartment access/call, IIUC).

Pretty harsh results: recursion time went from 1530ms to 3400ms.

So it's pretty clear we'll need jit fast paths for transparent cross-compartment wrappers.  Gal seems to be wanting this anyway for new-fangled proxy-based dom.  The question is whether it blocks CPG.
What does the corresponding number look like in other browsers?

My gut feeling is that it need not block as long as we actually commit to doing it and do it.... but I was serious about the gmail thing; cross-frame access is pretty common on the web.  The question is whether the difference is felt in practice (e.g. when using gmail on a phone).
(In reply to Boris Zbarsky (:bz) from comment #21)
> What does the corresponding number look like in other browsers?

Chrome gets anywhere from 1200ms to 1400ms.

> My gut feeling is that it need not block as long as we actually commit to
> doing it and do it.... but I was serious about the gmail thing; cross-frame
> access is pretty common on the web.  The question is whether the difference
> is felt in practice (e.g. when using gmail on a phone).

I made a counter that incs every time a JSCrossCompartmentWrapper::* function was called.  Starting gmail triggers 10K such calls.  Clicking "All Mail" triggers 3K.  1K to click "New Mail".  10K cross-compartment calls in the shell measures 2-3ms, so the problem is real but it seems ignorable for GMail.  Note that CPG allows bug 625199 which decreases this cost further: enter/leaving a compartment is just changing cx->compartment, no frame push/pop.
> 10K cross-compartment calls in the shell measures 2-3ms

Shell cross-compartment calls don't have to do security checks, unlike browser ones, right?  It's worth measuring the browser number here too, just to be sure.

But yes, with any luck that's low enough too and we can just get on with this change.
If a same-domain wrapper is doing security checks, it would seem like we're doing something wrong.  I'll double-check though.
(In reply to Boris Zbarsky (:bz) from comment #23)
> > 10K cross-compartment calls in the shell measures 2-3ms
> 
> Shell cross-compartment calls don't have to do security checks, unlike
> browser ones, right?  It's worth measuring the browser number here too, just
> to be sure.

The only times we do security checks for cross compartment calls are for cases involving either document.domain or UniversalXPConnect. Otherwise, we don't do any dynamic checking at the call site (the computation of which wrapper that we chose suffices in other cases).
Blocks: 683069
Depends on: 683361
Blocks: 687724
Depends on: 688069
Blocks: 689343
Note to self for docs later: need to update the debugger API guide info on global ojects.
Keywords: dev-doc-needed
So the main cost of the cross-compartment calls, if security checks are not an issue, is going to be lack of ICs and consequent slow paths through the proxy and the DOM binding, right?

Is there any way we can special-case any of that stuff to keep IC goodness and a fast path into the DOM?  I guess we still need to wrap things on the way out as needed, so can't just punch through the cross-compartment proxy entirely in jitcode... :(
Also, things that used to be same-origin (and hence not needing a security check) can stop being so when document.domain is set.  We have a bug on that...
(In reply to Boris Zbarsky (:bz) from comment #28)
> I guess we still need to wrap things on the
> way out as needed, so can't just punch through the cross-compartment proxy
> entirely in jitcode... :(

If need be, we could inline the crossCompartmentWrappers.lookup so that, on hit, we stay entirely in jit code.  That only helps if the same object is being wrapped, though, so it wouldn't help the "traverse the w3c spec" in comment 20 (since each call would wrap a new node).

(In reply to Boris Zbarsky (:bz) from comment #29)
> Also, things that used to be same-origin (and hence not needing a security
> check) can stop being so when document.domain is set.  We have a bug on
> that...

In that case, since proxies are already in place, it seems like we should be able to go around and update those proxies' ProxyHandlers to be security wrappers.
Luke, what sort of impact with this have on the heap graph?  I suppose it will make it strictly less connected than before, because there are more global objects?  I'm just trying to think about if this might have any negative effect on the cycle collector.
(In reply to Andrew McCreight [:mccr8] from comment #31)
The graph structure should remain the same modulo extra proxies injected between objects that used to be different-global-but-same-compartment and are now different-global-different-compartment.  I'm not sure how these (transparent, non-security) wrappers interact with the cycle collector.  

Another difference is that there will be a bunch more compartments; I'm not sure if that adds any overhead to the CC algorithm.
Ah, I see.  Thanks!  The extra wrappers will probably add a few extra nodes, but the CC overhead is probably the least of your problems.

I think the CC is pretty much blind to compartments, except that XPConnect root traversal involves iterating over roots, so I guess it could end up spending more time traversing, but that's a pretty insignificant part of the time CC takes.
> involves iterating over roots

That should be "involves iterating over compartments".
Blocks: 709529
Blocks: 713813
Blocks: 680784
Depends on: 720580
Depends on: 720753
Depends on: 722594
Depends on: 722775
Attached patch combined diff (WIP) (obsolete) — Splinter Review
I just wanted to post a folded set of patches that let a basic c-p-g start and run.  The patch browses the web without issue.  With this applied atop the patch in bug 720580, we can push again to try and identify remaining issues exposed by mochitests.
This just occurred to me:  zombie compartments are pretty common, especially in add-ons.  The most common case is when a single global reference holds onto a particular compartment.  When that reference gets updated subsequently, the old zombie will be released and a new one created, i.e. only one zombie compartment is alive at any time.  Bug 669845 is a good example.

With CPG, each compartment will, on average, contain much less data.  So the impact of such zombie compartments will be less.  Nice!
Depends on: 729584
Depends on: 729589
This was the source of a leak with the c-p-g patch. The old code subtly cleaned up the compartment private with an nsAutoPtr. I've made it more explicit here.
Ouch, nice catch!  Any more work to do in bug 729589?
(In reply to Luke Wagner [:luke] from comment #38)
> Ouch, nice catch!  Any more work to do in bug 729589?

I've still got a failure in a cross-origin location XPConnect test that I need to ask someone about on monday, but otherwise the XPConnect test suite all seems to pass. Pushed this whole thing to try to get a better sense of how close we are:

https://tbpl.mozilla.org/?tree=Try&rev=5e535dd9d9f9
Depends on: 731471
Looks like I can't assert what I thought I could assert in the compartment private cleanup patch. Updating.
Attachment #600576 - Attachment is obsolete: true
Comment on attachment 601733 [details] [diff] [review]
Bug 650353 - Clean up the compartment private. v2

Will need rebasing over bug 730281, and static_cast please.
Depends on: 731804
Depends on: 731845
I've been doing a lot of fixing, so here's another try push: https://tbpl.mozilla.org/?tree=Try&rev=742de674fd1f
Depends on: 667388
I'm maintaining a github branch for my current CPG commit stack (dependent patches and luke's patches). I'll be adding new fixes underneath the final patch, and removing the ones merged upstream when I rebase.

Follow along if you feel so inclined:

https://github.com/bholley/mozilla-central/commits/cpgtrain
Depends on: 733984
Depends on: 734475
Gave this another try push:

https://tbpl.mozilla.org/?tree=Try&rev=b836b7809e8f

The only unresolved issue from the last try push is the cross-compartment stack thing (the CAPS test failure). So with any luck, this might look pretty green...
Depends on: 734910
Arg, hit another bug (bug 734910)

Pushed again for linux and macosx64, since those build fast: https://tbpl.mozilla.org/?tree=Try&rev=ef777f8211a0
Depends on: 735544
Depends on: 596351
Depends on: 735968
Depends on: 736316
This patch is pretty much finalized. There are still a few mochitest-browser-chrome test failures, but they'll almost certainly be separate patches (possibly to the chrome code itself). Flagging mrbkap for review.
Attachment #596175 - Attachment is obsolete: true
Attachment #601733 - Attachment is obsolete: true
Attachment #606646 - Flags: review?(mrbkap)
Depends on: 737245
Depends on: 738119
Comment on attachment 606646 [details] [diff] [review]
Implement Compartment-Per-Global in XPConnect. v3

Review of attachment 606646 [details] [diff] [review]:
-----------------------------------------------------------------

Straightforward. I like it!

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +262,3 @@
>  
> +    // Get the current compartment private.
> +    xpc::CompartmentPrivate *priv =

Why prefer the explicit delete to the AutoPtr?
Attachment #606646 - Flags: review?(mrbkap) → review+
Depends on: 738874
https://tbpl.mozilla.org/?tree=Try&rev=7c454ec7fafd

Note that this stack will still break on the devtools tests and the webgl tests. The former I know the fix for but haven't yet written, and the latter is bug 737245 (typed array wrapper handling). I had a patch stack for that, but it wasn't really the right approach, and more architecture work was required. So sfink and evilpies are working on it, and will hopefully land their stuff mid next week.
Various failures that might be because of bug 738874. Did a push without it:

https://tbpl.mozilla.org/?tree=Try&rev=ba083543a99e
Depends on: 739432
Depends on: 739451
Depends on: 739796
https://tbpl.mozilla.org/?tree=Try&rev=f886d566a87a

This is still waiting on some spidermonkey work by sfink and some Places debugging by mak.
Blocks: 742444
Depends on: 743034
Try broke this morning. Pushing again...

https://tbpl.mozilla.org/?tree=Try&rev=8eebe80e23f9
Alright, looking better. The typedarray stuff is still in an inconsistent state, so it's crashing the mochitest-o runs. Added a push with those commented out, and perf tests enabled:

https://tbpl.mozilla.org/?tree=Try&rev=42fa3aa03f84

And, for comparison, a baseline Talos run:
https://tbpl.mozilla.org/?tree=Try&rev=99d0b7c3668a
Oh, hm. 705.39->358.37 on dromaeo? I thought comment 16 indicated that dromaeo was fine. Ah, CPG...

I'll fire up a profiler on monday and see what's up.
Comment 16 seems to only cover a subset of the Dromaeo tests.  Specifically the DOM and jslib ones.  http://perf.snarkfest.net/compare-talos/index.html?oldRevs=42fa3aa03f84&newRev=99d0b7c3668a&submit=true shows that those are only a few % different with the patch.  But the "basics" tests are about 2x slower as you note, the "css" tests are 10% slower, and the Dromaeo versions of "sunspider" and "v8" are about 2.5 and 3x slower respectively.  That's overall.  Some subtests are hit more than others.

I'd probably start by profiling the dromaeo version of Sunspider's string-fasta, which got about 8x slower....
Comment 20 may be relevant here, btw.
I don't think comment 20 is what's happening: comment 20 did nothing but touch objects from another compartment.  For Dromaeo SS, there is the cross-compartment call per test-run, but otherwise there shouldn't any cross-compartment touching.  I looked at a profile with bholley and indeed there was no time under JSCompartment::wrap or CrossCompartmentWrapper.  It looks like we are spending a lot of time in SetElem and iterator stubs that should ordinarily be in jit fast-path code.  That suggests some fast-path pre-condition is being violated.  I still don't know why, so I'll have to dig in.
Depends on: 743882
No longer depends on: 738119
Hah!  So I was working my way back from "why are we calling stubs::Iter in string-fasta so much" until I realized that TI is completely disabled for the inner dromaeo iframe.  Slowlarity ensues.

The way TI is disabled is that, when the iframe's compartment is created, !(cx->runOptions & JSOPTION_TYPE_INFERENCE), which sets newCompartment->type.inferenceEnabled = false.  In theory, this setting gets set on the new context by JSOptionChangedCallback, but that isn't happening (or perhaps its happening after the new compartment is created?).  I tried to understand what nsJSContext is doing but it's all rather entwined.  Bug 742497 can't come soon enough.
If you breakpoint in xpc_CreateGlobalObject conditioned on !(cx->runOptions & JSOPTION_TYPE_INFERENCE), what's the stack look like?
Depends on: 744034
Let's move the perf discussion over to bug 744034.
Depends on: 746021
Depends on: 746221
Depends on: 747749
Try Push: https://tbpl.mozilla.org/?tree=Try&rev=8f2994ea0f7b

And a talos reference push: https://tbpl.mozilla.org/?tree=Try&rev=20892516ccc7

The last failure that I know about is bug 747749 (android reftest crashes). Let's hope that this try push doesn't reveal any new ones. ;-)
Alright, so things are looking pretty green. Here's what's left:

1 - The last test failure, bug 747749. I'm hoping luke will have some guesses here as to what's going on.

2 - Performance. I did some try runs, and here's what the numbers look like:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=20892516ccc7&newRev=8f2994ea0f7b&submit=true

The significant factors seem to be:
A: 3% dromaeo CSS regression.
B: 7-23% tpaint regression, depending on the platform.
C: A ~10% memory hit. AIUI, luke has patches to make that go away.

My gut feeling is that B might be unavoidable, since we're inherently doing a lot more work when we open windows (since we're always creating a compartment). But I don't have a ton of experience in this area, so I'm hoping that bz can weigh in.

As for A, I've done some profiling. The profiles look pretty similar, with the exception that 40% of the calls into the hot code come via JaegerShotAtSafePoint without CPG (with the other 60% coming through JaegerShot), but they _all_ come through JaegerShot with CPG. bhackett tells me that there isn't a performance difference, but I still think that this is somewhat suspicious since it indicates that the JIT is behaving differently in the two cases. bhackett suggests that someone investigate this and determine whether we're making more stub calls (which would indicate that we have to bail from the jit). Anyway, this isn't my area of expertise, so I'm hoping luke can look into it.

Apart from that, we should be just about ready to land. Onward!
https://tbpl.mozilla.org/?tree=Try&rev=3230c667f111

Theoretically, this should be totally green. Let's see if theory matches practice.
Blocks: 749173
Depends on: 749617
Depends on: 750183
Blocks: 750185
Depends on: 750269
I looked into the cssquery-prototype regression.  There were no subtest outliers, most had a slowdown of 1-4%.  So, I focused on one subtest ($$("#speech5")) that has a reliable 4-5% slowdown.  Shark points to overhead from CCW calls for each execution of $$("#speech5").  To confirm this, I doubled/quadrupled the body of the test (to amortize the CCW call) and the slowdown decreased proportionately.  Thus, we have a short-running test that crosses globals 50K times.  To fix this, we need the CCW call ic mentioned earlier.  This wants IM and bug 625199 (which blocks on this bug).  Thus, I think we should eat this loss for now and fix CCW calls later.

Curiously, I couldn't get the overhead less than 1-2%.  Sharking $$("#speech5") executed 1000 times in a loop, I see all the CCW overhead gone, and instead the extra 1-2% is all in nsGenericElement::doQuerySelectorAll self time.  Any possible explanations bz?
Not offhand.  The only thing I can think of is that doQuerySelectorAll is very cache-sensitive (it's very tight-loop code mostly gated on how fast it can get data from the pointer-chase to the CPU), so if the CCW calls in between blow out the cache somehow....  I know it's a stretch.
Here's the talos results of trunk vs. cpg+720753:
  http://perf.snarkfest.net/compare-talos/index.html?oldRevs=36538bb034fb&newRev=35de20b7eae9&submit=true
It looks like bug 720753 decreases the tp4_rss regression from 6-8% (in comment 67) to 4-5%.

Note: GC arenas are still 4K.  comment 16 and 19 experimented with 2K arenas but, since those comments, this stopped being an option due to the GC decommit landing (which needs page-sized arenas).  That means we are still going to have a lot of under-utilized arenas as described in comment 16.  I talked about this will billm and we had a few plans to share GC arenas between compartments in many-small-compartment scenarios.  (This would be generally useful because it would avoid the tension between wanting to add more FinalizeKinds (for more GC-thing types or sizes of GC-things) and wanting fewer FinalizeKinds to decrease internal fragmentation.)
Just to clarify about the arena thing:
I think it would be reasonable to put multiple FinalizeKinds in the same arena. I'm much more skeptical about sharing arenas between compartments. Doing so would require a ton of changes throughout the GC, since that assumption is baked in pretty deeply.
Gave this one final try push to make sure the tests are green: https://tbpl.mozilla.org/?tree=Try&rev=2b7d9a8bfa12

As for performance, it sounds like we're going to accept the 3% cssquery regression, as well as the 5% memory regression.

Luke's putting together a patch over in bug 749617 to mitigate the 10% tpaint regression (with the longer-term fix coming thereafter). If it's reviewed and green on try by tomorrow morning (europe time), I'll push with it. Otherwise, we'll do it as a followup.
> I think it would be reasonable to put multiple FinalizeKinds in the same
> arena. I'm much more skeptical about sharing arenas between compartments.
> Doing so would require a ton of changes throughout the GC, since that
> assumption is baked in pretty deeply.

That sounds sensible to me.  Would the shared FinalizeKinds need to have the same GCThing size?
> As for performance, it sounds like we're going to accept the 3% cssquery regression

"Accept" as in "push to get it fixed in the JS engine", right?
(In reply to Boris Zbarsky (:bz) from comment #76)
> > As for performance, it sounds like we're going to accept the 3% cssquery regression
> 
> "Accept" as in "push to get it fixed in the JS engine", right?

The 'make-the-jit-CCW-aware' stuff? If so, yeah, I agree.
Depends on: 751086
Depends on: 751101
Sigh. More stupid leak stuff.

Two more identical pushes to try, so that we can be more sure about any potentially intermittent stuff:
https://tbpl.mozilla.org/?tree=Try&rev=7daa9d995250
https://tbpl.mozilla.org/?tree=Try&rev=295f04b65f06
Blocks: 751283
Depends on: 751424
Everything looks good, so I'm pushing to inbound. Here goes...

http://hg.mozilla.org/integration/mozilla-inbound/rev/bed8c4e3dfdf
Target Milestone: --- → mozilla15
Depends on: 751505
Depends on: 751575
This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc.  Can we please back it out?
Depends on: 738803
(In reply to Ehsan Akhgari [:ehsan] from comment #80)
> This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc.  Can
> we please back it out?

OK, after reading the bug, it looks like we were planning to take this regression.  In the future, please send an email to dev-tree-management before landing this kind of expected regression so that people know in advance to expect this.  Thanks!
No longer depends on: 738803
https://hg.mozilla.org/mozilla-central/rev/bed8c4e3dfdf


                             MMM88&&&,
       ,MMM8&&&.              `'MMM88&&&,
      MMMMM88&&&&                'MMM88&&&,
     MMMMM88&&&&&&                 'MMM88&&&,
     MMMMM88&&&&&&                   'MMM88&&&
     MMMMM88&&&&&&                    'MMM88&&&
      MMMMM88&&&&                       MMM88&&&
       'MMM8&&&'     MMMM888&&&&         'MM88&&&
                     MMMM88&&&&&          MM88&&&
                     MMMM88&&&&&          MM88&&&
       ,MMM8&&&.                          MM88&&&
      MMMMM88&&&&                        ,MM88&&&
     MMMMM88&&&&&&                      MMM88&&&'
     MMMMM88&&&&&&                     MMM88&&&'
     MMMMM88&&&&&&                   MMM88&&&'
      MMMMM88&&&&                  MMM88&&&'
       'MMM8&&&'                MMM88&&&'
                             MMM88&&&'
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 751873
Depends on: 751971
Depends on: 751995
Depends on: 752038
Depends on: 752081
No longer depends on: 752081
well... this makes reading about:compartments with memchaser enabled a fun time  an additional 60 lines of js files in the system compartments
Depends on: 752200
Depends on: 752259
Blocks: 738080
Depends on: 752335
Depends on: 752281
Depends on: 752764
Depends on: 752865
Depends on: 752309
Depends on: 753844
Depends on: 753557
Depends on: 754156
Depends on: 734215
Blocks: 755186
No longer depends on: 751995
Depends on: 751995
Depends on: 757639
Depends on: 762215
Depends on: 763225
Depends on: 752340
Depends on: 753162
Depends on: 771202
Depends on: 774119
Depends on: 761422
Depends on: 781476
Depends on: 674195
Depends on: 786801
Depends on: 792280
Depends on: CVE-2012-4212
Depends on: 839967
You need to log in before you can comment on or make changes to this bug.