Closed
Bug 650353
(cpg)
Opened 14 years ago
Closed 13 years ago
have one global object per compartment
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla15
People
(Reporter: luke, Assigned: luke)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(2 files, 3 obsolete files)
1.33 KB,
text/plain
|
Details | |
24.43 KB,
patch
|
mrbkap
:
review+
|
Details | Diff | Splinter Review |
There is a never-ending stream of goodness that would flow from having a 1:1 relationship between global objects and compartments. A few immediate ones: - avoid tricky XPCNativeScope/global invariants: just stick XPCNativeScope on the compartment - we can bake more pointers into jit code: global object, address of caches, address of initial .prototypes. - we can use compartment as "native goop associated with globals" instead of trying to pack it via the reserved-slot scheme in the global JSObject - security principals for objects becomes obj->compartment()->principals - obj->getGlobal() isn't a loop. We probably don't even need this function in lieu of faster access via cx->compartment->global. - bug 631135 (I think) and a few existing "crossing global objects" bits of logic go away - maybe some of this "implicit this" logic can be taken off the hot path in lieu of something on the wrapper path? ... and I know I'm forgetting a few others. The concern is: that's potentially a lot more compartments, what about all the big stuff we store in compartments? Well, with bug 649537, we can just stick them in the (single-threaded) compartment. Gregor Wagner apparently has measured the diff in cross-compartment wrappers and found that not too bad either.
Comment 1•14 years ago
|
||
Luke, Since you pointed me here... from another bug. In our web server we have at least two globals in play at all times. They are chained together. The "session global" is the parent of the "request global" via prototype chaining. This allows us to re-use standard classes and solve the dreaded "instanceOf" bugs when a client hits the server a 2nd time after saving something in the session. Each request can put temporary stuff on its request global (dies with each request), but can also store stuff in the session global for future returns to the web server. Standard classes ALWAYS exist in only the session global. Never in the request global. We haven't switched to the new JSAPI compartments yet...trying to let the dust settle and waiting for per compartment GC to be completed. Having 1 global per compartment might mess this scenario up pretty bad. Ideas?
Assignee | ||
Comment 2•14 years ago
|
||
(Fixing dependencies)
No longer blocks: new-web-workers
Depends on: 650411
Assignee | ||
Comment 3•14 years ago
|
||
MikeM: sorry for this recent flurry of bugs, just trying to get the dependency graph of work items set up. (That is a good question, lemme think a bit about an answer or (hopefully) someone else can jump in.)
Assignee | ||
Comment 4•14 years ago
|
||
I just talked to jorendorff about this. So, IIUC, you have a global which you want to live a long time, and a "temporary global", which you want to collect defs and throw away after each request. I think that is exactly the use of the "var obj", which is an object that sits on the scope chain in front of the global and can be disposed of while keeping the global for future script execution. Wes does this and explains his use here: http://groups.google.com/group/mozilla.dev.tech.js-engine/msg/f6a4b35232ff1572?pli=1) Do you think that would work for you?
Comment 5•14 years ago
|
||
I'm not sure it would work. I think I reviewed all possible solutions back when wrote it a few years back, including the CommonJS stuff Wes did. I know I re-wrote it at least 5 times. Unfortunately its more complex than I described. We have two resolver hooks in the mix and the request global has different options than the session global to prevent standard classes from being cached there. That "memoized" standard class thing is total disaster for those with resolver hooks in place. I've got a few bugs filed in the past on this alone. Any code run on the request global CANNOT save properties in the session global. To access the session global directly in the script we can do this: Session.global.blah = "Save Me"; Which adds a property to the session global. It gets even more complex if you do this: var localObject="put object/function or something here"; Session.global.blah = localObject; In that case the object living on the request global is saved to the session global. That took some special magic to keep it from falling out of scope when the request dies. It's a complex problem but is currently working well for us. There are many many test cases I can't even remember them all. I'm certainly willing to change things. But it needs to be researched & tested thoroughly before we can adopt it for production code. So would it work? I don't know. Do you know if compartment level GC is done yet? We were waiting for that before upgrading.
Assignee | ||
Comment 6•14 years ago
|
||
Well, except perhaps for the resolve hooks, nothing you mentioned (incl. the code sample) seems to be deal-breaker. Of course I don't understand all your constraints. Per-compartment GC (bug 605662) shipped with FF4 :) Background compartment GC (GC one compartment while another is running) is in progress in bug 616927. (Incidentally, you can follow the dependencies back from bug 616927 to this bug.)
Assignee | ||
Comment 7•14 years ago
|
||
More things this bug enables: - removing obj->parent - can remove findObjectPrincipals hook (and calls) in js - document.domain isn't such a corner case
Blocks: 638316
Assignee | ||
Comment 8•13 years ago
|
||
Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have the property and is it permanent" optimization check in the jit at method-compile time.
Comment 9•13 years ago
|
||
(In reply to comment #8) > Also enables removing JSOP_GETGLOBAL/JSOP_CALLGLOBAL (emitting > JSOP_GETGNAME/JSOP_CALLGNAME instead) and doing the "does the global have > the property and is it permanent" optimization check in the jit at > method-compile time. Can't we do this already? For compileAndGo code we have the global the script will run against and can check its properties while compiling. Also, in TI we statically guard against global properties being deleted or reconfigured, as well as against reallocation of the global slots. This allows GNAME and GLOBAL opcodes to run at the same speed, so there might not be much use in keeping GLOBAL anyways.
Assignee | ||
Comment 11•13 years ago
|
||
Haha, good point. I just assumed there was some reason but I didn't really think about it.
Comment 12•13 years ago
|
||
How will this affect compat (e.g. with Components.utils.import, which currently creates globals but not compartments)? See bug 665810 for some concerns about our cross-compartment proxies (e.g. can't reflect XML objects)...
Assignee | ||
Comment 13•13 years ago
|
||
For the former question: compartments would need to be inserted in all those places. For the latter: yes, this has been the concern and I think it will force us to fix proxies enough for what needs to work. Jason has pointed out that there is engine work needed to correctly recognize, e.g., arrays from other compartments as arrays. I think compartment-per-domain and the highly-restrictive sharing this entails has protected us putting full weight on the transparency of proxies. So noone expects this to be easy or soon. As for E4X, I would hate to see it be the only thing standing in the way of compartment-per-global...
Comment 14•13 years ago
|
||
> I would hate to see it be the only thing standing in the way
I would too, and I don't think it should unless there are actual Firefox extension compat issues involved. If there are, though, we have a problem.
Comment 15•13 years ago
|
||
For Components.utils.import and friends, I was assuming that we'd still have one chrome compartment (aside from sandboxes).
Assignee | ||
Comment 16•13 years ago
|
||
I got an incomplete version of CPG working by doing the obvious change to xpc_Create(MT)GlobalObject (this leaves out globals created via xpc_NewSystemInheritingJSObject when we wrap an nsISupports as a global). Surprisingly, there was only a single cross-compartment assertion to fix (XUL proto cache). With this, I was able to do basic memory/perf measurements to asses the impact of CPG. I also included the patch from bug 675078 and a patch to hoist stuff out of JSCompartments into JSRuntime (allowed by bug 650411). The main concerns were: 1. increased memory usage due to cross-compartment cloning 2. increased memory usage due to more, separate arena lists 3. decreased perf due to cross-compartment calls. In particular, for (3) I was worried about Dromaeo since I thought it did lots of nasty cross-global calls. However, Dromaeo (JS and DOM tests) shows no drop and maybe a slight speedup. V8 has high variance but seems to show a 100 point improvement (~7100 to ~7200). SunSpider seems to get slightly worse (4ms), but not in the shell. For 1 and 2, I used about:memory (after two "Minimize memory" clicks) to measure startup, 1 GMail tab, and 20 tech news tabs opened from news.google.com. One way to mitigate 2 is to shrink arenas down to 2KB, so I measured CPG with 2 and 4KB arenas. Igor is apparently already wanting this for bug 671702. V8 and SS did get a perf drop, but Igor is also looking at this (bug 681884). Before CPG After 2KB After 4KB Startup: explicit 27.2M 28.6M 28.6M js 21.2M 22.2M 22.8M js-gc-heap 8M 12M 13M arena-unused .8M 1.8M 3M chunk-dirty-unused 1.7M 3.4M 3.5M GMail: explicit 66.4M 67.7M 66.3M js 55M 56.1M 56.6M js-gc-heap 24M 28M 29M arena-unused 3.6M 4.5M 4.6M chunk-dirty-unused 3.5M 4.6M 4.6M 20x tech news articles explicit 589M 586M 599M js 469M 468M 483M js-gc-heap 217M 250M 253M arena-unused 29.3M 42.8M 52.8M chunk-dirty-unused 5.1M 12.5M 10.6M Observations: Total explicit allocation: not terribly effected. In fact, 2K-arena CPG has less explicit/js allocation than trunk, despite having 33M more gc-heap! I'm fairly certain this is the hoisting from JSCompartment to JSRuntime which saves memory on jaeger compartments and trace monitors. The 2K arenas help arena-unused. Attached is the distribution of full arenas, which show that 2K is also not packing gc things as well as 4K and hence may benefit from further twiddling. However, 2K arenas hurt chunk-dirty-unused; this could be further improved by giving arenas an "affinity" toward chunks allocated/shared by a domain (this is equivalent to bug 671702 comment 61). So this is positive for CPG. There is still more work (mentioned above) to get CPG to actually give the necessary 1:1 invariant that makes all the magic happen; also work will be needed to fix bugs resulting from proxies not being totally transparent (obj->getClass() == &js_SomeClass, I'm looking at you).
Assignee: general → luke
Status: NEW → ASSIGNED
Comment 17•13 years ago
|
||
> Total explicit allocation: not terribly effected. In fact, 2K-arena CPG has > less explicit/js allocation than trunk, despite having 33M more gc-heap! > I'm fairly certain this is the hoisting from JSCompartment to JSRuntime > which saves memory on jaeger compartments and trace monitors. So you've already done that hoisting in the patch? Yeah, those results aren't bad. A good stress test is http://gregor-wagner.com/tmp/mem.
Comment 18•13 years ago
|
||
Dromaeo does do a bunch of cross-global calls, but they don't dominate the benchark, which is good in this case. I'd be interested in measuring how the the performance of walking a same-origin subframe's DOM is affected (read: gmail). So maybe just grab the source of http://www.whatwg.org/specs/web-apps/current-work/, stick it in a local file, create another local file pointing an iframe at it, walk over the DOM from script in the parent page, and measure how long that takes before and after the CPG changes? That should be a worst-case of sorts for cross-global calls, I'd think.
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #17) > Yeah, those results aren't bad. A good stress test is > http://gregor-wagner.com/tmp/mem. Ah, yes, I forgot about that. Good results (for 2K arenas, at least); similar to 20x news tabs: Before CPG After 2KB After 4KB explicit 1.50G 1.46G 1.58G js 1.11G 1.05G 1.18G js-gc-heap 521M 552M 665M arena-unused 109M 108.6M 152M chunk-dirty-unused 18.7M 20.5M 23.7M
Assignee | ||
Comment 20•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #18) Good idea. I iterated over all the children via childNodes and build a string containing triplets (nodeName, nodeType, nodeValue) (each of which is a cross-compartment access/call, IIUC). Pretty harsh results: recursion time went from 1530ms to 3400ms. So it's pretty clear we'll need jit fast paths for transparent cross-compartment wrappers. Gal seems to be wanting this anyway for new-fangled proxy-based dom. The question is whether it blocks CPG.
Comment 21•13 years ago
|
||
What does the corresponding number look like in other browsers? My gut feeling is that it need not block as long as we actually commit to doing it and do it.... but I was serious about the gmail thing; cross-frame access is pretty common on the web. The question is whether the difference is felt in practice (e.g. when using gmail on a phone).
Assignee | ||
Comment 22•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #21) > What does the corresponding number look like in other browsers? Chrome gets anywhere from 1200ms to 1400ms. > My gut feeling is that it need not block as long as we actually commit to > doing it and do it.... but I was serious about the gmail thing; cross-frame > access is pretty common on the web. The question is whether the difference > is felt in practice (e.g. when using gmail on a phone). I made a counter that incs every time a JSCrossCompartmentWrapper::* function was called. Starting gmail triggers 10K such calls. Clicking "All Mail" triggers 3K. 1K to click "New Mail". 10K cross-compartment calls in the shell measures 2-3ms, so the problem is real but it seems ignorable for GMail. Note that CPG allows bug 625199 which decreases this cost further: enter/leaving a compartment is just changing cx->compartment, no frame push/pop.
Comment 23•13 years ago
|
||
> 10K cross-compartment calls in the shell measures 2-3ms
Shell cross-compartment calls don't have to do security checks, unlike browser ones, right? It's worth measuring the browser number here too, just to be sure.
But yes, with any luck that's low enough too and we can just get on with this change.
Assignee | ||
Comment 24•13 years ago
|
||
If a same-domain wrapper is doing security checks, it would seem like we're doing something wrong. I'll double-check though.
Comment 25•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #23) > > 10K cross-compartment calls in the shell measures 2-3ms > > Shell cross-compartment calls don't have to do security checks, unlike > browser ones, right? It's worth measuring the browser number here too, just > to be sure. The only times we do security checks for cross compartment calls are for cases involving either document.domain or UniversalXPConnect. Otherwise, we don't do any dynamic checking at the call site (the computation of which wrapper that we chose suffices in other cases).
Comment 27•13 years ago
|
||
Note to self for docs later: need to update the debugger API guide info on global ojects.
Keywords: dev-doc-needed
Comment 28•13 years ago
|
||
So the main cost of the cross-compartment calls, if security checks are not an issue, is going to be lack of ICs and consequent slow paths through the proxy and the DOM binding, right? Is there any way we can special-case any of that stuff to keep IC goodness and a fast path into the DOM? I guess we still need to wrap things on the way out as needed, so can't just punch through the cross-compartment proxy entirely in jitcode... :(
Comment 29•13 years ago
|
||
Also, things that used to be same-origin (and hence not needing a security check) can stop being so when document.domain is set. We have a bug on that...
Assignee | ||
Comment 30•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #28) > I guess we still need to wrap things on the > way out as needed, so can't just punch through the cross-compartment proxy > entirely in jitcode... :( If need be, we could inline the crossCompartmentWrappers.lookup so that, on hit, we stay entirely in jit code. That only helps if the same object is being wrapped, though, so it wouldn't help the "traverse the w3c spec" in comment 20 (since each call would wrap a new node). (In reply to Boris Zbarsky (:bz) from comment #29) > Also, things that used to be same-origin (and hence not needing a security > check) can stop being so when document.domain is set. We have a bug on > that... In that case, since proxies are already in place, it seems like we should be able to go around and update those proxies' ProxyHandlers to be security wrappers.
Comment 31•13 years ago
|
||
Luke, what sort of impact with this have on the heap graph? I suppose it will make it strictly less connected than before, because there are more global objects? I'm just trying to think about if this might have any negative effect on the cycle collector.
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #31) The graph structure should remain the same modulo extra proxies injected between objects that used to be different-global-but-same-compartment and are now different-global-different-compartment. I'm not sure how these (transparent, non-security) wrappers interact with the cycle collector. Another difference is that there will be a bunch more compartments; I'm not sure if that adds any overhead to the CC algorithm.
Comment 33•13 years ago
|
||
Ah, I see. Thanks! The extra wrappers will probably add a few extra nodes, but the CC overhead is probably the least of your problems. I think the CC is pretty much blind to compartments, except that XPConnect root traversal involves iterating over roots, so I guess it could end up spending more time traversing, but that's a pretty insignificant part of the time CC takes.
Comment 34•13 years ago
|
||
> involves iterating over roots
That should be "involves iterating over compartments".
Assignee | ||
Comment 35•13 years ago
|
||
I just wanted to post a folded set of patches that let a basic c-p-g start and run. The patch browses the web without issue. With this applied atop the patch in bug 720580, we can push again to try and identify remaining issues exposed by mochitests.
Updated•13 years ago
|
Blocks: CVE-2012-3985
Comment 36•13 years ago
|
||
This just occurred to me: zombie compartments are pretty common, especially in add-ons. The most common case is when a single global reference holds onto a particular compartment. When that reference gets updated subsequently, the old zombie will be released and a new one created, i.e. only one zombie compartment is alive at any time. Bug 669845 is a good example. With CPG, each compartment will, on average, contain much less data. So the impact of such zombie compartments will be less. Nice!
Comment 37•13 years ago
|
||
This was the source of a leak with the c-p-g patch. The old code subtly cleaned up the compartment private with an nsAutoPtr. I've made it more explicit here.
Comment 39•13 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #38) > Ouch, nice catch! Any more work to do in bug 729589? I've still got a failure in a cross-origin location XPConnect test that I need to ask someone about on monday, but otherwise the XPConnect test suite all seems to pass. Pushed this whole thing to try to get a better sense of how close we are: https://tbpl.mozilla.org/?tree=Try&rev=5e535dd9d9f9
Comment 40•13 years ago
|
||
Last push was borked - trying again: https://tbpl.mozilla.org/?tree=Try&rev=bef6f0742ea5
Comment 41•13 years ago
|
||
Looks like I can't assert what I thought I could assert in the compartment private cleanup patch. Updating.
Attachment #600576 -
Attachment is obsolete: true
Comment 42•13 years ago
|
||
Comment on attachment 601733 [details] [diff] [review] Bug 650353 - Clean up the compartment private. v2 Will need rebasing over bug 730281, and static_cast please.
Comment 43•13 years ago
|
||
I've been doing a lot of fixing, so here's another try push: https://tbpl.mozilla.org/?tree=Try&rev=742de674fd1f
Comment 44•13 years ago
|
||
I'm maintaining a github branch for my current CPG commit stack (dependent patches and luke's patches). I'll be adding new fixes underneath the final patch, and removing the ones merged upstream when I rebase. Follow along if you feel so inclined: https://github.com/bholley/mozilla-central/commits/cpgtrain
Comment 45•13 years ago
|
||
Gave this another try push: https://tbpl.mozilla.org/?tree=Try&rev=b836b7809e8f The only unresolved issue from the last try push is the cross-compartment stack thing (the CAPS test failure). So with any luck, this might look pretty green...
Comment 46•13 years ago
|
||
Arg, hit another bug (bug 734910) Pushed again for linux and macosx64, since those build fast: https://tbpl.mozilla.org/?tree=Try&rev=ef777f8211a0
Comment 50•13 years ago
|
||
This patch is pretty much finalized. There are still a few mochitest-browser-chrome test failures, but they'll almost certainly be separate patches (possibly to the chrome code itself). Flagging mrbkap for review.
Attachment #596175 -
Attachment is obsolete: true
Attachment #601733 -
Attachment is obsolete: true
Attachment #606646 -
Flags: review?(mrbkap)
Comment 51•13 years ago
|
||
Comment on attachment 606646 [details] [diff] [review] Implement Compartment-Per-Global in XPConnect. v3 Review of attachment 606646 [details] [diff] [review]: ----------------------------------------------------------------- Straightforward. I like it! ::: js/xpconnect/src/XPCJSRuntime.cpp @@ +262,3 @@ > > + // Get the current compartment private. > + xpc::CompartmentPrivate *priv = Why prefer the explicit delete to the AutoPtr?
Attachment #606646 -
Flags: review?(mrbkap) → review+
Comment 52•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=7c454ec7fafd Note that this stack will still break on the devtools tests and the webgl tests. The former I know the fix for but haven't yet written, and the latter is bug 737245 (typed array wrapper handling). I had a patch stack for that, but it wasn't really the right approach, and more architecture work was required. So sfink and evilpies are working on it, and will hopefully land their stuff mid next week.
Comment 53•13 years ago
|
||
Various failures that might be because of bug 738874. Did a push without it: https://tbpl.mozilla.org/?tree=Try&rev=ba083543a99e
Comment 54•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=f886d566a87a This is still waiting on some spidermonkey work by sfink and some Places debugging by mak.
Comment 56•13 years ago
|
||
Try broke this morning. Pushing again... https://tbpl.mozilla.org/?tree=Try&rev=8eebe80e23f9
Comment 57•13 years ago
|
||
Alright, looking better. The typedarray stuff is still in an inconsistent state, so it's crashing the mochitest-o runs. Added a push with those commented out, and perf tests enabled: https://tbpl.mozilla.org/?tree=Try&rev=42fa3aa03f84 And, for comparison, a baseline Talos run: https://tbpl.mozilla.org/?tree=Try&rev=99d0b7c3668a
Comment 58•13 years ago
|
||
Oh, hm. 705.39->358.37 on dromaeo? I thought comment 16 indicated that dromaeo was fine. Ah, CPG... I'll fire up a profiler on monday and see what's up.
Comment 59•13 years ago
|
||
Comment 16 seems to only cover a subset of the Dromaeo tests. Specifically the DOM and jslib ones. http://perf.snarkfest.net/compare-talos/index.html?oldRevs=42fa3aa03f84&newRev=99d0b7c3668a&submit=true shows that those are only a few % different with the patch. But the "basics" tests are about 2x slower as you note, the "css" tests are 10% slower, and the Dromaeo versions of "sunspider" and "v8" are about 2.5 and 3x slower respectively. That's overall. Some subtests are hit more than others. I'd probably start by profiling the dromaeo version of Sunspider's string-fasta, which got about 8x slower....
Assignee | ||
Comment 61•13 years ago
|
||
I don't think comment 20 is what's happening: comment 20 did nothing but touch objects from another compartment. For Dromaeo SS, there is the cross-compartment call per test-run, but otherwise there shouldn't any cross-compartment touching. I looked at a profile with bholley and indeed there was no time under JSCompartment::wrap or CrossCompartmentWrapper. It looks like we are spending a lot of time in SetElem and iterator stubs that should ordinarily be in jit fast-path code. That suggests some fast-path pre-condition is being violated. I still don't know why, so I'll have to dig in.
Assignee | ||
Comment 62•13 years ago
|
||
Hah! So I was working my way back from "why are we calling stubs::Iter in string-fasta so much" until I realized that TI is completely disabled for the inner dromaeo iframe. Slowlarity ensues. The way TI is disabled is that, when the iframe's compartment is created, !(cx->runOptions & JSOPTION_TYPE_INFERENCE), which sets newCompartment->type.inferenceEnabled = false. In theory, this setting gets set on the new context by JSOptionChangedCallback, but that isn't happening (or perhaps its happening after the new compartment is created?). I tried to understand what nsJSContext is doing but it's all rather entwined. Bug 742497 can't come soon enough.
Comment 63•13 years ago
|
||
If you breakpoint in xpc_CreateGlobalObject conditioned on !(cx->runOptions & JSOPTION_TYPE_INFERENCE), what's the stack look like?
Alias: cpg
Comment 66•13 years ago
|
||
Try Push: https://tbpl.mozilla.org/?tree=Try&rev=8f2994ea0f7b And a talos reference push: https://tbpl.mozilla.org/?tree=Try&rev=20892516ccc7 The last failure that I know about is bug 747749 (android reftest crashes). Let's hope that this try push doesn't reveal any new ones. ;-)
Comment 67•13 years ago
|
||
Alright, so things are looking pretty green. Here's what's left: 1 - The last test failure, bug 747749. I'm hoping luke will have some guesses here as to what's going on. 2 - Performance. I did some try runs, and here's what the numbers look like: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=20892516ccc7&newRev=8f2994ea0f7b&submit=true The significant factors seem to be: A: 3% dromaeo CSS regression. B: 7-23% tpaint regression, depending on the platform. C: A ~10% memory hit. AIUI, luke has patches to make that go away. My gut feeling is that B might be unavoidable, since we're inherently doing a lot more work when we open windows (since we're always creating a compartment). But I don't have a ton of experience in this area, so I'm hoping that bz can weigh in. As for A, I've done some profiling. The profiles look pretty similar, with the exception that 40% of the calls into the hot code come via JaegerShotAtSafePoint without CPG (with the other 60% coming through JaegerShot), but they _all_ come through JaegerShot with CPG. bhackett tells me that there isn't a performance difference, but I still think that this is somewhat suspicious since it indicates that the JIT is behaving differently in the two cases. bhackett suggests that someone investigate this and determine whether we're making more stub calls (which would indicate that we have to bail from the jit). Anyway, this isn't my area of expertise, so I'm hoping luke can look into it. Apart from that, we should be just about ready to land. Onward!
Comment 68•13 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=3230c667f111 Theoretically, this should be totally green. Let's see if theory matches practice.
Assignee | ||
Comment 69•13 years ago
|
||
I looked into the cssquery-prototype regression. There were no subtest outliers, most had a slowdown of 1-4%. So, I focused on one subtest ($$("#speech5")) that has a reliable 4-5% slowdown. Shark points to overhead from CCW calls for each execution of $$("#speech5"). To confirm this, I doubled/quadrupled the body of the test (to amortize the CCW call) and the slowdown decreased proportionately. Thus, we have a short-running test that crosses globals 50K times. To fix this, we need the CCW call ic mentioned earlier. This wants IM and bug 625199 (which blocks on this bug). Thus, I think we should eat this loss for now and fix CCW calls later. Curiously, I couldn't get the overhead less than 1-2%. Sharking $$("#speech5") executed 1000 times in a loop, I see all the CCW overhead gone, and instead the extra 1-2% is all in nsGenericElement::doQuerySelectorAll self time. Any possible explanations bz?
Comment 70•13 years ago
|
||
Not offhand. The only thing I can think of is that doQuerySelectorAll is very cache-sensitive (it's very tight-loop code mostly gated on how fast it can get data from the pointer-chase to the CPU), so if the CCW calls in between blow out the cache somehow.... I know it's a stretch.
Assignee | ||
Comment 71•13 years ago
|
||
I sent off some talos pushes to measure cpg when combined with bug 720753: trunk: https://tbpl.mozilla.org/?tree=Try&rev=36538bb034fb bug 720753: https://tbpl.mozilla.org/?tree=Try&rev=94a57710a02b cpg+720753: https://tbpl.mozilla.org/?tree=Try&rev=35de20b7eae9
Assignee | ||
Comment 72•13 years ago
|
||
Here's the talos results of trunk vs. cpg+720753: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=36538bb034fb&newRev=35de20b7eae9&submit=true It looks like bug 720753 decreases the tp4_rss regression from 6-8% (in comment 67) to 4-5%. Note: GC arenas are still 4K. comment 16 and 19 experimented with 2K arenas but, since those comments, this stopped being an option due to the GC decommit landing (which needs page-sized arenas). That means we are still going to have a lot of under-utilized arenas as described in comment 16. I talked about this will billm and we had a few plans to share GC arenas between compartments in many-small-compartment scenarios. (This would be generally useful because it would avoid the tension between wanting to add more FinalizeKinds (for more GC-thing types or sizes of GC-things) and wanting fewer FinalizeKinds to decrease internal fragmentation.)
Just to clarify about the arena thing: I think it would be reasonable to put multiple FinalizeKinds in the same arena. I'm much more skeptical about sharing arenas between compartments. Doing so would require a ton of changes throughout the GC, since that assumption is baked in pretty deeply.
Comment 74•13 years ago
|
||
Gave this one final try push to make sure the tests are green: https://tbpl.mozilla.org/?tree=Try&rev=2b7d9a8bfa12 As for performance, it sounds like we're going to accept the 3% cssquery regression, as well as the 5% memory regression. Luke's putting together a patch over in bug 749617 to mitigate the 10% tpaint regression (with the longer-term fix coming thereafter). If it's reviewed and green on try by tomorrow morning (europe time), I'll push with it. Otherwise, we'll do it as a followup.
Comment 75•13 years ago
|
||
> I think it would be reasonable to put multiple FinalizeKinds in the same
> arena. I'm much more skeptical about sharing arenas between compartments.
> Doing so would require a ton of changes throughout the GC, since that
> assumption is baked in pretty deeply.
That sounds sensible to me. Would the shared FinalizeKinds need to have the same GCThing size?
Comment 76•13 years ago
|
||
> As for performance, it sounds like we're going to accept the 3% cssquery regression
"Accept" as in "push to get it fixed in the JS engine", right?
Comment 77•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #76) > > As for performance, it sounds like we're going to accept the 3% cssquery regression > > "Accept" as in "push to get it fixed in the JS engine", right? The 'make-the-jit-CCW-aware' stuff? If so, yeah, I agree.
Comment 78•13 years ago
|
||
Sigh. More stupid leak stuff. Two more identical pushes to try, so that we can be more sure about any potentially intermittent stuff: https://tbpl.mozilla.org/?tree=Try&rev=7daa9d995250 https://tbpl.mozilla.org/?tree=Try&rev=295f04b65f06
Comment 79•13 years ago
|
||
Everything looks good, so I'm pushing to inbound. Here goes... http://hg.mozilla.org/integration/mozilla-inbound/rev/bed8c4e3dfdf
Target Milestone: --- → mozilla15
Comment 80•13 years ago
|
||
This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc. Can we please back it out?
Comment 81•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #80) > This regressed many benchmarks, such as Dromaeo, Tp5, Sunspider 2, etc. Can > we please back it out? OK, after reading the bug, it looks like we were planning to take this regression. In the future, please send an email to dev-tree-management before landing this kind of expected regression so that people know in advance to expect this. Thanks!
Comment 82•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/bed8c4e3dfdf MMM88&&&, ,MMM8&&&. `'MMM88&&&, MMMMM88&&&& 'MMM88&&&, MMMMM88&&&&&& 'MMM88&&&, MMMMM88&&&&&& 'MMM88&&& MMMMM88&&&&&& 'MMM88&&& MMMMM88&&&& MMM88&&& 'MMM8&&&' MMMM888&&&& 'MM88&&& MMMM88&&&&& MM88&&& MMMM88&&&&& MM88&&& ,MMM8&&&. MM88&&& MMMMM88&&&& ,MM88&&& MMMMM88&&&&&& MMM88&&&' MMMMM88&&&&&& MMM88&&&' MMMMM88&&&&&& MMM88&&&' MMMMM88&&&& MMM88&&&' 'MMM8&&&' MMM88&&&' MMM88&&&'
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 83•13 years ago
|
||
well... this makes reading about:compartments with memchaser enabled a fun time an additional 60 lines of js files in the system compartments
No longer depends on: 753844
Depends on: 754600
Updated•12 years ago
|
Depends on: CVE-2012-4212
Updated•4 years ago
|
Keywords: dev-doc-needed
You need to log in
before you can comment on or make changes to this bug.
Description
•