Closed
Bug 749617
Opened 12 years ago
Closed 12 years ago
10% tpaint regression with compartment-per-global
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla15
People
(Reporter: bholley, Unassigned)
References
Details
Attachments
(5 files, 1 obsolete file)
2.75 KB,
text/plain
|
Details | |
29.20 KB,
application/x-bzip2
|
Details | |
6.10 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
14.14 KB,
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
24.03 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
This is the very last CPG blocker. CPG gives us a 7-23% tpaint regression, where tpaint measures the time it takes to open up a new window and do the initial paint. I spent some time with a profiler, and the issue appears to be that we're spending time doing script analysis for scripts we cloned across globals. Gecko caches scripts in various places (like the XUL prototype cache). These scripts live in a compartment, so luke added some code in JS_ExecuteScript to just clone them if they come from the wrong compartment: http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#5239 So my question is, given that we only need to clone the script so that it lives in a new compartment, is there some way we can reuse the old analysis? It seems like there isn't a huge amount of stuff that changed. These scripts were being reused against multiple globals before, so they can't be tied to any particular global. It's just a compartment issue. CCing people who know about this stuff. Compartment-per-global is ready to land, modulo this issue. If you guys want it as bad as I do, then hopefully someone can lend a hand to push this over the line. :-)
Reporter | ||
Comment 1•12 years ago
|
||
Also, the hot function seems to be js::analysis::ScriptAnalysis::analyzeBytecode, at least in the profiler.
Comment 2•12 years ago
|
||
One simple idea is to turn off TI in the clone's destination compartment. There is even reason that this may be an overall speedup: clearly this code is not running for long if analysis time is significant (although that may just be how tpaint exercises it). Also, since everything gets thrown out on every GC, every time we run this chrome code we may be analyzing the scripts anew. If nothing else, the patch should be simple (is it as easy as just cx->compartment->inferenceEnabled = false?) so perhaps worth just try'ing.
Comment 3•12 years ago
|
||
Right now we analyzeBytecode in all scripts that are executed, whether TI is enabled or not. I'm don't think this behavior is actually necessary and will try fixing this first thing tomorrow.
Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #3) > Right now we analyzeBytecode in all scripts that are executed, whether TI is > enabled or not. Right. These are chrome scripts, so TI is already disabled on the compartment. > I'm don't think this behavior is actually necessary and will try fixing > this first thing tomorrow. Huzzah! Thanks Brian.
Comment 5•12 years ago
|
||
Patch which works in the shell, about to push to try.
Reporter | ||
Comment 6•12 years ago
|
||
First patch didn't work, so Brian and I did some pair debugging. Attaching the resulting patch. This patch makes analyzeBytecode disappear from the profile, but unfortunately doesn't make the tpaint regression go away. So it must be something else... :\
Attachment #619580 -
Attachment is obsolete: true
Reporter | ||
Comment 7•12 years ago
|
||
I've got to head out for the evening, though I'll check mail before heading to bed when I get home. If we're lucky, whatever luke discovers about the cssquery regression might indicate what's going on here.
Reporter | ||
Comment 8•12 years ago
|
||
Alright, so it looks like bhackett's patch didn't do the trick here. I've been running numbers all morning. Each run opens 50 windows, and I did a number of runs for various configurations. Here are the parameters: precpg/cpg: with/without cpg cloneall: a patch that clones _every_ script in JS_ExecuteScript. bhackett3: bhackett's 3rd patch (demonstrated to take analyzeBytecode out of the profile) Here are my results (each number represents the average opening time (in ms) of a 50-window run, with the first open removed from the average): precpg + cloneall + bhackett3: 192,194,192 precpg + cloneall: 199, 201, 193,195,198 (not sure why there's such high variance here) precpg: 181,185,187,181 cpg + cloneall + bhackett3: 200,202,202 cpg + bhackett3: 202, 200,199 cpg: 200, 201, 202 So this tells us a few things. First, cloneall makes a significant difference when applied to precpg, but almost none when applied to cpg. This implies that the cloning is the source of the performance regression. Second, bhackett3 made either slight or no impact in clone-heavy cases, even though it made analyzeBytecode disappear off the profile. So all this seems to indicate that there's some other way that CloneScript is hurting us. I'm going to see if I can convince Instruments to give me another lead.
Summary: Reduce the performance impact of analyzing scripts cloned across globals → 10% tpaint regression with compartment-per-global
Reporter | ||
Comment 9•12 years ago
|
||
I just checked that none of the numbers change if I turned off TI, chrome jit, and content jit. So the issue here is somewhere else.
Reporter | ||
Comment 10•12 years ago
|
||
So I just added some test code to precpg that _just_ does a CloneScript in JS_ExecuteScript, but doesn't actually use the result (it still executes the old scriptArg, if it can). I got 189,191,193,197. So a good chunk (half-ish?) of the overhead is just cloning the script, regardless of whether we're running it.
Comment 11•12 years ago
|
||
At a glance, js::CloneScript looks hideously inefficient --- it XDR-encodes the script into a byte stream, then decodes it into the new compartment.
Comment 12•12 years ago
|
||
You're doing a window.open, right? So the scripts you're cloning include chrome://browser/content/browser.js, at least. All 13000 lines of it. How long does it take to CloneScript just that one script?
Reporter | ||
Comment 13•12 years ago
|
||
I had to do some hacking on tpaint, so I'm attaching my resulting working directory (with a git repo inside of it). There are two branches, "master" and "profiling". The latter does programmatic enabling/disabling of the profiler. To run, you need to do the following: * enable remote xul with a pref * enable mozafterpaint with a pref * ./firefox -P yourprofile -chrome file:///full/path/to/winopen.xul?mozafterpaint=1%26phase1=20 The phase1= parameter specifies how many runs it does.
Reporter | ||
Comment 14•12 years ago
|
||
Rather than making CloneScript faster, it probably makes the most sense just to do bug 679940.
Reporter | ||
Comment 15•12 years ago
|
||
This is the log of the clone going on: http://pastebin.mozilla.org/1610307
Comment 16•12 years ago
|
||
Sharking "window.open" (which exercises this whole JS_Execute/CloneScript path) shows about 5.1% of total time under CloneScript and 2.2% total time under XDRAtom. Since atoms are shareable across compartments, this suggests a hack to avoid serializing+deserializing+hashing atoms.
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #16) > Sharking "window.open" (which exercises this whole JS_Execute/CloneScript > path) shows about 5.1% of total time under CloneScript and 2.2% total time > under XDRAtom. Since atoms are shareable across compartments, this suggests > a hack to avoid serializing+deserializing+hashing atoms. That sounds good - we can do that before the landing, and track bug 679940 (but not block on it). Is this the sort of patch you can put together in a jiffy?
Comment 18•12 years ago
|
||
Actually, there is more CloneScript than I thought; a lot of it comes from JS_CloneFunctionObject (a lot of that from XBL) and more XDRAtom than I thought (under XDRStaticBlock). I'll try to do this hack and see what it gets us.
Comment 19•12 years ago
|
||
Simple preparatory patch: assert (as is already true) that script constant strings are atoms.
Attachment #620198 -
Flags: review?(n.nethercote)
Comment 20•12 years ago
|
||
getLocalNamesArray should provide BindingKind in addition to name.
Attachment #620199 -
Flags: review?(jwalden+bmo)
Comment 21•12 years ago
|
||
Comment on attachment 620198 [details] [diff] [review] rm codeString > template<XDRMode mode> > bool > js::XDRAtom(XDRState<mode> *xdr, JSAtom **atomp) > { > if (mode == XDR_ENCODE) { >- JSString *str = *atomp; >- return xdr->codeString(&str); >+ uint32_t nchars = (*atomp)->length(); >+ if (!xdr->codeUint32(&nchars)) >+ return false; >+ >+ jschar *chars = const_cast<jschar *>((*atomp)->getChars(xdr->cx())); >+ if (!chars) >+ return false; >+ >+ return xdr->codeChars(chars, nchars); > } I was expecting to see a corresponding change in DECODE code for this... I guess there must be a reason why it's not necessary?
Attachment #620198 -
Flags: review?(n.nethercote) → review+
Comment 22•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #21) Yep. (Wow, you're fast! Real patch in when some more try results come in.)
Comment 23•12 years ago
|
||
This patch implements CloneScript by doing a clone directly (no XDR involved). This avoids all the atomization overhead as well as the copying overhead to and from script/script->data and the XDR buffer. Running the same window.open micro-bench, this patch cuts the slowdown almost in half (from 20% slower to 11% slower). I'm still waiting for the full try run: https://tbpl.mozilla.org/?tree=Try&rev=674b5a02bc44 but based on the partial results of a previous try push (with a borked assert), I see the tpaint slowdown at ~3% on 10.6 and 10.7.
Attachment #620207 -
Flags: review?(n.nethercote)
Comment 24•12 years ago
|
||
Green on try
Comment 25•12 years ago
|
||
Comment on attachment 620199 [details] [diff] [review] improve getLocalNamesArray Review of attachment 620199 [details] [diff] [review]: ----------------------------------------------------------------- You sure we don't want low-bit twiddling instead? Chicken!
Attachment #620199 -
Flags: review?(jwalden+bmo) → review+
Comment 26•12 years ago
|
||
Comment on attachment 620207 [details] [diff] [review] make CloneScript fast Review of attachment 620207 [details] [diff] [review]: ----------------------------------------------------------------- I don't love this patch, but it'll do. My main concern is about the JSCompartment maps, see below. ::: js/src/jsfun.cpp @@ +373,5 @@ > template<XDRMode mode> > bool > js::XDRInterpretedFunction(XDRState<mode> *xdr, JSObject **objp, JSScript *parentScript) > { > + /* NB: updates to XDRInterpretedFunction should propagate to CloneInterpretedFunction. */ You mean "if you modify this function you should modify CloneInterpretedFunction as well"? If so, can you make the comment clearer? (Maybe "Keep this in sync with CloneInterpretedFunction".) And should CloneInterpretedFunction have a similar comment pointing back here? ::: js/src/jsscript.cpp @@ +397,5 @@ > template<XDRMode mode> > bool > js::XDRScript(XDRState<mode> *xdr, JSScript **scriptp, JSScript *parentScript) > { > + /* NB: updates to XDRScript should propagate to CloneScript. */ Again, I found "propagate" unclear here. @@ -1137,5 @@ > /* > * We assume that calloc aligns on sizeof(Value) if the size we ask to > * allocate divides sizeof(Value). > */ > - JS_STATIC_ASSERT(sizeof(Value) == sizeof(double)); The comment is kind of bogus too. A better comment would be "We assume that calloc aligns on sizeof(Value) if the size we request is greater than sizeof(Value)." And can you add an assertion that |data| actually is sizeof(Value)-aligned? Thanks! @@ +1749,2 @@ > JSScript * > +js::CloneScript(JSContext *cx, JSScript *src) This function is disturbingly low-level, but I guess you gotta do what you gotta do :/ JSCompartment has three maps holding optional per-JSScript info: scriptCountsMap, sourceMapMap, debugScriptMap. Does the info in these maps need to be copied for the new script? I'm not sure. If it does, it does; if it does not, a comment explaining why would be helpful. @@ +1763,5 @@ > + size_t size = ScriptDataSize(cx, src->length, src->numNotes(), src->natoms, > + nobjects, nregexps, ntrynotes, nconsts, nglobals, > + nClosedArgs, nClosedVars); > + > + uint8_t *data = static_cast<uint8_t *>(cx->calloc_(JS_ROUNDUP(size, sizeof(Value)))); Can you assert that this is sizeof(Value)-aligned as well? ::: js/src/vm/RegExpObject.cpp @@ +757,5 @@ > template<XDRMode mode> > bool > js::XDRScriptRegExpObject(XDRState<mode> *xdr, HeapPtrObject *objp) > { > + /* NB: updates to XDRScriptRegExpObject should propagate to CloneScriptRegExpObject. */ Ditto for "propagate", and the comment pointing back. ::: js/src/vm/ScopeObject.cpp @@ +903,5 @@ > template<XDRMode mode> > bool > js::XDRStaticBlockObject(XDRState<mode> *xdr, JSScript *script, StaticBlockObject **objp) > { > + /* NB: updates to XDRStaticBlockObject should propagate to CloneStaticBlockObject. */ Ditto for "propagate". @@ +1018,5 @@ > js::XDRStaticBlockObject(XDRState<XDR_DECODE> *xdr, JSScript *script, StaticBlockObject **objp); > + > +JSObject * > +js::CloneStaticBlockObject(JSContext *cx, StaticBlockObject &srcBlock, > + const AutoObjectVector &objects, JSScript *src) I won't pretend to understand this function -- I don't know anything about StaticBlockObject.
Attachment #620207 -
Flags: review?(n.nethercote) → review+
Comment 27•12 years ago
|
||
Thanks again njn! https://hg.mozilla.org/mozilla-central/rev/6dbb135d3f1f https://hg.mozilla.org/mozilla-central/rev/1dd95b7228a6 https://hg.mozilla.org/mozilla-central/rev/d4e35005f5a9
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Comment 28•12 years ago
|
||
Followup to fix warnings: https://hg.mozilla.org/mozilla-central/rev/d701a77854b1
Comment 29•12 years ago
|
||
On failure return values, I'm insufferable.
You need to log in
before you can comment on or make changes to this bug.
Description
•