Closed Bug 749617 Opened 12 years ago Closed 12 years ago

10% tpaint regression with compartment-per-global

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla15

People

(Reporter: bholley, Unassigned)

References

Details

Attachments

(5 files, 1 obsolete file)

patch 12 years ago Brian Hackett [Laid off!] 2.71 KB, patch		Details \| Diff \| Splinter Review
updated patch 12 years ago Bobby Holley (:bholley) 2.75 KB, text/plain		Details
tpaint setup 12 years ago Bobby Holley (:bholley) 29.20 KB, application/x-bzip2		Details
rm codeString 12 years ago Luke Wagner [:luke] 6.10 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
improve getLocalNamesArray 12 years ago Luke Wagner [:luke] 14.14 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
make CloneScript fast 12 years ago Luke Wagner [:luke] 24.03 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review

Bobby Holley (:bholley)

Reporter

Description

•

12 years ago

This is the very last CPG blocker.

CPG gives us a 7-23% tpaint regression, where tpaint measures the time it takes to open up a new window and do the initial paint.

I spent some time with a profiler, and the issue appears to be that we're spending time doing script analysis for scripts we cloned across globals.

Gecko caches scripts in various places (like the XUL prototype cache). These scripts live in a compartment, so luke added some code in JS_ExecuteScript to just clone them if they come from the wrong compartment:
http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#5239

So my question is, given that we only need to clone the script so that it lives in a new compartment, is there some way we can reuse the old analysis? It seems like there isn't a huge amount of stuff that changed. These scripts were being reused against multiple globals before, so they can't be tied to any particular global. It's just a compartment issue.

CCing people who know about this stuff. Compartment-per-global is ready to land, modulo this issue. If you guys want it as bad as I do, then hopefully someone can lend a hand to push this over the line. :-)

Bobby Holley (:bholley)

Reporter

Comment 1

•

12 years ago

Also, the hot function seems to be js::analysis::ScriptAnalysis::analyzeBytecode, at least in the profiler.

Luke Wagner [:luke]

Comment 2

•

12 years ago

One simple idea is to turn off TI in the clone's destination compartment.  There is even reason that this may be an overall speedup: clearly this code is not running for long if analysis time is significant (although that may just be how tpaint exercises it).  Also, since everything gets thrown out on every GC, every time we run this chrome code we may be analyzing the scripts anew.

If nothing else, the patch should be simple (is it as easy as just cx->compartment->inferenceEnabled = false?) so perhaps worth just try'ing.

Brian Hackett [Laid off!]

Comment 3

•

12 years ago

Right now we analyzeBytecode in all scripts that are executed, whether TI is enabled or not.  I'm don't think this behavior is actually necessary and will try fixing this first thing tomorrow.

Bobby Holley (:bholley)

Reporter

Comment 4

•

12 years ago

(In reply to Brian Hackett (:bhackett) from comment #3)
> Right now we analyzeBytecode in all scripts that are executed, whether TI is
> enabled or not.

Right. These are chrome scripts, so TI is already disabled on the compartment.

> I'm don't think this behavior is actually necessary and will try fixing
> this first thing tomorrow.

Huzzah! Thanks Brian.

Brian Hackett [Laid off!]

Comment 5

•

12 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Patch which works in the shell, about to push to try.

Bobby Holley (:bholley)

Reporter

Comment 6

•

12 years ago

Attached file updated patch — Details

First patch didn't work, so Brian and I did some pair debugging. Attaching the resulting patch.

This patch makes analyzeBytecode disappear from the profile, but unfortunately doesn't make the tpaint regression go away. So it must be something else... :\

Attachment #619580 - Attachment is obsolete: true

Bobby Holley (:bholley)

Reporter

Comment 7

•

12 years ago

I've got to head out for the evening, though I'll check mail before heading to bed when I get home. If we're lucky, whatever luke discovers about the cssquery regression might indicate what's going on here.

Bobby Holley (:bholley)

Reporter

Comment 8

•

12 years ago

Alright, so it looks like bhackett's patch didn't do the trick here. I've been running numbers all morning. Each run opens 50 windows, and I did a number of runs for various configurations. Here are the parameters:

precpg/cpg: with/without cpg
cloneall: a patch that clones _every_ script in JS_ExecuteScript.
bhackett3: bhackett's 3rd patch (demonstrated to take analyzeBytecode out of the profile)

Here are my results (each number represents the average opening time (in ms) of a 50-window run, with the first open removed from the average):

precpg + cloneall + bhackett3: 192,194,192
precpg + cloneall: 199, 201, 193,195,198 (not sure why there's such high variance here)
precpg: 181,185,187,181

cpg + cloneall + bhackett3: 200,202,202
cpg + bhackett3: 202, 200,199
cpg: 200, 201, 202


So this tells us a few things. First, cloneall makes a significant difference when applied to precpg, but almost none when applied to cpg. This implies that the cloning is the source of the performance regression. Second, bhackett3 made either slight or no impact in clone-heavy cases, even though it made analyzeBytecode disappear off the profile.

So all this seems to indicate that there's some other way that CloneScript is hurting us. I'm going to see if I can convince Instruments to give me another lead.

Summary: Reduce the performance impact of analyzing scripts cloned across globals → 10% tpaint regression with compartment-per-global

Bobby Holley (:bholley)

Reporter

Comment 9

•

12 years ago

I just checked that none of the numbers change if I turned off TI, chrome jit, and content jit. So the issue here is somewhere else.

Bobby Holley (:bholley)

Reporter

Comment 10

•

12 years ago

So I just added some test code to precpg that _just_ does a CloneScript in JS_ExecuteScript, but doesn't actually use the result (it still executes the old scriptArg, if it can).

I got 189,191,193,197. So a good chunk (half-ish?) of the overhead is just cloning the script, regardless of whether we're running it.

Brian Hackett [Laid off!]

Comment 11

•

12 years ago

At a glance, js::CloneScript looks hideously inefficient --- it XDR-encodes the script into a byte stream, then decodes it into the new compartment.

Boris Zbarsky [:bzbarsky]

Comment 12

•

12 years ago

You're doing a window.open, right?  So the scripts you're cloning include chrome://browser/content/browser.js, at least.  All 13000 lines of it.  How long does it take to CloneScript just that one script?

Bobby Holley (:bholley)

Reporter

Comment 13

•

12 years ago

Attached file tpaint setup — Details

I had to do some hacking on tpaint, so I'm attaching my resulting working directory (with a git repo inside of it). There are two branches, "master" and "profiling". The latter does programmatic enabling/disabling of the profiler.

To run, you need to do the following:
* enable remote xul with a pref
* enable mozafterpaint with a pref
* ./firefox -P yourprofile -chrome file:///full/path/to/winopen.xul?mozafterpaint=1%26phase1=20

The phase1= parameter specifies how many runs it does.

Bobby Holley (:bholley)

Reporter

Comment 14

•

12 years ago

Rather than making CloneScript faster, it probably makes the most sense just to do bug 679940.

Bobby Holley (:bholley)

Reporter

Comment 15

•

12 years ago

This is the log of the clone going on: http://pastebin.mozilla.org/1610307

Luke Wagner [:luke]

Comment 16

•

12 years ago

Sharking "window.open" (which exercises this whole JS_Execute/CloneScript path) shows about 5.1% of total time under CloneScript and 2.2% total time under XDRAtom.  Since atoms are shareable across compartments, this suggests a hack to avoid serializing+deserializing+hashing atoms.

Bobby Holley (:bholley)

Reporter

Comment 17

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #16)
> Sharking "window.open" (which exercises this whole JS_Execute/CloneScript
> path) shows about 5.1% of total time under CloneScript and 2.2% total time
> under XDRAtom.  Since atoms are shareable across compartments, this suggests
> a hack to avoid serializing+deserializing+hashing atoms.

That sounds good - we can do that before the landing, and track bug 679940 (but not block on it). Is this the sort of patch you can put together in a jiffy?

Luke Wagner [:luke]

Comment 18

•

12 years ago

Actually, there is more CloneScript than I thought; a lot of it comes from JS_CloneFunctionObject (a lot of that from XBL) and more XDRAtom than I thought (under XDRStaticBlock).  I'll try to do this hack and see what it gets us.

Luke Wagner [:luke]

Comment 19

•

12 years ago

Attached patch rm codeString — Details — Splinter Review

Simple preparatory patch: assert (as is already true) that script constant strings are atoms.

Attachment #620198 - Flags: review?(n.nethercote)

Luke Wagner [:luke]

Comment 20

•

12 years ago

Attached patch improve getLocalNamesArray — Details — Splinter Review

getLocalNamesArray should provide BindingKind in addition to name.

Attachment #620199 - Flags: review?(jwalden+bmo)

Nicholas Nethercote [inactive]

Comment 21

•

12 years ago

Comment on attachment 620198 [details] [diff] [review]
rm codeString

> template<XDRMode mode>
> bool
> js::XDRAtom(XDRState<mode> *xdr, JSAtom **atomp)
> {
>     if (mode == XDR_ENCODE) {
>-        JSString *str = *atomp;
>-        return xdr->codeString(&str);
>+        uint32_t nchars = (*atomp)->length();
>+        if (!xdr->codeUint32(&nchars))
>+            return false;
>+
>+        jschar *chars = const_cast<jschar *>((*atomp)->getChars(xdr->cx()));
>+        if (!chars)
>+            return false;
>+
>+        return xdr->codeChars(chars, nchars);
>     }

I was expecting to see a corresponding change in DECODE code for this... I guess there must be a reason why it's not necessary?

Attachment #620198 - Flags: review?(n.nethercote) → review+

Luke Wagner [:luke]

Comment 22

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #21)
Yep.  (Wow, you're fast!  Real patch in when some more try results come in.)

Luke Wagner [:luke]

Comment 23

•

12 years ago

Attached patch make CloneScript fast — Details — Splinter Review

This patch implements CloneScript by doing a clone directly (no XDR involved).  This avoids all the atomization overhead as well as the copying overhead to and from script/script->data and the XDR buffer.

Running the same window.open micro-bench, this patch cuts the slowdown almost in half (from 20% slower to 11% slower).  I'm still waiting for the full try run:
  https://tbpl.mozilla.org/?tree=Try&rev=674b5a02bc44
but based on the partial results of a previous try push (with a borked assert), I see the tpaint slowdown at ~3% on 10.6 and 10.7.

Attachment #620207 - Flags: review?(n.nethercote)

Luke Wagner [:luke]

Comment 24

•

12 years ago

Green on try

Jeff Walden [:Waldo]

Comment 25

•

12 years ago

Comment on attachment 620199 [details] [diff] [review]
improve getLocalNamesArray

Review of attachment 620199 [details] [diff] [review]:
-----------------------------------------------------------------

You sure we don't want low-bit twiddling instead?  Chicken!

Attachment #620199 - Flags: review?(jwalden+bmo) → review+

Nicholas Nethercote [inactive]

Comment 26

•

12 years ago

Comment on attachment 620207 [details] [diff] [review]
make CloneScript fast

Review of attachment 620207 [details] [diff] [review]:
-----------------------------------------------------------------

I don't love this patch, but it'll do.  My main concern is about the JSCompartment maps, see below.

::: js/src/jsfun.cpp
@@ +373,5 @@
>  template<XDRMode mode>
>  bool
>  js::XDRInterpretedFunction(XDRState<mode> *xdr, JSObject **objp, JSScript *parentScript)
>  {
> +    /* NB: updates to XDRInterpretedFunction should propagate to CloneInterpretedFunction. */

You mean "if you modify this function you should modify CloneInterpretedFunction as well"?  If so, can you make the comment clearer?  (Maybe "Keep this in sync with CloneInterpretedFunction".)  And should CloneInterpretedFunction have a similar comment pointing back here?

::: js/src/jsscript.cpp
@@ +397,5 @@
>  template<XDRMode mode>
>  bool
>  js::XDRScript(XDRState<mode> *xdr, JSScript **scriptp, JSScript *parentScript)
>  {
> +    /* NB: updates to XDRScript should propagate to CloneScript. */

Again, I found "propagate" unclear here.

@@ -1137,5 @@
>      /*
>       * We assume that calloc aligns on sizeof(Value) if the size we ask to
>       * allocate divides sizeof(Value).
>       */
> -    JS_STATIC_ASSERT(sizeof(Value) == sizeof(double));

The comment is kind of bogus too.  A better comment would be "We assume that calloc aligns on sizeof(Value) if the size we request is greater than sizeof(Value)."

And can you add an assertion that |data| actually is sizeof(Value)-aligned?  Thanks!

@@ +1749,2 @@
>  JSScript *
> +js::CloneScript(JSContext *cx, JSScript *src)

This function is disturbingly low-level, but I guess you gotta do what you gotta do :/

JSCompartment has three maps holding optional per-JSScript info:  scriptCountsMap, sourceMapMap, debugScriptMap.  Does the info in these maps need to be copied for the new script?  I'm not sure.  If it does, it does;  if it does not, a comment explaining why would be helpful.

@@ +1763,5 @@
> +    size_t size = ScriptDataSize(cx, src->length, src->numNotes(), src->natoms,
> +                                 nobjects, nregexps, ntrynotes, nconsts, nglobals,
> +                                 nClosedArgs, nClosedVars);
> +
> +    uint8_t *data = static_cast<uint8_t *>(cx->calloc_(JS_ROUNDUP(size, sizeof(Value))));

Can you assert that this is sizeof(Value)-aligned as well?

::: js/src/vm/RegExpObject.cpp
@@ +757,5 @@
>  template<XDRMode mode>
>  bool
>  js::XDRScriptRegExpObject(XDRState<mode> *xdr, HeapPtrObject *objp)
>  {
> +    /* NB: updates to XDRScriptRegExpObject should propagate to CloneScriptRegExpObject. */

Ditto for "propagate", and the comment pointing back.

::: js/src/vm/ScopeObject.cpp
@@ +903,5 @@
>  template<XDRMode mode>
>  bool
>  js::XDRStaticBlockObject(XDRState<mode> *xdr, JSScript *script, StaticBlockObject **objp)
>  {
> +    /* NB: updates to XDRStaticBlockObject should propagate to CloneStaticBlockObject. */

Ditto for "propagate".

@@ +1018,5 @@
>  js::XDRStaticBlockObject(XDRState<XDR_DECODE> *xdr, JSScript *script, StaticBlockObject **objp);
> +
> +JSObject *
> +js::CloneStaticBlockObject(JSContext *cx, StaticBlockObject &srcBlock,
> +                           const AutoObjectVector &objects, JSScript *src)

I won't pretend to understand this function -- I don't know anything about StaticBlockObject.

Attachment #620207 - Flags: review?(n.nethercote) → review+

Luke Wagner [:luke]

Comment 27

•

12 years ago

Thanks again njn!

https://hg.mozilla.org/mozilla-central/rev/6dbb135d3f1f
https://hg.mozilla.org/mozilla-central/rev/1dd95b7228a6
https://hg.mozilla.org/mozilla-central/rev/d4e35005f5a9

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla15

Ed Morley [:emorley]

Comment 28

•

12 years ago

Followup to fix warnings:
https://hg.mozilla.org/mozilla-central/rev/d701a77854b1

Luke Wagner [:luke]

Comment 29

•

12 years ago

On failure return values, I'm insufferable.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

12 years ago

Depends on: 752205

Nicholas Nethercote [inactive]

Updated

•

8 years ago

Depends on: 1245233

You need to log in before you can comment on or make changes to this bug.