679940 - split JSScript into a per-compartment portion and a shareable runtime-global portion

Reporter

Description

•

13 years ago

With COMPILE_N_GO removed (see blocking bug), the bytecode, source notes, and many words of JSScript can be shared between compartments.  This can be useful for many reasons:
 - the shareable portion can be cached by necko (see dependent bug)
 - compartment per global wouldn't lose the memory savings of "brutal sharing"

"Brutal sharing" is the JSAPI use of JS_CloneFunctionObject to avoid re-compiling source when creating new documents (e.g., for a tab/window).  This only saves memory because chrome JS is in the same compartment; with compartment-per-global, JS_CloneFunctionObject would have to clone the underlying script.

To see how much memory brutal sharing saves in practice, I instrumented a browser.  Right after startup, brutal sharing saves ~300KB.  Per new tab the savings is ~10K.  Per new window the savings is ~1MB.  So really only new windows are killer, but perhaps most people only use a single window?  Test Pilot should be able to answer this.

Luke Wagner [:luke]

Reporter

Updated

•

13 years ago

Blocks: 679942

David Mandelin [:dmandelin]

Comment 1

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #0)
> To see how much memory brutal sharing saves in practice, I instrumented a
> browser.  Right after startup, brutal sharing saves ~300KB.  Per new tab the
> savings is ~10K.  Per new window the savings is ~1MB.  So really only new
> windows are killer, but perhaps most people only use a single window?  Test
> Pilot should be able to answer this.

What does this look like on Android?

Luke Wagner [:luke]

Reporter

Comment 2

•

13 years ago

(In reply to David Mandelin from comment #1)
Still trying to get this information; Android makes getting log spew difficult.

Luke Wagner [:luke]

Reporter

Comment 3

•

13 years ago

Attached patch measurement patch — Details — Splinter Review

Ah, printf_stderr.  On mobile browser startup, the cross-global sharing is about 100-200KB on both the chrome and content process.  Browsing around (thereby, I guess, exercising more chrome JS) causes this to rise to 500KB per process.  Opening a bunch of tabs (8) brings it up to 700KB.  Even after closing everything, there is still 500KB shared per tab.

So this is a good bit more than desktop (which makes sense, since more of Fennec is implemented in JS).  Measurement patch attached.

David Mandelin [:dmandelin]

Comment 4

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #3)
> Created attachment 555593 [details] [diff] [review]
> measurement patch
> 
> Ah, printf_stderr.  On mobile browser startup, the cross-global sharing is
> about 100-200KB on both the chrome and content process.  Browsing around
> (thereby, I guess, exercising more chrome JS) causes this to rise to 500KB
> per process.  Opening a bunch of tabs (8) brings it up to 700KB.  Even after
> closing everything, there is still 500KB shared per tab.
> 
> So this is a good bit more than desktop (which makes sense, since more of
> Fennec is implemented in JS).  Measurement patch attached.

Thanks. That's not huge, I guess, but it does seem like a pretty significant chunk for those devices and definitely worth keeping the savings.

Till Schneidereit [:till]

Assignee

Comment 5

•

11 years ago

Attached patch Share bytecode and source notes (obsolete) — Details — Splinter Review

After some quick measurements showed that ~60% of all bytecode in a short browsing session were identical to previously stored ones, I took a shot at splitting the bytecodes and source notes off of the JSScripts and storing them uniquely per runtime.

The attached patch does just that, similar to how filenames are handled in JSScripts. Ideally, we wouldn't even create the bytecodes in the first place, but then we'd have to deal with finding a lookup key that's guaranteed to be available. That'd create all sorts of complications with sourceless and nested functions.

Still, this seems to work well and doesn't crash my browser. Try-servering here:
https://tbpl.mozilla.org/?tree=Try&rev=58e3c9b12126

I didn't take any extensive measurements (and didn't yet adapt the memory reporters), but as one datapoint, arstechnica.com creates script-data arrays of ~1.3Mb in total size per compartment. With the patch, this reduces to ~0.4Mb, with the rest being shared among all compartments of that site. As the cache isn't domain-specific, I'm hopeful that we'll achieve considerable savings for common scripts such as jquery.

Attachment #707852 - Flags: review?(luke)

Attachment #707852 - Flags: review?(jwalden+bmo)

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Whiteboard: [MemShrink]

Luke Wagner [:luke]

Reporter

Comment 6

•

11 years ago

bhackett seems to have just started working on lazy bytecode generation (bug 678037).  It would probably make sense for you two to coordinate; perhaps you could give him my review?

Till Schneidereit [:till]

Assignee

Comment 7

•

11 years ago

Comment on attachment 707852 [details] [diff] [review]
Share bytecode and source notes

(In reply to Luke Wagner [:luke] from comment #6)
> bhackett seems to have just started working on lazy bytecode generation (bug
> 678037).  It would probably make sense for you two to coordinate; perhaps
> you could give him my review?

That makes sense, yes.

@bhackett: I don't think that this will immediately affect you too much, but if it does, I'm more than happy to make whatever changes you'd need to make this work better with lazy bytecode.

Attachment #707852 - Flags: review?(luke) → review?(bhackett1024)

Brian Hackett [Laid off!]

Comment 8

•

11 years ago

Till, do you know how much of the code that is common to multiple compartments is ever run?  (script->useCount ever becomes non-zero).  It'd be nice to have a sense for whether this complements lazy bytecode generation (many scripts that run are identical between compartments) or is redundant with it (many scripts that are identical between compartments never run).

Brian Hackett [Laid off!]

Comment 9

•

11 years ago

Comment on attachment 707852 [details] [diff] [review]
Share bytecode and source notes

Review of attachment 707852 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but I'd like to see an updated version of the patch.  Particularly interested in whether the memcmp bug affects the experiments you did in comment 5.

::: js/src/jsscript.cpp
@@ +1504,5 @@
> +jsbytecode *
> +js::SaveScriptBytecode(JSContext *cx, const jsbytecode *bytecode, uint32_t length)
> +{
> +    if (!bytecode)
> +        return NULL;

I don't see any callers that might pass NULL, can you assert bytecode is non-NULL?

@@ +1512,5 @@
> +
> +    ScriptBytecodeTable::AddPtr p = rt->scriptBytecodeTable.lookupForAdd(l);
> +    if (!p) {
> +        size_t size = offsetof(ScriptBytecodeEntry, bytecode) + length;
> +        ScriptBytecodeEntry *entry = (ScriptBytecodeEntry *)cx->malloc_(size);

I think this would be neater and more efficient if SaveScriptBytecode took ownership of its bytecode parameter, storing it directly in the table when adding a new entry or freeing it when the bytecode is a duplicate.  This would mean that ScriptBytecodeEntry would not be heap allocated (HashSet<ScriptBytecodeEntry> rather than HashSet<ScriptBytecodeEntry*>) or embed the bytecode in its structure.

@@ +1532,5 @@
> +    /*
> +     * During the IGC we need to ensure that bytecode is marked whenever it is
> +     * accessed even if the bytecode was already in the table. At this point
> +     * old scripts or exceptions pointing to the bytecode may no longer be
> +     * reachable.

Might want to mention this is a form of read barrier here.

@@ +1843,5 @@
>              return false;
>      }
> +
> +    script->length = prologLength + mainLength;
> +    uint32_t codeLength = sizeof(jsbytecode) * script->length + nsrcnotes * sizeof(jssrcnote);

Can you remove the sizeofs here?  They aren't used in XDRScript and are a bit confusing.

::: js/src/jsscript.h
@@ +1269,5 @@
> +    static HashNumber hash(const Lookup &l) { return mozilla::HashBytes(l.code, l.length); }
> +    static bool match(const ScriptBytecodeEntry *entry, const Lookup &lookup) {
> +        if (entry->length != lookup.length)
> +            return false;
> +        return memcmp(entry->bytecode, lookup.code, lookup.length);

This should be !memcmp, memcmp returns zero when the memmory blocks are equal.

Attachment #707852 - Flags: review?(bhackett1024)

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 10

•

11 years ago

Comment on attachment 707852 [details] [diff] [review]
Share bytecode and source notes

Review of attachment 707852 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscript.cpp
@@ +164,5 @@
>      /* The clone has the same bindingArray_ offset as 'src'. */
>      Bindings &src = srcScript->bindings;
>      ptrdiff_t off = (uint8_t *)src.bindingArray() - srcScript->data;
>      JS_ASSERT(off >= 0);
> +    JS_ASSERT(off <= (srcScript->dataSize));

Less parens?

Till Schneidereit [:till]

Assignee

Comment 11

•

11 years ago

Attached patch Share bytecode and source notes for functions in a runtime wherever possible, v2 (obsolete) — Details — Splinter Review

Addresses comments and adds a memory reporter for the runtime-shared bytecodes.

(In reply to Brian Hackett (:bhackett) from comment #9)
> I think this would be neater and more efficient if SaveScriptBytecode took
> ownership of its bytecode parameter, storing it directly in the table when
> adding a new entry or freeing it when the bytecode is a duplicate.  This
> would mean that ScriptBytecodeEntry would not be heap allocated
> (HashSet<ScriptBytecodeEntry> rather than HashSet<ScriptBytecodeEntry*>) or
> embed the bytecode in its structure.

I did something similar: Instead of passing the bytecode, callers have to malloc instances of ScriptBytecodeEntry which SaveScriptBytecode then takes ownership of. Keeping the bytecode embedded in the structure itself is required to make ScriptBytecodeEntry::fromBytecode work.


> This should be !memcmp, memcmp returns zero when the memmory blocks are
> equal.

Urgh, how embarrasing. That used to be (and is again, now) PodEquals, which returns true on equality, and I didn't refactor it properly. Luckily, this error didn't affect the numbers I mentioned before: I only compared the sizes of the script-data fields, whose decrease was correctly reflected. The error *did* lead to no memory actually being saved, as it caused new entries to be created for each bytecode, which is fixed now.

As for numbers: Our Tp tests apparently don't really contain any scenarios where identical scripts are used in many different compartments, so the automated tests don't show much of an improvement. Some manual testing did, however, show that, on average, the script-data measure sinks to about 1/5 of its previous value, so for each tab of the same site, that's open in parallel, we save about 4/5 of that data's size. The new reporter for the runtime's script-bytecodes confirms that.

Oh, and addons that inject scripts into each site are much less expensive memory-wise, now, too.

Try-servering here:
https://tbpl.mozilla.org/?tree=Try&rev=039c404582c5

Attachment #709504 - Flags: review?(bhackett1024)

Till Schneidereit [:till]

Assignee

Updated

•

11 years ago

Attachment #707852 - Attachment is obsolete: true

Attachment #707852 - Flags: review?(jwalden+bmo)

Mozilla RelEng Bot

Comment 12

•

11 years ago

Try run for 039c404582c5 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=039c404582c5
Results (out of 330 total builds):
    exception: 1
    success: 289
    warnings: 37
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tschneidereit@gmail.com-039c404582c5

Till Schneidereit [:till]

Assignee

Comment 13

•

11 years ago

The test failures are caused by the patches for bug 837416 that were in Inbound when I last rebased. They have been backed out since in rev e2cb5c5978fd.

Looking good otherwise.

Nicholas Nethercote [inactive]

Comment 14

•

11 years ago

> As for numbers: Our Tp tests apparently don't really contain any scenarios
> where identical scripts are used in many different compartments, so the
> automated tests don't show much of an improvement.

Yeah, Talos is crap for testing changes that affect memory consumption.  areweslimyet.com is much better for that.  johns has even implemented a way for you to do run a try build on the AWSY benchmarks.  Unfortunately he hasn't yet written any docs on how to do so, AFAIK :(  Maybe this will provide the necessary motivation...

Till Schneidereit [:till]

Assignee

Comment 15

•

11 years ago

Oh, and I forgot to mention: This new version includes changes to the propery cache to include the JSScript into the hash in addition to the pc and the shape. The hash function might not be ideal, so I'd be happy about suggestions for getting a better distribution there.

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Whiteboard: [MemShrink] → [MemShrink:P1]

Brian Hackett [Laid off!]

Comment 16

•

11 years ago

Comment on attachment 709504 [details] [diff] [review]
Share bytecode and source notes for functions in a runtime wherever possible, v2

Review of attachment 709504 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling review, talked with Till on IRC and it'd be nice to make the atoms vector part of the shared portion, which should improve memory usage further and avoid the need for changes to the property cache.

Attachment #709504 - Flags: review?(bhackett1024)

Till Schneidereit [:till]

Assignee

Comment 17

•

11 years ago

Attached patch Share bytecode, source notes and atoms of functions in a runtime wherever possible, v3 (obsolete) — Details — Splinter Review

Adds the atoms list to the shared things. In light of this, I have renamed the struct to SharedScriptData and fixed all related names accordingly.

Alignment requirements make storing the atoms at the end of SharedScriptData's `data` field a bit annoying, but I think the encapsulation I added makes it not so bad.

The field `JSScript::dataLength` annoys me, so I'll post a follow-up patch removing the need for it by making it possible to derive its value from the `bindings` class.

Attachment #712960 - Flags: review?(bhackett1024)

Till Schneidereit [:till]

Assignee

Updated

•

11 years ago

Attachment #709504 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 18

•

11 years ago

Comment on attachment 712960 [details] [diff] [review]
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v3

Review of attachment 712960 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscript.cpp
@@ +1547,5 @@
> +    if (IsIncrementalGCInProgress(rt) && rt->gcIsFull)
> +        ssd->marked = true;
> +#endif
> +
> +    return ssd->data;

Callers will set script->code to this returned pointer but will not modify script->atoms.  It seems like that will leave script->atoms as a dangling pointer in the case where an existing match was found.  Can you change the interface of this function so that it takes the JSScript*, modifies the ->code and ->atoms fields and just returns a bool?

@@ +1822,5 @@
> +    jsbytecode *code = ssd->data;
> +    code[0] = JSOP_STOP;
> +    code[1] = SRC_NULL;
> +    script->length = 1;
> +    script->code = SaveSharedScriptData(cx, ssd);

Needs a null check.

Attachment #712960 - Flags: review?(bhackett1024)

Till Schneidereit [:till]

Assignee

Comment 19

•

11 years ago

Attached patch Share bytecode, source notes and atoms of functions in a runtime wherever possible, v4 (obsolete) — Details — Splinter Review

Now with proper NULL checking and stuff

Attachment #713007 - Flags: review?(bhackett1024)

Till Schneidereit [:till]

Assignee

Updated

•

11 years ago

Attachment #712960 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 20

•

11 years ago

Comment on attachment 713007 [details] [diff] [review]
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v4

Review of attachment 713007 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jsscript.h
@@ +1250,5 @@
> +    uint32_t length;
> +    jsbytecode data[1];
> +
> +    static SharedScriptData *withDataLengths(JSContext *cx, uint32_t codeLength,
> +                                             uint32_t srcnotesLength, uint32_t natoms);

Missed this earlier, but this method's name should just be 'new_'.  It's not obvious from the name what it's doing.

Attachment #713007 - Flags: review?(bhackett1024) → review+

Till Schneidereit [:till]

Assignee

Comment 21

•

11 years ago

Thanks for the quick turnaround on the reviews!

I pushed to try once more for good measure and will land once that returns green.

http://tbpl.mozilla.org/?tree=Try&rev=993740cfecbf

Nicholas Nethercote [inactive]

Comment 22

•

11 years ago

> I pushed to try once more for good measure and will land once that returns
> green.

Before you do that, some measurements from about:memory would be nice.  Some ideas:

- measure at start-up

- measure after loading techcrunch.com

- measure after running http://gregor-wagner.com/tmp/mem50 (wait until the dialog box pops up saying that it has finished)

Mozilla RelEng Bot

Comment 23

•

11 years ago

Try run for 993740cfecbf is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=993740cfecbf
Results (out of 222 total builds):
    exception: 23
    success: 25
    warnings: 5
    failure: 169
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tschneidereit@gmail.com-993740cfecbf

Till Schneidereit [:till]

Assignee

Comment 24

•

11 years ago

(In reply to Nicholas Nethercote [:njn] from comment #22)
> > I pushed to try once more for good measure and will land once that returns
> > green.
> 
> Before you do that, some measurements from about:memory would be nice.

That makes sense, yes. The try run turned up a crash I have to debug, so I'll do that and do some memory testing before landing.

Till Schneidereit [:till]

Assignee

Comment 25

•

11 years ago

Attached patch Share bytecode, source notes and atoms of functions in a runtime wherever possible, v5 — Details — Splinter Review

Final patch (if the try run at http://tbpl.mozilla.org/?tree=Try&rev=186dba25515c doesn't go awry).

Contains two fixes: one for the crasher in the last try run, the other for not initializing the padding before the atoms list and thus severely reducing cache hits.

Till Schneidereit [:till]

Assignee

Updated

•

11 years ago

Attachment #713007 - Attachment is obsolete: true

Till Schneidereit [:till]

Assignee

Comment 26

•

11 years ago

Comment on attachment 713349 [details] [diff] [review]
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v5

Thanks for making me do the measurements, nnethercote: they were suspiciously bad, so I investigated some and found the bug mentioned in the previous comment. With that fixed, I got the following stats (all values are for the `script-data` fields):

Bug List + 10 Bugs + about:memory; with BugzillaJS installed
nightly: 14.8MB
patched: 05.2MB (1.5MB non-shared + 3.70MB shared)

Startup directly to about:memory; with BugzillaJS installed
nightly: 04.63MB
patched: 02.98MB (0.35MB non-shared + 2.63MB shared)

Startup directly to http://techcrunch.com/2011/07/07/facebook-now-lets-app-developers-see-their-spam-scores/ and about:memory, reload the latter; with BugzillaJS installed
nightly: 09.27MB
patched: 07.86MB (1.03MB non-shared + 6.83MB shared)

Startup directly to techcrunch main page + 3 techcrunch stories and about:memory, reload the latter; with BugzillaJS installed
nightly: 22.94MB
patched: 11.75MB (2.51MB non-shared + 9.24MB shared)

http://gregor-wagner.com/tmp/mem50
nightly: 70.11MB, after closing tabs and reducing memory: 55.80MB
patched: 55.46MB (8.40MB non-shared + 47.06MB shared), after closing and reducing: 47.16MB (6.23MB non-shared + 40.93MB shared)

All in all, I think these are some very nice savings, especially for those scenarios where multiple tabs with content containing the same scripts are open. The fact that mem50 also uses quite a bit less memory demonstrates nicely that the sharing works across domain-boundaries, too.

Attachment #713349 - Flags: review+

Till Schneidereit [:till]

Assignee

Comment 27

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/985508c04c80

Nicholas Nethercote [inactive]

Comment 28

•

11 years ago

> All in all, I think these are some very nice savings

Very much so!  Fantastic work.

One question:  did you confirm that "explicit" went down by similar amounts to "script-data"?  I just want to ensure that the memory saved in "script-data" didn't accidentally end up somewhere else :)

Daniel Holbert [:dholbert]

Updated

•

11 years ago

Blocks: 841132

Till Schneidereit [:till]

Assignee

Comment 29

•

11 years ago

(In reply to Nicholas Nethercote [:njn] from comment #28)
> > All in all, I think these are some very nice savings
> 
> Very much so!  Fantastic work.

Thanks! :)

> 
> One question:  did you confirm that "explicit" went down by similar amounts
> to "script-data"?  I just want to ensure that the memory saved in
> "script-data" didn't accidentally end up somewhere else :)

Yes, I did, and yes, it did. There really wasn't anywhere for it to go to, though ;)

Till Schneidereit [:till]

Assignee

Updated

•

11 years ago

Assignee: general → tschneidereit

Target Milestone: --- → mozilla21

Daniel Holbert [:dholbert]

Comment 30

•

11 years ago

(for future reference: you don't have to set Target Milestone when landing on inbound nowadays -- the sheriff merge tool sets this automagically when the cset is merged to central & the bug is resolved.)

Nicholas Nethercote [inactive]

Comment 31

•

11 years ago

I did some measurements too.

start-up:
- 3.58 --> 3.04 (2.68 + 0.36) == -0.54 MiB

techcrunch.com
- 7.03 --> 6.25 (5.49 + 0.76) == -0.78 MiB

gmail.com
- 10.87 --> 10.89 (9.99 + 0.90) = no change

gmail.com x 3
- 22.89 --> 12.20 (10.20 + 2.00) = -10.79 MiB

10 articles from theage.com.au 
- 27.33 --> 15.63 (8.56 + 7.07) = -11.60 MiB

mem50, just after finishing:
- 69.51 --> 54.67 (46.02 + 8.65) = -14.84 MiB

mem50, after closing all windows and minimizing memory usage *three* times:
- 3.65 --> 3.05 (2.70 + 0.35) = -0.60 MiB

These all roughly match yours, which is good.

But there was no noticeable drop on AWSY, which surprised me.  Looking closely, I see there is a small improvement (this is from the green line):
- 7.51 --> 6.38 (5.58 + 0.80) = -1.13 MiB

AWSY is sufficiently noisy that 1.13 MiB isn't enough to be noticeable.

So that got me wondering why AWSY has only 6--7 MiB of script-data when it finishes and 30 tabs are open, but mem50 has 55--70 MiB when it finishes and 50 tabs are open.  I talked to johns, and it turns out that AWSY's pageset (TP5) contains somewhat old snapshots of sites, and more importantly, that anything that causes external network accesses was deleted(!)  So this means no Facebook/Twitter/GoogleAds/Reddit/etc widgets are loaded.  I hadn't realized that.

Nicholas Nethercote [inactive]

Comment 32

•

11 years ago

In case that wasn't clear:  AWSY showed a small drop, one which is swamped by its noise.  But AWSY isn't a very good test of this change, because it greatly underplays the amount of JS code that real sites use.  The other measurements I did all looked good.

Ed Morley [:emorley]

Comment 33

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/985508c04c80

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Alice0775 White

Updated

•

11 years ago

Depends on: 897507

measurement patch 13 years ago Luke Wagner [:luke] 7.88 KB, patch		Details \| Diff \| Splinter Review
Share bytecode and source notes 11 years ago Till Schneidereit [:till] 23.85 KB, patch		Details \| Diff \| Splinter Review
Share bytecode and source notes for functions in a runtime wherever possible, v2 11 years ago Till Schneidereit [:till] 45.11 KB, patch		Details \| Diff \| Splinter Review
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v3 11 years ago Till Schneidereit [:till] 37.45 KB, patch		Details \| Diff \| Splinter Review
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v4 11 years ago Till Schneidereit [:till] 37.51 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
Share bytecode, source notes and atoms of functions in a runtime wherever possible, v5 11 years ago Till Schneidereit [:till] 37.32 KB, patch	till : review+	Details \| Diff \| Splinter Review