The Ember Render Complex List test allocates more than 300,000 external strings. An allocation log shows many sequences like this: ... http://www.w3.org/1999/xhtml INPUT ... The first one (the namespace URI) is allocated more than 150,000 times! My logging indicates this pattern (same external strings allocated repeatedly) also shows up on websites like GDocs. XPConnect has a 1-entry external string cache. I think it would be nice to move this into SpiderMonkey and experiment with different sizes for it (for instance 4-8 entries should have negligible overhead but be very effective in a lot of cases). It's also pretty common to see the same string chars but different string buffers on the Gecko side, so the current cache doesn't catch it. We should fix that too.
Created attachment 8855284 [details] [diff] [review] Patch Removes the ZoneStringCache from Gecko and adds a per-Zone external string cache to SpiderMonkey. The cache is purged on GC. The cache has 4 entries. This is enough to handle the Ember case (see comment 0, we get a 99% hit rate there). It also works well on other sites. I opened Gmail, Google Docs, Twitter, etc, and I got the following results for the cache: Total: 35120 Miss: 14985 Hits total: 20135 - position 0: 9987 - position 1: 4262 - position 2: 3117 - position 3: 2769 Note that the 3 extra entries improve the hit rate from 28.4% to 57.3%. The cache doesn't just compare the string pointers, it also compares the string chars if the length matches. This makes the cache a lot more effective: of these cache hits, 35% were based on the char comparison. The cache may add some overhead to external string allocation, but I think it should be pretty small and worth it in most cases. r? arai for the JS changes, r? bz for the rest (mostly code removal).
Comment on attachment 8855284 [details] [diff] [review] Patch Review of attachment 8855284 [details] [diff] [review]: ----------------------------------------------------------------- happy to see the API gets improved :D
I was pretty worried about this making things slower due to an extra out of line call before we check the cache, but NonVoidStringToJsval is not inlined without these changes (at least on Mac) and is inlined with, so no change there. The changes in this bug do help a good bit in cases involving a few different values (tested by removing the [Pure] annotation from getAttribute and then running the testcase I'm about to attach), but we're still somewhat slower than Chrome and way (5-6x) slower than Safari. I seem to recall that Safari simply uses a unified string representation for WebKit and JSC.....
Comment on attachment 8855284 [details] [diff] [review] Patch I don't understand the GC interaction here. We purge the cache when marking _starts_, but what if the cache gets filled when marking is ongoing? Why did we use to need to barrier (JS::StringReadBarrier) but the new code doesn't need barriers? Might be worth at least documentation. I'm a little worried about the PodEqual bit. If the length is large enough, this could be more expensive than just allocating a new external string, right? Is it worth adding some sort of length cliff where once we're over it we stop doing the PodEqual bit? Anyway, the xpconnect bits look fine to me, but I'd like to understand the answers to the questions above.
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #4) > I don't understand the GC interaction here. We purge the cache when marking > _starts_, but what if the cache gets filled when marking is ongoing? Why > did we use to need to barrier (JS::StringReadBarrier) but the new code > doesn't need barriers? Might be worth at least documentation. The barrier is not needed because we purge the cache at the start of the GC and any string we pull out of the cache must have been allocated *after* the GC started. Incremental barriers are not needed for things allocated after the GC started. Other caches do similar things - I'll add a comment to the lookup function to document this. > I'm a little worried about the PodEqual bit. If the length is large enough, > this could be more expensive than just allocating a new external string, > right? Is it worth adding some sort of length cliff where once we're over > it we stop doing the PodEqual bit? When both strings are large, in most cases either (1) the lengths won't match, or (2) the strings will be so different that the PodEqual will bail immediately. But true, there might be some pathological cases where we allocate huge strings that are different but have a common "prefix". Do you think a threshold of 50 or 100 chars or so would be reasonable? I can do some measurements tomorrow to see how different thresholds affect the hit rate. Safari is not loop hoisting this btw? 5-6x slower is quite a lot.. I wonder how much of that is the string representation.
> Safari is not loop hoisting this btw? An empty loop in Safari on my hardware takes about 2ns per iteration. A loop with a single getAttribute() call in it takes 7ns. That said, I just poked a bit at their source and their IDL has: [DOMJIT=ReadDOM] DOMString? getAttribute(DOMString name); in Source/WebCore/dom/Node.idl. And the corresponding commit at https://trac.webkit.org/changeset/208320/webkit does talk about LICM. So it's hard to say. :( > When both strings are large, in most cases either (1) the lengths won't match, or > (2) the strings will be so different that the PodEqual will bail immediately. I'm thinking repeated gets of XHR responseText from gecko. But then the pointer would match anyway... So maybe this is not a big worry. > Do you think a threshold of 50 or 100 chars or so would be reasonable? At a guess, yes.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/c1cda5e8a20e Move the external string cache into the JS engine and improve it. r=arai,bz
I went with a max length of 100 for the PodEqual part. I added some logging and it shows this should still handle most strings the previous version handled. Lengths like 400 and much bigger ones (say 50,000 chars) do show up - these will no longer match but that seems fine considering this is a new optimization. Most of the hits from PodEqual are lengths <= 100 so this will still be a win in these cases.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cdd87ce7a015 followup - Also purge external string cache after compacting GC. r=orange