Closed
Bug 939088
Opened 11 years ago
Closed 11 years ago
massive regression in testClosureAssign
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla28
Tracking | Status | |
---|---|---|
firefox25 | --- | unaffected |
firefox26 | --- | unaffected |
firefox27 | --- | wontfix |
firefox28 | --- | verified |
firefox29 | --- | verified |
firefox-esr17 | --- | unaffected |
firefox-esr24 | --- | unaffected |
b2g18 | --- | unaffected |
b2g-v1.1hd | --- | unaffected |
b2g-v1.2 | --- | unaffected |
People
(Reporter: fb+mozdev, Assigned: bhackett1024)
References
()
Details
(Keywords: perf, regression)
Attachments
(3 files)
1.96 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
12.28 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
7.21 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #517164 +++ (In reply to Florian Bender from Bug 517164 comment #9 > "testClosureAssign" seems regressed (first bad: Aurora), this needs a few > more runs and bisection. Last good nightly: 2013-10-11 First bad nightly: 2013-10-12 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6b101d4c6d24&tochange=73f37c7a3860 Does anything stand out? If not, can anyone bisect this further, please?
Reporter | ||
Comment 1•11 years ago
|
||
I only bisected on Mac, so resetting the platform info for now, though I suspect the numbers are the same for other platforms.
OS: All → Mac OS X
Hardware: All → x86_64
Comment 2•11 years ago
|
||
(In reply to Florian Bender from comment #0) > Does anything stand out? If not, can anyone bisect this further, please? Yes, bug 880085 seems related. Requesting needinfo from Brian.
Flags: needinfo?(bhackett1024)
Reporter | ||
Updated•11 years ago
|
status-firefox26:
--- → unaffected
status-firefox27:
--- → affected
status-firefox28:
--- → affected
Assignee | ||
Comment 3•11 years ago
|
||
This test is pretty silly but does point to a perf bottleneck that could affect large applications using run once closures. It generates a run once closure with 14000 variables with an inner function that assigns to each of those variables once. That inner function is called 150 times so we never try to Ion compile it. Baseline does a fine job generating ICs for it, and 90% of the time taken here is before we even baseline compile, almost all of it warming up in the interpreter. That time taken has increased because the interpreter needs to keep track of type side effects on the object (since bug 880085 unbroke run once closure optimizations), which requires it to compute the property name being accessed. That computation incurs a linear search on the call object shape, which takes a while since there are 14000 entries in it. The attached patch is a bandaid to avoid the linear search in cases when we know the call object's types don't need to be updated, i.e. when no code under it has been compiled. This improves the testClosureAssign() case from 1560 to 95 for me, but won't have much effect in more diffuse large applications where we baseline/ion compile some inner code and delazify the type but still do a lot of interpreting in cold code. I'd like to do a followup patch to add a cache in the runtime mapping slots to names for singleton scope objects, similar to the source note cache. (This seems a better use of memory than attaching the name to every SETALIASEDVAR op.)
Assignee: nobody → bhackett1024
Attachment #832936 -
Flags: review?(luke)
Flags: needinfo?(bhackett1024)
Updated•11 years ago
|
Blocks: 880085
status-b2g18:
--- → unaffected
status-b2g-v1.1hd:
--- → unaffected
status-b2g-v1.2:
--- → unaffected
status-firefox25:
--- → unaffected
status-firefox-esr17:
--- → unaffected
status-firefox-esr24:
--- → unaffected
tracking-firefox27:
--- → ?
tracking-firefox28:
--- → ?
Assignee | ||
Comment 4•11 years ago
|
||
Attachment #8333581 -
Flags: review?(luke)
Comment 5•11 years ago
|
||
Instead of adding a whole new mechanism that, as you said, doesn't really solve the problem for realistic workloads, can we just blow another 4 bytes on JOF_SCOPECOORD ops and embed the atom? Actually, with bug 932276, the only remaining use of the 4 byte blockIndex current embedded in JOF_SCOPECOORD ops is from Ion/Baseline JITs, so we could use the O(n) GetBlockChainAtPC added by bug 932276 and remove the blockIndex.
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #5) > Instead of adding a whole new mechanism that, as you said, doesn't really > solve the problem for realistic workloads, can we just blow another 4 bytes > on JOF_SCOPECOORD ops and embed the atom? > > Actually, with bug 932276, the only remaining use of the 4 byte blockIndex > current embedded in JOF_SCOPECOORD ops is from Ion/Baseline JITs, so we > could use the O(n) GetBlockChainAtPC added by bug 932276 and remove the > blockIndex. I think the cache mechanism will work fine for realistic workloads; it's the first patch that's just a bandaid. Embedding the atoms directly will be a good deal less efficient on memory than the cache; it's not just the extra four bytes for the atom index, but also (usually) a new word-sized entry in the script's atoms table.
Comment 7•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #6) comment 5 explains how to avoid increasing the bytecode size. To avoid increasing atoms-table size, we can use the index of the aliased var's Bindings::bindingArray. Before doing either, though, we should see how prevalent aliased-var opcodes even are or how many extra atoms-table entries would added; this might only be a few KB we're talking about here.
Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #7) > (In reply to Brian Hackett (:bhackett) from comment #6) > comment 5 explains how to avoid increasing the bytecode size. Well, I'm confused now. Comment 5 sounds like an independent optimization which would allow shortening the bytecode length by 4 bytes; if we did that and also used this cache then the bytecode size would be improved over the current state of things.
Comment 9•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #8) > Well, I'm confused now. Comment 5 sounds like an independent optimization If we removed the block index *without* embedding the var name's index in the opcode then we'd add a second O(n) lookup in ScopeCoordinateName with the same potential for pathological performance we're seeing here. If we remove the block index *and* add the atom index, then ScopeCoordinateName won't need to call GetBlockChainAtPC so ScopeCoordinateName will become O(1).
Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #9) > (In reply to Brian Hackett (:bhackett) from comment #8) > If we removed the block index *without* embedding the var name's index in > the opcode then we'd add a second O(n) lookup in ScopeCoordinateName with > the same potential for pathological performance we're seeing here. If we > remove the block index *and* add the atom index, then ScopeCoordinateName > won't need to call GetBlockChainAtPC so ScopeCoordinateName will become O(1). Does GetBlockChainAtPC need to be O(n)? Looking at the code it is doing a linear search of what looks like a sorted array and could be doing a binary search instead.
Comment 11•11 years ago
|
||
Hmm, that's a nice thought, probably so. Still, embedding the name in the bytecode is the obvious/simple strategy -- it's what one would expect when reading the code -- so I think we need to justify the complexity of adding a caching mechanism with some memory measurements. If the savings are significant, great. It'll also justify the work to remove blockIndex too.
Comment 12•11 years ago
|
||
Naveed, does this FF27 perf regression have any major user or external test impact? If this is a critical test, shouldn't we be testing more frequently?
Flags: needinfo?(nihsanullah)
Comment 13•11 years ago
|
||
Alex this is not a general case issue. The issues are confined to a few specific unusual test cases that are not indicative of real world web apps.
Flags: needinfo?(nihsanullah)
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #11) > Still, embedding the name in the bytecode is the obvious/simple strategy -- > it's what one would expect when reading the code -- so I think we need to > justify the complexity of adding a caching mechanism with some memory > measurements. If the savings are significant, great. It'll also justify > the work to remove blockIndex too. Running the asm.js bullet test with --no-asmjs, 26% of the bytecode we emit (194k out of 759k) is used up by aliasedvar ops.
Comment 15•11 years ago
|
||
Comment on attachment 832936 [details] [diff] [review] bandaid Well, that is non-trivial. I measured a few more sites and saw percentages from 6% (GMail) to 18% (Facebook). Still, it feels strange to add a whole cache for a pathological case (tons of closed-over variables and tons of accesses that don't get baseline-compiled) when it doesn't really fix the problem (X accesses to X unique closed-over variables will miss X times; also performing aliased var accesses in alternating functions will invalidate the cache each time). If you want to land this first band-aid patch, that's fine, but I think the cache is just a bigger bandaid. If we want to fix the problem of "it's slow to find the name of an aliased var", I think we should do it fully and embed the atom index. We can trade out the blockIndex and avoid any regression. But I'd like to see this problem show up in a non-synthetic situation first.
Attachment #832936 -
Flags: review?(luke) → review+
Updated•11 years ago
|
Attachment #8333581 -
Flags: review?(luke)
Assignee | ||
Comment 16•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #15) > Still, it feels strange to add a whole cache for a pathological case (tons > of closed-over variables and tons of accesses that don't get > baseline-compiled) when it doesn't really fix the problem (X accesses to X > unique closed-over variables will miss X times; also performing aliased var > accesses in alternating functions will invalidate the cache each time). I don't think you understand what this cache's behavior is. This cache has benefits even in code we eventually baseline compile --- per comment 3 we baseline compile this script fine, but eat a huge amount of time doing linear lookups while we're still in the interpreter. Use of this cache will also benefit lookups during baseline and Ion compilation; even making one pass over scripts with these aliased vars (the interp only does 10) will) still hurts. This does fix the problem of X accesses to X closed over vars when those closed over vars are associated with the same scope object, and while it would repeatedly invalidate when code is visiting different large call objects over and over the resulting behavior will not be any worse asymptotically than what the code is currently doing. > If you want to land this first band-aid patch, that's fine, but I think the > cache is just a bigger bandaid. If we want to fix the problem of "it's slow > to find the name of an aliased var", I think we should do it fully and embed > the atom index. We can trade out the blockIndex and avoid any regression. > But I'd like to see this problem show up in a non-synthetic situation first. When is a cache *anything* except a bandaid of some size? The reason we have caches is to trade off speed and memory concerns. We already know that the size of aliased var ops can be quite large so reducing the size of those ops while speeding up their accesses in common cases seems an entirely reasonable thing to do, even if it requires *gasp* a cache. It's pretty clear this problem can happen in non-synthetic situations, as comment 3 describes, all you need is a big app inside a closure (e.g. all asm.js style apps fit this pattern).
Assignee | ||
Updated•11 years ago
|
Attachment #8333581 -
Flags: review?(luke)
Comment 17•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #16) > This does fix the problem of X accesses to X closed over vars > when those closed over vars are associated with the same scope object, and Ah, I misread the code; I thought the cache was lazily adding (id, slot) pairs as they were accessed, but I see now the cache eagerly populates all (id, slot) pairs on first access. (Perhaps you can see why I was hesitant under this assumption.) However, this seems to suggest a new realistic pathologically scenario: a program with several large modules that calls back and forth between them can incur an O(n) purge/fill at each call. This could produce much worse behavior than the status quo. This actually seems likely to occur in practice given the popular requireJS/program-composed-of-many-modules style. This is of course is fixable by having a two-level map (shape->(id->slot)), if you want to go there. > even if it requires *gasp* a cache. Save the snark. > It's pretty clear this problem can happen in non-synthetic situations, as comment 3 > describes, all you need is a big app inside a closure (e.g. all asm.js style > apps fit this pattern). Also, a bunch of closed-over variables and a bunch of accesses in functions that don't get quickly ICed. asm.js has highly-aliased variables, but a program-independent fixed quantity (~139, far less than testClosureAssign). That being said, of course it would be nice to solve this problem w/o using more memory. With the abovementioned problem fixed as suggested or otherwise, I think the cache *would* be quite solid and worth the added complexity.
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #17) > However, this seems to suggest a new realistic pathologically scenario: a > program with several large modules that calls back and forth between them > can incur an O(n) purge/fill at each call. This could produce much worse > behavior than the status quo. This actually seems likely to occur in > practice given the popular requireJS/program-composed-of-many-modules style. > > This is of course is fixable by having a two-level map (shape->(id->slot)), > if you want to go there. Right now we eat an O(n) lookup for every single access, so I don't see how purging/filling the cache when calling between modules is 'much worse' than the status quo. Using the cache would be asymptotically the same, with a not much worse constant factor, and anyhow most accesses will typically be within the same module as the previous access, not a different one. > Save the snark. Sorry, but you've been dragging this bug out for a week and a half now when the solution I came up with initially is perfectly fine. I'd like to be spending my time on more interesting problems.
Comment 19•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #18) > Right now we eat an O(n) lookup for every single access, so I don't see how > purging/filling the cache when calling between modules is 'much worse' than > the status quo. Using the cache would be asymptotically the same, with a > not much worse constant factor, and anyhow most accesses will typically be > within the same module as the previous access, not a different one. Ideally, yes. But I was thinking about the case I explained with small functions calling each other. Also, the constant different between N linked-list hops and N hash-table insertions/deletions isn't exactly small nor are aliased variable accesses necessarily randomly distributed. This is all in the pathological realm of "tons of aliased variables" so I guess don't really know. I'll drop it. > Sorry, but you've been dragging this bug out for a week and a half now when > the solution I came up with initially is perfectly fine. I'd like to be > spending my time on more interesting problems. I've had a sequence of questions, which I think are quite valid. This is not dragging feet; this is a reticence to add complexity to an area that is already somewhat confusing to those unfamiliar without being quite clear about we're getting. I'm sorry for the latency, but I'm also doing things and this bug doesn't exactly motivate the context switch. I'll get to it now.
Comment 20•11 years ago
|
||
On the question of "is this worth it?" the initial measurements we did in comment 14/15 still leave open the question of what the actual impact to browser memory usage would be. This is really easy to measure with the attached patch to about:memory. I get these results, where "Saved" is the accumulation of the per-SharedScriptData number of aliased vars * sizeof(uint32_t): Site Saved script-data explicit Saved as % of script-data Saved as % of explicit fresh browser 44K 3.7M 102M 1.2% .04% amazon.com 52K 4.5M 163M 1.2% .03% gmail.com 88K 6.7M 210M 1.3% .04% cnn.com article 94K 5.8M 165M 1.6% .05% facebook.com 106K 5.7M 250M 1.8% .04% techcrunch.com article 114K 5.4M 150M 2.1% .08% turbulenz.com games 115K 6.7M 888M 1.7% .01% Citadel (w/o asm.js) 1.3M 18.9M 1.2G 6.8% .11% I'm probably biased by this point, but the cache doesn't seem like a particular win over embedding the atomsIndex. Do you still think it is?
Comment 21•11 years ago
|
||
cc'ing njn and jorendorff in case they have an opinion on this tradeoff.
Comment 22•11 years ago
|
||
(In reply to Naveed Ihsanullah [:naveed] from comment #13) > Alex this is not a general case issue. The issues are confined to a few > specific unusual test cases that are not indicative of real world web apps. Given this, we are not going to track for release. But if a low risk uplift is found, feel free to uplift!
tracking-firefox27:
? → ---
tracking-firefox28:
? → ---
Assignee | ||
Comment 23•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #20) > I'm probably biased by this point, but the cache doesn't seem like a > particular win over embedding the atomsIndex. Do you still think it is? Thanks for measuring. I think a simple patch to save .05% of explicit is definitely worth taking. It's only a tiny fraction of the effectiveness of a big project like objshrink, but there aren't many such projects left to be done and the little stuff still counts.
Updated•11 years ago
|
Attachment #8333581 -
Flags: review?(luke) → review+
Assignee | ||
Comment 24•11 years ago
|
||
I'll put together a patch to remove the blockIndex on the aliasedvar ops. https://hg.mozilla.org/integration/mozilla-inbound/rev/62e94f70b2cd
Comment 25•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/62e94f70b2cd
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Reporter | ||
Comment 26•11 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #24) > I'll put together a patch to remove the blockIndex on the aliasedvar ops. Will you file a new bug, or do it as part of this bug? Then this bug should be re- and [leave open]ed.
Flags: needinfo?(bhackett1024)
Updated•11 years ago
|
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Florian Bender from comment #26) > (In reply to Brian Hackett (:bhackett) from comment #24) > > I'll put together a patch to remove the blockIndex on the aliasedvar ops. > > Will you file a new bug, or do it as part of this bug? Then this bug should > be re- and [leave open]ed. Filed bug 944930.
Flags: needinfo?(bhackett1024)
Comment 28•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #21) > cc'ing njn and jorendorff in case they have an opinion on this tradeoff. No opinion. Thanks for obtaining actual numbers. I don't have anything to compare them to, though. We do a ton of work on stuff that wins a few percent on some vanishingly small proportion of the web. This wins a vanishingly small percent on pretty much everything. The future portion of the cost of this patch still seems possibly worth taking and possibly not.
Comment 29•11 years ago
|
||
Florian, can you please verify this is fixed now in Firefox 28?
Flags: needinfo?(fb+mozdev)
Reporter | ||
Comment 30•11 years ago
|
||
I got about 180 now (see Bug 517164 comment #9).
Updated•11 years ago
|
Comment 31•10 years ago
|
||
Brian is away on PTO until February. Luke, could you nominate this for Firefox 27 Beta uplift?
Flags: needinfo?(luke)
Comment 32•10 years ago
|
||
In agreement with comment 13, I'm not sure this problem is so significant as to require uplift.
Flags: needinfo?(luke)
Comment 33•10 years ago
|
||
Thanks Luke, marking this as wontfix for Firefox 27.
You need to log in
before you can comment on or make changes to this bug.
Description
•