Demo at https://www.elmismopancho.cl/demos/github/everythingformula/index.html is 1.2x slower in Nightly
Categories
(Core :: JavaScript Engine, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox138 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: iain)
References
(Blocks 1 open bug, )
Details
Attachments
(3 files)
Go to https://www.elmismopancho.cl/demos/github/everythingformula/index.html
Enter the sample number and press enter.
Nightly: https://share.firefox.dev/41C7yix (5m10s)
Chrome: https://share.firefox.dev/3EXL4jg (4m)
Assignee | ||
Comment 1•13 days ago
|
||
I see very little GC time in this profile.
We spend ~25% of our time in t.times, whereas V8 spends ~1.5%. Most of that time is in AddOrUpdateSparseElementHelper and SetElementMegamorphic. I think this is a case where our existing sparse element code is not good enough.
Here's the code for t.times. The minification doesn't make it particularly easy to figure out what's going on:
t.mul = t.times = function(n) {
var t, s = this,
h = s.constructor,
f = s.c,
e = (n = new h(n))
.c,
o = f.length,
i = e.length,
u = s.e,
r = n.e;
if (n.s = s.s == n.s ? 1 : -1, !f[0] || !e[0]) return new h(n.s * 0);
for (n.e = u + r, o < i && (t = f, f = e, e = t, r = o, o = i, i = r), t = new Array(r = o +
i); r--; t[r] = 0);
for (u = i; u--;) {
for (i = 0, r = o + u; r > u;) i = t[r] + e[u] * f[r - u - 1] + i, t[r--] = i % 10, i = i / 10 | 0;
t[r] = (t[r] + i) % 10
}
for (i && ++n.e, t[0] || t.shift(), u = t.length; !t[--u]; t.pop());
return n.c = t, n
};
Assignee | ||
Comment 2•13 days ago
|
||
Ah, the non-minified version is here: https://github.com/MikeMcl/big.js/blob/master/big.js
I think the culprit is this line of code:
for (c = new Array(j = a + b); j--;) c[j] = 0;
This initializes a new array and then iterates through it in reverse order, setting its elements to 0. Pretty quickly, the creation of an array with only one element exceeds our thresholds for dense storage: if the array length is >1000 and less than 1/8 of indices are filled in, then we go sparse. By starting at the highest element, we hit that case as quickly as possible. It's possible that we redensify as we fill things in, but the damage is already done. (I don't see any redensification in the profile, so maybe it doesn't kick in for some reason).
As the input gets longer, the effect of this problem gets worse, so a 26K digit input is slow.
I looked a little bit at the V8 sparse array heuristics in this comment. Revisiting that, one thing I notice is that they only enforce those heuristics when adding elements to an existing array. It turns out that V8 imposes basically no restrictions on how large an array they are willing to allocate for new Array(n)
. They have a limit around 16K, but it turns out that's just the limit for how large an array they are willing to allocate from jitcode. Testing in rr, it looks like they're willing to immediately allocate a dense array of any length up to the maximum length of a dense array (kMaxFastArrayLength, which is 0x2000000, or roughly 33M elements, pretty close to our MAX_DENSE_ELEMENTS_COUNT).
In comparison, our limit for eager allocation is 2046, which AFAICT hasn't been changed in 11 years. There's a lot of room between 2k and 33M for us to relax this restriction, which might help this code.
Assignee | ||
Comment 3•12 days ago
|
||
It's slightly more complicated than just bumping EagerAllocationMaxLength. The problem is that new Array(N)
creates an array with length and capacity equal to N, but initLength of 0: we've allocated enough space for N elements, but they are not yet initialized. We could initialize them eagerly, but then we'd have to set the non-packed flag on the elements, which would make things worse in the common case where the array is initialized from index 0 to index N.
To make this work, I think we'd have to modify the existing StoreDenseElementHole IC to handle the case where initLength < index < capacity by a) looping from initLength to index and writing the magic hole value, b) updating initLength, and c) setting the non-packed flag. As I write this, I'm realizing that it sounds familiar: we already have bug 1804104 to track doing that.
Reporter | ||
Comment 4•3 days ago
|
||
With the try build from here, this is the profile : https://share.firefox.dev/43I6yL1 (3m47s)
So a 27% improvement (5m10s -> 3m47s)
Assignee | ||
Comment 5•3 days ago
|
||
This seemed like a reasonable place to draw the line. Combined with the previous patch, code like this:
let arr = new Array(50000);
for (let i = arr.length - 1; i >= 0; i--) {
arr[i] = 0;
}
is 9-10x faster.
Updated•3 days ago
|
Assignee | ||
Comment 6•3 days ago
|
||
I originally intended to reset the NON_PACKED flag in this code if there were no holes, but after a closer look, I realized that we exit early from the loop once we've seen enough non-holes, but resetting the flag would require us to examine every element.
Comment 8•23 hours ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9fee213591f6
https://hg.mozilla.org/mozilla-central/rev/28d7ac36c32b
Reporter | ||
Comment 9•17 hours ago
|
||
Either this bug or Bug 1804104 lead to some improvements:
SP3:
Windows: 2.8% on SP3-perf-Dashboard-Total , which was driven by 7.2% improvement on SP3-perf-Dashboard-SelectingPoints-total
Android: 4.8% on Android SP3-perf-Dashboard-SelectingPoints-total
Sunspider
54% on Sunspider-access-nsieve (Probably due to bug 1804104)
Description
•