Closed Bug 1951338 Opened 14 days ago Closed 23 hours ago

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED FIXED
138 Branch
Tracking Status
firefox138 --- fixed

People

(Reporter: mayankleoboy1, Assigned: iain)

References

(Blocks 1 open bug, )

Details

Attachments

(3 files)

Attached file sample number

I see very little GC time in this profile.

We spend ~25% of our time in t.times, whereas V8 spends ~1.5%. Most of that time is in AddOrUpdateSparseElementHelper and SetElementMegamorphic. I think this is a case where our existing sparse element code is not good enough.

Here's the code for t.times. The minification doesn't make it particularly easy to figure out what's going on:

  t.mul = t.times = function(n) {
    var t, s = this,
      h = s.constructor,
      f = s.c,
      e = (n = new h(n))
      .c,
      o = f.length,
      i = e.length,
      u = s.e,
      r = n.e;
    if (n.s = s.s == n.s ? 1 : -1, !f[0] || !e[0]) return new h(n.s * 0);
    for (n.e = u + r, o < i && (t = f, f = e, e = t, r = o, o = i, i = r), t = new Array(r = o +
      i); r--; t[r] = 0);
    for (u = i; u--;) {
      for (i = 0, r = o + u; r > u;) i = t[r] + e[u] * f[r - u - 1] + i, t[r--] = i % 10, i = i / 10 | 0;
      t[r] = (t[r] + i) % 10
    }
    for (i && ++n.e, t[0] || t.shift(), u = t.length; !t[--u]; t.pop());
    return n.c = t, n
  };
Priority: -- → P3

Ah, the non-minified version is here: https://github.com/MikeMcl/big.js/blob/master/big.js

I think the culprit is this line of code:

 for (c = new Array(j = a + b); j--;) c[j] = 0;

This initializes a new array and then iterates through it in reverse order, setting its elements to 0. Pretty quickly, the creation of an array with only one element exceeds our thresholds for dense storage: if the array length is >1000 and less than 1/8 of indices are filled in, then we go sparse. By starting at the highest element, we hit that case as quickly as possible. It's possible that we redensify as we fill things in, but the damage is already done. (I don't see any redensification in the profile, so maybe it doesn't kick in for some reason).

As the input gets longer, the effect of this problem gets worse, so a 26K digit input is slow.

I looked a little bit at the V8 sparse array heuristics in this comment. Revisiting that, one thing I notice is that they only enforce those heuristics when adding elements to an existing array. It turns out that V8 imposes basically no restrictions on how large an array they are willing to allocate for new Array(n). They have a limit around 16K, but it turns out that's just the limit for how large an array they are willing to allocate from jitcode. Testing in rr, it looks like they're willing to immediately allocate a dense array of any length up to the maximum length of a dense array (kMaxFastArrayLength, which is 0x2000000, or roughly 33M elements, pretty close to our MAX_DENSE_ELEMENTS_COUNT).

In comparison, our limit for eager allocation is 2046, which AFAICT hasn't been changed in 11 years. There's a lot of room between 2k and 33M for us to relax this restriction, which might help this code.

It's slightly more complicated than just bumping EagerAllocationMaxLength. The problem is that new Array(N) creates an array with length and capacity equal to N, but initLength of 0: we've allocated enough space for N elements, but they are not yet initialized. We could initialize them eagerly, but then we'd have to set the non-packed flag on the elements, which would make things worse in the common case where the array is initialized from index 0 to index N.

To make this work, I think we'd have to modify the existing StoreDenseElementHole IC to handle the case where initLength < index < capacity by a) looping from initLength to index and writing the magic hole value, b) updating initLength, and c) setting the non-packed flag. As I write this, I'm realizing that it sounds familiar: we already have bug 1804104 to track doing that.

Depends on: 1804104
Summary: Demo at https://www.elmismopancho.cl/demos/github/everythingformula/index.html is 1.2x slower in Nightly and spends tons of time in MinorGC. → Demo at https://www.elmismopancho.cl/demos/github/everythingformula/index.html is 1.2x slower in Nightly

With the try build from here, this is the profile : https://share.firefox.dev/43I6yL1 (3m47s)
So a 27% improvement (5m10s -> 3m47s)

This seemed like a reasonable place to draw the line. Combined with the previous patch, code like this:

let arr = new Array(50000);
for (let i = arr.length - 1; i >= 0; i--) {
  arr[i] = 0;
}

is 9-10x faster.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

I originally intended to reset the NON_PACKED flag in this code if there were no holes, but after a closer look, I realized that we exit early from the loop once we've seen enough non-holes, but resetting the flag would require us to examine every element.

Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9fee213591f6 Bump EagerAllocationMaxLength r=jandem https://hg.mozilla.org/integration/autoland/rev/28d7ac36c32b Add fast path for packed elements in willBeSparseElements r=jandem
Status: ASSIGNED → RESOLVED
Closed: 23 hours ago
Resolution: --- → FIXED
Target Milestone: --- → 138 Branch

Either this bug or Bug 1804104 lead to some improvements:

SP3:
Windows: 2.8% on SP3-perf-Dashboard-Total , which was driven by 7.2% improvement on SP3-perf-Dashboard-SelectingPoints-total
Android: 4.8% on Android SP3-perf-Dashboard-SelectingPoints-total

Sunspider
54% on Sunspider-access-nsieve (Probably due to bug 1804104)

Jetstream2
7.6% on Segmentation
3% on Jetstream2-Lebab-wtb

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: