Closed Bug 1851662 Opened 2 years ago Closed 8 months ago

Creating a new Map / Set object is 7x more expensive than creating a new array (and 5x slower than in Chrome)

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
134 Branch
Tracking Status
firefox134 --- fixed

People

(Reporter: mstange, Assigned: jandem)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [sp3][js-perf-next])

Attachments

(3 files)

We spend 0.21% of the overall Speedometer 3 time creating Map and Set objects.
We are about 5x slower than Chrome at it, for example in createDep on TodoMVC-Vue or in _$Ek on TodoMVC-Lit.

Profile of MapObject::create: https://share.firefox.dev/3P07AJw

Allocation is the main problem. For a new Map object, we have three calls to js_pod_arena_malloc.

The following three reduced testcases compare Map/Set construction with JS array construction:

new Map() microbenchmark (attachment 9351654 [details]): 230ms in Firefox, 53ms in Chrome
new Set() microbenchmark (attachment 9351655 [details]): 230ms in Firefox, 53ms in Chrome
[] microbenchmark (attachment 9351656 [details]): 20ms in Firefox, 20ms in Chrome


For the curious, here's a breakdown of all Map / Set operations on sp3: https://share.firefox.dev/3sIG9wt (slow to load)

This bug isn't currently among the highest priorities for sp3-affecting JS work, but I expect it'll bubble to the top of the list eventually.

Attached file [] microbenchmark

I think we do a malloc for the OrderedHashTable and then also for its variable-size table. Maybe we could store the OrderedHashTable in the object itself, similar to inline typed array data. Or we could fuse the two allocations into one. Or allocate it in the nursery if small.

It'd also be good to check how many entries we add in practice; if it's very small we'd ideally handle this without malloc.

Severity: -- → S3
Priority: -- → P3

This also shows up in scheduleUpdateOnFiber on react-redux

The best option is probably to try to fuse the two allocations into one. After that we could try to allocate this single buffer in the nursery if the Map or Set itself is nursery allocated.

Making this depend on bug 1787526 to avoid conflicting with that.

Depends on: 1787526
Blocks: 1895667
Blocks: 1898545

This shows up on Sp3-NewsSite-Nuxt too, where we're 2.5x slower on the following function:

function track(target, type, key) {
  if (shouldTrack && activeEffect) {
    let depsMap = targetMap.get(target);
    if (!depsMap) {
      targetMap.set(target, depsMap = /* @__PURE__ */ new Map());
    }
    let dep = depsMap.get(key);
    if (!dep) {
      depsMap.set(key, dep = createDep());
    }
    trackEffects(dep);
  }
}

It shows up on preMatch in SP3-Editor-TipTap as well where we're 2x slower on the following function:

function preMatch(frag, parentDesc) {
  let curDesc = parentDesc, descI = curDesc.children.length;
  let fI = frag.childCount, matched = /* @__PURE__ */ new Map(), matches2 = [];
  outer:
    while (fI > 0) {
      let desc;
      for (; ; ) {
        if (descI) {
          let next = curDesc.children[descI - 1];
          if (next instanceof MarkViewDesc) {
            curDesc = next;
            descI = next.children.length;
          } else {
            desc = next;
            descI--;
            break;
          }
        } else if (curDesc == parentDesc) {
          break outer;
        } else {
          descI = curDesc.parent.children.indexOf(curDesc);
          curDesc = curDesc.parent;
        }
      }
      let node = desc.node;
      if (!node)
        continue;
      if (node != frag.child(fI - 1))
        break;
      --fI;
      matched.set(desc, fI);
      matches2.push(desc);
    }
  return { index: fI, matched, matches: matches2.reverse() };
}

js-perf-next: See comment 6.

Whiteboard: [sp3] → [sp3][js-perf-next]
Blocks: 1925065

Taking this. I want to see if we can merge the malloc calls and maybe nursery-allocate this buffer if small.

I also want to get some data on the number of elements we store in Map/Set objects, to see if it's worth optimizing for empty maps or sets.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

I am also seeing this problem on Android during TodoMVC-React-Redux where we are 12x slower than Chrome for scheduleWork:

Profile: https://share.firefox.dev/3YiWDHt
https://github.com/mozilla/Speedometer/blob/537d5769ef9ef3fa9a236884eae4d109c0628833//resources/todomvc/architecture-examples/react-redux/dist/app.bundle.js#L5109

function scheduleWork(fiber, expirationTime) {
  if (50 < nestedUpdateCount) throw nestedUpdateCount = 0, rootWithNestedUpdates = null, Error(formatProdErrorMessage(185));
  fiber = markUpdateTimeFromFiberToRoot(fiber, expirationTime);
  if (null !== fiber) {
    var priorityLevel = getCurrentPriorityLevel();
    1073741823 === expirationTime ? (executionContext & LegacyUnbatchedContext) !== NoContext && (executionContext & (RenderContext | CommitContext)) === NoContext ? performSyncWorkOnRoot(fiber) : (ensureRootIsScheduled(fiber), executionContext === NoContext && flushSyncCallbackQueue()) : ensureRootIsScheduled(fiber);
    (executionContext & 4) === NoContext || 98 !== priorityLevel && 99 !== priorityLevel || (null === rootsWithPendingDiscreteUpdates ? rootsWithPendingDiscreteUpdates = new Map([[fiber, expirationTime]]) : (priorityLevel = rootsWithPendingDiscreteUpdates.get(fiber), (void 0 === priorityLevel || priorityLevel > expirationTime) && rootsWithPendingDiscreteUpdates.set(fiber, expirationTime)));
  }
}

They seem to have optimized these cases a lot more in the last 1 year.

Compared to comment #0, on my less powerful Win11 hardware I get 260ms in Nightly, and 33ms in Chrome. So on a slower machine, we are slower(as expected), but they are even faster than last year.

Depends on: 1926525

Bug 1926525 merges 2 of the 3 allocations into a single allocation.

I collected some data for the max number of (used) entries in OrderedHashTable instances used for Map/Set objects when starting Firefox and doing a full Speedometer 3 run:

0 entries: 20738 / 153655 = 13.5%
1 entry: 93990 / 153655 = 61.2%

<= 4 entries: 148983 / 153655 = 97%

This suggests it's worth optimizing for a small number of elements but I think it will be hard to do this with OrderedHashTable.

It might be easier and more efficient to use an array of entries for small map/set objects. This array we could allocate in the nursery too. Once we exceed the max number of items for this array (maybe 4 or 8) we would expand to a full OrderedHashTable. There's some complexity with iterators, but we could migrate to an OrderedHashTable the first time we create an iterator - I still need to get some data on iterator usage too.

We should also try to optimize empty Map/Set objects better by delaying the allocations until we add the first entry.

I prototyped the scheme in comment 13 today and it works, but allocating a full OrderedHashTable when we create an iterator makes it less effective than I wanted: we get rid of about 85% of OrderedHashTable instances on sp3. Still nice but not great. It's hard to optimize that more because of how Map/Set iteration works.' I'm still waiting for perf numbers for this prototype.

I also want to prototype a different idea: see if we can use a native object as backing storage for the hash table instead of using OrderedHashTable. If this works out it would let us simplify a lot of the Map/Set GC interaction. I think this is also more similar to what other engines are doing for Map and Set.

Depends on: 1927395
Depends on: 1927405
Depends on: 1927464
Depends on: 1928666
See Also: → 1800806
Depends on: 1931815
Depends on: 1932305

Creating a new Map/Set object is now fast since it's just allocating a JS object and we can do this directly from JIT code when calling the constructor without arguments. We now allocate the buffer when we add the first entry and this is a single memory allocation that can be in the nursery, instead of three mallocs per object.

On the micro-benchmarks Markus attached we're a bit slower than Chrome (some of it is probably outside the Map/Set code) but about as fast as Safari on my machine. The numbers are pretty noisy but if I change it to add 1 item to the Map instead of 3 (common on Sp3) we might be the fastest.

Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.

Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED

(In reply to Jan de Mooij [:jandem] from comment #15)

Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.

I took a look through the new report and only found one instance, which I filed as bug 1933557. The argument-less constructors now all appear to be fast. Thank you!

Target Milestone: --- → 134 Branch
See Also: → 1799154
Blocks: 1799154
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: