Closed Bug 1851662 Opened 2 years ago Closed 9 months ago

Creating a new Map / Set object is 7x more expensive than creating a new array (and 5x slower than in Chrome)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

134 Branch

Tracking Flags:

Tracking

Status

firefox134

---

fixed

People

(Reporter: mstange, Assigned: jandem)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [sp3][js-perf-next])

Attachments

(3 files)

new Map() microbenchmark 2 years ago Markus Stange [:mstange] 841 bytes, text/html		Details
new Set() microbenchmark 2 years ago Markus Stange [:mstange] 823 bytes, text/html		Details
[] microbenchmark 2 years ago Markus Stange [:mstange] 982 bytes, text/html		Details

Markus Stange [:mstange]

Reporter

Description

•

2 years ago

•

Edited

We spend 0.21% of the overall Speedometer 3 time creating Map and Set objects.
We are about 5x slower than Chrome at it, for example in createDep on TodoMVC-Vue or in _$Ek on TodoMVC-Lit.

Profile of MapObject::create: https://share.firefox.dev/3P07AJw

Allocation is the main problem. For a new Map object, we have three calls to js_pod_arena_malloc.

The following three reduced testcases compare Map/Set construction with JS array construction:

new Map() microbenchmark (attachment 9351654 [details]): 230ms in Firefox, 53ms in Chrome
new Set() microbenchmark (attachment 9351655 [details]): 230ms in Firefox, 53ms in Chrome
[] microbenchmark (attachment 9351656 [details]): 20ms in Firefox, 20ms in Chrome

For the curious, here's a breakdown of all Map / Set operations on sp3: https://share.firefox.dev/3sIG9wt (slow to load)

This bug isn't currently among the highest priorities for sp3-affecting JS work, but I expect it'll bubble to the top of the list eventually.

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-550

Markus Stange [:mstange]

Reporter

Comment 1

•

2 years ago

Attached file new Map() microbenchmark — Details

Markus Stange [:mstange]

Reporter

Comment 2

•

2 years ago

Attached file new Set() microbenchmark — Details

Markus Stange [:mstange]

Reporter

Comment 3

•

2 years ago

Attached file [] microbenchmark — Details

Jan de Mooij [:jandem]

Assignee

Comment 4

•

2 years ago

I think we do a malloc for the OrderedHashTable and then also for its variable-size table. Maybe we could store the OrderedHashTable in the object itself, similar to inline typed array data. Or we could fuse the two allocations into one. Or allocate it in the nursery if small.

It'd also be good to check how many entries we add in practice; if it's very small we'd ideally handle this without malloc.

Will Medina [:willyelm]

Updated

•

2 years ago

Severity: -- → S3

Priority: -- → P3

Jeff Muizelaar [:jrmuizel]

Comment 5

•

2 years ago

This also shows up in scheduleUpdateOnFiber on react-redux

Jan de Mooij [:jandem]

Assignee

Comment 6

•

1 years ago

The best option is probably to try to fuse the two allocations into one. After that we could try to allocate this single buffer in the nursery if the Map or Set itself is nursery allocated.

Making this depend on bug 1787526 to avoid conflicting with that.

Depends on: 1787526

Jan de Mooij [:jandem]

Assignee

Updated

•

1 year ago

Blocks: 1895667

Jan de Mooij [:jandem]

Assignee

Updated

•

1 year ago

Blocks: 1898545

Jeff Muizelaar [:jrmuizel]

Comment 7

•

1 year ago

•

Edited

This shows up on Sp3-NewsSite-Nuxt too, where we're 2.5x slower on the following function:

function track(target, type, key) {
  if (shouldTrack && activeEffect) {
    let depsMap = targetMap.get(target);
    if (!depsMap) {
      targetMap.set(target, depsMap = /* @__PURE__ */ new Map());
    }
    let dep = depsMap.get(key);
    if (!dep) {
      depsMap.set(key, dep = createDep());
    }
    trackEffects(dep);
  }
}

Jeff Muizelaar [:jrmuizel]

Comment 8

•

1 year ago

It shows up on preMatch in SP3-Editor-TipTap as well where we're 2x slower on the following function:

function preMatch(frag, parentDesc) {
  let curDesc = parentDesc, descI = curDesc.children.length;
  let fI = frag.childCount, matched = /* @__PURE__ */ new Map(), matches2 = [];
  outer:
    while (fI > 0) {
      let desc;
      for (; ; ) {
        if (descI) {
          let next = curDesc.children[descI - 1];
          if (next instanceof MarkViewDesc) {
            curDesc = next;
            descI = next.children.length;
          } else {
            desc = next;
            descI--;
            break;
          }
        } else if (curDesc == parentDesc) {
          break outer;
        } else {
          descI = curDesc.parent.children.indexOf(curDesc);
          curDesc = curDesc.parent;
        }
      }
      let node = desc.node;
      if (!node)
        continue;
      if (node != frag.child(fI - 1))
        break;
      --fI;
      matched.set(desc, fI);
      matches2.push(desc);
    }
  return { index: fI, matched, matches: matches2.reverse() };
}

Iain Ireland [:iain]

Comment 9

•

1 year ago

js-perf-next: See comment 6.

Whiteboard: [sp3] → [sp3][js-perf-next]

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Blocks: 1925065

Jan de Mooij [:jandem]

Assignee

Comment 10

•

10 months ago

Taking this. I want to see if we can merge the malloc calls and maybe nursery-allocate this buffer if small.

I also want to get some data on the number of elements we store in Map/Set objects, to see if it's worth optimizing for empty maps or sets.

Assignee: nobody → jdemooij

Status: NEW → ASSIGNED

Denis Palmeiro [:denispal]

Comment 11

•

10 months ago

I am also seeing this problem on Android during TodoMVC-React-Redux where we are 12x slower than Chrome for scheduleWork:

Profile: https://share.firefox.dev/3YiWDHt
https://github.com/mozilla/Speedometer/blob/537d5769ef9ef3fa9a236884eae4d109c0628833//resources/todomvc/architecture-examples/react-redux/dist/app.bundle.js#L5109

function scheduleWork(fiber, expirationTime) {
  if (50 < nestedUpdateCount) throw nestedUpdateCount = 0, rootWithNestedUpdates = null, Error(formatProdErrorMessage(185));
  fiber = markUpdateTimeFromFiberToRoot(fiber, expirationTime);
  if (null !== fiber) {
    var priorityLevel = getCurrentPriorityLevel();
    1073741823 === expirationTime ? (executionContext & LegacyUnbatchedContext) !== NoContext && (executionContext & (RenderContext | CommitContext)) === NoContext ? performSyncWorkOnRoot(fiber) : (ensureRootIsScheduled(fiber), executionContext === NoContext && flushSyncCallbackQueue()) : ensureRootIsScheduled(fiber);
    (executionContext & 4) === NoContext || 98 !== priorityLevel && 99 !== priorityLevel || (null === rootsWithPendingDiscreteUpdates ? rootsWithPendingDiscreteUpdates = new Map([[fiber, expirationTime]]) : (priorityLevel = rootsWithPendingDiscreteUpdates.get(fiber), (void 0 === priorityLevel || priorityLevel > expirationTime) && rootsWithPendingDiscreteUpdates.set(fiber, expirationTime)));
  }
}

Mayank Bansal

Comment 12

•

10 months ago

They seem to have optimized these cases a lot more in the last 1 year.

Compared to comment #0, on my less powerful Win11 hardware I get 260ms in Nightly, and 33ms in Chrome. So on a slower machine, we are slower(as expected), but they are even faster than last year.

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Depends on: 1926525

Jan de Mooij [:jandem]

Assignee

Comment 13

•

10 months ago

Bug 1926525 merges 2 of the 3 allocations into a single allocation.

I collected some data for the max number of (used) entries in OrderedHashTable instances used for Map/Set objects when starting Firefox and doing a full Speedometer 3 run:

0 entries: 20738 / 153655 = 13.5%
1 entry: 93990 / 153655 = 61.2%

<= 4 entries: 148983 / 153655 = 97%

This suggests it's worth optimizing for a small number of elements but I think it will be hard to do this with OrderedHashTable.

It might be easier and more efficient to use an array of entries for small map/set objects. This array we could allocate in the nursery too. Once we exceed the max number of items for this array (maybe 4 or 8) we would expand to a full OrderedHashTable. There's some complexity with iterators, but we could migrate to an OrderedHashTable the first time we create an iterator - I still need to get some data on iterator usage too.

We should also try to optimize empty Map/Set objects better by delaying the allocations until we add the first entry.

Jan de Mooij [:jandem]

Assignee

Comment 14

•

10 months ago

I prototyped the scheme in comment 13 today and it works, but allocating a full OrderedHashTable when we create an iterator makes it less effective than I wanted: we get rid of about 85% of OrderedHashTable instances on sp3. Still nice but not great. It's hard to optimize that more because of how Map/Set iteration works.' I'm still waiting for perf numbers for this prototype.

I also want to prototype a different idea: see if we can use a native object as backing storage for the hash table instead of using OrderedHashTable. If this works out it would let us simplify a lot of the Map/Set GC interaction. I think this is also more similar to what other engines are doing for Map and Set.

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Depends on: 1927395

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Depends on: 1927405

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Depends on: 1927464

Jan de Mooij [:jandem]

Assignee

Updated

•

10 months ago

Depends on: 1928666

Mayank Bansal

Updated

•

9 months ago

Updated

•

9 months ago

Depends on: 1931815

Jan de Mooij [:jandem]

Assignee

Updated

•

9 months ago

Depends on: 1932305

Jan de Mooij [:jandem]

Assignee

Comment 15

•

9 months ago

Creating a new Map/Set object is now fast since it's just allocating a JS object and we can do this directly from JIT code when calling the constructor without arguments. We now allocate the buffer when we add the first entry and this is a single memory allocation that can be in the nursery, instead of three mallocs per object.

On the micro-benchmarks Markus attached we're a bit slower than Chrome (some of it is probably outside the Map/Set code) but about as fast as Safari on my machine. The numbers are pretty noisy but if I change it to add 1 item to the Map instead of 3 (common on Sp3) we might be the fastest.

Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.

Status: ASSIGNED → RESOLVED

Closed: 9 months ago

Resolution: --- → FIXED

Markus Stange [:mstange]

Reporter

Comment 16

•

9 months ago

(In reply to Jan de Mooij [:jandem] from comment #15)

Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.

I took a look through the new report and only found one instance, which I filed as bug 1933557. The argument-less constructors now all appear to be fast. Thank you!

Mathew Hodson

Updated

•

9 months ago

status-firefox134: --- → fixed

Target Milestone: --- → 134 Branch

Mayank Bansal

Updated

•

6 months ago

Updated

•

6 months ago

Blocks: 1799154

You need to log in before you can comment on or make changes to this bug.