Creating a new Map / Set object is 7x more expensive than creating a new array (and 5x slower than in Chrome)
Categories
(Core :: JavaScript Engine, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox134 | --- | fixed |
People
(Reporter: mstange, Assigned: jandem)
References
(Blocks 4 open bugs)
Details
(Whiteboard: [sp3][js-perf-next])
Attachments
(3 files)
We spend 0.21% of the overall Speedometer 3 time creating Map
and Set
objects.
We are about 5x slower than Chrome at it, for example in createDep
on TodoMVC-Vue or in _$Ek
on TodoMVC-Lit.
Profile of MapObject::create
: https://share.firefox.dev/3P07AJw
Allocation is the main problem. For a new Map
object, we have three calls to js_pod_arena_malloc
.
The following three reduced testcases compare Map/Set construction with JS array construction:
new Map()
microbenchmark (attachment 9351654 [details]): 230ms in Firefox, 53ms in Chrome
new Set()
microbenchmark (attachment 9351655 [details]): 230ms in Firefox, 53ms in Chrome
[]
microbenchmark (attachment 9351656 [details]): 20ms in Firefox, 20ms in Chrome
For the curious, here's a breakdown of all Map / Set operations on sp3: https://share.firefox.dev/3sIG9wt (slow to load)
This bug isn't currently among the highest priorities for sp3-affecting JS work, but I expect it'll bubble to the top of the list eventually.
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
Reporter | ||
Comment 2•2 years ago
|
||
Reporter | ||
Comment 3•2 years ago
|
||
Assignee | ||
Comment 4•2 years ago
|
||
I think we do a malloc
for the OrderedHashTable
and then also for its variable-size table. Maybe we could store the OrderedHashTable
in the object itself, similar to inline typed array data. Or we could fuse the two allocations into one. Or allocate it in the nursery if small.
It'd also be good to check how many entries we add in practice; if it's very small we'd ideally handle this without malloc
.
Updated•2 years ago
|
Comment 5•1 years ago
|
||
This also shows up in scheduleUpdateOnFiber
on react-redux
Assignee | ||
Comment 6•1 year ago
|
||
The best option is probably to try to fuse the two allocations into one. After that we could try to allocate this single buffer in the nursery if the Map
or Set
itself is nursery allocated.
Making this depend on bug 1787526 to avoid conflicting with that.
Comment 7•1 year ago
•
|
||
This shows up on Sp3-NewsSite-Nuxt too, where we're 2.5x slower on the following function:
function track(target, type, key) {
if (shouldTrack && activeEffect) {
let depsMap = targetMap.get(target);
if (!depsMap) {
targetMap.set(target, depsMap = /* @__PURE__ */ new Map());
}
let dep = depsMap.get(key);
if (!dep) {
depsMap.set(key, dep = createDep());
}
trackEffects(dep);
}
}
Comment 8•1 year ago
|
||
It shows up on preMatch
in SP3-Editor-TipTap as well where we're 2x slower on the following function:
function preMatch(frag, parentDesc) {
let curDesc = parentDesc, descI = curDesc.children.length;
let fI = frag.childCount, matched = /* @__PURE__ */ new Map(), matches2 = [];
outer:
while (fI > 0) {
let desc;
for (; ; ) {
if (descI) {
let next = curDesc.children[descI - 1];
if (next instanceof MarkViewDesc) {
curDesc = next;
descI = next.children.length;
} else {
desc = next;
descI--;
break;
}
} else if (curDesc == parentDesc) {
break outer;
} else {
descI = curDesc.parent.children.indexOf(curDesc);
curDesc = curDesc.parent;
}
}
let node = desc.node;
if (!node)
continue;
if (node != frag.child(fI - 1))
break;
--fI;
matched.set(desc, fI);
matches2.push(desc);
}
return { index: fI, matched, matches: matches2.reverse() };
}
Assignee | ||
Comment 10•9 months ago
|
||
Taking this. I want to see if we can merge the malloc
calls and maybe nursery-allocate this buffer if small.
I also want to get some data on the number of elements we store in Map
/Set
objects, to see if it's worth optimizing for empty maps or sets.
Comment 11•9 months ago
|
||
I am also seeing this problem on Android during TodoMVC-React-Redux where we are 12x slower than Chrome for scheduleWork
:
Profile: https://share.firefox.dev/3YiWDHt
https://github.com/mozilla/Speedometer/blob/537d5769ef9ef3fa9a236884eae4d109c0628833//resources/todomvc/architecture-examples/react-redux/dist/app.bundle.js#L5109
function scheduleWork(fiber, expirationTime) {
if (50 < nestedUpdateCount) throw nestedUpdateCount = 0, rootWithNestedUpdates = null, Error(formatProdErrorMessage(185));
fiber = markUpdateTimeFromFiberToRoot(fiber, expirationTime);
if (null !== fiber) {
var priorityLevel = getCurrentPriorityLevel();
1073741823 === expirationTime ? (executionContext & LegacyUnbatchedContext) !== NoContext && (executionContext & (RenderContext | CommitContext)) === NoContext ? performSyncWorkOnRoot(fiber) : (ensureRootIsScheduled(fiber), executionContext === NoContext && flushSyncCallbackQueue()) : ensureRootIsScheduled(fiber);
(executionContext & 4) === NoContext || 98 !== priorityLevel && 99 !== priorityLevel || (null === rootsWithPendingDiscreteUpdates ? rootsWithPendingDiscreteUpdates = new Map([[fiber, expirationTime]]) : (priorityLevel = rootsWithPendingDiscreteUpdates.get(fiber), (void 0 === priorityLevel || priorityLevel > expirationTime) && rootsWithPendingDiscreteUpdates.set(fiber, expirationTime)));
}
}
Comment 12•9 months ago
|
||
They seem to have optimized these cases a lot more in the last 1 year.
Compared to comment #0, on my less powerful Win11 hardware I get 260ms in Nightly, and 33ms in Chrome. So on a slower machine, we are slower(as expected), but they are even faster than last year.
Assignee | ||
Comment 13•9 months ago
|
||
Bug 1926525 merges 2 of the 3 allocations into a single allocation.
I collected some data for the max number of (used) entries in OrderedHashTable
instances used for Map
/Set
objects when starting Firefox and doing a full Speedometer 3 run:
0 entries: 20738 / 153655 = 13.5%
1 entry: 93990 / 153655 = 61.2%
<= 4 entries: 148983 / 153655 = 97%
This suggests it's worth optimizing for a small number of elements but I think it will be hard to do this with OrderedHashTable
.
It might be easier and more efficient to use an array of entries for small map/set objects. This array we could allocate in the nursery too. Once we exceed the max number of items for this array (maybe 4 or 8) we would expand to a full OrderedHashTable
. There's some complexity with iterators, but we could migrate to an OrderedHashTable
the first time we create an iterator - I still need to get some data on iterator usage too.
We should also try to optimize empty Map
/Set
objects better by delaying the allocations until we add the first entry.
Assignee | ||
Comment 14•9 months ago
|
||
I prototyped the scheme in comment 13 today and it works, but allocating a full OrderedHashTable
when we create an iterator makes it less effective than I wanted: we get rid of about 85% of OrderedHashTable
instances on sp3. Still nice but not great. It's hard to optimize that more because of how Map
/Set
iteration works.' I'm still waiting for perf numbers for this prototype.
I also want to prototype a different idea: see if we can use a native object as backing storage for the hash table instead of using OrderedHashTable
. If this works out it would let us simplify a lot of the Map
/Set
GC interaction. I think this is also more similar to what other engines are doing for Map
and Set
.
Assignee | ||
Comment 15•8 months ago
|
||
Creating a new Map
/Set
object is now fast since it's just allocating a JS object and we can do this directly from JIT code when calling the constructor without arguments. We now allocate the buffer when we add the first entry and this is a single memory allocation that can be in the nursery, instead of three mallocs per object.
On the micro-benchmarks Markus attached we're a bit slower than Chrome (some of it is probably outside the Map/Set code) but about as fast as Safari on my machine. The numbers are pretty noisy but if I change it to add 1 item to the Map
instead of 3 (common on Sp3) we might be the fastest.
Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.
Reporter | ||
Comment 16•8 months ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #15)
Markus, I'm curious if this still shows up in your profile comparisons. It's possible there's work left that's not covered by the micro-benchmarks but we can file new bugs for that.
I took a look through the new report and only found one instance, which I filed as bug 1933557. The argument-less constructors now all appear to be fast. Thank you!
Updated•8 months ago
|
Description
•