Closed Bug 1599800 Opened 4 years ago Closed 1 year ago

[exploration] Consider growing table capacity exponentially

Categories

(Core :: JavaScript: WebAssembly, task, P3)

task

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
firefox109 --- fixed

People

(Reporter: luke, Assigned: bvisness)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

Currently, we just realloc() each time so, if realloc() was O(n), then n grows would be O(n^2). Realloc() mostly helps out out here by doing some internal rounding-up, but it seems dangerous to depend on this. I bring this up b/c there was a Chrome bug where they were, I'm guessing, not using realloc, and thus they were getting O(n^2), and someone reported a bug, and so they are switching to exponential growth, bringing them from slower than us to faster than us.

Priority: -- → P3
Blocks: wasm-perf
Summary: Consider growing table capacity exponentially → [exploration] Consider growing table capacity exponentially

Instead of growing a Table by the requested delta (and relying on
realloc to do the heavy lifting for us to keep this efficient),
increase the table capacity by at least a factor of 1.5 when we need
to grow it.

I believe the attached patch does exactly what we need, for a growth factor of 1.5x. This does not improve the performance on Linux. On Linux, Nightly is quite a bit faster than Chromium [sic] on the benchmark attached to the v8 bug, and the patch regresses performance for the part of the benchmark in the build we care about. (You want to look at the "__assign_got_entries" in the console, and be sure to disable javascript.options.wasm_verbose before you run this, as there's tremendous spew that slows everything down, due to thousands of modules being constructed to simulate WebAssembly.Function.)

It may be that 1.5x is much too aggressive and that this will result in copying because realloc already was implementing this optimization for us and now it doesn't have the breathing room to do so. But 1.1x isn't really much better.

(I need to measure this on other systems where other mallocs may be behaving differently.)

Assignee: nobody → lhansen
Status: NEW → ASSIGNED

On macOS X (M1 MacBook Pro, OS 12.3), Chrome is about 30% faster than Firefox on that segment of the test. Growing the tables exponentially does not improve the Firefox score in my local testing; the cost is spent elsewhere, TBD. The profiler shows almost no time being spent in Table::grow.

Some of the time is background compilation noise: running with either just baseline or ion, the time improves by about 15%. But the profiler still shows almost all time everywhere being spent in compilation.

(I need to test Windows too.)

On Windows, basically no time is spent in Table::grow, the profiler has just 13 hits, for a total of 26ms (this is Nightly 100, without the patch). Chrome 100 is about 15% faster than Nightly 100 on the segment of the test in question so they're winning on something. A lot of time in Firefox is being spent in AllocateCodeBytes (for the memcpy it looks like) and VirtualProtect - this is essentially a compiler benchmark, from what I can see, and what we're likely seeing the most is that when new WebAssembly.Module is being used to simulate WebAssembly.Function, we have more overhead than Chrome.

In short, I think it might be worth digging into this test case some more to see if we can speed up compilation, but the table growth does not seem to be a factor. It's possible we're saved by good realloc implementations.

Profile from Windows: https://share.firefox.dev/3DMQVDn

See Also: → 1728834

Blocking wasm-large-web-properties since they will suffer the most from the compilation overhead of lots of tiny modules, see comment 4, and also from our WebAssembly.Function implementation, which basically does the same thing.

Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Severity: normal normal → S3 S3

Rather than sticking with our current realloc strategy and growing exponentially, we decided to simply store func refs in a vector. Microbenchmark performance indicates that this is 7-8x faster.

(The current implementation spends a lot of time in memset for reasons unclear to me - I imagine it's repeatedly re-zeroing memory, but I don't have a good explanation as to why.)

Assignee: nobody → ben
Status: NEW → ASSIGNED
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/autoland/rev/78831b749c8e
Improve performance when growing wasm tables by small amounts. r=rhunt
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: