Closed Bug 1656638 Opened 7 months ago Closed 7 months ago

Add telemetry for WebAssembly compile and run times


(Core :: Javascript: WebAssembly, enhancement, P3)




81 Branch
Tracking Status
firefox81 --- fixed


(Reporter: cfallin, Assigned: cfallin)




(2 files)

In order to track the quality of our WebAssembly JIT compilers' output, and their compile times, we would like to add telemetry to measure these timings. This telemetry would ideally measure CPU time spent in the compilers themselves, and in the code that they generate.

In order to limit overhead, we will likely not be able to precisely measure CPU time spent in Wasm code, at least not without a sampling-based profiling scheme. Rather, our initial thought is that we could measure wallclock time for each call into a Wasm function from the JS interpreter. This would omit direct calls into Wasm from JIT'd JS code, but we really don't want to pessimize that hot path; and since all systems under measurement would have the same limitation, data should still be comparable across Wasm-compiler changes (given that JS tiering heuristics do not change).

Given that we aren't quite sure what telemetry we might want yet, I haven't created a request for the Data Stewards yet; I will do that as soon as we have some consensus on what we would like.

Blocks: 1649932

This patch adds telemetry to measure the time spent in Wasm compilers
and in the code that they generate.

For simplicity, it measures wallclock time as a proxy for CPU time.
Furthermore, it measures runtime only for Wasm invoked from the JS
interpreter, not for Wasm code entered directly from JIT'd JS code. This
is for efficiency reasons: we do not want to add a VM call in the
transition stubs; that would be a Very Bad Thing for performance, even
for a Nightly build instrumented with telemetry.

The plumbing is somewhat awkward, given that Wasm compilers can run on
background threads. It appears that the telemetry-callback API that
SpiderMonkey includes to avoid a direct dependency on the Gecko
embedding had artificially limited the callback to main-thread use only.
This patch removes that limitation, which is safe at least for Gecko;
the telemetry hooks in Gecko are thread-safe (they take a global mutex).
That way, the background threads performing compilation can directly add
telemetry incrementally, without having to pass this up through the main
thread somehow.

Finally, I have chosen to add the relevant metrics as Scalar telemetry
values rather than Histograms. This is because what we are really
interested in is the sum of all these values (all CPU time spent in
compilation + running Wasm code); if this value goes down as a result of
a Wasm compiler change, then that Wasm compiler change is good because
it reduces CPU time on the user's machine. It is difficult to add two
Histograms together because the bins may have different boundaries. If
we instead need to use a binned histogram for other reasons, then we
could simply report the sum (of all compiler time plus run time) as
another histogram.

One additional thought: after playing with this patch for a bit, I've started to suspect that counting only Wasm entered from interpreted JS might be missing some significant Wasm runtime: e.g., on the Godot Wasm demo (link), the stat remains constant at ~300ms even after jumping around in the game for a while.

An alternative low-overhead approach might be to measure all CPU time within any top-level JS invocation (hooking in at say JS::Call or thereabouts), i.e., any time the main loop enters JS or Wasm code. I suspect that measuring JS+Wasm time might be OK if we're measuring Wasm compiler changes and holding everything else constant; we won't have a speedup ratio for Wasm, but we will have absolute CPU time saved, to balance against compile time spent.

Just updated the patch to do the above: it now measures all JS + Wasm runtime by hooking top-level SpiderMonkey invocations. This appears to be doing the right thing with local testing.

Attachment #9167451 - Attachment description: Bug 1656638 (WIP DRAFT): Add Wasm compile- and run-time telemetry to track Wasm compiler performance. r=lth → Bug 1656638: Add Wasm compile- and run-time telemetry to track Wasm compiler performance. r=lth

Attaching data collection review form for the Data Stewards. Happy to revise and answer any and all questions as necessary -- thanks!

Attachment #9168043 - Flags: data-review?(chutten)
Comment on attachment 9168043 [details]


    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?


    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire in Firefox 88.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default on for pre-release channels only.

    Does the instrumentation include the addition of any new identifiers?


    Is the data collection covered by the existing Firefox privacy notice?


    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :cfallin is responsible for renewing or removing the collection before it expires in Firefox 88.

Result: datareview+
Attachment #9168043 - Flags: data-review?(chutten) → data-review+
Pushed by
Add Wasm compile- and run-time telemetry to track Wasm compiler performance. r=lth

Fixed the bustage -- sorry about that. It was the implicit-constructor lint that's an error in the Treeherder build; I didn't catch it in my local build.

Flags: needinfo?(cfallin)
Pushed by
Add Wasm compile- and run-time telemetry to track Wasm compiler performance. r=lth
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch
See Also: → 1689273
You need to log in before you can comment on or make changes to this bug.