Open
Bug 1341133
Opened 7 years ago
Updated 2 years ago
WebAssembly execution performance stutters in Zen Garden Wasm demo in 64-bit Firefox, but enabling built-in profiler or geckoprofiler takes the performance stutters away
Categories
(Core :: JavaScript Engine, defect, P3)
Core
JavaScript Engine
Tracking
()
NEW
People
(Reporter: jujjyl, Unassigned)
Details
(Keywords: triage-deferred)
Attachments
(5 files)
Here's something that is very peculiar. Benchmarking performance of a WebAssembly + WebGL 2 demo with Unreal Engine 4, performance is generally good, but animation suffers from constant stuttering where execution misses frames randomly every second or two. This execution time that causes the stutters apppers inside the wasm code that is running in the requestAnimationFrame loop. However enabling geckoprofiler or the built-in profiler to analyze where these stutters would come from actually completely remove the stutters, giving perfectly smooth animation. Having the profiler enabled does have a tiny impact on overall performance (CPU load rises from 89.03% to 94.87%), but each rAF() finishes consistently within 16msecs after that, as opposed to spikes of 30-50 msecs when profiler is disabled. Why would enabling the profiler g
Reporter | ||
Comment 1•7 years ago
|
||
View of the Emscripten requestAnimationFrame performance profiler when Firefox profiler is disabled. Each one pixel column represent a single rAF tick, with the height of the column measuring the number of msecs that the rAF took. Note the constant random height spikes at random intervals.
Reporter | ||
Comment 2•7 years ago
|
||
Enabling geckoprofiler completely fixes performance and all rAF()s now finish in < 16.67 msecs for smooth 60fps animation.
Reporter | ||
Comment 3•7 years ago
|
||
Oh, I notice the text had cut off in comment 0. Why would enabling the profiler generate such a different performance footprint? I think this should not be anything GC or CC related, since the added time is spent inside the rAF(), and GC and CC run when the browser main loop is idle?
Comment 4•7 years ago
|
||
Could you perhaps profile with a native C++ profiler and see if any C++ functions (e.g., GC) occur during the stuttering frames? Profiling-mode does affect tracing and discarding of JIT code, iirc.
Reporter | ||
Comment 5•7 years ago
|
||
I don't have a native C++ profiler that would let me see into the stuttering frames. The native profilers that I have all aggregate samples together and don't show them on a timeline. Here is an overview of what AMD CodeXL says as hotspots.
Reporter | ||
Comment 6•7 years ago
|
||
And a detailed view. The biggest overhead, setUniform() which takes 986 samples corresponds to 1.70% of total execution time, so it is difficult to conclude that that would be the slow item. Moz_speex_resampler_set_rate_frac is about audio resampling, which should run the same with or without profiling enabled. AMD CodeXL is not able to show wasm frames, so the "Unknown Module (PID 15540)" is a black box in this case.
Reporter | ||
Comment 7•7 years ago
|
||
Marked down the speex part separately as bug 1341254, although that is not the root issue, since disabling all audio in the demo does not take the stutters away.
Comment 8•7 years ago
|
||
I'm unable to repro in Linux (Nightly or Beta). While I see if I can repo in Windows, one quick test that would be useful, could you enable GC logging (javascript.options.mem.log = true), open the Browser Console (note: not *Web* Console) and see if the GC logging output (which includes max pause times) coincides with the janks you're observing.
Comment 9•7 years ago
|
||
I am able to see more stuttery behavior on Windows (subjectively: in Beta much more than Nightly), but looking at the GC log, GC activity doesn't seem to be the culprit. So far, this isn't looking like a JS issue; it seems like some timing instrumentation is needed here to focus on what is going slower for these dropped frames.
Reporter | ||
Updated•7 years ago
|
Summary: WebAssembly execution performance stutters in requestAnimationFrame-based animation, but enabling built-in profiler or geckoprofiler takes the performance stutters away → WebAssembly execution performance stutters in Zen Garden Wasm demo, but enabling built-in profiler or geckoprofiler takes the performance stutters away
Reporter | ||
Comment 10•7 years ago
|
||
The demo went live today. Here is a public link, which force disables WebGL and WebAudio from the equation to remove any stuttering effects they might cause: https://s3.amazonaws.com/mozilla-games/tmp/2017-03-07-ZenGarden-disablewebaudioandwebgl/EpicZenGarden.html?playback&novsync&cpuprofiler&fakegl&noaudio Note that this bug is about different source of stuttering compared to bug 1345212. Bug 1345212 focuses on extra stuttering that is caused by specifically running in 32-bit Firefox, but this bug focuses on performance spikes on 64-bit Firefox that seem to vanish when enabling the profiler.
Summary: WebAssembly execution performance stutters in Zen Garden Wasm demo, but enabling built-in profiler or geckoprofiler takes the performance stutters away → WebAssembly execution performance stutters in Zen Garden Wasm demo in 64-bit Firefox, but enabling built-in profiler or geckoprofiler takes the performance stutters away
Reporter | ||
Comment 11•7 years ago
|
||
Here is a new profiling run, executed twice with geckoprofiler enabled, which was practically stutter free on both runs, except for three distinct deterministically occurring spikes which geckoprofiler locates to be specific to when the demo initializes particle systems at the first time they come to view in the camera. When running with geckoprofiler disabled however (two bottom graphs), there is much more short term spiking in executing the demo Wasm code, or some FFI calls it does. These are pointed to by red color in the lower graphs. Luke asked whether these could be GCs or CCs? My understanding is that GC&CC are only run in the main thread when the browser main loop is idle. Is that assumption correct? If so, in these cpuprofiler graphs, such GC&CC spikes will then be shown either in light green, light yellow or light red (depending on how much time they take, i.e. the vertical height in the graph). The interesting spikes in question for this bug are drawn in blue in the lower graphs, which is for time spent inside the user requestAnimationFrame() code. The demo is extremely optimized for avoiding temporary garbage, and there's only a handful of locations that do generate temporary garbage. Large memcpys dominate with 52% of all allocations, localtime_r does new Date(), mouse input injection needs to estimate bounding client rectangles for the canvas, that's about it. Profiling the source is hard for this reason, because I cannot attach geckoprofiler or the built-in profiler because that takes the spikes away, and native profilers such as VTune or CodeXL are not able to show interactive timelines (nor do they show wasm call stacks), and attempting to instrument the UE4 codebase itself is difficult since the codebase is huge and I don't have a clue where to start, especially since it does not look like the UE4 code would be the culprit, as indicated by perf being smooth when one enables geckoprofiler. Are geckoprofiler and/or built-in profiler expected to change how GC&CC operations behave?
Comment 12•7 years ago
|
||
One thing that comes to mind is that the Gecko profiler calls timeBeginPeriod/timeEndPeriod on Windows in order to increase the timer resolution. IIRC we also do that for requestAnimationFrame but maybe with a different value.
Comment 13•7 years ago
|
||
When I tried this out, in comment 9 I didn't see any problematic GC/CC behavior, so that's probably not the culprit. Jukka, is there anyway to use that integrated profiler whose graph you're showing to get some insight into what functions are being run during those spikes?
Reporter | ||
Comment 14•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #13) > Jukka, is there anyway to use that integrated profiler whose graph you're > showing to get some insight into what functions are being run during those > spikes? Unfortunately not. The integrated profiler is only able to hook into high level JS DOM elements, since it's not possible to inject to WebAssembly code on the fly. If I was able to do that, certainly would have attempted it a long time ago. (In reply to Markus Stange [:mstange] from comment #12) > One thing that comes to mind is that the Gecko profiler calls > timeBeginPeriod/timeEndPeriod on Windows in order to increase the timer > resolution. IIRC we also do that for requestAnimationFrame but maybe with a > different value. Would you be able to help me figure out the effect of this? I'd like to either a) make those time*Period calls happen when profiling is disabled, i.e. perhaps run with these always set to their "profiling enabled" states, or b) adjust profiler to run without changing any of the time*Period values.
Comment 15•7 years ago
|
||
b) is a lot easier. Just remove the two calls in tools/profiler/core/platform-win32.cpp.
Updated•7 years ago
|
Keywords: triage-deferred
Priority: -- → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•