Closed Bug 1329815 Opened 3 years ago Closed 3 years ago

41.5% of CPU time in a WebGL 2 demo is lost on gl.drawElements(), gl.clearBufferfv() and gl.blitFramebuffer() checking framebuffer completeness.


(Core :: Canvas: WebGL, defect, P3)

53 Branch
Windows 10



Tracking Status
firefox51 --- wontfix
firefox53 --- fixed


(Reporter: jujjyl, Assigned: jgilbert)


(Whiteboard: gfx-noted)


(2 files)

Profiling a WebGL 2 demo from one of our partners on Windows GTX980Ti & a 16-core i7 5960X, with the prefs


set, the demo spends 41.5% of its total CPU time inside glCheckFramebufferStatus() validation calls in the WebGL backend. See the attached screenshot for an illustration.

The framebuffer completeness checks seem to be present both in webgl.force-enabled = false and true runs, and presumably take up the same amount of absolute time, but they show up with much smaller proportional overhead when ANGLE is enabled, because of other added ANGLE CPU overhead that piles up in that case.

The motivation to these prefs is to test how much bypassing ANGLE improves performance on Windows, since Jeff Gilbert has dreamed about the possibly in the future that we could do away with various ANGLE validation, making this a sensible setup to benchmark. The demo can be natively compiled as a Windows application, or Emscripten compiled to run in browser. Measuring FPS, CPU and GPU utilization in different cases:

Native Win64:
 - 120-160fps
 - 197% CPU utilization, i.e. ~two full CPU cores worth of work
 - 100% GPU utilization (TDP throttled)

FF Nightly with ANGLE enabled:
 - 11-20 fps
 - 92% CPU utilization, i.e. roughly one core fully utilized
 - 10% GPU utilization, not even enough to spin up the GPU fan

FF Nightly with ANGLE disabled:
 - 22-30 fps
 - 76% CPU utilization, not even one core fully utilized
 - 18% GPU utilization, a bit better

Which looks like when running in a browser, the system is neither fully CPU or GPU bound, but both are regularly idle throughout an application frame, which indicates per-frame pipeline sync bubbles: GPU is fast enough to occassionally starve for work, but CPU also needs to stall to wait for GPU in the middle of a frame while it is working.

The function glCheckFramebufferStatus() would ideally only ever be called at FBO creation time ("load" time), and never at render time. What kind of measures could we do to optimize this? Losing about half of CPU performance to FBO validation is too much. Can we e.g. assume that an FBO configuration that has already once been tested to be complete to be always complete?

Contact me for a live testcase if needed, it is unfortunately not public.
Doing a custom build of Firefox where I commented out gl->fCheckFramebufferStatus(LOCAL_GL_FRAMEBUFFER); check in WebGLFramebuffer::CheckFramebufferStatus() and faked it to always return LOCAL_GL_FRAMEBUFFER_COMPLETE for testing.

This gives the following numbers (ANGLE disabled as well):
 - 38-45 fps (+50% - 72% improvement)
 - 92% CPU utilization (+21% more saturated)
 - 26% GPU utilization (+44% more saturated)
Running on OS X Mac Pro (Late 2013) in stock FF Nightly, 3.5 GHz 6-core Intel Xeon E5 with AMD FirePro D500, the results are
 - 20-22 fps
 - 130% CPU utilization (1.3 cores)
 - 23% GPU utilization

Looking at geckoprofiler, on OS X the FBO completeness checks are a much smaller overhead, around 5% of total execution time.
I'm confident this is the app using the single-FBO antipattern.

I can add a pref warning for this, and we can check.
Assignee: nobody → jgilbert
Priority: -- → P3
Whiteboard: gfx-noted
This will spew invalidations to FB completeness:

There are two prefs:
pref("webgl.max-perf-warnings", -1);
pref("webgl.max-acceptable-fb-status-invals", 0);

It only triggers when the status is already cached as complete, but something invalidates our cache.
It will spew to console at after the number of invalidations for a specific FB exceed uint32_t(webgl.max-acceptable-fb-status-invals), and will spew at most uint32_t(webgl.max-perf-warnings) warnings. (-1 => UINT32_MAX, or 'practically forever')

Running with this build (assuming it builds on Try, and not just locally!) should make it obvious if our cache is being invalidated.
Flags: needinfo?(jujjyl)
Thanks Jeff, this is excellent!

Seeing a running counter of "WebGL perf warning: framebufferTexture2D: FB was invalidated after being complete 25672 times.". We'll give this a look at using multiple static FBOs and comparing performance to that.
Flags: needinfo?(jujjyl)
Comment on attachment 8826803 [details]
Bug 1329815 - GeneratePerfWarning and warn on completed-FB invalidation. -

::: dom/canvas/WebGLContext.h:2038
(Diff revision 1)
>  public:
>      // console logging helpers
>      void GenerateWarning(const char* fmt, ...);
>      void GenerateWarning(const char* fmt, va_list ap);
> +    void GeneratePerfWarning(const char* fmt, ...) const;

I don't think this needs to be `const`. Consequently, `mNumPerfWarnings` would not need to be `mutable`.

::: dom/canvas/WebGLTexture.h
(Diff revision 1)
> -        void Clear();
> +        void Clear(const char* funcName);
>          ~ImageInfo() {
> -            if (!IsDefined())
> +            MOZ_ASSERT(!mAttachPoints.size());
> -                Clear();

What's the reason for this `Clear()` to be removed?

::: gfx/thebes/gfxPrefs.h:619
(Diff revision 1)
>    DECL_GFX_PREF(Live, "webgl.prefer-16bpp",                    WebGLPrefer16bpp, bool, false);
>    DECL_GFX_PREF(Live, "webgl.restore-context-when-visible",    WebGLRestoreWhenVisible, bool, true);
>    DECL_GFX_PREF(Live, "webgl.allow-immediate-queries",         WebGLImmediateQueries, bool, false);
>    DECL_GFX_PREF(Live, "webgl.allow-fb-invalidation",           WebGLFBInvalidation, bool, false);
> +  DECL_GFX_PREF(Live, "webgl.max-perf-warnings",               WebGLMaxPerfWarnings, int32_t, 0);

shouldn't there be some sensible defaults?
Comment on attachment 8826803 [details]
Bug 1329815 - GeneratePerfWarning and warn on completed-FB invalidation. -

> I don't think this needs to be `const`. Consequently, `mNumPerfWarnings` would not need to be `mutable`.

Conceptually this should be const, since it doesn't mutate any notable state on the object. (there's no case where we'd want const to protect us from incremeting the number of posted warnings, so we remove it from the set of members that count towards const)

> What's the reason for this `Clear()` to be removed?

Clear is called during WebGLTexture teardown, so these should all be cleared already.
Also notice that Clear was only previously being called if the ImageInfo was /not/ defined. Which was totally a bug anyway.

> shouldn't there be some sensible defaults?

These are sensible defaults:
Most people never want to see these, and they don't technically represent an error.
They should be useful for guiding optimizations for devs. We should dev-doc-needed this.
Comment on attachment 8826803 [details]
Bug 1329815 - GeneratePerfWarning and warn on completed-FB invalidation. -
Looks like mozreview bugged out on that bug, it doesn't let me "r+" it.
Attachment #8826803 - Flags: review?(kvark) → review+
Pushed by
GeneratePerfWarning and warn on completed-FB invalidation. - r=kvark
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8826803 [details]
Bug 1329815 - GeneratePerfWarning and warn on completed-FB invalidation. -
Attachment #8826803 - Flags: review+
You need to log in before you can comment on or make changes to this bug.