Open Bug 1684224 Opened 3 years ago Updated 2 years ago

Poor performance in webgl aquarium demo (even with EGL/DmaBuf)

Categories

(Core :: Graphics: CanvasWebGL, defect)

Firefox 86
x86_64
Linux
defect

Tracking

()

Tracking Status
firefox86 --- disabled

People

(Reporter: sk.griffinix, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: nightly-community, perf)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Android 10; Mobile; rv:86.0) Gecko/86.0 Firefox/86.0

Steps to reproduce:

  1. Enable webrender, widget.dmabuf-webgl.enabled and launch firefox with MOZ_X11_EGL=1.

  2. Run webgl aquarium benchmark https://webglsamples.org/aquarium/aquarium.html

Actual results:

Firefox runs it at 60fps until 1000 fish, drops to 23fps at 5000 fish and runs at 5-6fps at 30000 fish

Chrome runs it at 60fps at 10000 fish, 45-50 fps at 15000 and 22-23fps at 30000 fish

Expected results:

Performance should be comparable to chrome

Attached file about:support

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
Component: Graphics: WebRender → Canvas: WebGL
OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Which results do you get on KDE Wayland (MOZ_ENABLE_WAYLAND=1) and KDE Xwayland (MOZ_X11_EGL=1) for comparison?
(Do you get the same results with Gnome Wayland and Gnome Xwayland?)

(Martin Stránský [:stransky] from bug 1586696 comment 6)

Bug 1608800 is also related and when lands we can create dmabuf framebuffer/textures with modifiers. It should improve dmabuf performance for surfaces/textures used exclusively by GPU, like WebGL framebuffer.

(Martin Stránský [:stransky] from bug 1588736 comment 2 + bug 1588736 comment 3)

Gnome bug: https://bugzilla.gnome.org/show_bug.cgi?id=785779
This may be also related: https://phabricator.kde.org/T8067

(Martin Stránský [:stransky] from bug 1662409 comment 0)

Dmabuf modifiers are not used with WebGL right now.

For the record: the aquarium demo is known to perform much better on chrome (on high fish counts), on all platform (also win/mac) and there are a bunch of issues for that already. I remember having read that it's about spidermonkey vs v8, but on a short search I could only find e.g. bug 1663084

Anyhow, all I wanted to say is that for this specific demo the result is expected - and it's most likely not something we can fix in the dmabuf code.

Profiles would help with demonstrating that! :)

There's a Windows one in bug 1662811 comment 1 with 10000 fish IIUC (https://share.firefox.dev/3gT2DPq)

It's easy enough for someone who's running Linux to make one here.

Hehe, true. Here are two with 10000 fish on current nightly:

Both, wayland and x11/egl, use the dmabuf buffer sharing and show almost identical performance here.

Jan, would you know why some functions such as renderMono remain running as baseline-interpreter mode for multiple seconds in both profiles? (feel free to fork this to another bug blocking this one)

Flags: needinfo?(jdemooij)
Depends on: 1691504

~25% of time in webgl dispatch code, so webgl's not the long pole here.

(In reply to Nicolas B. Pierron [:nbp] from comment #9)

Jan, would you know why some functions such as renderMono remain running as baseline-interpreter mode for multiple seconds in both profiles? (feel free to fork this to another bug blocking this one)

I looked into this but I think the profiler is buggy for JIT frames and as a result incorrectly attributes time to renderMono and some C++ functions. I filed bug 1691504 for that.

Flags: needinfo?(jdemooij)

Here is a fresh profile (10000 fish) with the fix from bug 1691504: https://share.firefox.dev/3poBBnv

For comparison a chromium profile: https://share.firefox.dev/2Zkmscq

With fixed profiling, there is a lot of time (17%) in:

tdl.math.pseudoRandom = function() {
  var math = tdl.math;
  return (math.randomSeed_ =
          (134775813 * math.randomSeed_ + 1) %
          math.RANDOM_RANGE_) / math.RANDOM_RANGE_;
};

So this could just be bug 1518857? Bug 1673840 improved the situation for power-of-two modulus which are representable as int32_t, but math.RANDOM_RANGE_ is Math.pow(2, 32), so it doesn't use the fast path. Changing LModPowTwoD to accept larger values may help here.

Took a quick look at this. Relative to Ion:

  1. Warp emits a pre-barrier when storing to math.randomSeed_, because it doesn't have TI to indicate that the previous value isn't a GC thing
  2. Warp loads math.RANDOM_RANGE_ out of the slot each time instead of using a constant value, presumably because of the singleton optimizations.

However, the overall difference between Ion and Warp on this is minimal (<5%), so it's likely that neither of those optimizations would help us much here. (However, we might need something like the second optimization to enable the LModPowTwoD improvements anba mentions above.)

This function is small enough that Warp is willing to inline it, and in general the LIR looks pretty reasonable. (I haven't looked at the actual generated code.) Unfortunately we end up using double arithmetic, but I don't think that can be avoided because the product of the multiplication is sometimes greater than 2*53, where double arithmetic loses precision, so we'd get the wrong answer if we tried computing this using 64-bit integer math.

Presumably Chrome has the same ISA constraints as us w.r.t. precision, so are we actually slower and they're still correct, and if so, what are they doing differently here?

Is it <5% faster overall? If it's 5% faster overall, and this is 17% of overall in the slow version, is this code then 5/17 = 30% faster in Ion? (Is that the faster one?)
It's possible the answer here will be a bunch of 3-5% improvements causing a big win in aggregate.

(In reply to Jeff Gilbert [:jgilbert] from comment #16)

Presumably Chrome has the same ISA constraints as us w.r.t. precision, so are we actually slower and they're still correct, and if so, what are they doing differently here?

They're using x87 instructions. See bug 1518857, comment #0 for a more detailed explanation.

(In reply to Jeff Gilbert [:jgilbert] from comment #17)

Is it <5% faster overall? If it's 5% faster overall, and this is 17% of overall in the slow version, is this code then 5/17 = 30% faster in Ion? (Is that the faster one?)
It's possible the answer here will be a bunch of 3-5% improvements causing a big win in aggregate.

Taking this particular function, wrapping it in a loop, and comparing the performance with Ion (our old backend) to Warp (our new backend) we are <5% slower. (Warp deliberately traded off aggressive optimization of hot number-crunching loops for reduced overhead, which is a good deal overall but sometimes causes regressions on code like this.)

Digging in slightly deeper, the ~5% gap only occurs with inlining. If we turn inlining off, the difference between Ion and Warp is actually <0.5%. Warp's issue post-inlining is that we generate guards to ensure that tdl.math.pseudoRandom is still the same function we inlined, and LICM doesn't hoist those guards out of the loop because our alias analysis isn't precise enough to realize that writing to tdl.math.randomSeed_ won't change tdl.math.pseudoRandom. The cost of those guards is amortized over the body of the loop, so for more realistic code that actually uses the random value, the overhead of the extra guards would be smaller.

Leo, can you please run Firefox with dmabuf logging:

MOZ_LOG="Dmabuf:5" MOZ_X11_EGL=1 firefox

and attach the log here? I suspect Bug 1696869 may be related and we fall back to shm here.

Flags: needinfo?(sk.griffinix)

I am currently not in possession of linux device. I will try to post it within a week

Hi Martin, I can provide the log you requested but for an Arch Linux + Nvidia system.
Before posting and leading to confusion (since you mention something related to an AMD Radeon driver configuration) I would like to ask you if you could find it useful or if it's something unrelated so useless for this bug.

(In reply to Martin Stránský [:stransky] from comment #20)
> Leo, can you please run Firefox with dmabuf logging:
> 
> MOZ_LOG="Dmabuf:5" MOZ_X11_EGL=1 firefox
> 
> and attach the log here? I suspect Bug 1696869 may be related and we fall back to shm here.

Some about:config entries that were active while log was made. Do tell if i need to change any and paste another log. I also ran the webgl sample while logging was being done

media.ffmpeg.dmabuf-textures.disabled false
media.ffmpeg.dmabuf-textures.enabled true
widget.dmabuf-textures.enabled false
widget.dmabuf-webgl.enabled true

Flags: needinfo?(sk.griffinix)

From the log the dmabuf looks working correctly. Do you see any difference when you disable dmabuf framebuffer, i.e. set widget.dmabuf-webgl.enabled to false and restart browser?
Thanks.

Flags: needinfo?(sk.griffinix)

(In reply to Martin Stránský [:stransky] from comment #25)

From the log the dmabuf looks working correctly. Do you see any difference when you disable dmabuf framebuffer, i.e. set widget.dmabuf-webgl.enabled to false and restart browser?
Thanks.

Setting widget.dmabuf.webgl.enabled to false actually reduces frame rate by about 16-20%

Flags: needinfo?(sk.griffinix)

One thing I would like to point out particularly with regard to webgl aquarium benchmark is that frame rates drop suddenly in firefox with increasing number of fishes. It is something of the sort:

1000-60fps
5000-24fps
10000-13fps
15000-9fps
20000-7fps
25000-6fps
30000-5fps

With chrome on other hand, it is as follows:
1000-60fps
5000-60fps
10000-58fps
15000-45fps
20000-35fps
25000-29fps
30000-24fps

In chrome, decrease in fps is regular with increase in number of elements, while firefox for some reason takes a massive sudden dip. I am no where close to being trained in computers, but the only time I have seen such dips in performance was when enough memory was not available

(In reply to Leo_sk from comment #26)

Setting widget.dmabuf.webgl.enabled to false actually reduces frame rate by about 16-20%

In such case the dmabuf backend is working correctly and it must be something different.
How does the performance look like when you run Firefox without MOZ_X11_EGL=1 set, i.e. with GLX backend?

Flags: needinfo?(sk.griffinix)

Sorry for the delay. In the latest nightly (89.0a1 (2021-03-24) (64-bit), about:troubleshoot shows X11_EGL as 'blocklisted by env: Blocklisted by gfxInfo' . It is showing the same with MOZ_X11_EGL as 0 or 1. Does it mean it is GLX backend in both cases?

Flags: needinfo?(sk.griffinix)
See Also: → 1662811
Summary: webgl performance poor in firefox even with DMAbuf backend in EGL/x11 → webgl performance poor
Depends on: 1518857

I just hit an interesting blog post concerning the Zink driver, which is hitting issues with exactly this demo as well: https://www.supergoodcode.com/underwater/

Quote:

it’s one of the only test cases for GL_EXT_multisampled_render_to_texture I’m aware of, at least when running Chrome in EGL mode.

This might help explain why this demo is so notoriously slow on Firefox compared to Chrome.

See Also: → slow-linux-webgl
Summary: webgl performance poor → Poor performance in webgl aquarium demo (even with EGL/DmaBuf)
Severity: -- → S3

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: