WebGL performance is behind its competitors in Emscripten-based '10kCubes' benchmark.

RESOLVED DUPLICATE of bug 1133570

Status

()

Core
Canvas: WebGL
RESOLVED DUPLICATE of bug 1133570
4 years ago
11 months ago

People

(Reporter: Jukka Jylänki, Unassigned)

Tracking

Trunk
x86
Windows 7
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [games] webgl-perf)

(Reporter)

Description

4 years ago
1. Run https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_benchmark/10kCubes.html?/benchmark&/objects&1000
2. Wait for a few minutes for the benchmark to finish. Do not switch tabs or active windows or run other programs while the benchmark is running. When the benchmark finishes, the test score will be printed out to the text box on the page.
3. Repeat the process in competing browsers.

Note: You can change the workload by adjusting the number of objects in the URL. To run interactively (allows up/down keys to switch num of objects), leave out the GET parameters: https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_benchmark/10kCubes.html

Observed: Firefox falls behind in performance compared to Chrome. See
https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_results.png
https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_results.txt

The tests were run on a Macbook Air laptop with Intel HD 3000 GPU in bootcamp with Windows 7 OS.

Expected: Firefox should give better performance than its competition.
(Reporter)

Updated

4 years ago
Whiteboard: [games]
Whiteboard: [games] → [games] webgl-perf
(Reporter)

Comment 1

4 years ago
Rerunning the results today with these STR:
1. Go to https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_benchmark/10kCubes_results_20130312.html
2. Select "interactive version"
3. Don't change anything, but observe the FPS (draws 100 cubes)

Firefox 34 Nightly: 138fps
Chrome 36 Stable: 181fps
Chrome 38 Canary: 179fps

For Firefox, I changed prefs to disable vsync:
layers.offmainthreadcomposition.frame-rate;1000
layout.frame_rate;0

For Chrome, I started up with --disable-gpu-vsync parameter.
(In reply to Jukka Jylänki from comment #1)
> Rerunning the results today with these STR:
> 1. Go to
> https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_benchmark/
> 10kCubes_results_20130312.html
> 2. Select "interactive version"
> 3. Don't change anything, but observe the FPS (draws 100 cubes)
> 
> Firefox 34 Nightly: 138fps
> Chrome 36 Stable: 181fps
> Chrome 38 Canary: 179fps
> 
> For Firefox, I changed prefs to disable vsync:
> layers.offmainthreadcomposition.frame-rate;1000
> layout.frame_rate;0
> 
> For Chrome, I started up with --disable-gpu-vsync parameter.

Can you get a profile of Firefox?
(Reporter)

Comment 3

4 years ago
A profile is available for download here: https://dl.dropboxusercontent.com/u/40949268/emcc/bugs/10kCubes_profile The 'Share' button did not work, but it gave an 'Error 0'.

In particular, about 12% of the total time is spent in asm.js execution. The biggest spender is the glFinish call, which was also discussed in bug #1008571.

Comment 4

4 years ago
Is the source for that benchmark up somewhere? I'd like to do some tests on the emscripten side with it (gl proxying in particular).
(In reply to Jukka Jylänki from comment #3)
> A profile is available for download here:
> https://dl.dropboxusercontent.com/u/40949268/emcc/bugs/10kCubes_profile The
> 'Share' button did not work, but it gave an 'Error 0'.
> 
> In particular, about 12% of the total time is spent in asm.js execution. The
> biggest spender is the glFinish call, which was also discussed in bug
> #1008571.

So this looks like a similar performance problem to what we had on B2G last time. Basically we end up waiting in glFinish for the frame to end and then we wait for vsyncs. This means we end up doing lots of waiting. I expect BenWa's patch of delay that we had on B2G will help here, but really we just need the compositor to be doing the waiting instead of the main thread.
(Reporter)

Comment 6

4 years ago
Oh sorry Alon, there is no online repository structure for that (the build system is quite a mess). I made a build with -O3 -DNDEBUG --emrun -s PRECISE_F32=2 -s AGGRESSIVE_VARIABLE_ELIMINATION=1 -profiling --proxy-to-worker -s TOTAL_MEMORY=67108861 to 

https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_proxy_to_worker/10kCubes.html

An offline-downloadable copy:

https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_proxy_to_worker/10kCubes.zip

Intermediate files from during the build:

https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_proxy_to_worker/10kCubes_proxy_to_worker_emscripten_temp.zip

Comment 7

4 years ago
Something seems wrong here: the rendering is not identical on chrome and on firefox. They both report the same frame rate, however the animation rate is faster in firefox. Am I the only one seeing a visual difference there? It's not huge, but it is noticeable if you look, on my machine.

Anyhow, I got the GL proxying code to work here. Results are odd. On firefox, I get 166fps on the main thread, 142 with proxying, and *exactly* the same numbers in both cases on chrome.

Perhaps there is just something wrong with my machine. You can test the proxying stuff by building with incoming, where I fixed some stuff. Then on normal loads it proxies, and if you add   ?noProxy   to the URL, it will not proxy.
Here are my results with the STR from comment 1, but with 10k cubes:
Chrome Canary 38: 69.00s, 67.22s, 66.67s.
Local trunk FF build from today: 67.85s, 64.40s, 65.09s.
Local build with patches from bug 1054808: 66.65s, 65.90s, 64.34s, 63.72s.

This is a Haswell MBP running Win8.1.

I guess I need an MB Air to test this, as I am seeing nearly strictly better performance in Firefox.
I'll check the new STR though.
For the new STR on my machine, I get about 425-445fps on Chrome, and about 440-450 on Firefox with the patches from bug 1054808. It's about 425-440 on Firefox trunk without the patches.

Really, I need to increase the workload, but the pageup/pagedown fn+up/fn+down key combo isn't actuating the workload for me, apparently.
Actually, just up/down works. I'll get new data.
3000 cubes:
Trunk: 87-89fps
With patches: 90-92fps
Chrome: 82-83fps (75fps immediately run after 'with patches'?!)

I suspect this laptop is a bad consistent-benchmarking machine, possibly given modern techniques to modulate clock speeds to remain inside the hardware's TDP.

At the same time, it's a fine real-world usecase, but would require me to leave the benchmark running for a couple minutes to approach equilibrium. (Also, I cannot change its environment/move the machine while it's running, and cannot reproduce results if the environment/location changed since a previous measurement)
(Reporter)

Comment 13

4 years ago
Good point about the frequency changes. I downloaded https://software.intel.com/en-us/articles/intel-power-gadget-20 and can observe that when I start up the page when the CPU is idle (temp 54c), the CPU frequency jumps up from 0.8GHz to 2.40GHz and the GPU frequency jumps up from 0.35GHz to 1.2GHz on my Macbook Air+Windows. After that it takes only a few seconds for the CPU temperature to reach its limit 100c, after which the GPU temperature jumps back down to its idle 0.35GHz and stays there. The CPU temperature does not throttle down, but stays stable at 2.40GHz at all times.
(Reporter)

Comment 14

4 years ago
Alon, I rebuilt with your fixes. Testing on Macbook Air+Windows:

https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_proxy_to_worker_running/10kCubes.html 60fps

https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_proxy_to_worker_running/10kCubes.html?noProxy 120fps

Getting 120fps with noProxy is very odd. I didn't yet look into it in detail, though I wonder if it could be that the noProxy path causes requestAnimationFrame to be registered twice?

Visually they look identical, also the animation speed is identical. Alon, note that the visual animation in proxy_to_worker builds in this comment and in comment 6 are different from the original builds in comments 0 and 3. The original builds were somewhat heavy on sin&cos per object, so I wanted to lighten the computations a bit to make it better stress glDrawArrays() calls.
Hmm, a proxy build run with noProxy should be identical to a non-proxy build in all ways. When you build without proxying, do you see 120fps or 60fps? And, how is your application measuring fps?

Also, since it caps at 60, I'm not sure this is stressing much. It says arrows can adjust the workload, but they don't seem to do anything, it always stays at the default 100. Is there another key I should press?
(Reporter)

Comment 16

11 months ago
The same test case is registered twice on bugzilla it seems, I'll mark this one as a duplicate.
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1133570
You need to log in before you can comment on or make changes to this bug.