Open Bug 1133570 Opened 10 years ago Updated 2 years ago

WebGL rendering on 10kCubes sample is 20x slower compared to native.

Tracking

()

Status:

NEW

People

(Reporter: jujjyl, Unassigned)

References

Details

(Whiteboard: gfx-noted)

Attachments

(3 files)

10kCubes_native_gl3_profile.png 10 years ago Jukka Jylänki 68.79 KB, image/png		Details
10kCubes_Firefox_profile.png 10 years ago Jukka Jylänki 101.51 KB, image/png		Details
vtune_profile_firefox.png 10 years ago Jukka Jylänki 142.38 KB, image/png		Details

Jukka Jylänki

Reporter

Description

•

10 years ago

Attached image 10kCubes_native_gl3_profile.png — Details

I just got a new 8-core Intel Core i7 5960X + nVidia GeForce 980 GTX setup, and gave it a go with my "10kCubes" large batch count rendering stress test page. Here are some observations that can hopefully lead to optimizations for WebGL: Native Windows GL3 executable: Running with 100000 cubes, I'm getting 42fps, with ~24msecs/frame. nVidia NSight shows that the execution is fully CPU bound, the GPU being idle about 90% of the time. A native profile shows that about 70% of execution is inside the nVidia GL driver. The remaining 30% is in animating and submitting draws. 10kCubes in current Firefox Nightly, with e10s disabled and try-d3d11 enabled: https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_vsync/10kCubes.html (tap B once to make the rendering back to rAF instead of setTimeout): Running with 100000 cubes, I'm getting 2fps, with ~510msecs/frame. nVidia NSight shows that the execution is also fully CPU bound, the GPU being idle ~100% of the time. A native profile with AMD CodeXL gives very interesting info about the bottlenecks. I've attached a screenshot that shows the top samples. These give interesting numbers: - Only about 20% of samples are inside the nVidia D3D driver (compared to 70% in the native execution). - libGLESv2.dll dominates by taking up 45.62% of total samples. - In the samples inside libGLESv2.dll, the entry point (deep samples) gl::Context::drawArrays() dominates with 10.39% of total time being spent by that tree. - of which 29.51% is taken up by the function gl::Context::applyTextures(), however the WebGL application does not use textures at all. - and 12.86% is taken up by gl::context::applyShaders(), but the WebGL code never changes shaders during the hot rendering loop. (it activates the shader in the beginning of the frame once, and never changes it) - gl::ProgramBinary::sortAttributesByLayout(), std::string creation and destruction and framebuffer completeness checking also come up high, which looks odd. How do these call sites look to you guys? Anything that's not expected showing up there?

Jukka Jylänki

Reporter

Comment 1

•

10 years ago

Flipping the pref webgl.angle.try-d3d11;false improves performance from about 510msecs/frame to 450msecs/frame, so it is slightly better, but looking at the profile, the biggest call sites showing up samples seem to be the same.

Jukka Jylänki

Reporter

Comment 2

•

10 years ago

Attached image 10kCubes_Firefox_profile.png — Details

Ops, attachment was wrong and referred to the native version. Here is the Firefox version.

u480271

Comment 3

•

10 years ago

Is there a magic short cut key to get to 100K cubes?

u480271

Comment 4

•

10 years ago

Also is the C++ source code to 10k cubes available?

u480271

Comment 5

•

10 years ago

After pressing 'up' a lot I reached 100K cubes and I see 120ms/f in an optimised build of demo-hacks branch. This is using DX-OGL interop on nVidia GTX660.

Flags: needinfo?(jujjyl)

Jukka Jylänki

Reporter

Comment 6

•

10 years ago

Sorry for the delay. The source code for the demo is not unfortunately available, but native builds can be found here, if they are of any use: - Windows: https://dl.dropboxusercontent.com/u/40949268/code/10kCubes/10kCubes_2015_02_23_Win.zip - OSX: https://dl.dropboxusercontent.com/u/40949268/code/10kCubes/10kCubes_2015_02_23_OSX.tar.gz For native runs, passing command line parameters "/objects 100000" gives a startup object count of 100000 objects. For the html build, the command line parameters can be passed in the URL as GET objects, i.e. "?/objects&100000" will give 100k objects, like this: https://dl.dropboxusercontent.com/u/40949268/emcc/10kCubes_vsync/10kCubes.html?/objects&100000 I did

Flags: needinfo?(jujjyl)

Jukka Jylänki

Reporter

Comment 7

•

10 years ago

I did a profile with VTune, which shows similar data to AMD CodeAnalyst, see the attachment below.

Jukka Jylänki

Reporter

Comment 8

•

10 years ago

Attached image vtune_profile_firefox.png — Details

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 9

•

10 years ago

This continues to make me want to figure out if we can make the DX/GL interop rock solid for at least some recent set of users and use native GL where possible... we should be able to optimize ANGLE as well though. Is your source in github somewhere? I'd be curious to see the layout of the vertex arrays going in to the draw call. D3D can only efficiently do interleaved vertex arrays (one struct per vertex basically), and I think that's what that prepareVertexData stuff is doing (interleaving).

u480271

Comment 10

•

10 years ago

Testing on my machine, I see: 10kCubes_Win: frametime: ~122-124ms rendertime: ~41-42ms swapbuffers: ~79-81ms 10kCubes_html: frametime: ~126-127ms rendertime: ~87-90ms swapbuffers: 0ms This is with the webgl 2 demo-hacks build using DX/GL interop.

Sotaro Ikeda [:sotaro]

Updated

•

10 years ago

Whiteboard: gfx-noted

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Priority: -- → P3

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

WebGL rendering on 10kCubes sample is 20x slower compared to native.

Categories

(Core :: Graphics: CanvasWebGL, defect, P3)

Tracking

()

People

(Reporter: jujjyl, Unassigned)

References

Details

(Whiteboard: gfx-noted)

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type