Closed
Bug 522193
Opened 15 years ago
Closed 13 years ago
WebGL calls are slow because of memory allocation churning
Categories
(Core :: Graphics: CanvasWebGL, defect)
Core
Graphics: CanvasWebGL
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: 00003b, Unassigned)
References
()
Details
Attachments
(1 file, 2 obsolete files)
8.54 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.3a1pre) Gecko/20091013 Minefield/3.7a1pre In test app at http://3bb.cc/projects/webgl/test1/ ~60% of cpu time is spent making GL calls according to firebug profiler at higher model counts, and framerate is lower than expected. Reproducible: Always Steps to Reproduce: 1.view specified page in browser with WebGL enabled 2.increase number of instances by picking a number from the select box Actual Results: On a core i7 920, nvidia gtx285, ubuntu 9.04 x86-64, nvidia drivers 190.18.04 I get about 10-13 fps with 100 models (10 on select box) According to firebug, ~45% of time is spent in the closure that binds texture and buffer objects, and 20% in the closure that updates the bone uniforms (that also does a bunch of matrix multiplies, but splitting it still shows the single call to uniformMatrix4fv as taking more time than the matrix multiply loop) Expected Results: Hard to say exactly how fast it should be running, but rough tests outside the browser and reports from webkit users suggests there is room for at least 2x improvement. This particular test case could avoid most of those GL calls, since it actually only has 1 model, but that optimization was skipped to try to test the worst case performance where there were a number of different models.
Component: Canvas: 2D → Canvas: WebGL
Comment 1•15 years ago
|
||
Interesting... on N900 it works much faster than on T61p with Nvidia card...
None of WebGL is currently quickstubbed, so it's taking the slow boat between JS and native code. That'll be fixed soon.
Gonna call this WORKSFORME -- currently I get about 45fps with 100 models, though the models themselves are drawing weird/brokenly (not sure if there's something that needs to get updated here). Some more profiling would be interesting, though.
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Comment 4•13 years ago
|
||
With 100 models, I get 40 FPS in Firefox Nightly (10.0a1) and 60 FPS in Chrome 14.
Comment 5•13 years ago
|
||
A profile showed, as usual, a bit of time spent in XPC overhead for WebGL calls; with the patches on bug 638328 applied, I get perhaps a tiny performance improvement (41 FPS instead of 40) and the profile is now as follows: * 22% in methodjit-generated code * 12% in libnvidia-glcore.so * 10% in libGL.so * 1.6% in js_CloneFunctionObject, called by js::mjit::stubs::Lambda, called from jitted code. Should I file a JS engine bug? CC'ing Andreas. * lots of time in realloc and malloc, some of which with stacks like - realloc - 79.11% moz_xrealloc - 99.53% nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity - 38.10% WebGLZeroingObject::AddRefOwner(WebGLObjectBaseRefPtr*) - 92.16% WebGLObjectRefPtr<WebGLBuffer>::assign_with_AddRef - WebGLContext::VertexAttribPointer and similar stacks pointing to BindBuffer and BindTexture. I need to investigate this. * also a lot of time in mutex locking, some of which with stacks like - pthread_mutex_lock - 8.74% nsTArray_base<nsTArrayDefaultAllocator>::ShrinkCapacity - 85.70% WebGLObjectRefPtr<WebGLBuffer>::assign_with_AddRef - 90.11% WebGLContext::VertexAttribPointer so we really need to look into these WebGL functions. * 1.1% in castNativeFromWrapper and 0.6% in nsID::Equals. This is much less XPC overhead than without the patches in bug 638328.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: WORKSFORME → ---
Comment 6•13 years ago
|
||
So, the above-mentioned overhead goes away with this change replacing refptrs by plain pointers. This brings us to 50 FPS, up from 40 FPS, i.e. covers half the distance toward chrome's speed. The question is how to turn this finding into a patch that we could actually land?
It looks like the work is all in manipulating the list of refs in WebGLZeroingObject -- try stubbing just those functions and keeping everything else the same, and see how it behaves?
Comment 8•13 years ago
|
||
Indeed, this patch instead of the previous one gives the same performance increase.
Comment 9•13 years ago
|
||
The benchmark has WebGLBuffers oscillating between 0, 1, and 2 owners. The ShrinkCapacity calls from above profile come from call stacks like below. When the number of owners goes down to 0, ShrinkCapacity throws away the whole mHdr so the next time it grows, it has to reallocate stuff from scratch. #0 nsTArray_base<nsTArrayDefaultAllocator>::ShrinkCapacity (this=0x7fffd68f58d8, elemSize=8) at ../../dist/include/nsTArray-inl.h:143 #1 0x00007ffff583571e in RemoveElementsAt (count=1, start=<optimized out>, this=0x7fffd68f58d8) at ../../../dist/include/nsTArray.h:895 #2 RemoveElementAt (index=<optimized out>, this=0x7fffd68f58d8) at ../../../dist/include/nsTArray.h:900 #3 RemoveElement<mozilla::WebGLObjectBaseRefPtr*, nsDefaultComparator<mozilla::WebGLObjectBaseRefPtr*, mozilla::WebGLObjectBaseRefPtr*> > (item=<synthetic pointer>, this=0x7fffd68f58d8, comp=<optimized out>) at ../../../dist/include/nsTArray.h:919 #4 RemoveElement<mozilla::WebGLObjectBaseRefPtr*> (item=<synthetic pointer>, this=0x7fffd68f58d8) at ../../../dist/include/nsTArray.h:927 #5 mozilla::WebGLZeroingObject::RemoveRefOwner (this=0x7fffd68f58d8, owner=<optimized out>) at /home/bjacob/mozilla-central/content/canvas/src/WebGLContext.h:710 #6 0x00007ffff58493c5 in assign_assuming_AddRef (newPtr=0x7fffd75d1970, this=0x7fffd68c4cc0) at /home/bjacob/mozilla-central/content/canvas/src/WebGLContext.h:237 #7 mozilla::WebGLObjectRefPtr<mozilla::WebGLBuffer>::assign_with_AddRef (this=0x7fffd68c4cc0, rawPtr=0x7fffd75d1970) at /home/bjacob/mozilla-central/content/canvas/src/WebGLContext.h:230 #8 0x00007ffff5849630 in operator= (rhs=<optimized out>, this=<optimized out>) at /home/bjacob/mozilla-central/content/canvas/src/WebGLContext.h:192 #9 mozilla::WebGLContext::BindBuffer (this=0x7fffd68c4c00, target=34963, bobj=<optimized out>) at /home/bjacob/mozilla-central/content/canvas/src/WebGLContextGL.cpp:231
Comment 10•13 years ago
|
||
Yup, this gives me as high performance as I've seen with any of the other patches, so let's go for this.
Attachment #564071 -
Attachment is obsolete: true
Attachment #564083 -
Attachment is obsolete: true
Attachment #564098 -
Flags: review?
Comment 11•13 years ago
|
||
Wouldn't it be a better fix to just grow and not shrink the array so aggressively?
Comment 12•13 years ago
|
||
(In reply to Andreas Gal :gal from comment #11) > Wouldn't it be a better fix to just grow and not shrink the array so > aggressively? Looking at my logs, I'd say yes, but do we have any Array class readily able to do that? While that behavior would be nice to have, it's not worth making a new array class just to have that. The amount of memory that I preallocate here is not big compared to what these objects use anyway (that was the point of having this template parameter so we could adjust the preallocated size for each type of webgl object).
We could quite easily make nsTArray parameterizable with options to control the shrinking policy. But we probably don't need to here.
Comment 14•13 years ago
|
||
@ Andreas, while you're here, please have a look at the js_CloneFunctionObject showing up in the profile in comment 5.
Comment on attachment 564098 [details] [diff] [review] use a nsAutoTArray with some preallocated capacity Review of attachment 564098 [details] [diff] [review]: ----------------------------------------------------------------- Who were you looking for review from? I'm not really a peer, but the patch is trivial so who cares ...
Attachment #564098 -
Flags: review? → review+
Comment 16•13 years ago
|
||
Note that between this patch and the patches from bug 638328, we improved from 50 FPS to 60 FPS on WebGL Aquarium with 1000 fish!
Comment 17•13 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #15) > Who were you looking for review from? Oops, I thought I had entered :vlad, but apparently I failed. Thanks.
Comment 18•13 years ago
|
||
Sure, happy to look. What does the JS code look like around the call site? Can you paste here.
Comment 19•13 years ago
|
||
(In reply to Andreas Gal :gal from comment #18) > Sure, happy to look. What does the JS code look like around the call site? > Can you paste here. I broke in js_CloneFunctionObject and called DumpJSStack, and got alternatively 0 anonymous(cache = undefined, stat = [function], time = 0, animNumber = 0) ["http://3bb.cc/projects/webgl/test1/model.js":120] getKeyR = [function] getKeyP = [function] parent3568 = 5 r = [object Float32Array] p = [object Float32Array] anim = [object Object] i = 5 akl = 46 a = [object Object] this = [object Object] 1 draw(0) ["http://3bb.cc/projects/webgl/test1/foo.js":147] _js_idx3650 = undefined _js_arrvar3651 = undefined m = undefined animNodes = null xx = [object Float32Array] pos3649 = 0,-60,0 instance = [object Object] stat = [function] x = 0 count = 1 drawCount = 1 r = [object Float32Array] d2 = [object Float32Array] d = [object Float32Array] p2 = [object Float32Array] la = [object Float32Array] time = 1317604694.539 this = [object Window] *** and *** 0 anonymous(cache = undefined, stat = [function], time = 0, animNumber = 0) ["http://3bb.cc/projects/webgl/test1/model.js":107] getKeyR = [function] getKeyP = [function] parent3568 = 2 r = [object Float32Array] p = [object Float32Array] anim = [object Object] i = 6 akl = 46 a = [object Object] this = [object Object] 1 draw(0) ["http://3bb.cc/projects/webgl/test1/foo.js":147] _js_idx3650 = undefined _js_arrvar3651 = undefined m = undefined animNodes = null xx = [object Float32Array] pos3649 = 0,-60,0 instance = [object Object] stat = [function] x = 0 count = 1 drawCount = 1 r = [object Float32Array] d2 = [object Float32Array] d = [object Float32Array] p2 = [object Float32Array] la = [object Float32Array] time = 1317604694.539 this = [object Window] The code in http://3bb.cc/projects/webgl/test1/model.js, around lines 107 and 120, is (line markers added, search for ### ): var I = _mat4i(); var Temp = _mat4i(); function makeOldAnimator(obj, nodes, anims, lastAnim) { var pI = new Float32Array([0.0, 0.0, 0.0,0.0]); var rI = new Float32Array([0.0, 1.0, 0.0, 0.0, 0.0]); var pt = new Float32Array([0.0, 0.0, 0.0, 0.0]); var rt = new Float32Array([0.0, 1.0, 0.0, 0.0, 0.0]); var mx = Temp;//_mat4i;//new Float32Array(16); return function (animNumber, time, stat, cache) { var a = anims[animNumber]; if (a.length) time = time % a.length; else time = 0; var akl = a.keys.length; for (var i = 0; i < akl; i += 1) { var anim = (a.keys)[i]; var p = anim.position; var r = anim.rotation; var parent3568 = (obj.nodes)[anim.index].parent; var getKeyP = function (keys, time,dd) { // ### line 107 var k = keys[0]; for (var ii = 0; ii < keys.length; ii += 1) { var ki = keys[ii]; if (ki[0] < time) { k = ki; } else { var dt = (time - k[0]) / (ki[0] - k[0]); return lerppInto(k, ki, dt,dd); }; }; return k; }; var getKeyR = function (keys, time,d) { // ### line 120 var k = keys[0]; for (var ii = 0; ii < keys.length; ii += 1) { var ki = keys[ii]; if (ki[0] < time) { k = ki; } else { var dt = (time - k[0]) / (ki[0] - k[0]); return lerprInto(k, ki, dt,d); }; }; return k; }; animsT++; if (p) { if (p.length == 1) { animsP1++; p = p[0]; } else { p = getKeyP(p, time,pt); }; } else { animsPi++; p = pI; }; if (r) { if (r.length == 1) { animsR1++; r = r[0]; } else { r = getKeyR(r, time,rt); }; _translationxquatInto(p[1], p[2], p[3], r[1], r[2], r[3], r[4], mx); } else { animsRi++; //r = rI; mx[ 0] = 1.0; mx[ 1] = 0.0; mx[ 2] = 0.0; //mx[ 3] = 0.0; mx[ 4] = 0.0; mx[ 5] = 1.0; mx[ 6] = 0.0; //mx[ 7] = 0.0; mx[ 8] = 0.0; mx[ 9] = 0.0; mx[10] = 1.0; //mx[11] = 0.0; mx[12] = p[1]; mx[13] = p[2]; mx[14] = p[3]; //mx[15] = 1.0; }; if (parent3568 == 0) animsXi++; //var pm = parent3568 > 0 ? (obj.animNodes)[parent3568] : I; //_matx3(pm, mx, (obj.animNodes)[anim.index]); if (parent3568 > 0) _matx3((obj.animNodes)[parent3568], mx, (obj.animNodes)[anim.index]); else _mcopy(mx, (obj.animNodes)[anim.index]); }; }; };
Comment 20•13 years ago
|
||
On tryserver: https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=2790be438dc3
Comment 21•13 years ago
|
||
Yeah, the closure in the function escapes and it closes over different objects every time. Not much we can do there. How much time do we spend in the JS engine after your optimization?
Comment 22•13 years ago
|
||
The profile now looks like: 25.64% firefox perf-1813.map [.] 0x7f24ad30b585 14.72% firefox libnvidia-glcore.so.275.28 [.] 0x12e8bc8 11.17% firefox libGL.so.275.28 [.] 0x9bd77 1.61% firefox libxul.so [.] js_CloneFunctionObject 1.25% firefox libxul.so [.] castNativeFromWrapper 0.99% firefox libpthread-2.13.so [.] pthread_self 0.80% firefox libxul.so [.] nsID::Equals(nsID const&) const 0.79% firefox [kernel.kallsyms] [k] system_call_after_swapgs 0.68% firefox libxul.so [.] js::gc::FinalizeArenas 0.67% firefox [usb_storage] [.] 0x5e5a2108 0.65% firefox libpthread-2.13.so [.] pthread_mutex_lock 0.65% firefox libc-2.13.so [.] __memcpy_ssse3_back 0.56% firefox firefox [.] arena_malloc 0.51% firefox libxul.so [.] js_GCThingIsMarked 0.49% firefox libxul.so [.] js_GetClassPrototype 0.47% firefox libxul.so [.] JSObject::isWrapper 0.47% firefox libxul.so [.] XPCWrappedNative::GetWrappedNativeOfJSObject 0.47% firefox libxul.so [.] castNative 0.46% firefox libxul.so [.] mozilla::WebGLContext::ValidateBuffers The top one, perf-1813.map, is methodjit code (many of the call stacks have EnterMethodJIT) The two next ones are the OpenGL libraries.
Updated•13 years ago
|
Summary: WebGL: GL calls are slow → WebGL calls are slow because of memory allocation churning
Comment 24•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/704f37801611
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
You need to log in
before you can comment on or make changes to this bug.
Description
•