Optimization in DrawElements: cache the max-in-sub-array computation

RESOLVED FIXED

Status

()

Core
Canvas: WebGL
--
enhancement
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: bjacob, Assigned: bjacob)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(blocking2.0 -, status2.0 wanted)

Details

Attachments

(2 attachments, 1 obsolete attachment)

As discussed in bug 567565, we should try optimizing this computation.
Comments from the other bug, from Benoit; carrying over here for discussion.

> Now discussing caching. I see what you mean but there are a couple of issues to
> solve. One is, what to do in the case where the user uses a lot of
> (count,offset) pairs? One simple solution is that when the cache has more than
> N entries, we clear it.
>
> A second question is whether using a hash table is really the right data
> structure? In a lot of cases the hash table will have only 1 entry, and in that
> case, a hash table would be less efficient than e.g. a plain array of key-value
> pairs.
>
> Finally, maybe it's just more efficient to compute, at the time of the
> BufferData call, an auxiliary helper array that allows to later compute very
> quickly the max index for a given (count,offset) pair? For example, using
> binary partitioning we can make the max computation in a sub-array have only
> log complexity (or something like that... perhaps log^2).

I like the partitioning idea, and it would be good to experiment with these data structures.  One additional idea.. we should compute the minimum valid index of all currently active vertex attrib arrays and pass that in.  That way we can often avoid computing the full result -- for example, if we know that over the entire range the maximum is 200, and we have 300 elements available in the vertex arrays, then any subrange of that will always be valid.  So there's lots of room to be smart here, I think.
Here's a profile showing how bad this can get using our ff4 final audio demo (not online yet):

http://scotland.proximity.on.ca/dxr/tmp/draw-elements.mshark

We are seeing from 2-10% always in DrawElements.
Blocks: 625871
(Assignee)

Comment 3

7 years ago
Created attachment 504006 [details] [diff] [review]
optimize drawElements  (WIP)

Draft patch, untested, not yet asking for review.
Assignee: nobody → bjacob
Status: NEW → ASSIGNED
(Assignee)

Comment 4

7 years ago
Created attachment 504065 [details] [diff] [review]
optimize drawElements

Patch ready for review. It consists in first computing the max index for the whole index array, and caching the result; only if that value is too high for the current VBO state is the actual sub-index-array maximum computed.

Incidentally, WebKit seems to be doing the same optimization.

We go a bit farther though: the ValidateBuffers function is modified to return the max allowed count of vertices that may be read from the VBOs, rather than doing the check itself. So we only do the loop over VBOs once per drawElements call, even if we have to do the max-index-in-sub-array computation.
Attachment #504065 - Flags: review?(vladimir)
(Assignee)

Updated

7 years ago
Attachment #504006 - Attachment is obsolete: true
I built this patch and ran our demo.  I am going to test a few more sections, but one of the ones I know to be particularly hard hit by DrawElements now doesn't even show it in profiles.  Great stuff!
Yes, I've confirmed that with this patch applied, one of our heaviest scenes for geometry drops 16% off its shark profile (all in WebGLContext::DrawElements) with this patch applied.  r+!
Any thoughts about getting this in for 4?  It helps so, so much.
Wouldn't block, would take, need tests, send help.
blocking2.0: --- → -
status2.0: --- → wanted
(Assignee)

Comment 9

7 years ago
Mike: it's true that the WebGL conformance test suite in its current state could easily fail to find a bug in here, since it's about caching results of former computations. However, David has been happily using it on very complex test cases.

Vladimir is probably busy this week with the Khronos F2F in San Jose.
Yes, I have no doubt that it works on David's stuff. :-)

Do we have tests that ensure we don't use the cache when we shouldn't?  The WebGL test suite shouldn't be the only tests we have, probably.
(In reply to comment #10)
> Yes, I have no doubt that it works on David's stuff. :-)
> 
> Do we have tests that ensure we don't use the cache when we shouldn't?  The
> WebGL test suite shouldn't be the only tests we have, probably.

Well, I'd argue that ideally, the WebGL test suite should be the only test we have, but we should expand it as needed. There's a time constraint here, with both Vlad and me spread over many things to take care of.

To be clear, I have a good level of trust in the code I wrote, I just don't want to mathematically rule out the possibility of a bug, but given that it works on complex demos (I tested other demos too) and on the WebGL test suite, I wouldn't block on writing a new test.
the judges would also accept "follow-up with detailed description of what the test would cover, marked .x"
Comment on attachment 504065 [details] [diff] [review]
optimize drawElements

Looks fine, just some style corrections...

(Tests for this are already part of the webgl test suite)

>+    void InvalidateCachedMaxElements() {
>+      mHasCachedMaxUbyteElement = PR_FALSE;
>+      mHasCachedMaxUshortElement = PR_FALSE;
>+    }
>+
>+    PRInt32 FindMaxUbyteElement() {
>+      if (mHasCachedMaxUbyteElement) {
>+        return mCachedMaxUbyteElement;
>+      } else {

No |else| clause after return -- just:

      if (mHasCachedMaxUbyteElement) {
        return mCachedMaxUbyteElement;
      }

      mHasCachedMaxUbyteElement = PR_TRUE;
      mCachedMaxUbyteElement = FindMaxElementInSubArray<GLubyte>(mByteLength, 0);
      return mCachedMaxUbyteElement;
   
>+        mHasCachedMaxUbyteElement = PR_TRUE;
>+        mCachedMaxUbyteElement = FindMaxElementInSubArray<GLubyte>(mByteLength, 0);
>+        return mCachedMaxUbyteElement;
>+      }
>+    }
>+
>+    PRInt32 FindMaxUshortElement() {
>+      if (mHasCachedMaxUshortElement) {
>+        return mCachedMaxUshortElement;
>+      } else {

Same thing

>+        mHasCachedMaxUshortElement = PR_TRUE;
>+        mCachedMaxUshortElement = FindMaxElementInSubArray<GLshort>(mByteLength>>1, 0);
>+        return mCachedMaxUshortElement;
>+      }
>+    }
>+
Attachment #504065 - Flags: review?(vladimir) → review+
(In reply to comment #13)
> Comment on attachment 504065 [details] [diff] [review]
> optimize drawElements
> 
> Looks fine, just some style corrections...
> 
> (Tests for this are already part of the webgl test suite)

Well, if you're talking about draw-elements-out-of-bounds.html, I don't think it would catch caching bugs here... but I shouldn't mention that !

Applied your comments, ready to land.
Fixed a little bug caught by the tests: was using LogMessage instead of ErrorInvalidOperation to generate warnings.
Attachment #504065 - Flags: feedback+
Attachment #504065 - Flags: feedback+ → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/d0a4c07f4a40

let's mark as fixed, as the originally envisioned optimization is probably not that useful anymore now that we have this.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Created attachment 507245 [details] [diff] [review]
fix warning in ValidateBuffers

This fixes a warning I introduced, 

/home/bjacob/mozilla-central/content/canvas/src/WebGLContextValidate.cpp: In member function ‘PRBool mozilla::WebGLContext::ValidateBuffers(PRInt32*, const char*)’:
/home/bjacob/mozilla-central/content/canvas/src/WebGLContextValidate.cpp:148: warning: comparison between signed and unsigned integer expressions
Attachment #507245 - Flags: review?(vladimir)
Attachment #507245 - Flags: review?(vladimir) → review+
warning fix:
http://hg.mozilla.org/mozilla-central/rev/0ab93abd23cb
Filed bug 732660 about the binary partitioning idea.
You need to log in before you can comment on or make changes to this bug.