Open Bug 980364 Opened 6 years ago Updated 5 years ago

Add GLDrawRectHelper and use it from GLBlitTextureImageHelper

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set

Tracking

()

REOPENED
mozilla31

People

(Reporter: gal, Assigned: gal)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file, 3 obsolete files)

I have a patch queue to use this everywhere where we draw rects (6 more patches), but lets start with this.

Once every place we draw rectangles goes through this path, we can optimize GL state switching since we will also feed the same attribute setup to every shader in the system.
Attached patch patch (obsolete) — Splinter Review
Assignee: nobody → gal
Attachment #8386843 - Flags: review?(bjacob)
Blocks: 978515
Comment on attachment 8386843 [details] [diff] [review]
patch

Review of attachment 8386843 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing bad here, but sufficiently many nits and suggestions for improvement that would be worth having a look at, especially the caching of the variable-size vbo (creating and destroying buffer objects everytime might be slow on some drivers).

I could be happy with just answers to my questions/comments :)

::: gfx/gl/GLBlitTextureImageHelper.cpp
@@ -146,5 @@
>  
>              ScopedBindTextureUnit autoTexUnit(mGL, LOCAL_GL_TEXTURE0);
>              ScopedBindTexture autoTex(mGL, aSrc->GetTextureID());
>  
> -            mGL->fBindBuffer(LOCAL_GL_ARRAY_BUFFER, 0);

Oh my, this was using a client-side array.

::: gfx/gl/GLDrawRectHelper.cpp
@@ +15,5 @@
> +    /* Then quad texcoords */
> +    0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f,
> +};
> +
> +#define BUFFER_OFFSET(i) ((char *)nullptr + (i))

I believe that this would be more explicit as:

  reinterpret_cast<void*>(i)

which would be simple enough, I believe, to not need to be wrapped in a macro (and if wrapping is needed, it could be in an inline function, according to the principle that macros are a last resort).

@@ +48,5 @@
> +    if (aTexCoordAttribIndex != GLuint(-1)) {
> +        mGL->fVertexAttribPointer(aTexCoordAttribIndex,
> +                                  2, LOCAL_GL_FLOAT,
> +                                  LOCAL_GL_FALSE,
> +                                  0, BUFFER_OFFSET(sizeof(quad)/2));

sizeof(quad)/2 is prone to breaking silently if we ever decide to throw a third vertex array into this buffer.

How about, instead, interleaving the two vertex attrib arrays here in this buffer, so that the offset is just 2*sizeof(float) and doesn't depend on any array length?

@@ +69,5 @@
> +                            RectTriangles& aRects)
> +{
> +    GLsizei bytes = aRects.elements() * 2 * sizeof(GLfloat);
> +    GLsizei total = bytes;
> +    if (aTexCoordAttribIndex != GLuint(-1)) {

Do we have a definite use case for drawing without a texcoords attrib?

@@ +81,5 @@
> +                     total,
> +                     nullptr,
> +                     LOCAL_GL_STREAM_DRAW);
> +
> +    mGL->fBufferSubData(LOCAL_GL_ARRAY_BUFFER,

Calling BufferData and BufferSubData here means that this function is not just drawing stuff, it's uploading data to the GL, which can be expensive. I understand that that's necessary given what this function is doing, but:
 1) Is this the right API?  (Or can we instead keep uploading separate from drawing, if that would allow us to upload less frequently than we draw?)
 2) If this is the right API, then at least the function name should make it clear that this is uploading data. How about UploadAndDrawRects?

Note that I do understand that the existing code in GLBlitTextureImageHelper.cpp, that you're changing, was using client-side arrays, which is the worst and means uploading anyway, so there's no question that this patch is going in the right direction.

Another idea. Now that this is in a helper object that stays alive as long as the GLContext is alive, how about keeping the buffer object alive and reusing it, instead of creating, BufferData'ing, and destroying it everytime?
Attachment #8386843 - Flags: review?(bjacob) → review-
> Nothing bad here, but sufficiently many nits and suggestions for improvement
> that would be worth having a look at, especially the caching of the
> variable-size vbo (creating and destroying buffer objects everytime might be
> slow on some drivers).

I wrote VBOArena for that purpose. I still haven't seen a driver where that matters. I have a patch queue of 12 patches to use DrawRects everywhere. Happy to add VBOArena back on top of that once we see any case where it matters.

> > +#define BUFFER_OFFSET(i) ((char *)nullptr + (i))

This is a well known GL define that is missing from our environment. I left it as a define for familiarity. I will turn it into a helper. Doesn't make a difference to me.

> 
> I believe that this would be more explicit as:
> 
>   reinterpret_cast<void*>(i)

This is strictly more confusing. I prefer the above if you don't mind. It reads what it does.

> sizeof(quad)/2 is prone to breaking silently if we ever decide to throw a
> third vertex array into this buffer.
> 
> How about, instead, interleaving the two vertex attrib arrays here in this
> buffer, so that the offset is just 2*sizeof(float) and doesn't depend on any
> array length?

Yes, all coming in the patch queue. Check out the bug this blocks or the full picture. If you don't mind I will leave this unchanged for now to avoid conflicts in my queue.

> 
> @@ +69,5 @@
> > +                            RectTriangles& aRects)
> > +{
> > +    GLsizei bytes = aRects.elements() * 2 * sizeof(GLfloat);
> > +    GLsizei total = bytes;
> > +    if (aTexCoordAttribIndex != GLuint(-1)) {
> 
> Do we have a definite use case for drawing without a texcoords attrib?

Yes, but I am actually removing that in the next patch as well. Again, look at the whole picture. I am unifying the attribute state of every shader we have on the compositor so we can stop switching/restoring it.

> 
> @@ +81,5 @@
> > +                     total,
> > +                     nullptr,
> > +                     LOCAL_GL_STREAM_DRAW);
> > +
> > +    mGL->fBufferSubData(LOCAL_GL_ARRAY_BUFFER,
> 
> Calling BufferData and BufferSubData here means that this function is not
> just drawing stuff, it's uploading data to the GL, which can be expensive. I
> understand that that's necessary given what this function is doing, but:
>  1) Is this the right API?  (Or can we instead keep uploading separate from
> drawing, if that would allow us to upload less frequently than we draw?)
>  2) If this is the right API, then at least the function name should make it
> clear that this is uploading data. How about UploadAndDrawRects?

Upload is a side effect of the way I call the draw. If I used client-side rendering I wouldn't upload. Why is Upload important in the name?

> 
> Note that I do understand that the existing code in
> GLBlitTextureImageHelper.cpp, that you're changing, was using client-side
> arrays, which is the worst and means uploading anyway, so there's no
> question that this patch is going in the right direction.
> 
> Another idea. Now that this is in a helper object that stays alive as long
> as the GLContext is alive, how about keeping the buffer object alive and
> reusing it, instead of creating, BufferData'ing, and destroying it everytime?

Thats actually a performance hazard. You don't want to reuse VBOs until the frame is rendered. Many drivers stall the GPU pipeline when you do. Lets crank through the patch queue and then measure where performance matters, ok? (I did measure by the way that switching the attribute state is expensive, even on MacOSX).
> > Calling BufferData and BufferSubData here means that this function is not
> > just drawing stuff, it's uploading data to the GL, which can be expensive. I
> > understand that that's necessary given what this function is doing, but:
> >  1) Is this the right API?  (Or can we instead keep uploading separate from
> > drawing, if that would allow us to upload less frequently than we draw?)
> >  2) If this is the right API, then at least the function name should make it
> > clear that this is uploading data. How about UploadAndDrawRects?
> 
> Upload is a side effect of the way I call the draw. If I used client-side
> rendering I wouldn't upload. Why is Upload important in the name?

Well, client-side arrays are implicitly uploaded everytime. So at least you're not regressing anything here.

> 
> > 
> > Note that I do understand that the existing code in
> > GLBlitTextureImageHelper.cpp, that you're changing, was using client-side
> > arrays, which is the worst and means uploading anyway, so there's no
> > question that this patch is going in the right direction.
> > 
> > Another idea. Now that this is in a helper object that stays alive as long
> > as the GLContext is alive, how about keeping the buffer object alive and
> > reusing it, instead of creating, BufferData'ing, and destroying it everytime?
> 
> Thats actually a performance hazard. You don't want to reuse VBOs until the
> frame is rendered. Many drivers stall the GPU pipeline when you do.

True, I didn't think of that. Now I understand that point of VBOArena, which I didn't before.

OK to proceed with the rest of your patch queue then.
Attachment #8386843 - Flags: review- → review+
> > > Another idea. Now that this is in a helper object that stays alive as long
> > > as the GLContext is alive, how about keeping the buffer object alive and
> > > reusing it, instead of creating, BufferData'ing, and destroying it everytime?
> > 
> > Thats actually a performance hazard. You don't want to reuse VBOs until the
> > frame is rendered. Many drivers stall the GPU pipeline when you do.
> 
> True, I didn't think of that. Now I understand that point of VBOArena, which
> I didn't before.

Wait --- there's something important that I'm still not understanding here.

I do know about one story that goes like, "updating objects in place can cause stalls in deferred renderers". But that story is about updating *textures*, not VBOs. The reason for that is that the deferred renderers that I know of ( https://wiki.mozilla.org/Platform/GFX/MobileGPUs ) defer the *fragment shader* stage, and fragment shaders consume textures. VBOs, on the other hand, are consumed by the *vertex shader* stage, the fragment shader stage. I don't know of any renderer where the vertex shader stage is deferred; in fact, the deferred renderers that I know are tile-based, and by definition tiling requires already knowing when on screen the rendering needs to go to, i.e. it requires having already executed vertex shaders, i.e. these renderers are still immediate renderers as far as vertex shaders and VBOs are concerned.

So what am I missing here?
(In reply to Benoit Jacob [:bjacob] (on vacation until March 17) from comment #5)
> VBOs, on the other hand, are consumed by the *vertex shader* stage, the fragment
> shader stage.

A 'not' is missing in the above sentence. Read:

VBOs, on the other hand, are consumed by the *vertex shader* stage, NOT the fragment shader stage.
And also s/when/where --- "tiling requires already knowing WHERE on screen the rendering needs to go to".
Nevermind, I missed something obvious in comment 5 --- the problem is that if we issue a draw call using a VBO and immediately after we update the contents of that VBO, then the buffer update will have to block on the completion of at least the vertex shader stage of the draw call. Sorry about the noise.
Depends on: 978479
Attached patch patch ready for check-in (obsolete) — Splinter Review
Keywords: checkin-needed
Attachment #8399024 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/733a0129c739
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Backed out:

https://hg.mozilla.org/integration/mozilla-inbound/rev/a63f5694f562

Reason: is one of two patches in the regression range for bug 990233, 5% tresize regression on OSX 10.6. See conversation on bug 990233.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch patchSplinter Review
Attachment #8386843 - Attachment is obsolete: true
Attachment #8399172 - Attachment is obsolete: true
Depends on: 995805
I spent a bunch of time trying to fix the regression but I can't talk people into caching the underlying geometry so I propose to just eat the 10.6 regression here and call it a day. I have a ton of code clean up and code removal piled on top of this, plus the CSS mesh work, so I want to go ahead here. bjacob?
Flags: needinfo?(bjacob)
The geometry streaming/caching conversation (bug 995805) is nontrivial and will probably require some experimentation, so if you are looking for an immediate way to re-land here, then I suppose that it isn't it.

I'm not strongly opposed to re-landing here and taking the 5% hit on 10.6, because I don't know for sure whether that OSX 10.6 regression is indicative of anything non-10.6-specific; and for sure, a 5% tresize hit specific to OSX 10.6 is not by itself a huge concern. Also, I would rather take a 5% tresize hit on OSX 10.6 than rush the conversation on bug 995805.

I'm not feeling 100% comfortable making the call all by myself of whether a Talos regression like this is acceptable though, but realistically, a 10.6-only regression is something we could very easily have missed anyway (for example, if 10.6 slaves had already been phased out, which is being considered at the moment, bug 990490).
Flags: needinfo?(bjacob)
Thanks. Lets ask Jeff in addition.
Flags: needinfo?(jmuizelaar)
I'll talk with bjacob about this tomorrow.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> I'll talk with bjacob about this tomorrow.

I'm not opposed to regressing 10.6 but I'd like to understand what we think the cause is. Can you explain what it is that causes this regression?
Flags: needinfo?(jmuizelaar)
Comment 20: 10.6 seems to have a bug that makes VBOs (which my new path uses) slower than client-side rendering (which the old code uses). I made a patch to cache geometries which fixes the former, but people didn't like that approach and feel we should not use VBOs at all and only use GL_REPEAT instead of more-than-one-quad geometries (which the mQuadVBO stuff covers).
You need to log in before you can comment on or make changes to this bug.