The default bug view has changed. See this FAQ.

Streaming WebGL Buffers (Double-buffering, etc)

RESOLVED FIXED in B2G C1 (to 19nov)

Status

()

Core
Canvas: WebGL
P1
normal
RESOLVED FIXED
5 years ago
a year ago

People

(Reporter: bjacob, Assigned: jgilbert)

Tracking

(Depends on: 2 bugs, Blocks: 8 bugs, {feature})

unspecified
B2G C1 (to 19nov)
feature
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:-, blocking-basecamp:-, b2g18-, blocking-fennec1.0 -)

Details

(Whiteboard: webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M] regression-risk [tech-p1])

Attachments

(16 attachments, 21 obsolete attachments)

415.36 KB, patch
Details | Diff | Splinter Review
6.01 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
14.62 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
416.30 KB, patch
Details | Diff | Splinter Review
6.46 KB, patch
jgilbert
: review+
Details | Diff | Splinter Review
437.31 KB, patch
Details | Diff | Splinter Review
3.87 KB, patch
Details | Diff | Splinter Review
439.18 KB, patch
jrmuizel
: feedback+
Details | Diff | Splinter Review
5.61 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
31.06 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
4.05 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
955 bytes, patch
jrmuizel
: review+
Details | Diff | Splinter Review
7.39 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
7.58 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
436.55 KB, patch
Details | Diff | Splinter Review
431 bytes, patch
Details | Diff | Splinter Review
Currently, the surface that WebGL commands draw to is the same that is read by the compositor. That's single buffering; that only works because currently compositing cannot happen simultaneously as JS code is running.
With OMTC, that changes: compositing will happen at any time, even in the middle of a WebGL draw command. Double buffering is a requirement.
Double buffering is also the "right way" to fix bug 691347.
(Assignee)

Comment 1

5 years ago
I think we should consider doing triple-buffering, so as to actually desync the producer and consumer. Writing it out, it is quite simple to implement.

Also, it seems like we would still need to clarify how the resize should be handled. That is, should we desync the resize as well, so the framebuffer held by the consumer stays the same size until it is recycled back to the producer? If so, we have to keep track of different heights between the buffers we are swapping, and resize on-the-fly.
(Assignee)

Comment 2

5 years ago
This may be a place where using ARB_sync could win us some win.
(Reporter)

Comment 3

5 years ago
It is! Blocking on 697831.

I believe that we want to desync the resize as well. The user who's resizing the Firefox window wants it to look smooth, but won't notice if the actual canvas resizing happens only a few frames later.
Depends on: 697831

Updated

5 years ago
Blocks: 728524
Created attachment 599233 [details] [diff] [review]
Initial double buffering support for WebGL.

This patch is preliminary and while it works (read: doesn't cause horrible issues), it needs fence code to be useful.  A future patch will tie that in.
Assignee: nobody → cbrocious
Keywords: fennecnative-betablocker
Priority: -- → P1
blocking-fennec1.0: --- → beta+
Status: NEW → ASSIGNED
blocking-fennec1.0: beta+ → -

Updated

5 years ago
Blocks: 710398
(Assignee)

Updated

5 years ago
Depends on: 739421
(Reporter)

Updated

5 years ago
Whiteboard: webgl-next

Updated

5 years ago
Blocks: 749062

Updated

5 years ago
Blocks: 749721
(Assignee)

Updated

5 years ago
Assignee: cbrocious → jgilbert
(Assignee)

Comment 5

5 years ago
Created attachment 625337 [details] [diff] [review]
GLStream for waitless triple-buffering
(Assignee)

Comment 6

5 years ago
Just a note that this patch is still WIP, though it seems to work very nicely locally.
It's still rife with printfs and it could use more than a few clarifying comments.
No longer blocks: 598873
Blocks: 598873
Keywords: dev-doc-needed
(Assignee)

Updated

5 years ago
See Also: → bug 760675
(Assignee)

Updated

5 years ago
Attachment #599233 - Attachment is obsolete: true

Updated

5 years ago
Whiteboard: webgl-next → webgl-next [games:p1]
(Assignee)

Updated

5 years ago
No longer blocks: 728524
Depends on: 728524
blocking-basecamp: --- → ?
(Assignee)

Updated

5 years ago
Blocks: 767484
blocking-basecamp: ? → +
Whiteboard: webgl-next [games:p1] → webgl-next [games:p1] [soft]
(Assignee)

Updated

5 years ago
Depends on: 766366
(Assignee)

Comment 7

5 years ago
Created attachment 651113 [details] [diff] [review]
ScreenBuffer and SharedSurface WIP (non-compilable)

I'm just tying up loose ends, now.

Let's talk about anything you hate about the frameworks/classes. I tried a few things to try to make things easier to understand. (BufferBitMasks, for one) The goal is clarity, so any suggestions are appreciated.

I haven't torn out the existing resize/buffer code, but I think you can see that ScreenBuffer should remove the need for much of that, and move it all into one place. Basically, we Swap() on the ScreenBuffer to try to get a resize, and we'll be calling Morph from the content-side of the layers code, to set up how we need to do accelerated production.

Compositing code is not tied in yet, but basically GLStreams are thread-safe, and we'll just be passing the current GLStream pointer through to the OMTC thread, where it can call SwapCons to get a surface to consume.

OMProcessC will work, but should get a mini-OMTC thread to handle resolves (WaitSync) before dispatching the handle to the Gralloc surface across the process boundary. We can also initially just do CopyDoubleBuffer the slow way with the Finish on the content thread.

I need to add comments about the expected lifetimes and owners of the various objects. The main tricky one is that GLStreams maintain ownership of the SwapProd and SwapCons surfaces, so these surfaces die with the GLStream. (GLStream teardown after connection to a compositor should only happen on GLContext teardown, which should only happen after (context) layers teardown)

Also note that GLStreams don't have set factories any more; SurfaceFactories are owned by ScreenBuffers, and passed in to GLStream::SwapProd.

We'll probably want to vid-chat about this Monday, or whenever you have a chance. Actually, it'd be good to get input from anyone who was any, so I'll ask at GFX meeting.
Attachment #625337 - Attachment is obsolete: true
Attachment #651113 - Flags: feedback?(bjacob)
(Assignee)

Comment 8

5 years ago
Created attachment 651114 [details] [diff] [review]
ScreenBuffer and SharedSurface WIP 2 (non-compilable)

Forgot to add gfx/gl/SharedSurfaceANGLE.cpp.
Attachment #651113 - Attachment is obsolete: true
Attachment #651113 - Flags: feedback?(bjacob)
Attachment #651114 - Flags: feedback?(bjacob)
I'm late to this party (was sick most of last week), but please please try to come up with better names for SwapProd/SwapCons and related Prod/Cons usage?  Breaks my head every time I look at them, not least because 'cons' has a lisp meaning.
At a minimum, add cdr.
Does this feature require any documentation for any potentially web facing changes?
(Assignee)

Comment 12

5 years ago
(In reply to Ehsan Akhgari [:ehsan] from comment #11)
> Does this feature require any documentation for any potentially web facing
> changes?

Largely no, but the one exception is that preserveDrawingBuffer=true will now be slower than pDB=false. This is already mentioned in the WebGL spec as a likelihood, but it might do to reiterate and possibly have a brief explanation.

Also, quick update, this is really fighting my tooth and nail to get this compiling, as it's full of cyclic dependencies and namespaces, but it's nearly compiling now.
(Assignee)

Comment 13

5 years ago
Er, "fighting me tooth and nail", not "fighting my"...
(Assignee)

Comment 14

5 years ago
Got it building today, and am now fighting through my own asserts.
Should have it working for MTC-Basic tomorrow. (On linux)
(Assignee)

Comment 15

5 years ago
Created attachment 656748 [details] [diff] [review]
compiling patch

Here's the current state of it. Yes, it's over 200k. A large chunk of that is deletions. There's a bunch of new code as well, naturally.
(Assignee)

Comment 16

5 years ago
I should also note that it's still missing parts, particularly some of the recent work with EGLImages and GL_TEXTURE_EXTERNAL.
(Assignee)

Comment 17

5 years ago
It renders with BasicLayers now. (readback only, OMTC untested)
(Assignee)

Comment 18

5 years ago
Changing the title to be more accurate.
Summary: Double buffered WebGL → Streaming WebGL Buffers (Double-buffering, etc)
Comment on attachment 651114 [details] [diff] [review]
ScreenBuffer and SharedSurface WIP 2 (non-compilable)

Review of attachment 651114 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay on feedback; lots happened since; let me know on what particular aspect you would still like feedback.
Attachment #651114 - Flags: feedback?(bjacob)
Whiteboard: webgl-next [games:p1] [soft] → webgl-next [games:p1] [soft] [WebAPI:P0]
Keywords: feature

Comment 20

5 years ago
Slightly off-topic (not bug related), but does anyone have some example code for webGL that includes triple-buffering in webGL?  I am trying to wrap my head around it.

Would be appreciated, thanks
(Assignee)

Comment 21

5 years ago
(In reply to Lucas from comment #20)
> Slightly off-topic (not bug related), but does anyone have some example code
> for webGL that includes triple-buffering in webGL?  I am trying to wrap my
> head around it.
> 
> Would be appreciated, thanks

Do you mean WebGL code that uses what's being worked on here? There's actually nothing you can do from WebGL that'll be changed by this. Everything here is behind the scenes. The only thing WebGL code should notice is (hopefully) requestAnimationFrame turnover improving.

Comment 22

5 years ago
I was curious to see if I can implement and control triple buffering through WebGL, yes.
In short I am trying to highly synchronize multiple WebGL renderings.  Am I correct to say that WebGL won't give me such low level control?
(Assignee)

Comment 23

5 years ago
(In reply to Lucas from comment #22)
> I was curious to see if I can implement and control triple buffering through
> WebGL, yes.
> In short I am trying to highly synchronize multiple WebGL renderings.  Am I
> correct to say that WebGL won't give me such low level control?

Somewhat correct. I don't know exactly what you're trying to do. As it stands, from a WebGL user's point of view, there is nothing that would benefit from doing WebGL-side double-/triple-buffering. Multibuffering is only useful if one component is going to be asynchronous from another, which is not currently possible in WebGL.

Two WebGL contexts that are each rendered to in the same requestAnimationFrame will generally display at the same time. I don't believe there's a guarantee that would assure that two frames are going to be displayed simultaneously, though.

Comment 24

5 years ago
To clarify I am talking about two WebGL renderings on two different machines through WebGL.  My initial thought is that I could separate rendering and drawing on  low level using a double or tripple buffer, helping me to control the timing as much as possible from the software side.  I think I may be able to accomplish something close by simply using a frame buffer object.
(Assignee)

Comment 25

5 years ago
(In reply to Lucas from comment #24)
> To clarify I am talking about two WebGL renderings on two different machines
> through WebGL.  My initial thought is that I could separate rendering and
> drawing on  low level using a double or tripple buffer, helping me to
> control the timing as much as possible from the software side.  I think I
> may be able to accomplish something close by simply using a frame buffer
> object.

There is no WebGL semantic for interactions between contexts on multiple machines. This too far off topic to continue in this bug. For questions to this end, you can email me (jgilbert@mozilla.com) or email the WebGL mailing list (public_webgl@khronos.org).
(Assignee)

Updated

5 years ago
Blocks: 785838
Whiteboard: webgl-next [games:p1] [soft] [WebAPI:P0] → webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M]
(Assignee)

Comment 26

5 years ago
Created attachment 665108 [details] [diff] [review]
Working on Basic and OGL Layers on Linux

Yes, 300K is daunting. However, much of this are blocks of additions and subtractions, as opposed to small single-line changes.

Approximations:
Of 10415 diff lines, 4140 are additions, and 2944 are subtractions, so 3331 of these lines are unchanged context. About 2500 lines of the additions are the new files.


I need feedback on anything that seems weird, too complicated, or unnecessary.

The main goal of this patch is to add SurfaceStream class that takes care of swapping around buffers, even between threads. It has different implementations, including Single-Buffered (what we do now), Double-Buffered, Double-Buffered w/Copy, and Triple-Buffered. Instead of passing a some texture handle through the OMTC interface, we just request the next frame in the compositing thread directly from the Stream object.

These buffers take the form of SharedSurface objects that abstract away the details of what they're backed by, so that really only their client code in Layers needs to unroll them to pull out the data it needs.

Since this will thrash through GLContext and all our offscreen buffer code, I pulled that out into a ScreenBuffer class. The former functionality is still in GLContext, but it's a bit simpler, and is now hoisted out in a different pair of files.

When we initialize a CanvasLayer for a GLContext, we now 'Morph' the context to switch the type of SharedSurface and SurfaceStream it uses, since determining this depends on which Layers backend we're using, and whether we're on OMTC (or OMPC). Everything has Type enums now, so it's much easier to see which surface and stream types we're using.

Context format requests are simplified to use SurfaceCaps to request capabilities, as opposed to trying to match an arbitrary pixel depth from a ContextFormat.

Things which will be (re)connected later today: (You can already see some of the connecting code, but it hasn't been refreshed/finished)
GLContextProviderCGL
D3D9, D3D10 layers
OMTC with GLTexture
OMTC with EGLImage
OMPC with Gralloc
SharedSurface_GLTexture should use Fence objects

It's also bitrotted from about a week ago.

As soon as I get it standing up on OMTC, I'll upload that.
Attachment #656748 - Attachment is obsolete: true
Attachment #665108 - Flags: feedback?(ncameron)
Attachment #665108 - Flags: feedback?(bjacob)
Attachment #665108 - Flags: feedback?(bgirard)

Comment 27

5 years ago
Comment on attachment 665108 [details] [diff] [review]
Working on Basic and OGL Layers on Linux

Review of attachment 665108 [details] [diff] [review]:
-----------------------------------------------------------------

I had a look over the patch, mostly concentrating on the concepts and the layers side of things (a lot of the GL stuff is over my head). It looks good, I like the design. It should fit well with the layers refactoring (although it will be a bit of work to combine them, there is not much overlap and no obvious conflict). 

A few big comments to give an overview of what is going on in the main .h files would be good.

Are you removing all OSMesa support? Should that be in a separate bug?

::: gfx/gl/TempAsserts.h
@@ +30,5 @@
> +            MOZ_ERROR("Assertion failure: |" #expr "|");\
> +            MOZ_CRASH();\
> +    } while (0)
> +
> +

is this a temporary file to be removed? If not, should it be somewhere other than gfx?

::: gfx/layers/basic/BasicCanvasLayer.cpp
@@ +456,5 @@
>                                      mNeedsYFlip,
>                                      mBackBuffer);
>        // Move SharedTextureHandle ownership to ShadowLayer
> +      // Todo: What does ^this^ mean.
> +

I think it means that by removing our reference to mBackBuffer, the shadow layer has the only remaining reference to the buffer and is responsible for destroying it

::: gfx/layers/d3d9/CanvasLayerD3D9.cpp
@@ +54,5 @@
> +
> +    if (!screen->Morph(factory, streamType)) {
> +      MOZ_NOT_REACHED("Should never happen.");
> +      delete factory;
> +    }

should this block be in the below (aData.mGLCOntext) case?

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +712,5 @@
>  ShadowImageLayerOGL::Init(const SharedImage& aFront)
>  {
>    if (aFront.type() == SharedImage::TSurfaceDescriptor) {
>      SurfaceDescriptor surface = aFront.get_SurfaceDescriptor();
> +    /*

Why is stuff commented out in this file?
Attachment #665108 - Flags: feedback?(ncameron) → feedback+
I took a very quick look but the size is very taunting and I'm a bit worried about the increase code base complexity of all these changes but maybe it wont be so bad once I have it parsed in my head.

It seems like you re-implementing some texture sharing. What's the relation with Romaxa's sharedtextureimage? I'm concerned about ending up with more implementation of opengl resource sharing that don't work well together.
Comment on attachment 665108 [details] [diff] [review]
Working on Basic and OGL Layers on Linux

Review of attachment 665108 [details] [diff] [review]:
-----------------------------------------------------------------

Only got halfway; some comments:

::: content/canvas/src/WebGLContext.cpp
@@ +444,5 @@
>          if (status == nsIGfxInfo::FEATURE_NO_INFO || forceMSAA) {
> +            // Todo: Revive me:
> +            //uint32_t msaaLevel = Preferences::GetUint("webgl.msaa-level", 2);
> +            //format.samples = msaaLevel*msaaLevel;
> +            caps.antialias = true;

Revive which? webgl.msaa-level? I'd just nuke it entirely, if caps.antialias works -- have a global pref for antialias level (if antialiasing is enabled/allowed for the content).  I guess we might want more finer-grained control over the webgl level, but I think that if we do get that, we'll want it content-side for developers to control.

@@ +486,5 @@
>      }
>  #endif
>  
>      // if we're forcing osmesa, do it first
> +    /* Or don't.

I am so OK with killing OSMesa support.  Make it die.  If someone wants to implement some sort of alternate support, they can do it via providing a dummy EGL implementation.

::: content/canvas/src/WebGLContextGL.cpp
@@ +240,5 @@
>  
>      MakeContextCurrent();
>  
>      if (!wfb) {
> +        gl->fBindFramebuffer(target, 0);

sure wish you'd made the GetOffscreenFBO() changes as a single patch, and the format -> caps change as another path, etc :)

::: gfx/gl/GLContext.cpp
@@ +1660,4 @@
>  {
> +    MOZ_ASSERT(mScreen);
> +
> +    return mScreen->Stream()->SwapCons();

Aaa these names still :)  Please make them a bit more descriptive, or at least use SwapConsumer()/SwapProducer() etc.  "SwapCons" triggers all sorts of lisp-related places in my brain.

::: gfx/gl/GLContext.h
@@ +628,5 @@
>       * If this GL context has a D3D texture share handle, returns non-null.
>       */
> +    virtual void* GetEGLContext() { return nullptr; }
> +
> +    virtual void* GetLibraryEGL() { return nullptr; }

This is odd; why did this function name change?  The D3D Share Handle is pretty opaque... or did it just go away and the comment should go away too?

@@ +3059,5 @@
> +        if (mOldState)
> +            mGL->fEnable(mCapability);
> +        else
> +            mGL->fDisable(mCapability);
> +    }

This isn't quite right.  If you're caching the fIsEnabled() result then ok, but you should do the check in UwrapImpl as well.  Otherwise, if you're trying to avoid unneeded state changes, you'll end up always calling fDisable() if mOldState is false, even if the newState was also false.

You may want to have a protected WrapImpl() (maybe change these to SetStateImpl()/UnsetStateImpl()?) that returns a boolean indicating whether it actually made a change -- have ScopedGLWrapper save that boolean and then skip calling Unwrap/Unset entirely if it can.  The constructor can call it directly.

Oooor, just don't bother with the early-out at all (and skip calling IsEnabled) -- if we add state tracking/caching, we'll want to just do it directly in Enable/Disable, or we can trust the driver to already do that (which it 99% most likely does anyway, esp. for expensive states).

::: gfx/gl/GLContextProvider.h
@@ +27,4 @@
>  #define GL_CONTEXT_PROVIDER_NAME GLContextProviderOSMesa
>  #include "GLContextProviderImpl.h"
>  #undef GL_CONTEXT_PROVIDER_NAME
> +*/

Nuke, don't comment out -- be bold! :)
(Assignee)

Comment 30

5 years ago
(In reply to Nick Cameron [:nrc] from comment #27)
> Comment on attachment 665108 [details] [diff] [review]
> Working on Basic and OGL Layers on Linux
> 
> Review of attachment 665108 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I had a look over the patch, mostly concentrating on the concepts and the
> layers side of things (a lot of the GL stuff is over my head). It looks
> good, I like the design. It should fit well with the layers refactoring
> (although it will be a bit of work to combine them, there is not much
> overlap and no obvious conflict). 
> 
> A few big comments to give an overview of what is going on in the main .h
> files would be good.
> 
> Are you removing all OSMesa support? Should that be in a separate bug?
OSMesa support was removed in another bug. I hadn't unbitrotted this patch to include it yet, though.
> 
> ::: gfx/gl/TempAsserts.h
> @@ +30,5 @@
> > +            MOZ_ERROR("Assertion failure: |" #expr "|");\
> > +            MOZ_CRASH();\
> > +    } while (0)
> > +
> > +
> 
> is this a temporary file to be removed? If not, should it be somewhere other
> than gfx?
It will not be present in the final patch.

> 
> ::: gfx/layers/basic/BasicCanvasLayer.cpp
> @@ +456,5 @@
> >                                      mNeedsYFlip,
> >                                      mBackBuffer);
> >        // Move SharedTextureHandle ownership to ShadowLayer
> > +      // Todo: What does ^this^ mean.
> > +
> 
> I think it means that by removing our reference to mBackBuffer, the shadow
> layer has the only remaining reference to the buffer and is responsible for
> destroying it
This part is still more in flux than most. I'll keep that in mind though.
> 
> ::: gfx/layers/d3d9/CanvasLayerD3D9.cpp
> @@ +54,5 @@
> > +
> > +    if (!screen->Morph(factory, streamType)) {
> > +      MOZ_NOT_REACHED("Should never happen.");
> > +      delete factory;
> > +    }
> 
> should this block be in the below (aData.mGLCOntext) case?
Haha, yes, you're totally correct.
> 
> ::: gfx/layers/opengl/ImageLayerOGL.cpp
> @@ +712,5 @@
> >  ShadowImageLayerOGL::Init(const SharedImage& aFront)
> >  {
> >    if (aFront.type() == SharedImage::TSurfaceDescriptor) {
> >      SurfaceDescriptor surface = aFront.get_SurfaceDescriptor();
> > +    /*
> 
> Why is stuff commented out in this file?
The goal was originally to replace the SharedTexture stuff with the new SharedSurface stuff, so that we aren't juggling two such types around. I commented this stuff out so that it won't trigger when I think I'm testing the new stuff, but didn't delete it in case we ended up wanting to do the SharedTexture->SharedSurface conversion in another patch. (Which is the current plan, now)
(Assignee)

Comment 31

5 years ago
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #29)
> Comment on attachment 665108 [details] [diff] [review]
> Working on Basic and OGL Layers on Linux
> 
> Review of attachment 665108 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Only got halfway; some comments:
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +444,5 @@
> >          if (status == nsIGfxInfo::FEATURE_NO_INFO || forceMSAA) {
> > +            // Todo: Revive me:
> > +            //uint32_t msaaLevel = Preferences::GetUint("webgl.msaa-level", 2);
> > +            //format.samples = msaaLevel*msaaLevel;
> > +            caps.antialias = true;
> 
> Revive which? webgl.msaa-level? I'd just nuke it entirely, if caps.antialias
> works -- have a global pref for antialias level (if antialiasing is
> enabled/allowed for the content).  I guess we might want more finer-grained
> control over the webgl level, but I think that if we do get that, we'll want
> it content-side for developers to control.
For the time being, I would like to keep it available at least somehow. Whether or not we change it to gfx.msaa-level or not, I'm not to picky about. I would like to be able to tweak at least msaa-level for WebGL.

> 
> @@ +486,5 @@
> >      }
> >  #endif
> >  
> >      // if we're forcing osmesa, do it first
> > +    /* Or don't.
> 
> I am so OK with killing OSMesa support.  Make it die.  If someone wants to
> implement some sort of alternate support, they can do it via providing a
> dummy EGL implementation.
Indeed, this is exactly what we've done: Pull out OSMesa and add an option to use LLVMpipe to GLX. (And I think EGL?)
> 
> ::: content/canvas/src/WebGLContextGL.cpp
> @@ +240,5 @@
> >  
> >      MakeContextCurrent();
> >  
> >      if (!wfb) {
> > +        gl->fBindFramebuffer(target, 0);
> 
> sure wish you'd made the GetOffscreenFBO() changes as a single patch, and
> the format -> caps change as another path, etc :)
I agree with the former, but the latter would require going over all the old code just prior to ripping it out.
> 
> ::: gfx/gl/GLContext.cpp
> @@ +1660,4 @@
> >  {
> > +    MOZ_ASSERT(mScreen);
> > +
> > +    return mScreen->Stream()->SwapCons();
> 
> Aaa these names still :)  Please make them a bit more descriptive, or at
> least use SwapConsumer()/SwapProducer() etc.  "SwapCons" triggers all sorts
> of lisp-related places in my brain.
Yeah, that works.

> 
> ::: gfx/gl/GLContext.h
> @@ +628,5 @@
> >       * If this GL context has a D3D texture share handle, returns non-null.
> >       */
> > +    virtual void* GetEGLContext() { return nullptr; }
> > +
> > +    virtual void* GetLibraryEGL() { return nullptr; }
> 
> This is odd; why did this function name change?  The D3D Share Handle is
> pretty opaque... or did it just go away and the comment should go away too?
The comment is orphaned. It used to precede the d3d share handle getter, but I removed that, but forgot to remove the comment.
> 
> @@ +3059,5 @@
> > +        if (mOldState)
> > +            mGL->fEnable(mCapability);
> > +        else
> > +            mGL->fDisable(mCapability);
> > +    }
> 
> This isn't quite right.  If you're caching the fIsEnabled() result then ok,
> but you should do the check in UwrapImpl as well.  Otherwise, if you're
> trying to avoid unneeded state changes, you'll end up always calling
> fDisable() if mOldState is false, even if the newState was also false.
> 
> You may want to have a protected WrapImpl() (maybe change these to
> SetStateImpl()/UnsetStateImpl()?) that returns a boolean indicating whether
> it actually made a change -- have ScopedGLWrapper save that boolean and then
> skip calling Unwrap/Unset entirely if it can.  The constructor can call it
> directly.
> 
> Oooor, just don't bother with the early-out at all (and skip calling
> IsEnabled) -- if we add state tracking/caching, we'll want to just do it
> directly in Enable/Disable, or we can trust the driver to already do that
> (which it 99% most likely does anyway, esp. for expensive states).

Paragraph 3 is exactly what we should consider, though it's very true that the driver caches these things.

The reason paragraph 1 applies is if we change the state while the scoped object is active, we want to assure that it resets even if we didn't change it at first. While this does make it safer, it does also add slight expense. It's also not a likely use-case, but I don't think it's worth the drop in safety for saving an enable/disable call, at least at this level. (Caching at the fEnable/fDisable point is better if we want to do this)

The reason I keep the early out is because it's trivially unnecessary to enable/disable if we just checked the state (which we need to do to rebind it later), and we're already in the correct state. We cannot guarantee the same at Unwrap time.
(Assignee)

Comment 32

5 years ago
Created attachment 667313 [details] [diff] [review]
Everything but b2g

Alright, this should work on everything except B2G. I'm also getting a weird case where it seems to revert from 'Fennec OMTC' (Basic->OGLLayers) to just BasicLayers.

Performance seems comparable to existing on Fennec as tested on a Nexus 7. Framerates on Aquarium seem to be Before:[18-19, some 17], After[18-19, some 20], when we the 16bpp pref is true.

With 24bpp, we get about 8fps before, and 18fps after.

Passing over to Vlad to check if there's anything unusual during profiling.
Attachment #651114 - Attachment is obsolete: true
Attachment #665108 - Attachment is obsolete: true
Attachment #665108 - Flags: feedback?(bjacob)
Attachment #665108 - Flags: feedback?(bgirard)
Attachment #667313 - Flags: feedback?(vladimir)
(Assignee)

Comment 33

5 years ago
This try run should be a riot:
https://tbpl.mozilla.org/?tree=Try&rev=319be11c6833
(Assignee)

Comment 34

5 years ago
Ah, warnings as errors. Why don't we have these on normal builds?...
Looks like we're leaking Mutexes as well.
(In reply to comment #34)
> Ah, warnings as errors. Why don't we have these on normal builds?...

You should put --enable-warnings-as-error in your .mozconfig.
(Assignee)

Comment 36

5 years ago
Created attachment 667748 [details] [diff] [review]
Rebased, with some tweaks in GLCProviderEGL,

This build with warnings-as-errors for desktop Linux, at least. Fennec build is still pending. Will post a try if that's clean.
Attachment #667313 - Attachment is obsolete: true
Attachment #667313 - Flags: feedback?(vladimir)
Attachment #667748 - Flags: feedback?(vladimir)
(Assignee)

Comment 37

5 years ago
https://tbpl.mozilla.org/?tree=Try&rev=1f37fb036006

There are a number of interesting failures on the previous try run, even though it only ran on linux+debug.
(Assignee)

Comment 38

5 years ago
And of course I forgot to fix the main issues that made it burn last time. :|
(Assignee)

Comment 39

5 years ago
Alright, I'm blocked on getting the Gralloc primatives working for b2g.

I need a working version of the following, or similar:
gfx/layers/GrallocEGLGlue.h: http://pastebin.mozilla.org/1857618
gfx/layers/GrallocEGLGlue.cpp: http://pastebin.mozilla.org/1857617
(Assignee)

Comment 40

5 years ago
Current errors: (the 2nd seems simple enough, but the first is stumping me)

distcc[9938] ERROR: compile /home/jgilbert/.ccache/tmp/GrallocEGL.tmp.sisyphus.9907.ii on localhost failed
/home/jgilbert/dev/mozilla/b2g2/gecko/gfx/layers/GrallocEGLGlue.cpp: In function 'bool mozilla::layers::IsGrallocDesc(const mozilla::layers::SurfaceDescriptor*)':
/home/jgilbert/dev/mozilla/b2g2/gecko/gfx/layers/GrallocEGLGlue.cpp:21: error: request for member 'type' in 'desc', which is of non-class type 'const mozilla::layers::SurfaceDescriptor*'
/home/jgilbert/dev/mozilla/b2g2/gecko/gfx/layers/GrallocEGLGlue.cpp: In function 'void* mozilla::layers::GetClientBuffer(mozilla::layers::SurfaceDescriptor*)':
/home/jgilbert/dev/mozilla/b2g2/gecko/gfx/layers/GrallocEGLGlue.cpp:51: error: cannot convert 'android::sp<android::GraphicBuffer>' to 'android::GraphicBuffer*' in initialization
(Assignee)

Comment 41

5 years ago
Hah, I was trying to access the member var of a pointer. PEBKAC, indeed.
With the last patch applied, webgl doesn't work at all on my nexus 7.  Can't create a context -- haven't stepped through it to figure out why yet.
Comment on attachment 667748 [details] [diff] [review]
Rebased, with some tweaks in GLCProviderEGL,

Review of attachment 667748 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +2145,4 @@
>  {
> +    if (sEGLLibrary.fBindAPI(LOCAL_EGL_OPENGL_ES_API)) {
> +        NS_WARNING("Failed to bind to the GLES API.");
> +        return nullptr;

This is missing a == LOCAL_EGL_FALSE  (actually, prefer if (!...) here, to make it obvious that a negative is being looked for. LOCAL_EGL_FALSE is pretty ugly!)
With that fixed, things work great.  Here's a profile of the aquarium demo on a Nexus 7, with all the prefs set to defaults:

http://people.mozilla.com/~bgirard/cleopatra/?report=2a6d9e0ce87279eb1a6f1e49d0d8c6b846141900

57% in JS render() (!!), and then 30% in EndTransactionInternal -- canvas paints don't let us do async compositor transactions, which should be the next step after this lands (since I think all the infrastructure for them is in place for webgl canvases, just some stuff needs to be hooked up).
What's the status here?  Vlad/Jeff?
Flags: needinfo?
(Assignee)

Comment 46

5 years ago
Here's the most recent try run:
https://tbpl.mozilla.org/?tree=Try&rev=5c0ffc33634f

Ignore the burning Windows for now, as Windows has been green for a few days now.
So far, Windows is clean, Mac seems clean now, but Linux seems to be hitting a GL OOM condition in m1. I have not succeeded at reproducing this locally so far, but I'll be taking another look at it today.

Android seems fairly broken, but also seems to work more-or-less on my machine, though in DEBUG builds I am hitting a fatal assertion, so maybe that's related.

As a side note, that we don't run Android DEBUG builds on our testing machines is Really Bad.
Flags: needinfo?
(Assignee)

Comment 47

5 years ago
With some modifications, we're passing on linux[1] and windows[2].

[1]: https://tbpl.mozilla.org/?tree=Try&rev=8573fcdf4daf
[2]: https://tbpl.mozilla.org/?tree=Try&rev=076b8dc6c9d7

The linux issue should not affect android, but it's worth a look, as I still don't have a great lead for the android issues. I'll be trying to get a slave to test on directly, and see what's up.

The linux modifications seem odd, as it's necessitated by that we seem to be tearing down some WebGLContexts' GLContext before cleaning up orphaned WebGL objects. It's really important to clean up, as shared objects live as long as the share group, so killing the context alone is not enough to free them.

I think we should move to a regime where GL objects from GLContexts don't ever outlive their contexts. That is, we should track glGen*() calls, and correspondingly call glDelete*() if no one else did, on GLContext teardown. Sharing objects past the lifetime of the usual GLContext would still be possible, but should be done by generating the names for objects on the global shared context.

Mattwoodrow has also notified me that we should add another Layers callback like DidTransactionCallback, and do WebGLContext::PresentFrame in there, instead of in layers. I'm confident I know where everything goes, though, so it should be trivial.
(Assignee)

Comment 48

5 years ago
How embarrassing. It looks like I wasn't bothering to check whether we had the KHR_fence_sync extension before using it. Spinning up a fixed try run.
Bad, bad Zoot!
(Assignee)

Comment 50

5 years ago
The answer is more asserts. Also bug 808253, which is in fact 'more asserts'.
(Assignee)

Comment 51

5 years ago
Here was the try run: https://tbpl.mozilla.org/?tree=Try&rev=ba4cb3a09fc6

Still failing webgl reftests, apparently in gfxUtils::PremultiplyImageSurface. It's probably my fault though. After R2 is clean, I'll run another full set of reftests for all the platforms to check if anything else is up.

I also still need to clean out the printfs and friends.

For reviews, I am thinking bjacob, benwa(layers), and bas(d3d). Feedback from nrc and mattwoodrow? Is there anyone else who should be shooting this full of holes?
We're marking this bug with the C1 milestone since it follows the criteria of "unfinished feature work" (see https://etherpad.mozilla.org/b2g-convergence-schedule).

If this work is not finished by Nov19, this bug will need an exception and will be called out at the upcoming Exec Review.
Target Milestone: --- → B2G C1 (to 19nov)
(Reporter)

Updated

4 years ago
Duplicate of this bug: 806207
(Assignee)

Updated

4 years ago
Blocks: 806207
(Assignee)

Updated

4 years ago
Blocks: 811439
(Assignee)

Comment 54

4 years ago
Created attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

I don't need a full review from everyone:

mattwoodrow: Check what I've done for the callback.
bas: D3D stuff \o/
BenWa: GL Layers and OMTC
bjacob: We should probably talk through this.
Attachment #667748 - Attachment is obsolete: true
Attachment #667748 - Flags: feedback?(vladimir)
Attachment #682285 - Flags: review?(ncameron)
Attachment #682285 - Flags: review?(bjacob)
Attachment #682285 - Flags: review?(bgirard)
Attachment #682285 - Flags: review?(bas)
Attachment #682285 - Flags: feedback?(matt.woodrow)
(Assignee)

Comment 55

4 years ago
nrc: Couldn't decide between feedback and review for you. Since I'm changing how we toss around GLContext frames, it might be a good idea to talk through this.
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SharedSurfaceANGLE.cpp
@@ +195,5 @@
> +        goto CleanUpIfFailed;
> +
> +    failed = false;
> +
> +CleanUpIfFailed:

nit: Can you remove IfFailed here since you also run this when you didn't fail.

::: gfx/layers/d3d10/CanvasLayerD3D10.cpp
@@ +42,5 @@
> +    ScreenBuffer* screen = mGLContext->Screen();
> +    SurfaceStreamType streamType =
> +        SurfaceStream::ChooseGLStreamType(SurfaceStream::MainThread,
> +                                          screen->PreserveBuffer());
> +    SurfaceFactory_GL* factory = nullptr;

I prefer this being a refptr rather than the explicit ownership transfer. I feel there's a less of a chance of future modifications causing us to accidentally leak this.

@@ +138,5 @@
> +
> +        uint8_t* pixels = image.Data();
> +        printf_stderr("Pixels: [[%08x, %08x], [%08x, %08x]].\n",
> +                      pixels[0], pixels[1], pixels[2], pixels[3]);
> +        */

This probably shouldn't be here :)

@@ +149,5 @@
> +        SharedSurface_Basic* shareSurf = SharedSurface_Basic::Cast(surf);
> +        // WebGL reads entire surface.
> +        D3D10_MAPPED_TEXTURE2D map;
> +
> +        HRESULT hr = mTexture->Map(0, D3D10_MAP_WRITE_DISCARD, 0, &map);

This should probably create a new IMMUTABLE texture with initialization data. You might want to test this for performance. If dynamic is better we should stick with that.

Dynamic is good when changing a texture roughly once a frame, and practically never more often. Immutable is better if you expect the content to regularly remain constant for a while.

::: gfx/layers/d3d9/CanvasLayerD3D9.cpp
@@ +92,5 @@
>  
>      D3DLOCKED_RECT r = textureLock.GetLockRect();
>  
> +    gfxImageSurface* frameData = shareSurf->GetData();
> +    uint8_t* pixels = frameData->Data();

Ideally assert there's a 4 byte-per-pixel image format here.
Attachment #682285 - Flags: review?(bas) → review+
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/Layers.h
@@ +1470,5 @@
>  
>    /**
> +   * Register a callback to be called at the before each transaction.
> +   */
> +  typedef void PreTransactionCallback(void* closureData);

(* PreTransactionCallback)

@@ +1471,5 @@
>    /**
> +   * Register a callback to be called at the before each transaction.
> +   */
> +  typedef void PreTransactionCallback(void* closureData);
> +  void Set_PreTransactionCallback(PreTransactionCallback* callback, void* closureData)

I would prefer to have this match the DidTransactionCallback in both naming and style.

SetWillDoTransactionCallback() etc, or SetPreTransactionCallback and rename the other to SetPostTransactionCallback.

@@ +1477,5 @@
> +    mPreTransCallback = callback;
> +    mPreTransCallbackData = closureData;
> +  }
> +
> +  void Fire_PreTransactionCallback()

Can this be under protected: like FireDidTransactionCallback? As above naming, and removing the _
(Assignee)

Comment 58

4 years ago
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

To nical for review of non-d3d canvas layers.
Got the feedback I needed from mattwoodrow.
Attachment #682285 - Flags: review?(nical.bugzilla)
Attachment #682285 - Flags: review?(ncameron)
Attachment #682285 - Flags: feedback?(matt.woodrow)
To help reviewing here is some documentation from talking with Jeff.

There are three new classes there: ScreenBuffer, SurfaceStream, SharedSurface.

ScreenBuffer is the abstraction for the "default framebuffer" used by an offscreen GLContext. Since it's only for offscreen GLContext's, it's only useful for things like WebGL, and is NOT used by the compositor's GLContext. Remember that GLContext provides an abstraction so that even if you are a WebGLContext doing rendering into a non-default framebuffer object, that is still exposed to you as "Framebuffer 0". The new ScreenBuffer class takes the logic handling that out of class GLContext. ScreenBuffer is un-specialized.

SharedSurface abstracts an actual surface (can be a GL texture, but not necessarily) that handles sharing. Its specializations are:
  SharedSurface_Basic (client-side bitmap, does readback)
  SharedSurface_GLTexture
  SharedSurface_EGLImage
  SharedSurface_ANGLEShareHandle

SurfaceStream is where we handle swapping/blitting. Its specializations are:
  SurfaceStream_SingleBuffer
  SurfaceStream_TripleBuffer
  SurfaceStream_TripleBuffer_Copy

In addition there is SurfaceFactory which produces new |SharedSurface|s of the same kind. SurfaceFactory has specializations matching the specializations of SharedSurface.


The overall diagram around compositing WebGL frames looks like this:

WebGLContext

  |
  |  owns a
  V

GLContext

  |
  |  owns a
  V

ScreenBuffer ----------------------------------------------
                                                          |
  |                                                       |
  |  owns a                                               | owns a
  V                                                       V

SurfaceStream  <------------------------------------ SurfaceFactory
                 creates new surface as a fall-back
  |
  |  Swaps buffers.
  |  In OMTC case, the inter-thread communication/syncing happens here.
  |
  |                 (Content)
--+----------------------------------------------------------------------------
  |                 (Compositor)
  V
SurfaceStream

  |
  |  Obtains a
  V

SharedSurface

  |
  |  Is composited by
  V

Layers


In class SurfaceStream, the swapping is done by two functions: SwapProducer and SwapConsumer.
In general:
SwapProducer swaps away the current production buffer, and returns a new buffer to use for production.
SwapConsumer yields the most recent completed frame, if there is one.

For example, for triple-buffering, there are three buffers: the Producer buffer, the Staging buffer and the Consumer buffer.

SwapProducer:
1. Moves the Producer buffer into the Staging buffer
2. Tries to find a suitable buffer in a so-called Scraps area, and move it into the Producer buffer
3. If that fails, creates a new Producer buffer from the SurfaceFactory and moves it into the Producer buffer

SwapConsumer:
1. Moves the Consumer buffer into Scraps
2. Moves the Staging buffer into the Consumer buffer
3. Moves null into the Staging buffer
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/Makefile.in
@@ +32,5 @@
> +	SharedSurfaceEGL.h \
> +	SurfaceFactory.h \
> +	SurfaceStream.h \
> +	TempAsserts.h \
> +	SurfaceTypes.h \

Not that I really care, but I have been asked to place the items in alphabetcal order in some of my own patches.

::: gfx/gl/SharedSurface.h
@@ +19,5 @@
> +class SurfaceFactory;
> +
> +class SharedSurface
> +{
> +    //NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SharedSurface)

comented out code

::: gfx/gl/SharedSurfaceANGLE.cpp
@@ +36,5 @@
> +{
> +    aAttrs.Clear();
> +
> +#define A1(_x)      do { aAttrs.AppendElement(_x); } while (0)
> +#define A2(_x,_y)   do { A1(_x); A1(_y); } while (0)

Perhaps you should statically assert that A1 and A2 are not defined, or just use names that extremely unlikely to have be used as maccro somewhere else.

::: gfx/gl/SharedSurfaceANGLE.h
@@ +5,5 @@
> +
> +#ifndef SHARED_SURFACE_ANGLE_H_
> +#define SHARED_SURFACE_ANGLE_H_
> +
> +#include "GLContext.h"

Do you need this include?
It looks like a forward declaration would be sufficient here.

@@ +10,5 @@
> +#include "SharedSurfaceGL.h"
> +#include "SurfaceFactory.h"
> +#include "GLLibraryEGL.h"
> +#include "SurfaceTypes.h"
> +#include "GLContextProvider.h"

Same thing here, do you need this include here?

@@ +85,5 @@
> +    }
> +
> +    virtual bool WaitSync() {
> +        return true;
> +    }

Please document why we don't do any wait here

::: gfx/gl/SharedSurfaceGL.h
@@ +145,5 @@
> +    virtual void Fence();
> +
> +    virtual bool WaitSync() {
> +        return true;
> +    }

please add a comment to explain why we don't need to wait

::: gfx/gl/SurfaceTypes.h
@@ +28,5 @@
> +        Preserve  = 1 << 5
> +    };
> +};
> +typedef uint8_t CapBitMask;
> +*/

commented out code

::: gfx/layers/ipc/ShadowLayers.h
@@ +294,5 @@
>  
>    void DestroySharedSurface(SurfaceDescriptor* aSurface);
>  
> +
> +

whitespace change

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +153,5 @@
> +    NS_ASSERTION(mGLContext->IsOffscreen(), "Canvas GLContext must be offscreen.");
> +    mIsGLAlphaPremult = aData.mIsGLAlphaPremult;
> +    mNeedsYFlip = true;
> +
> +    printf_stderr("[OGL Layers, MTC] WebGL layer init.\n");

debug printf to remove

@@ +219,5 @@
> +      mLayerProgram = surf->HasAlpha() ? RGBALayerProgramType
> +                                       : RGBXLayerProgramType;
> +      switch (surf->Type()) {
> +        case SharedSurfaceType::Basic: {
> +          printf_stderr("Using Readback...\n");

debug printf to remove

@@ +227,5 @@
> +        }
> +        case SharedSurfaceType::GLTextureShare: {
> +          SharedSurface_GLTexture* textureSurf = SharedSurface_GLTexture::Cast(surf);
> +          mTexture = textureSurf->Texture();
> +          printf_stderr("Using GLTextureShare: %d\n", mTexture);

debug printf to remove

@@ +455,5 @@
>      *aNewBack = mFrontBufferDescriptor;
>      mFrontBufferDescriptor = aNewFront;
>      mNeedsYFlip = needYFlip;
> +  } else if (IsValidSurfaceStreamDescriptor(frontDesc)) {
> +    printf_stderr("Swapping SurfaceStreamDesc.\n");

printf

@@ +564,5 @@
> +  if (IsValidSurfaceStreamDescriptor(mFrontBufferDescriptor)) {
> +    printf_stderr("ShadowCanvasLayerOGL got a SurfaceStream.\n");
> +    const SurfaceStreamDescriptor& streamDesc =
> +        mFrontBufferDescriptor.get_SurfaceStreamDescriptor();
> +    printf_stderr("got a SurfaceStreamDescriptor.\n");

printf

@@ +567,5 @@
> +        mFrontBufferDescriptor.get_SurfaceStreamDescriptor();
> +    printf_stderr("got a SurfaceStreamDescriptor.\n");
> +    surfStream = SurfaceStream::FromHandle(streamDesc.handle());
> +    MOZ_ASSERT(surfStream);
> +    printf_stderr("got surfStream: 0x%08x!\n", (uint32_t)(uintptr_t)surfStream);

printf

@@ +570,5 @@
> +    MOZ_ASSERT(surfStream);
> +    printf_stderr("got surfStream: 0x%08x!\n", (uint32_t)(uintptr_t)surfStream);
> +
> +    sharedSurf = surfStream->SwapConsumer();
> +    printf_stderr("got sharedSurf!\n");

printf

@@ +577,5 @@
> +
> +      gfxImageSurface* toUpload = nullptr;
> +      switch (sharedSurf->Type()) {
> +        case SharedSurfaceType::GLTextureShare: {
> +          printf_stderr("ShadowCanvasLayerOGL with GLTextureShare\n");

some more printfs, i'll stop pointing them it's becoming noisy

@@ +617,5 @@
> +                              LOCAL_GL_RGBA,
> +                              2, 2, 0,
> +                              LOCAL_GL_RGBA, LOCAL_GL_UNSIGNED_BYTE,
> +                              test);
> +            */

you've got a big chunk of commented code here

::: gfx/layers/opengl/CanvasLayerOGL.h
@@ +37,4 @@
>  #endif
>    { 
> +    mImplData = static_cast<LayerOGL*>(this);
> +    mForceReadback = Preferences::GetBool("webgl.force-layers-readback", false);

Not too important in this case, but I prefer to see this kind of methods that add a dependency (Preferences.h) implemented in the .cpp.
Here it's okay-ish though since CanvasLayerOGL.h is only included twice and it was ther before your patch anyway.

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +754,5 @@
>        if (newID != mImageContainerID) {
>          mImageContainerID = newID;
>          mImageVersion = 0;
>        }
> +    /*

The code here should not be commented out
Can this feedback be accommodated in time to close this out for C1 milestone?  The end of C1 milestone is today!
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

Here's my partial review. I'll add more this week.
-Various printf to remove, I'll let you search and replace them.
-Path summary is wrong.

::: gfx/gl/GLContext.cpp
@@ +529,5 @@
> +
> +        // We're ready for final setup.
> +        InitFramebuffers();
> +
> +        bool lol = true;

Test var.

@@ +856,5 @@
>  {
>      GLFormats formats;
>  
>      // If we're on ES2 hardware and we have an explicit request for 16 bits of color or less
>      // OR we don't support full 8-bit color, return a 4444 or 565 format.

What if we have an explicit request for 16 bit color and we're not on ES2? Looks like the code will silently ignore this.

@@ +922,3 @@
>      }
>  
> +    formats.stencil = LOCAL_GL_STENCIL_INDEX8;

This should be moved above instead of init it to 0 before for readability.

@@ +1130,5 @@
>                                   LOCAL_GL_RENDERBUFFER,
>                                   colorMSRB);
>      } else {
> +        drawFB = readFB;
> +        // drawFB==readFB is already bound from the 'if (texture)' block.

Nit: I feel like this is a dangerous assumption since you relying on the if (!colorMSRB && !texture) + if (texture) + else of if (colorMSRB). I agree this is VALID but as soon as someone touch this code it will break. Perhaps you should introduce 'bool readFBBound' that you set in if (texture) and let the compiler perform the static analysis for you and optimize it away. It should make this code more refactor friendly.

@@ +1174,5 @@
>  
> +    if (drawFB_out)
> +        *drawFB_out = drawFB;
> +    else
> +        MOZ_ASSERT(!drawFB);

nit: I wouldn't object to making these runtime aborts. I'd rather have this show up in our nightly crash reports.

@@ +1192,4 @@
>  {
> +    MOZ_ASSERT(mScreen);
> +
> +    // Note that this will not *always* publish a frame.

Useful comment but it leaves the render wondering 'When?'. Can you elaborate?

::: gfx/gl/GLContext.h
@@ +3091,5 @@
>      }
>  };
>  
> +
> +class TextureScrapper {

This appears to work like a Pool or as we use in Mozilla a RecycleBin. Please rename it to TextureRecycleBin or put a block comment explaining how this class differs from a RecycleBin.
Also let's land bjacob ASCII explanation with this patch.
(Assignee)

Comment 64

4 years ago
Agreed.
I'll add this stuff to my review diff from this main patch.

This also seems to be working on b2g locally, pending landing of bug 813278.
Depends on: 813278
(Assignee)

Comment 65

4 years ago
(In reply to Benoit Girard (:BenWa) from comment #62)
> Comment on attachment 682285 [details] [diff] [review]
> nearly-complete patch. Seems to just be failing bpp16 stuff.
> 
> Review of attachment 682285 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Here's my partial review. I'll add more this week.
> -Various printf to remove, I'll let you search and replace them.
Indeed, they were staying as long as I was still doing testing.
> -Path summary is wrong.
I don't know what you mean by this.
> 
> ::: gfx/gl/GLContext.cpp
> @@ +529,5 @@
> > +
> > +        // We're ready for final setup.
> > +        InitFramebuffers();
> > +
> > +        bool lol = true;
> 
> Test var.
Hah, indeed.
> 
> @@ +856,5 @@
> >  {
> >      GLFormats formats;
> >  
> >      // If we're on ES2 hardware and we have an explicit request for 16 bits of color or less
> >      // OR we don't support full 8-bit color, return a 4444 or 565 format.
> 
> What if we have an explicit request for 16 bit color and we're not on ES2?
> Looks like the code will silently ignore this.
Yes. We figure that it's never going to improve performance on desktop, so it's not worth implementing. Incidentally, we get to test this on Windows with ANGLE anyways, though.
> 
> @@ +922,3 @@
> >      }
> >  
> > +    formats.stencil = LOCAL_GL_STENCIL_INDEX8;
> 
> This should be moved above instead of init it to 0 before for readability.
> 
> @@ +1130,5 @@
> >                                   LOCAL_GL_RENDERBUFFER,
> >                                   colorMSRB);
> >      } else {
> > +        drawFB = readFB;
> > +        // drawFB==readFB is already bound from the 'if (texture)' block.
> 
> Nit: I feel like this is a dangerous assumption since you relying on the if
> (!colorMSRB && !texture) + if (texture) + else of if (colorMSRB). I agree
> this is VALID but as soon as someone touch this code it will break. Perhaps
> you should introduce 'bool readFBBound' that you set in if (texture) and let
> the compiler perform the static analysis for you and optimize it away. It
> should make this code more refactor friendly.
Sounds good.
> 
> @@ +1174,5 @@
> >  
> > +    if (drawFB_out)
> > +        *drawFB_out = drawFB;
> > +    else
> > +        MOZ_ASSERT(!drawFB);
> 
> nit: I wouldn't object to making these runtime aborts. I'd rather have this
> show up in our nightly crash reports.
Alright.
> 
> @@ +1192,4 @@
> >  {
> > +    MOZ_ASSERT(mScreen);
> > +
> > +    // Note that this will not *always* publish a frame.
> 
> Useful comment but it leaves the render wondering 'When?'. Can you elaborate?
Sure.
> 
> ::: gfx/gl/GLContext.h
> @@ +3091,5 @@
> >      }
> >  };
> >  
> > +
> > +class TextureScrapper {
> 
> This appears to work like a Pool or as we use in Mozilla a RecycleBin.
> Please rename it to TextureRecycleBin or put a block comment explaining how
> this class differs from a RecycleBin.
It differs in that we don't attempt to reuse anything in here. I could call it 'GarbageBin' though.
(Assignee)

Comment 66

4 years ago
(In reply to Faramarz (:faramarz) from comment #61)
> Can this feedback be accommodated in time to close this out for C1
> milestone?  The end of C1 milestone is today!

No. Inside the next week is likely, but still this by itself doesn't have any perf improvement for b2g. B2G perf would be addressed in a follow up patch. There are roughly two copies which can be avoided fairly easily, and an additional third copy that could be skipped by clever use of gralloc-backed SharedSurfaces. Whether this work blocks or not depends on the priority of performant WebGL. If someone measures how long it takes to copy a screen-sized buffer on B2G, that will tell us what sort of perf hit we're taking.
Frustrating and sad that this missed the boat, but we don't have specific PRs for webgl other than "fast", so we'll just have to recalibrate that.
blocking-basecamp: + → -
blocking-kilimanjaro: --- → +
(Assignee)

Updated

4 years ago
Blocks: 776796
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, the new classes could use some doc comments.
You have to cleanup the commented out code and printfs. Also, ShadowLayers.h doesn't need to be part of this changeset for one whitespace change.
As far as the actual way the patch works, it looks good to me, although parts of it overlap with the layers refactoring, so when the refactoring lands we'll probably change a few things to integrate your stuff.
r=me with the stuff I pointed today and last weekend

::: gfx/gl/SurfaceStream.h
@@ +8,5 @@
> +
> +#include <stack>
> +#include <set>
> +#include "mozilla/Mutex.h"
> +//#include "nsISupportsImpl.h"

commented out code

@@ +18,5 @@
> +    namespace gfx {
> +        class SharedSurface;
> +        class SurfaceFactory;
> +    }
> +}

nit: you can move these two forward declarations below and remove the extra namespace scopes (you are closing mozilla::gfx and reopening it right after).

@@ +26,5 @@
> +
> +// Owned by: ScreenBuffer
> +class SurfaceStream
> +{
> +    //NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SurfaceStream)

commented out code

::: gfx/layers/TiledLayerBuffer.h
@@ +353,5 @@
>          int currTileX = floor_div(x - newBufferOrigin.x, GetTileLength());
>          int currTileY = floor_div(y - newBufferOrigin.y, GetTileLength());
>          int index = currTileX * mRetainedHeight + currTileY;
> +        //NS_ABORT_IF_FALSE(!newValidRegion.Intersects(tileRect) ||
> +        NS_ASSERTION(!newValidRegion.Intersects(tileRect) ||

Commented out code, I don't know too much about this code, but are you killing the symptom of a bug here?
Attachment #682285 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

Review of attachment 682285 [details] [diff] [review]:
-----------------------------------------------------------------

Only reviewed WebGL so far; already 2 r-'s, though I might be missing things.

::: content/canvas/src/WebGLContext.cpp
@@ +775,2 @@
>  
> +    /** DidTransactionCallback gets called by the Layers code everytime the WebGL canvas gets composite,

Update this comment! s/DidTransactionCallback/PreTransactionCallback/

@@ +1112,5 @@
> +    if (mDitherEnabled)
> +        gl->fDisable(LOCAL_GL_DITHER);
> +
> +    if (mScissorTestEnabled)
> +        gl->fDisable(LOCAL_GL_SCISSOR_TEST);

I wouldn't add the if (mScissorTestEnabled) there, because if that value were wrong because of a bug, adding  that if  would turn it into a security flaw. Or even better: keep that if, but add a MOZ_ASSERT that it equals what glIsEnabled(LOCAL_GL_SCISSOR_TEST) returns.

@@ +1126,5 @@
> +
> +        if (mColorClearValue[0] != 0.0f ||
> +            mColorClearValue[1] != 0.0f ||
> +            mColorClearValue[2] != 0.0f ||
> +            mColorClearValue[3] != 0.0f)

I doubt that these two if()'s are worth it. If you have measured that they improve performance, OK to keep them but add a comment about that.

@@ +1137,5 @@
> +        if (!mDepthWriteMask)
> +            gl->fDepthMask(1);
> +
> +        if (mDepthClearValue != 1.0f)
> +            gl->fClearDepth(1.0f);

Same with all these if()'s.

More generally I don't understand the need to rewrite this functions as part of this patch. Seems like an orthogonal patch; the present patch is already big enough without that ;-)

R- because this makes this patch significantly harder t.o review (adds significant security risk)

@@ +1270,2 @@
>  
> +    printf_stderr("WebGL[0x%08x]: Presenting screen buffer.\n", (uint32_t)this);

Remove printfs

::: content/canvas/src/WebGLContext.h
@@ +570,5 @@
> +     *
> +     * WebGL Presents its backbuffer less often than Invalidation happens.
> +     */
> +
> +    // A note that this uses 'clean' without having anything to do with mIsScreenDirty.

Write that note

@@ +590,5 @@
>  
>      const WebGLRectangleObject *FramebufferRectangleObject() const;
>  
> +    // This is similar to GLContext::ClearSafely, but tries to minimize the
> +    // amount of work it does.

Again --- why?

::: content/canvas/src/WebGLContextGL.cpp
@@ -590,5 @@
> -                                  mStencilClearValue  == 0;
> -        if (valuesAreDefault &&
> -            mBackbufferClearingStatus == BackbufferClearingStatus::ClearedToDefaultValues)
> -        {
> -            needClearCallHere = false;

The code being removed here was an optimization skipping a WebGL clear() when it was going to be redundant with the backbuffer clearing that we had just performed. For all I can see, this optimization is removed by this patch. r- for that too unless justification --- by default I assume that avoiding a clear() is worthwhile, as it can be many pixels to fill.
Attachment #682285 - Flags: review?(bjacob) → review-
(Assignee)

Comment 70

4 years ago
(In reply to Nicolas Silva [:nical] from comment #68)
> Comment on attachment 682285 [details] [diff] [review]
> nearly-complete patch. Seems to just be failing bpp16 stuff.
> 
> Review of attachment 682285 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall, the new classes could use some doc comments.
> You have to cleanup the commented out code and printfs. Also, ShadowLayers.h
> doesn't need to be part of this changeset for one whitespace change.
> As far as the actual way the patch works, it looks good to me, although
> parts of it overlap with the layers refactoring, so when the refactoring
> lands we'll probably change a few things to integrate your stuff.
> r=me with the stuff I pointed today and last weekend
Yep.
Hah, ok, ShadowLayers.h sounds like an orphaned fix.
> 
> ::: gfx/gl/SurfaceStream.h
> @@ +8,5 @@
> > +
> > +#include <stack>
> > +#include <set>
> > +#include "mozilla/Mutex.h"
> > +//#include "nsISupportsImpl.h"
> 
> commented out code
> 
> @@ +18,5 @@
> > +    namespace gfx {
> > +        class SharedSurface;
> > +        class SurfaceFactory;
> > +    }
> > +}
> 
> nit: you can move these two forward declarations below and remove the extra
> namespace scopes (you are closing mozilla::gfx and reopening it right after).
I much prefer doing forward declarations all in one place. I would prefer to keep them wholly separate from the code. To that end, I pulled them all the way out of the namespaces we use for the code. I think this makes it more clear that we're forward-declaring them, and makes it more similar to includes, since that's what we're replacing.
> 
> @@ +26,5 @@
> > +
> > +// Owned by: ScreenBuffer
> > +class SurfaceStream
> > +{
> > +    //NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SurfaceStream)
> 
> commented out code
> 
> ::: gfx/layers/TiledLayerBuffer.h
> @@ +353,5 @@
> >          int currTileX = floor_div(x - newBufferOrigin.x, GetTileLength());
> >          int currTileY = floor_div(y - newBufferOrigin.y, GetTileLength());
> >          int index = currTileX * mRetainedHeight + currTileY;
> > +        //NS_ABORT_IF_FALSE(!newValidRegion.Intersects(tileRect) ||
> > +        NS_ASSERTION(!newValidRegion.Intersects(tileRect) ||
> 
> Commented out code, I don't know too much about this code, but are you
> killing the symptom of a bug here?
I'll look into this again. I believe it was an assertion I was hitting after a rebase at some point. I don't remember if it was spurious, since-fixed, or something I need to look into more.
Comment on attachment 682285 [details] [diff] [review]
nearly-complete patch. Seems to just be failing bpp16 stuff.

I gave some early feedback. I'm waiting for some new changes to take another look.
Attachment #682285 - Flags: review?(bgirard) → review-
(Assignee)

Comment 72

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #69)
> Comment on attachment 682285 [details] [diff] [review]
> nearly-complete patch. Seems to just be failing bpp16 stuff.
> 
> Review of attachment 682285 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Only reviewed WebGL so far; already 2 r-'s, though I might be missing things.
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +775,2 @@
> >  
> > +    /** DidTransactionCallback gets called by the Layers code everytime the WebGL canvas gets composite,
> 
> Update this comment! s/DidTransactionCallback/PreTransactionCallback/
Done.
> 
> @@ +1112,5 @@
> > +    if (mDitherEnabled)
> > +        gl->fDisable(LOCAL_GL_DITHER);
> > +
> > +    if (mScissorTestEnabled)
> > +        gl->fDisable(LOCAL_GL_SCISSOR_TEST);
> 
> I wouldn't add the if (mScissorTestEnabled) there, because if that value
> were wrong because of a bug, adding  that if  would turn it into a security
> flaw. Or even better: keep that if, but add a MOZ_ASSERT that it equals what
> glIsEnabled(LOCAL_GL_SCISSOR_TEST) returns.
We check this below in an #ifdef DEBUG block. For these, we should probably be using IsEnabled instead of GetInteger, though.
> 
> @@ +1126,5 @@
> > +
> > +        if (mColorClearValue[0] != 0.0f ||
> > +            mColorClearValue[1] != 0.0f ||
> > +            mColorClearValue[2] != 0.0f ||
> > +            mColorClearValue[3] != 0.0f)
> 
> I doubt that these two if()'s are worth it. If you have measured that they
> improve performance, OK to keep them but add a comment about that.
We should really be setting these naively, and caching behind the scenes, if we want to go this route, agreed.
> 
> @@ +1137,5 @@
> > +        if (!mDepthWriteMask)
> > +            gl->fDepthMask(1);
> > +
> > +        if (mDepthClearValue != 1.0f)
> > +            gl->fClearDepth(1.0f);
> 
> Same with all these if()'s.
> 
> More generally I don't understand the need to rewrite this functions as part
> of this patch. Seems like an orthogonal patch; the present patch is already
> big enough without that ;-)
> 
> R- because this makes this patch significantly harder t.o review (adds
> significant security risk)
> 
> @@ +1270,2 @@
> >  
> > +    printf_stderr("WebGL[0x%08x]: Presenting screen buffer.\n", (uint32_t)this);
> 
> Remove printfs
Printfs are coming out now that we're clean on try.
> 
> ::: content/canvas/src/WebGLContext.h
> @@ +570,5 @@
> > +     *
> > +     * WebGL Presents its backbuffer less often than Invalidation happens.
> > +     */
> > +
> > +    // A note that this uses 'clean' without having anything to do with mIsScreenDirty.
> 
> Write that note
That is the note. ("A note that X" being similar to "Note: X") I'll clear this up, though.
> 
> @@ +590,5 @@
> >  
> >      const WebGLRectangleObject *FramebufferRectangleObject() const;
> >  
> > +    // This is similar to GLContext::ClearSafely, but tries to minimize the
> > +    // amount of work it does.
> 
> Again --- why?
In WebGL land, we know all our state, so we don't have to first query the state so that we can restore it properly later, like ClearSafely needs to. I expanded the comment to this effect.
> 
> ::: content/canvas/src/WebGLContextGL.cpp
> @@ -590,5 @@
> > -                                  mStencilClearValue  == 0;
> > -        if (valuesAreDefault &&
> > -            mBackbufferClearingStatus == BackbufferClearingStatus::ClearedToDefaultValues)
> > -        {
> > -            needClearCallHere = false;
> 
> The code being removed here was an optimization skipping a WebGL clear()
> when it was going to be redundant with the backbuffer clearing that we had
> just performed. For all I can see, this optimization is removed by this
> patch. r- for that too unless justification --- by default I assume that
> avoiding a clear() is worthwhile, as it can be many pixels to fill.
This is handled by PresentScreenBuffer() now, where it early-outs if nothing needs to change.
(Assignee)

Comment 73

4 years ago
Created attachment 685423 [details] [diff] [review]
patch with working 16bpp path

Here's the new version of the base patch, with my 16bpp fixes. I accidentally folded it in. Oops. I'll point out what I changed to...bjacob, I guess, if you have time tomorrow or so?
Attachment #682285 - Attachment is obsolete: true
Attachment #685423 - Flags: review?(bjacob)
(Assignee)

Comment 74

4 years ago
Created attachment 685424 [details] [diff] [review]
interdiff of responses to review comments

Back over to bjacob and bgirard.
Attachment #685424 - Flags: review?(bjacob)
Attachment #685424 - Flags: review?(bgirard)
(Assignee)

Comment 75

4 years ago
This is currently based on top of bug 811181, which is on top of 811958, which hit inbound earlier today.
(Assignee)

Updated

4 years ago
Blocks: 618898
(In reply to Jeff Gilbert [:jgilbert] from comment #73)
> Created attachment 685423 [details] [diff] [review]
> patch with working 16bpp path
> 
> Here's the new version of the base patch, with my 16bpp fixes. I
> accidentally folded it in. Oops. I'll point out what I changed to...bjacob,
> I guess, if you have time tomorrow or so?

Since we are 4 reviewers here, it would be nice if you could post this explanation here.
(Assignee)

Comment 77

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #76)
> (In reply to Jeff Gilbert [:jgilbert] from comment #73)
> > Created attachment 685423 [details] [diff] [review]
> > patch with working 16bpp path
> > 
> > Here's the new version of the base patch, with my 16bpp fixes. I
> > accidentally folded it in. Oops. I'll point out what I changed to...bjacob,
> > I guess, if you have time tomorrow or so?
> 
> Since we are 4 reviewers here, it would be nice if you could post this
> explanation here.

Ok. ReadPixelsIntoImageSurface is no longer trivial. It reads with one of RGBA/UNSIGNED_BYTE, BGRA/UNSIGNED_INT_8_8_8_8_REV, or RGB/UNSIGNED_SHORT_5_6_5_REV, depending on what's available, and converts it to one of ImageFormatARGB32, ImageFormatRGB24, or ImageFormatRGB16_565. Note that because of how we can't rely on anything but RGBA/UNSIGNED_BYTE, we have to support cases where the caller wants to read into a ImageFormatRGB16_565, but we can only ReadPixels with RGBA/UNSIGNED_BYTE. This means after we ReadPixels in, we have to swap the R and B channels (making the format ImageFormatRGB24), then use gfxContext to blit into our destination ImageFormatRGB16_565 surface.

It also slightly complicates readback paths in Layers, as we generally only work with ImageFormatRGB24/ARGB32 there. (From what I've seen)
(Assignee)

Comment 78

4 years ago
Created attachment 685801 [details] [diff] [review]
combined patch

Here's the combined patch.
Comment on attachment 685801 [details] [diff] [review]
combined patch

Review of attachment 685801 [details] [diff] [review]:
-----------------------------------------------------------------

I focused on the layers changed like asked minus D3D which Bas reviewed.

::: gfx/gl/Makefile.in
@@ +21,3 @@
>  	GLContextProvider.h \
>  	GLContextProviderImpl.h \
> +  GLContextSymbols.h \

You need to use TABs in Makefile

::: gfx/gl/SharedSurface.cpp
@@ +15,5 @@
> +// can be temporarily unlocked if needed.
> +void
> +SharedSurface::Copy(SharedSurface* src, SharedSurface* dest, SurfaceFactory* factory)
> +{
> +    printf_stderr("SS::Copy.\n");

printf

::: gfx/gl/SharedSurface.h
@@ +67,5 @@
> +    // Fence may be called multiple times, and should be very light.
> +    virtual void Fence() = 0;
> +    virtual bool WaitSync() = 0;
> +
> +    // Assumes |this| has been Lock()'d:

debug assert

@@ +72,5 @@
> +    // Note that this can be left as a no-op.
> +    virtual void DiscardContents() {}
> +
> +
> +    // Toys for everyone:

That's not a useful comment

::: gfx/gl/SurfaceStream.h
@@ +26,5 @@
> +
> +// Owned by: ScreenBuffer
> +class SurfaceStream
> +{
> +    //NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SurfaceStream)

remove?

::: gfx/gl/SurfaceTypes.h
@@ +15,5 @@
> +
> +typedef uintptr_t SurfaceStreamHandle;
> +
> +/*
> +struct CapabilityBit

This is commented. Should it be removed?

@@ +74,5 @@
> +        return caps;
> +    }
> +};
> +
> +#define TEMP_ENUM_CLASS_BEGIN(Name,Type)\

I'm not a fan of using macro expansion for something like this. It makes searching/indexing the code much harder to save a bit of repetition. I don't think it's worth it. bjacob agrees with me on this one.

::: gfx/gl/TempAsserts.h
@@ +6,5 @@
> +#ifndef TEMP_ASSERTS_H_
> +#define TEMP_ASSERTS_H_
> +
> +#include "mozilla/Attributes.h"
> +#include "mozilla/Assertions.h"

I don't think this belong in GFX code and in this patch.

::: gfx/layers/Layers.h
@@ +1476,5 @@
>      return mDirty; 
>    }
>  
>    /**
> +   * Register a callback to be called at the before each transaction.

s/before/start of/ or similar.

::: gfx/layers/basic/BasicCanvasLayer.cpp
@@ +107,5 @@
>      mNeedsYFlip = true;
> +    MOZ_ASSERT(mGLContext->IsOffscreen(), "canvas gl context isn't offscreen");
> +
> +    // Screen morphing is done in BasicShadowableCanvasLayer.
> +    // [Basic Layers, MTC] WebGL layer init.

I don't understand this comment. Can you reword it? MTC=MainThreadCompositing? Screen morphing?

@@ +379,5 @@
> +          if (!isCrossProcess) {
> +            // [Basic/OGL Layers, OMTC] WebGL layer init.
> +            factory = SurfaceFactory_EGLImage::Create(mGLContext, screen->Caps());
> +          } else {
> +            // [Basic/OGL Layers, OMPC] WebGL layer init.

This should be a runtime abort. Normally we use OOP for cross process not OMP. Might be worth spelling out rather then defining a new abbreviation.

@@ +422,5 @@
>        BasicManager()->GetParentBackendType() == mozilla::layers::LAYERS_OPENGL)
>    {
> +    bool isCrossProcess = XRE_GetProcessType() != GeckoProcessType_Default;
> +    // Todo: If isCrossProcess (OMPC), spin off mini-thread to RequestFrame
> +    // and forward Gralloc handle. Signal WaitSync complete with XPC mutex?

Are you planning on doing this TODO before landing the patch? If not we like want to runtime abort in paths that are incomplete rather then look debug why something isn't rendering.

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +234,5 @@
> +          MOZ_NOT_REACHED("Unacceptable SharedSurface type.");
> +          return;
> +      }
> +    } else {
> +      // No surf, so give it something blank.

This is neat but I'm not sure if this is better then just letting the background show through. What's your reasoning for doing this? Does this actually occur normally?

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ -989,5 @@
>      mTexImage->BeginTileIteration();
>  
>      if (gl()->CanUploadNonPowerOfTwo()) {
>        do {
> -        TextureImage::ScopedBindTextureAndApplyFilter texBind(mTexImage, LOCAL_GL_TEXTURE0);

I don't like changing GL_TEXTURE0 -> 0. This is different from OpenGL and hurts readability. The readers has to look up the constructor for the definition of the second parameter where it was previous obvious. I don't see the benefit in making this change.

::: xpcom/glue/nsCycleCollectionTraversalCallback.h
@@ -58,5 @@
>  
>    uint32_t mFlags;
>  };
>  
> -#endif // nsCycleCollectionTraversalCallback_h__
\ No newline at end of file

\ No newline at end of file

Accidental change.
Attachment #685801 - Flags: review-

Updated

4 years ago
Attachment #685424 - Flags: review?(bgirard)
(Assignee)

Comment 80

4 years ago
(In reply to Benoit Girard (:BenWa) from comment #79)
> Comment on attachment 685801 [details] [diff] [review]
> combined patch
> 
> Review of attachment 685801 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I focused on the layers changed like asked minus D3D which Bas reviewed.
> 
> ::: gfx/gl/Makefile.in
> @@ +21,3 @@
> >  	GLContextProvider.h \
> >  	GLContextProviderImpl.h \
> > +  GLContextSymbols.h \
> 
> You need to use TABs in Makefile
Oops.
> 
> ::: gfx/gl/SharedSurface.cpp
> @@ +15,5 @@
> > +// can be temporarily unlocked if needed.
> > +void
> > +SharedSurface::Copy(SharedSurface* src, SharedSurface* dest, SurfaceFactory* factory)
> > +{
> > +    printf_stderr("SS::Copy.\n");
> 
> printf
> 
> ::: gfx/gl/SharedSurface.h
> @@ +67,5 @@
> > +    // Fence may be called multiple times, and should be very light.
> > +    virtual void Fence() = 0;
> > +    virtual bool WaitSync() = 0;
> > +
> > +    // Assumes |this| has been Lock()'d:
> 
> debug assert
Nothing implements this, so I will remove it for now.
I'll add checks for locking semantics, though.
> 
> @@ +72,5 @@
> > +    // Note that this can be left as a no-op.
> > +    virtual void DiscardContents() {}
> > +
> > +
> > +    // Toys for everyone:
> 
> That's not a useful comment
> 
> ::: gfx/gl/SurfaceStream.h
> @@ +26,5 @@
> > +
> > +// Owned by: ScreenBuffer
> > +class SurfaceStream
> > +{
> > +    //NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SurfaceStream)
> 
> remove?
> 
> ::: gfx/gl/SurfaceTypes.h
> @@ +15,5 @@
> > +
> > +typedef uintptr_t SurfaceStreamHandle;
> > +
> > +/*
> > +struct CapabilityBit
> 
> This is commented. Should it be removed?
Yes.
> @@ +74,5 @@
> > +        return caps;
> > +    }
> > +};
> > +
> > +#define TEMP_ENUM_CLASS_BEGIN(Name,Type)\
> 
> I'm not a fan of using macro expansion for something like this. It makes
> searching/indexing the code much harder to save a bit of repetition. I don't
> think it's worth it. bjacob agrees with me on this one.
Switched this to the existing MOZ_BEGIN/END_ENUM_CLASS. I somewhat strongly disagree. When verboseness gets in the way of legibility, for code-comprehension's sake, I think they're worth it. Especially given how this error-prone code would need to be replicated multiple times, I think it's best to keep it in macros.
> 
> ::: gfx/gl/TempAsserts.h
> @@ +6,5 @@
> > +#ifndef TEMP_ASSERTS_H_
> > +#define TEMP_ASSERTS_H_
> > +
> > +#include "mozilla/Attributes.h"
> > +#include "mozilla/Assertions.h"
> 
> I don't think this belong in GFX code and in this patch.
Nope, it's derelict, and now deleted.
> 
> ::: gfx/layers/Layers.h
> @@ +1476,5 @@
> >      return mDirty; 
> >    }
> >  
> >    /**
> > +   * Register a callback to be called at the before each transaction.
> 
> s/before/start of/ or similar.
> 
> ::: gfx/layers/basic/BasicCanvasLayer.cpp
> @@ +107,5 @@
> >      mNeedsYFlip = true;
> > +    MOZ_ASSERT(mGLContext->IsOffscreen(), "canvas gl context isn't offscreen");
> > +
> > +    // Screen morphing is done in BasicShadowableCanvasLayer.
> > +    // [Basic Layers, MTC] WebGL layer init.
> 
> I don't understand this comment. Can you reword it?
> MTC=MainThreadCompositing? Screen morphing?
Yes, but I suppose "non-OMTC" is easier to understand?
(GL)Screen(Buffer) morphing is where we adapt the 'Screen' of the GLContext to accelerate compositing. Reworded the comment.
> 
> @@ +379,5 @@
> > +          if (!isCrossProcess) {
> > +            // [Basic/OGL Layers, OMTC] WebGL layer init.
> > +            factory = SurfaceFactory_EGLImage::Create(mGLContext, screen->Caps());
> > +          } else {
> > +            // [Basic/OGL Layers, OMPC] WebGL layer init.
> 
> This should be a runtime abort. Normally we use OOP for cross process not
> OMP. Might be worth spelling out rather then defining a new abbreviation.
Changed to OOPC, added comment saying it's "out of process compositing".
This should not be a runtime abort, since this is the branch B2G hits. We just don't define a new factory, instead falling back to readback. (Added a comment to this effect)
> 
> @@ +422,5 @@
> >        BasicManager()->GetParentBackendType() == mozilla::layers::LAYERS_OPENGL)
> >    {
> > +    bool isCrossProcess = XRE_GetProcessType() != GeckoProcessType_Default;
> > +    // Todo: If isCrossProcess (OMPC), spin off mini-thread to RequestFrame
> > +    // and forward Gralloc handle. Signal WaitSync complete with XPC mutex?
> 
> Are you planning on doing this TODO before landing the patch? If not we like
> want to runtime abort in paths that are incomplete rather then look debug
> why something isn't rendering.
No, but it's not incomplete. We just fall back to readback.
> 
> ::: gfx/layers/opengl/CanvasLayerOGL.cpp
> @@ +234,5 @@
> > +          MOZ_NOT_REACHED("Unacceptable SharedSurface type.");
> > +          return;
> > +      }
> > +    } else {
> > +      // No surf, so give it something blank.
> 
> This is neat but I'm not sure if this is better then just letting the
> background show through. What's your reasoning for doing this? Does this
> actually occur normally?
It now relies on having either a texture, or a surface to upload into a texture.
I believe this needs to be MarkDirty'd, though.
> 
> ::: gfx/layers/opengl/ImageLayerOGL.cpp
> @@ -989,5 @@
> >      mTexImage->BeginTileIteration();
> >  
> >      if (gl()->CanUploadNonPowerOfTwo()) {
> >        do {
> > -        TextureImage::ScopedBindTextureAndApplyFilter texBind(mTexImage, LOCAL_GL_TEXTURE0);
> 
> I don't like changing GL_TEXTURE0 -> 0. This is different from OpenGL and
> hurts readability. The readers has to look up the constructor for the
> definition of the second parameter where it was previous obvious. I don't
> see the benefit in making this change.
I really disagree here, and in fact a few lines above, we do SetTextureUnit(0).
GL_TEXTUREn is OpenGL cruft that is best left behind when we abstract away into TextureImages.
Regardless, this should be a separate bug, and this case is left over from when I tried to revert this change. Please point out anywhere else I missed reverting this change.
> 
> ::: xpcom/glue/nsCycleCollectionTraversalCallback.h
> @@ -58,5 @@
> >  
> >    uint32_t mFlags;
> >  };
> >  
> > -#endif // nsCycleCollectionTraversalCallback_h__
> \ No newline at end of file
> 
> \ No newline at end of file
> 
> Accidental change.
This was actually to quiet a warning about missing an EOF newline, but should just be committed separately from this patch.

Comment 81

4 years ago
(In reply to Jeff Gilbert [:jgilbert] from comment #80)
> (In reply to Benoit Girard (:BenWa) from comment #79)
> > I don't like changing GL_TEXTURE0 -> 0. This is different from OpenGL and
> > hurts readability. ...
> I really disagree here, and in fact a few lines above, we do
> SetTextureUnit(0).
> GL_TEXTUREn is OpenGL cruft that is best left behind when we abstract away
> into TextureImages.
I just want to point out that GL_TEXTURE0 != 0 so passing 0 means care must be taken elsewhere to pass the correct value to OpenGL.

But, regrettably when setting a sampler2D uniform value you cannot use GL_TEXTURE0, you must use 0 for unit 0 or (GL_TEXTUREn - GL_TEXTURE0) generally.
(Assignee)

Comment 82

4 years ago
(In reply to Mark Callow from comment #81)
> (In reply to Jeff Gilbert [:jgilbert] from comment #80)
> > (In reply to Benoit Girard (:BenWa) from comment #79)
> > > I don't like changing GL_TEXTURE0 -> 0. This is different from OpenGL and
> > > hurts readability. ...
> > I really disagree here, and in fact a few lines above, we do
> > SetTextureUnit(0).
> > GL_TEXTUREn is OpenGL cruft that is best left behind when we abstract away
> > into TextureImages.
> I just want to point out that GL_TEXTURE0 != 0 so passing 0 means care must
> be taken elsewhere to pass the correct value to OpenGL.
> 
> But, regrettably when setting a sampler2D uniform value you cannot use
> GL_TEXTURE0, you must use 0 for unit 0 or (GL_TEXTUREn - GL_TEXTURE0)
> generally.

Indeed. I added an assert that the value we get in here is >= GL_TEXTURE0. If/when we switch away from GL_TEXTURE for TextureImage, we would switch the assert to asserting that val < GL_TEXTURE0.
> > ::: content/canvas/src/WebGLContextGL.cpp
> > @@ -590,5 @@
> > > -                                  mStencilClearValue  == 0;
> > > -        if (valuesAreDefault &&
> > > -            mBackbufferClearingStatus == BackbufferClearingStatus::ClearedToDefaultValues)
> > > -        {
> > > -            needClearCallHere = false;
> > 
> > The code being removed here was an optimization skipping a WebGL clear()
> > when it was going to be redundant with the backbuffer clearing that we had
> > just performed. For all I can see, this optimization is removed by this
> > patch. r- for that too unless justification --- by default I assume that
> > avoiding a clear() is worthwhile, as it can be many pixels to fill.
> This is handled by PresentScreenBuffer() now, where it early-outs if nothing
> needs to change.

As discussed on IRC, this optimization is actually removed by this patch; please add it back in.
(Assignee)

Comment 84

4 years ago
Created attachment 686263 [details] [diff] [review]
supplimental patch for skipping redundent clears
Attachment #686263 - Flags: review?(bjacob)
Comment on attachment 685801 [details] [diff] [review]
combined patch

Review of attachment 685801 [details] [diff] [review]:
-----------------------------------------------------------------

I reviewed in varying degrees of thoroughness:
  WebGL*
  WebGL reftest.list
  GLContext.*
  GLContextProvider.h
  GLScreenBuffer.*
  SharedSurface.*
  SurfaceStream.*
  GLContextTypes.*
  SurfaceFactory.*
  SurfaceTypes.h

I did NOT review:
  specific GLContextProvider implementations (GLX, etc)
  specific SharedSurface implementations (GL, etc)
  Layers stuff

I would like a general design comment about the choice to use plain pointers for everything around SurfaceStream; see below. More generally, please document everything that relates to how we manage objects and avoid leaking them (e.g. my comment in SurfaceFactory).

::: content/canvas/src/WebGLContext.cpp
@@ +1146,5 @@
> +
> +        MOZ_ASSERT(depthWriteMask == mDepthWriteMask);
> +        MOZ_ASSERT(depthClearValue == mDepthClearValue);
> +
> +

2 lines of whitespace here

@@ +1174,5 @@
>      }
>  
>      if (initializeStencilBuffer) {
> +        // "The clear operation always uses the front stencil write mask
> +        //  when clearing the stencil buffer."

Clarify where this quote is from.

FWIW, this is a finer level of detail in the spec than I feel comfortable relying on this this security-sensitive operation, e.g. with this, we would re-open the hole on any driver that mixes up front-vs-back stencil masks, but fine --- it's OK to land this and only care once we know for sure that some driver gets it wrong.

::: content/canvas/src/WebGLContext.h
@@ +214,5 @@
>      bool preserveDrawingBuffer;
>  };
>  
> +/*
> + * The overall diagram around compositing WebGL frames looks like this:

Try to move this
 - either to a place that is more specifically about compositing (e.g. PublishFrame). In fact, in line with our current trend of splitting the WebGL codebase into separate files, how about starting a separate WebGLCompositing.cpp file and put a big comment at its top? Note, that can be a follow-up bug.
 - or make it the start of a separate DESIGN document.

Either way would also have the desirable effect of allowing to throw into it more information that is not directly relevant here, such as, the other information written in the above comment about the various classes there, and about the compositing process, which help make more sense of the class names here.

@@ +348,5 @@
> +
> +    /* There are two different concepts of 'clean' going on here:
> +     * 1) 'ContextClean', which is inherited by canvas drawing contexts to denote
> +     *    their invalidation state.
> +     * 2) 'ScreenClean', which is whether the webgl 'screen' has been cleared

I was wondering if ScreenCleared would be a better name, then?

@@ +351,5 @@
> +     *    their invalidation state.
> +     * 2) 'ScreenClean', which is whether the webgl 'screen' has been cleared
> +     *    to blank.
> +     *
> +     * WebGL Presents its backbuffer less often than Invalidation happens.

Try to connect that last sentence more explicitly to the above terminology: backbuffer -> webgl screen, invalidation -> canvas invalidation. Right now it leaves me wondering what logic connection it is trying to imply between ContextClean and ScreenClean.

@@ +354,5 @@
> +     *
> +     * WebGL Presents its backbuffer less often than Invalidation happens.
> +     */
> +
> +    // A note that this uses 'clean' without having anything to do with mIsScreenDirty.

Comment still not removed.

::: content/canvas/src/WebGLContextGL.cpp
@@ +580,4 @@
>      }
>  
> +    gl->fClear(mask);
> +

As discussed above, I would still like the redundant-clear optimization to be brought back from the Styx.

::: content/canvas/test/reftest/reftest.list
@@ -19,5 @@
>  == webgl-clear-test.html?aa&_____&preserve  wrapper.html?green.png
>  == webgl-clear-test.html?__&alpha&preserve  wrapper.html?green.png
>  == webgl-clear-test.html?aa&alpha&preserve  wrapper.html?green.png
>  
> -pref(webgl.force-layers-readback,true) fails-if(nativeFennec)  == webgl-clear-test.html?readback&__&_____&________  wrapper.html?green.png

Why are these tests failing on fennec? Why is that acceptable?

::: gfx/gl/GLContext.cpp
@@ +330,5 @@
>      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
>      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
>          sDebugMode |= DebugAbortOnError;
> +
> +    sDebugMode |= DebugEnabled;

Remove this debugging tweak (important!)

@@ +344,5 @@
>                  "ATI",
>                  "Qualcomm"
>              };
>  
> +            MOZ_ASSERT(glVendorString);

We have seen some systems where GL strings are null. I'd rather fail them cleanly than crash debug builds and silently continue in release builds.

@@ +1520,1 @@
>      if (gl->IsGLES2()) {

Why are we no longer using EXT_bgra when it's available?

@@ +1614,5 @@
> +    nsAutoPtr<gfxImageSurface> tempSurf;
> +    gfxImageSurface* readSurf = nullptr;
> +    int readPixelSize = 0;
> +    if (needsTempSurf) {
> +        NS_WARNING("Needing intermediary surface for ReadPixels. This will be slow!");

Are we potentially spewing this warning at 60 FPS?

::: gfx/gl/GLContext.h
@@ +1155,5 @@
> +    bool SharesWith(GLContext* that) const {
> +        return (that == this ||
> +                that == this->mSharedContext ||
> +                that->mSharedContext == this ||
> +                that->mSharedContext == this->mSharedContext);

That still wouldn't catch more complex cases... if this matters enough that you wrote this 4-line condition, it probably matters enough to get this right. You could do a potentially expensive computation to accurately group contexts into share groups, but only do it whenever that potentially changes (on context creation and destruction) and here you'd just have to look up a share-group-id.

@@ +1166,5 @@
> +        InitFramebuffers();
> +
> +        mCaps = mScreen->Caps();
> +        UpdateGLFormats(caps);
> +        UpdatePixelFormat();

I'll randomly, arbitrarily, and unfairly pick this one as the target for a little rant. This function is a good example of a function that I'm not honestly reviewing very well, because doing so requires me to look up again each of the other functions that it calls and that were likely affected by this patch. That's a good example of how, without compromises on review quality, review time grows _quadratically_ with patch size for large enough patches (the threshold being where a patch grows large enough that it can't fit in the reviewer's head).

@@ +2035,3 @@
>      void raw_fReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels) {
>          BEFORE_GL_CALL;
> +        mSymbols.fReadPixels(x, y, width, height, format, type, pixels);

Please explain this

@@ +2784,5 @@
>  
> +    ScopedGLWrapper()
> +        : mGL(nullptr)
> +    {
> +        NS_ERROR("Something either went wrong, or is about to.");

Why allow this to build? Why not fatal?

@@ +2952,5 @@
> +    }
> +
> +public:
> +    ScopedBindRenderbuffer(int val) {
> +        NS_ERROR("You forgot an arg.");

Same questions here.

@@ +3046,5 @@
> +        : ScopedGLWrapper<ScopedFramebufferRenderbuffer>(gl)
> +        , mComplete(false)
> +        , mFB(0)
> +    {
> +        mGL->fGenFramebuffers(1, &mFB);

The name ScopedFramebufferRenderbuffer suggests to me something that attaches and detaches a renderbuffer, not something that creates and destroys it. Probably because I parse this as Scoped(FramebufferRenderbuffer) and FramebufferRenderbuffer is a GL function. Maybe rename this in a way that prevents this mis-parsing? Like ScopedFramebufferAroundRenderbuffer?

::: gfx/gl/GLScreenBuffer.h
@@ +159,5 @@
> +    SurfaceStream* mStream;
> +
> +    DrawBuffer* mDraw;
> +    ReadBuffer* mRead;
> +

That is a lot of plain pointers to manage! Can you post an explanation of either why you did not use refcounting or if there is any simple fact that makes this all safe?

::: gfx/gl/SharedSurface.h
@@ +97,5 @@
> +
> +    // For use when AttachType is correct.
> +    virtual GLuint Texture() const {
> +        MOZ_ASSERT(AttachType() == AttachmentType::GLTexture);
> +        MOZ_NOT_REACHED("Did you forget to override this function?");

Why not make this method pure virtual then?

::: gfx/gl/SurfaceFactory.cpp
@@ +22,5 @@
> +
> +SharedSurface*
> +SurfaceFactory::NewSharedSurface(const gfxIntSize& size)
> +{
> +    while (!mScraps.empty()) {

Add a comment there as this seems to play a crucial role in not leaking surfaces.

::: gfx/gl/SurfaceStream.cpp
@@ +146,5 @@
> +    for (; itr != end; ++itr) {
> +        SharedSurface* cur = *itr;
> +        delete cur;
> +    }
> +    */

Remove commented-out code?
Attachment #685801 - Flags: review-
(Assignee)

Comment 86

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #85)
> Comment on attachment 685801 [details] [diff] [review]
> combined patch
> 
> Review of attachment 685801 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I reviewed in varying degrees of thoroughness:
>   WebGL*
>   WebGL reftest.list
>   GLContext.*
>   GLContextProvider.h
>   GLScreenBuffer.*
>   SharedSurface.*
>   SurfaceStream.*
>   GLContextTypes.*
>   SurfaceFactory.*
>   SurfaceTypes.h
> 
> I did NOT review:
>   specific GLContextProvider implementations (GLX, etc)
>   specific SharedSurface implementations (GL, etc)
>   Layers stuff
Is there someone better to review GLContextProvider and non-d3d SharedSurface backends, or are you saving those for later?
> 
> I would like a general design comment about the choice to use plain pointers
> for everything around SurfaceStream; see below. More generally, please
> document everything that relates to how we manage objects and avoid leaking
> them (e.g. my comment in SurfaceFactory).
Everything has a fixed lifetime, or a lifetime that is a strict sub- or superset of another object's lifetime. We don't need to rely on refcounting if we know when things should be destroyed, so here we don't. Everything has an owner, and the others that are borrowing pointers have shorter lifetimes than the owners.
Refcounting also doesn't work as well when it matters where or when things can be deleted, like it does with GL objects and friends.

I suppose the ideal solution would be refcounting, and asserting if we're not deleting where we think we should be/are kept alive too long? Is there a good way to do that?
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +1146,5 @@
> > +
> > +        MOZ_ASSERT(depthWriteMask == mDepthWriteMask);
> > +        MOZ_ASSERT(depthClearValue == mDepthClearValue);
> > +
> > +
> 
> 2 lines of whitespace here
> 
> @@ +1174,5 @@
> >      }
> >  
> >      if (initializeStencilBuffer) {
> > +        // "The clear operation always uses the front stencil write mask
> > +        //  when clearing the stencil buffer."
> 
> Clarify where this quote is from.
> 
> FWIW, this is a finer level of detail in the spec than I feel comfortable
> relying on this this security-sensitive operation, e.g. with this, we would
> re-open the hole on any driver that mixes up front-vs-back stencil masks,
> but fine --- it's OK to land this and only care once we know for sure that
> some driver gets it wrong.
I see your point. I can have it do both.
> 
> ::: content/canvas/src/WebGLContext.h
> @@ +214,5 @@
> >      bool preserveDrawingBuffer;
> >  };
> >  
> > +/*
> > + * The overall diagram around compositing WebGL frames looks like this:
> 
> Try to move this
>  - either to a place that is more specifically about compositing (e.g.
> PublishFrame). In fact, in line with our current trend of splitting the
> WebGL codebase into separate files, how about starting a separate
> WebGLCompositing.cpp file and put a big comment at its top? Note, that can
> be a follow-up bug.
>  - or make it the start of a separate DESIGN document.
> 
> Either way would also have the desirable effect of allowing to throw into it
> more information that is not directly relevant here, such as, the other
> information written in the above comment about the various classes there,
> and about the compositing process, which help make more sense of the class
> names here.
Ok, design document on the wiki, maybe?
> 
> @@ +348,5 @@
> > +
> > +    /* There are two different concepts of 'clean' going on here:
> > +     * 1) 'ContextClean', which is inherited by canvas drawing contexts to denote
> > +     *    their invalidation state.
> > +     * 2) 'ScreenClean', which is whether the webgl 'screen' has been cleared
> 
> I was wondering if ScreenCleared would be a better name, then?
> 
> @@ +351,5 @@
> > +     *    their invalidation state.
> > +     * 2) 'ScreenClean', which is whether the webgl 'screen' has been cleared
> > +     *    to blank.
> > +     *
> > +     * WebGL Presents its backbuffer less often than Invalidation happens.
> 
> Try to connect that last sentence more explicitly to the above terminology:
> backbuffer -> webgl screen, invalidation -> canvas invalidation. Right now
> it leaves me wondering what logic connection it is trying to imply between
> ContextClean and ScreenClean.
This is obsolete now, so I removed it.
> 
> @@ +354,5 @@
> > +     *
> > +     * WebGL Presents its backbuffer less often than Invalidation happens.
> > +     */
> > +
> > +    // A note that this uses 'clean' without having anything to do with mIsScreenDirty.
> 
> Comment still not removed.
Gone now.
> 
> ::: content/canvas/src/WebGLContextGL.cpp
> @@ +580,4 @@
> >      }
> >  
> > +    gl->fClear(mask);
> > +
> 
> As discussed above, I would still like the redundant-clear optimization to
> be brought back from the Styx.
In the follow up patch up above.
> 
> ::: content/canvas/test/reftest/reftest.list
> @@ -19,5 @@
> >  == webgl-clear-test.html?aa&_____&preserve  wrapper.html?green.png
> >  == webgl-clear-test.html?__&alpha&preserve  wrapper.html?green.png
> >  == webgl-clear-test.html?aa&alpha&preserve  wrapper.html?green.png
> >  
> > -pref(webgl.force-layers-readback,true) fails-if(nativeFennec)  == webgl-clear-test.html?readback&__&_____&________  wrapper.html?green.png
> 
> Why are these tests failing on fennec? Why is that acceptable?
Currently? Because we shouldn't be using readback ever there. We test it and want it to work because it's nice for debug work as a 'known working solution'. Ironically, it's the broken path on Fennec right now.
This fixes it though. :)

> 
> ::: gfx/gl/GLContext.cpp
> @@ +330,5 @@
> >      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
> >      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
> >          sDebugMode |= DebugAbortOnError;
> > +
> > +    sDebugMode |= DebugEnabled;
> 
> Remove this debugging tweak (important!)
> 
> @@ +344,5 @@
> >                  "ATI",
> >                  "Qualcomm"
> >              };
> >  
> > +            MOZ_ASSERT(glVendorString);
> 
> We have seen some systems where GL strings are null. I'd rather fail them
> cleanly than crash debug builds and silently continue in release builds.
Ok, gross. What do we do work WorkAroundDriverBugs there, though? Presumably we should be doing all of the workarounds, if we don't know what we're working with, shouldn't we?
> 
> @@ +1520,1 @@
> >      if (gl->IsGLES2()) {
> 
> Why are we no longer using EXT_bgra when it's available?
EXT_bgra isn't a GLES extension, and it was core in desktop GL by 2.0, which is our minimum.
> 
> @@ +1614,5 @@
> > +    nsAutoPtr<gfxImageSurface> tempSurf;
> > +    gfxImageSurface* readSurf = nullptr;
> > +    int readPixelSize = 0;
> > +    if (needsTempSurf) {
> > +        NS_WARNING("Needing intermediary surface for ReadPixels. This will be slow!");
> 
> Are we potentially spewing this warning at 60 FPS?
Yes. Changed this to depend on DebugMode().
> 
> ::: gfx/gl/GLContext.h
> @@ +1155,5 @@
> > +    bool SharesWith(GLContext* that) const {
> > +        return (that == this ||
> > +                that == this->mSharedContext ||
> > +                that->mSharedContext == this ||
> > +                that->mSharedContext == this->mSharedContext);
> 
> That still wouldn't catch more complex cases... if this matters enough that
> you wrote this 4-line condition, it probably matters enough to get this
> right. You could do a potentially expensive computation to accurately group
> contexts into share groups, but only do it whenever that potentially changes
> (on context creation and destruction) and here you'd just have to look up a
> share-group-id.
I don't see what this doesn't catch. I think I need to add more asserts here.
My assumptions:
If we are using share contexts, we keep a global share context which everyone shares with.
Thus, if a given shared GLContext is not the global share context, its `mSharedContext` *is*.
This can be restated as:
MOZ_ASSERT(!this->mSharedContext || !this->mSharedContext->mSharedContext);
MOZ_ASSERT(!other->mSharedContext || !other->mSharedContext->mSharedContext);
MOZ_ASSERT(!this->mSharedContext ||
           !other->mSharedContext ||
           this->mSharedContext == other->mSharedContext);
GLContext* thisShared = this->mSharedContext ? this->mSharedContext : this;
GLContext* otherShared = other->mSharedContext ? other->mSharedContext : other;
return thisShared == otherShared;

I assume that we never have a bizarre chain or graph of contexts, and that if we're sharing, we're sharing at one point. (Even if sometimes `this` or `other` is that point.
Is this incorrect?
> 
> @@ +1166,5 @@
> > +        InitFramebuffers();
> > +
> > +        mCaps = mScreen->Caps();
> > +        UpdateGLFormats(caps);
> > +        UpdatePixelFormat();
> 
> I'll randomly, arbitrarily, and unfairly pick this one as the target for a
> little rant. This function is a good example of a function that I'm not
> honestly reviewing very well, because doing so requires me to look up again
> each of the other functions that it calls and that were likely affected by
> this patch. That's a good example of how, without compromises on review
> quality, review time grows _quadratically_ with patch size for large enough
> patches (the threshold being where a patch grows large enough that it can't
> fit in the reviewer's head).
You are very correct, but to force a number of these changes into incremental patches would mean trying to match new semantics to old code that's (to a greater or lesser extent) incompatible with the changes the new structures bring.
For example, trying to get WebGL working with rendering to SharedSurfaces without the SurfaceStream pipe would mean wasting development on a transient configuration which should never see the light of day.
> 
> @@ +2035,3 @@
> >      void raw_fReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels) {
> >          BEFORE_GL_CALL;
> > +        mSymbols.fReadPixels(x, y, width, height, format, type, pixels);
> 
> Please explain this
This can be in a separate patch, but basically, logic dependent on arcane bits of GLContext state crept into important raw_f[...] functions like ReadPixels and CopyTexSubImage. Removed into a different patch.
> 
> @@ +2784,5 @@
> >  
> > +    ScopedGLWrapper()
> > +        : mGL(nullptr)
> > +    {
> > +        NS_ERROR("Something either went wrong, or is about to.");
> 
> Why allow this to build? Why not fatal?
This ended up being a misguided attempt to prevent parameterless construction. Gone now.
> 
> @@ +2952,5 @@
> > +    }
> > +
> > +public:
> > +    ScopedBindRenderbuffer(int val) {
> > +        NS_ERROR("You forgot an arg.");
> 
> Same questions here.
This should have used `explicit` on the (GLContext*) version. Changed.
> 
> @@ +3046,5 @@
> > +        : ScopedGLWrapper<ScopedFramebufferRenderbuffer>(gl)
> > +        , mComplete(false)
> > +        , mFB(0)
> > +    {
> > +        mGL->fGenFramebuffers(1, &mFB);
> 
> The name ScopedFramebufferRenderbuffer suggests to me something that
> attaches and detaches a renderbuffer, not something that creates and
> destroys it. Probably because I parse this as
> Scoped(FramebufferRenderbuffer) and FramebufferRenderbuffer is a GL
> function. Maybe rename this in a way that prevents this mis-parsing? Like
> ScopedFramebufferAroundRenderbuffer?
Fair, as that is the reason behind their naming.
ScopedFramebufferForRenderbuffer and ScopedFramebufferForTexture?
> 
> ::: gfx/gl/GLScreenBuffer.h
> @@ +159,5 @@
> > +    SurfaceStream* mStream;
> > +
> > +    DrawBuffer* mDraw;
> > +    ReadBuffer* mRead;
> > +
> 
> That is a lot of plain pointers to manage! Can you post an explanation of
> either why you did not use refcounting or if there is any simple fact that
> makes this all safe?
Labelled their ownership. GLScreenBuffer owns everything but mGL.
> 
> ::: gfx/gl/SharedSurface.h
> @@ +97,5 @@
> > +
> > +    // For use when AttachType is correct.
> > +    virtual GLuint Texture() const {
> > +        MOZ_ASSERT(AttachType() == AttachmentType::GLTexture);
> > +        MOZ_NOT_REACHED("Did you forget to override this function?");
> 
> Why not make this method pure virtual then?
Because then all descendants would need to override it. Here, only descendants of AttachmentType GLTexture need to override it.
> 
> ::: gfx/gl/SurfaceFactory.cpp
> @@ +22,5 @@
> > +
> > +SharedSurface*
> > +SurfaceFactory::NewSharedSurface(const gfxIntSize& size)
> > +{
> > +    while (!mScraps.empty()) {
> 
> Add a comment there as this seems to play a crucial role in not leaking
> surfaces.
This doesn't so much prevent leaks as it reuses buffers instead of destroying them after each frame is completed.
> 
> ::: gfx/gl/SurfaceStream.cpp
> @@ +146,5 @@
> > +    for (; itr != end; ++itr) {
> > +        SharedSurface* cur = *itr;
> > +        delete cur;
> > +    }
> > +    */
> 
> Remove commented-out code?
Done.
(Assignee)

Comment 87

4 years ago
https://wiki.mozilla.org/Platform/GFX/WebGL/Compositing is now a thing.
I'll flesh it out more tomorrow. I do like the idea of concentrating the WebGL compositing stuff it its own CPP file, though. Let's do that after this.
(Assignee)

Comment 88

4 years ago
Created attachment 686455 [details] [diff] [review]
After round 2 of reviews
Attachment #685801 - Attachment is obsolete: true
Attachment #686455 - Flags: review?(bjacob)
Attachment #686455 - Flags: review?(bgirard)
(Assignee)

Comment 89

4 years ago
(In reply to Jeff Gilbert [:jgilbert] from comment #86)
> (In reply to Benoit Jacob [:bjacob] from comment #85)
> > ::: content/canvas/src/WebGLContext.cpp
> > @@ +1146,5 @@
> > > +
> > > +        MOZ_ASSERT(depthWriteMask == mDepthWriteMask);
> > > +        MOZ_ASSERT(depthClearValue == mDepthClearValue);
> > > +
> > > +
> > 
> > 2 lines of whitespace here
Forgot to do this, fixed now.
I also totally changed an #ifdef to an #ifndef here to get code completion on the inner block.
It's back to being an #ifdef now.
> > ::: gfx/gl/GLContext.cpp
> > @@ +330,5 @@
> > >      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
> > >      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
> > >          sDebugMode |= DebugAbortOnError;
> > > +
> > > +    sDebugMode |= DebugEnabled;
> > 
> > Remove this debugging tweak (important!)
Also forgot this. I'll remove this, but why don't we do this on DEBUG builds? They're already going to be slow, and we lose assert coverage for some things without DebugMode().
That's why I'm keeping it in until everything's passing Try.
> > 
> > @@ +344,5 @@
> > >                  "ATI",
> > >                  "Qualcomm"
> > >              };
> > >  
> > > +            MOZ_ASSERT(glVendorString);
> > 
> > We have seen some systems where GL strings are null. I'd rather fail them
> > cleanly than crash debug builds and silently continue in release builds.
> Ok, gross. What do we do work WorkAroundDriverBugs there, though? Presumably
> we should be doing all of the workarounds, if we don't know what we're
> working with, shouldn't we?
We need to figure out what we want to do here.
> > @@ +2035,3 @@
> > >      void raw_fReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels) {
> > >          BEFORE_GL_CALL;
> > > +        mSymbols.fReadPixels(x, y, width, height, format, type, pixels);
> > 
> > Please explain this
> This can be in a separate patch, but basically, logic dependent on arcane
> bits of GLContext state crept into important raw_f[...] functions like
> ReadPixels and CopyTexSubImage. Removed into a different patch.
Forgot to pull these out. I'll pull them out. Disregard them for now.
(Assignee)

Updated

4 years ago
Attachment #685423 - Flags: review?(bjacob)
(Assignee)

Updated

4 years ago
Attachment #685424 - Flags: review?(bjacob)
(Assignee)

Updated

4 years ago
Attachment #685424 - Attachment is obsolete: true
(In reply to Jeff Gilbert [:jgilbert] from comment #86)
> Is there someone better to review GLContextProvider and non-d3d
> SharedSurface backends, or are you saving those for later?

I can review the non-d3d SharedSurface backends and the GLX and WGL context providers. I think you should ask BenWa to review GLContextProviderEGL and GLContextProviderCGL. I'll still take a quick look at them.

> > 
> > I would like a general design comment about the choice to use plain pointers
> > for everything around SurfaceStream; see below. More generally, please
> > document everything that relates to how we manage objects and avoid leaking
> > them (e.g. my comment in SurfaceFactory).
> Everything has a fixed lifetime, or a lifetime that is a strict sub- or
> superset of another object's lifetime. We don't need to rely on refcounting
> if we know when things should be destroyed, so here we don't. Everything has
> an owner, and the others that are borrowing pointers have shorter lifetimes
> than the owners.

Thanks, that's what I needed to know.

> Refcounting also doesn't work as well when it matters where or when things
> can be deleted, like it does with GL objects and friends.
> 
> I suppose the ideal solution would be refcounting, and asserting if we're
> not deleting where we think we should be/are kept alive too long? Is there a
> good way to do that?

I think that the issue you mention about refcounting can be sorted out by ensuring that what determines when it's safe to destroy an object can be defined purely in terms of the lifespans of other objects, and designing your ownership relations accordingly; but that is not a conversation that needs to block the present bug.

> > Why are these tests failing on fennec? Why is that acceptable?
> Currently?

Nevermind. I misread this hunk as an addition, instead of a deletion.

> > 
> > ::: gfx/gl/GLContext.cpp
> > @@ +330,5 @@
> > >      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
> > >      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
> > >          sDebugMode |= DebugAbortOnError;
> > > +
> > > +    sDebugMode |= DebugEnabled;
> > 
> > Remove this debugging tweak (important!)

This is really important! Hope you're not forgetting about this bit. Will watch out for it in next review.

> > 
> > @@ +344,5 @@
> > >                  "ATI",
> > >                  "Qualcomm"
> > >              };
> > >  
> > > +            MOZ_ASSERT(glVendorString);
> > 
> > We have seen some systems where GL strings are null. I'd rather fail them
> > cleanly than crash debug builds and silently continue in release builds.
> Ok, gross. What do we do work WorkAroundDriverBugs there, though? Presumably
> we should be doing all of the workarounds, if we don't know what we're
> working with, shouldn't we?

I thought that our current InitWithPrefix code gracefully handled null GLStrings (presumably by failing the initialization). I don't remember. On Linux, null strings in glxtest leads to everything getting blacklisted.

> > ::: gfx/gl/GLContext.h
> > @@ +1155,5 @@
> > > +    bool SharesWith(GLContext* that) const {
> > > +        return (that == this ||
> > > +                that == this->mSharedContext ||
> > > +                that->mSharedContext == this ||
> > > +                that->mSharedContext == this->mSharedContext);
> > 
> > That still wouldn't catch more complex cases... if this matters enough that
> > you wrote this 4-line condition, it probably matters enough to get this
> > right. You could do a potentially expensive computation to accurately group
> > contexts into share groups, but only do it whenever that potentially changes
> > (on context creation and destruction) and here you'd just have to look up a
> > share-group-id.
> I don't see what this doesn't catch. I think I need to add more asserts here.
> My assumptions:
> If we are using share contexts, we keep a global share context which
> everyone shares with.

OK, with that assumption, that if() is safe indeed. I guess I wasn't sure how much we wanted to hardcode the idea of a global context.

> I assume that we never have a bizarre chain or graph of contexts, and that
> if we're sharing, we're sharing at one point. (Even if sometimes `this` or
> `other` is that point.
> Is this incorrect?

Should be correct. Even if we got rid of the global context, we would then be sharing at the level of the layermanager's gl context, so there would still be a single point of sharing.

> > The name ScopedFramebufferRenderbuffer suggests to me something that
> > attaches and detaches a renderbuffer, not something that creates and
> > destroys it. Probably because I parse this as
> > Scoped(FramebufferRenderbuffer) and FramebufferRenderbuffer is a GL
> > function. Maybe rename this in a way that prevents this mis-parsing? Like
> > ScopedFramebufferAroundRenderbuffer?
> Fair, as that is the reason behind their naming.
> ScopedFramebufferForRenderbuffer and ScopedFramebufferForTexture?

Sounds good.

> > Why not make this method pure virtual then?
> Because then all descendants would need to override it. Here, only
> descendants of AttachmentType GLTexture need to override it.

Fair.
(In reply to Jeff Gilbert [:jgilbert] from comment #89)
> Also forgot this. I'll remove this, but why don't we do this on DEBUG
> builds? They're already going to be slow, and we lose assert coverage for
> some things without DebugMode().

MOZ_GL_DEBUG is really slow; DEBUG builds aren't otherwise very slow (what is slow is --disable-optimize, which is an orthogonal thing).

Also, we try to keep DEBUG purely mean 'enable asserts' and not evolve into an all-around throw-in-all-debug-helpers build option.
(In reply to Jeff Gilbert [:jgilbert] from comment #87)
> https://wiki.mozilla.org/Platform/GFX/WebGL/Compositing is now a thing.
> I'll flesh it out more tomorrow. I do like the idea of concentrating the
> WebGL compositing stuff it its own CPP file, though. Let's do that after
> this.

Theorem: we only think about updating documentation when it's right under our eyes while we code.

Corollary: out-of-tree documentation never gets updated.

Ideally I would like us to go the route I proposed above with a dedicated WebGLCompositing.cpp file with a big comment at the beginning.
Comment on attachment 686455 [details] [diff] [review]
After round 2 of reviews

Review of attachment 686455 [details] [diff] [review]:
-----------------------------------------------------------------

I amn't done with this review, nor have I found any serious issue, but I want to stop here for today and don't trust Splinter's cache so I'll just hit publish and make a note here that I had reviewed up to SharedSurface.h. r- because you told me it worked better with your workflow than just cancelling reviews.

::: gfx/gl/GLContext.cpp
@@ +330,5 @@
>      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
>      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
>          sDebugMode |= DebugAbortOnError;
> +
> +    sDebugMode |= DebugEnabled;

Remove!!

::: gfx/gl/GLContextProviderEGL.cpp
@@ +371,5 @@
>      void SetIsDoubleBuffered(bool aIsDB) {
>          mIsDoubleBuffered = aIsDB;
>      }
>  
> +    virtual void* GetEGLContext() {

Give me a very good reason why this has to return void*?

@@ +2125,3 @@
>  
> +    EGLConfig configs[1]; // We only need one.
> +    EGLint numConfigs = sizeof(configs)/sizeof(EGLConfig);

I think that we have macros for that, like NS_ARRAY_LENGTH or some such. Otherwise, I prefer something more direct like

const size_t numConfigs = 1; // We only need one.
EGLConfig configs[numConfigs];

::: gfx/gl/GLLibraryEGL.h
@@ +94,5 @@
>  {
>  public:
> +    static GLLibraryEGL* OfContext(GLContext* gl) {
> +        void* ret = gl->GetLibraryEGL();
> +        return (GLLibraryEGL*)ret;

Make this a static_cast.

@@ +534,5 @@
>  
>      bool mIsANGLE;
>  };
>  
> +GLLibraryEGL& GetEGLLibrary();

These GetLibrary... functions sometimes return pointers (like GetLibraryEGL), sometimes return references (like here). Try to standardize on either.

::: gfx/gl/GLScreenBuffer.cpp
@@ +79,5 @@
> +        break;
> +
> +    default:
> +        // In case we got a bad target.
> +        gl->raw_fBindFramebuffer(target, 0);

Why not do nothing and abort here?

@@ +85,5 @@
> +    }
> +}
> +
> +void
> +GLScreenBuffer::BindFB(GLuint fb)

This BindFB method is doing something completely different than the above Bind method. I would like more descriptive names. It should also be mentioned somewhere that |fb| here is a user framebuffer.

@@ +190,5 @@
> +    mGL->raw_fGetIntegerv(LOCAL_GL_DRAW_FRAMEBUFFER_BINDING_EXT, (GLint*)&actual);
> +
> +    GLuint predicted = mInternalDrawFB;
> +    if (predicted != actual) {
> +        NS_ERROR("Draw FB binding misprediction!");

just make that a printf_stderr too so that we dont use 3 different error-reporting things in 3 lines of code. Also, if it's OK to crash only debug builds, as I think that it is (i.e. unless there is a security concern), MOZ_ASSERT will do all you need (it takes varargs).

@@ +499,5 @@
> +        mStencilRB
> +    };
> +
> +    mGL->fDeleteFramebuffers(1, &fb);
> +    mGL->fDeleteRenderbuffers(3, rbs);

this rbs[] business saves 2 lines at the cost of adding 5 lines and nontrivialness.

@@ +570,5 @@
> +        mStencilRB
> +    };
> +
> +    mGL->fDeleteFramebuffers(1, &fb);
> +    mGL->fDeleteRenderbuffers(2, rbs);

same.

::: gfx/gl/GLScreenBuffer.h
@@ +99,5 @@
> +protected:
> +    GLContext* const mGL;
> +
> +    const GLuint mFB;
> +    // mFB contains:

rather 'has the following attachments'

@@ +260,5 @@
> +    GLuint mUserDrawFB;
> +    GLuint mUserReadFB;
> +    GLuint mInternalDrawFB;
> +    GLuint mInternalReadFB;
> +

Please group the data members together.

::: gfx/gl/Makefile.in
@@ +15,5 @@
>  EXPORT_LIBRARY	= 1
>  FAIL_ON_WARNINGS = 1
>  
>  EXPORTS	= \
> +    ForceDiscreteGPUHelperCGL.h \

Here, this list was indented with tabs. Either preserve the tabs, or switch it all to 2 spaces, but don't do 4 spaces.

::: gfx/gl/SharedSurface.h
@@ +59,5 @@
> +    /*
> +     * Lock is called before production.
> +     * Unlock is called before consumption, but
> +     * not necessarily before the someone else's Lock.
> +     * XXX This sounds wrong.

It does. From the name 'lock' I was looking for mutexes here. Please rename.

::: gfx/gl/SurfaceStream.cpp
@@ +76,5 @@
> +    MOZ_ASSERT(!surf);
> +}
> +
> +SharedSurface*
> +SurfaceStream::Surrender(SharedSurface*& surf)

I am french, I can review that!
Attachment #686455 - Flags: review?(bjacob) → review-
Comment on attachment 686263 [details] [diff] [review]
supplimental patch for skipping redundent clears

Review of attachment 686263 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContextGL.cpp
@@ +585,5 @@
> +    // Ok, we're clearing the default framebuffer/screen.
> +
> +    bool needsClear = true;
> +    if (mIsScreenCleared) {
> +        bool isClearRedundent = true;

redundant?

@@ +587,5 @@
> +    bool needsClear = true;
> +    if (mIsScreenCleared) {
> +        bool isClearRedundent = true;
> +        if (mScissorTestEnabled || mDitherEnabled)
> +            isClearRedundent = false;

I don't understand how scissor or dither can make the clear not redundant. The clear is only going to be redundant anyways if the clear color is the default and we didn't draw anything since the last clear due to preservedrawingbuffer=false. In such a case, dithering should have no effect, and a scissorbox is not going to make the clear less redundant.

@@ +604,5 @@
> +                !mColorWriteMask[2] ||
> +                !mColorWriteMask[3])
> +            {
> +                isClearRedundent = false;
> +            }

Likewise I don't understand how having a color write mask can make a clear less redundant.
Attachment #686263 - Flags: review?(bjacob) → review-
Has there been in Talos run of these patches? Might be worth looking into while we're still reviewing.
Comment on attachment 686455 [details] [diff] [review]
After round 2 of reviews

Review of attachment 686455 [details] [diff] [review]:
-----------------------------------------------------------------

r+ conditional on this patch getting a good testing on all platforms with all layer manager supported with that platform including manual testing where try doesn't cover.

::: content/canvas/src/WebGLContext.cpp
@@ +1111,5 @@
>  
> +    // fun GL fact: no need to worry about the viewport here, glViewport is just
> +    // setting up a coordinates transformation, it doesn't affect glClear at all.
> +
> +#ifndef DEBUG

Did you mean ifdef DEBUG?

::: gfx/gl/GLContext.cpp
@@ +1161,5 @@
> +    if (drawFB_out) {
> +        *drawFB_out = drawFB;
> +    } else if (drawFB) {
> +        printf_stderr("drawFB created when not requested!");
> +        MOZ_CRASH();

NS_RUNTIMEABORT

@@ +1168,5 @@
> +    if (readFB_out) {
> +        *readFB_out = readFB;
> +    } else if (readFB) {
> +        printf_stderr("readFB created when not requested!");
> +        MOZ_CRASH();

NS_RUNTIMEABORT

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +247,5 @@
>  
> +      pixels[2] = black;
> +      pixels[3] = purple;
> +
> +      temporarySurface->MarkDirty();

I though I had commented on this part I can't find it. Is this expected to be hit? I don't think we should be doing this for only one backend as it confusing to the user why the browser behaves differently randomly.

Let's do it everywhere or nowhere.
Attachment #686455 - Flags: review?(bgirard) → review+
(Assignee)

Comment 97

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #91)
> (In reply to Jeff Gilbert [:jgilbert] from comment #89)
> > Also forgot this. I'll remove this, but why don't we do this on DEBUG
> > builds? They're already going to be slow, and we lose assert coverage for
> > some things without DebugMode().
> 
> MOZ_GL_DEBUG is really slow; DEBUG builds aren't otherwise very slow (what
> is slow is --disable-optimize, which is an orthogonal thing).
> 
> Also, we try to keep DEBUG purely mean 'enable asserts' and not evolve into
> an all-around throw-in-all-debug-helpers build option.

I don't believe it's extraordinarily slow, particularly for things like the conformance tests. However, I can move everything that's behind DebugMode, but needs test coverage, to #ifdef DEBUG.

The corollary here is that we should not have any asserts that are hidden behind DebugMode, with the exception of abort-on-error. That is, someone with a green debug Try build should be able to use MOZ_GL_DEBUG and not get any new asserts.
(Assignee)

Comment 98

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #93)
> Comment on attachment 686455 [details] [diff] [review]
> After round 2 of reviews
> 
> Review of attachment 686455 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I amn't done with this review, nor have I found any serious issue, but I
> want to stop here for today and don't trust Splinter's cache so I'll just
> hit publish and make a note here that I had reviewed up to SharedSurface.h.
> r- because you told me it worked better with your workflow than just
> cancelling reviews.
Thanks. :)
> 
> ::: gfx/gl/GLContext.cpp
> @@ +330,5 @@
> >      // aborts on GL error. Can be useful to debug quicker code that is known not to generate any GL error in principle.
> >      if (PR_GetEnv("MOZ_GL_DEBUG_ABORT_ON_ERROR"))
> >          sDebugMode |= DebugAbortOnError;
> > +
> > +    sDebugMode |= DebugEnabled;
> 
> Remove!!
Done.
> 
> ::: gfx/gl/GLContextProviderEGL.cpp
> @@ +371,5 @@
> >      void SetIsDoubleBuffered(bool aIsDB) {
> >          mIsDoubleBuffered = aIsDB;
> >      }
> >  
> > +    virtual void* GetEGLContext() {
> 
> Give me a very good reason why this has to return void*?
This was from when EGLContext was defined in GLCProviderEGL. All those types are now in GLDefs, so I switched these to actual types now. (fixed)
> 
> @@ +2125,3 @@
> >  
> > +    EGLConfig configs[1]; // We only need one.
> > +    EGLint numConfigs = sizeof(configs)/sizeof(EGLConfig);
> 
> I think that we have macros for that, like NS_ARRAY_LENGTH or some such.
> Otherwise, I prefer something more direct like
> 
> const size_t numConfigs = 1; // We only need one.
> EGLConfig configs[numConfigs];
I like the latter a lot more. This was existing code, but I can fix it up.
> 
> ::: gfx/gl/GLLibraryEGL.h
> @@ +94,5 @@
> >  {
> >  public:
> > +    static GLLibraryEGL* OfContext(GLContext* gl) {
> > +        void* ret = gl->GetLibraryEGL();
> > +        return (GLLibraryEGL*)ret;
> 
> Make this a static_cast.
Now that GetLibraryEGL is returning a pointer of the correct type, this function can go away.
> 
> @@ +534,5 @@
> >  
> >      bool mIsANGLE;
> >  };
> >  
> > +GLLibraryEGL& GetEGLLibrary();
> 
> These GetLibrary... functions sometimes return pointers (like
> GetLibraryEGL), sometimes return references (like here). Try to standardize
> on either.
This looks unused. Removed.
> 
> ::: gfx/gl/GLScreenBuffer.cpp
> @@ +79,5 @@
> > +        break;
> > +
> > +    default:
> > +        // In case we got a bad target.
> > +        gl->raw_fBindFramebuffer(target, 0);
> 
> Why not do nothing and abort here?
I thought we might get invalid targets from WebGL or some such, but it looks like we check that. Adding a MOZ_NOT_REACHED here.
> 
> @@ +85,5 @@
> > +    }
> > +}
> > +
> > +void
> > +GLScreenBuffer::BindFB(GLuint fb)
> 
> This BindFB method is doing something completely different than the above
> Bind method. I would like more descriptive names. It should also be
> mentioned somewhere that |fb| here is a user framebuffer.
Ok, renaming.
> 
> @@ +190,5 @@
> > +    mGL->raw_fGetIntegerv(LOCAL_GL_DRAW_FRAMEBUFFER_BINDING_EXT, (GLint*)&actual);
> > +
> > +    GLuint predicted = mInternalDrawFB;
> > +    if (predicted != actual) {
> > +        NS_ERROR("Draw FB binding misprediction!");
> 
> just make that a printf_stderr too so that we dont use 3 different
> error-reporting things in 3 lines of code. Also, if it's OK to crash only
> debug builds, as I think that it is (i.e. unless there is a security
> concern), MOZ_ASSERT will do all you need (it takes varargs).
I do not believe MOZ_ASSERT takes varargs a la printf.
I flipped the order of the NS_ERROR and printf, and changed the NS_ERROR to a MOZ_ASSERT(false,...), which should have the same effect.
> 
> @@ +499,5 @@
> > +        mStencilRB
> > +    };
> > +
> > +    mGL->fDeleteFramebuffers(1, &fb);
> > +    mGL->fDeleteRenderbuffers(3, rbs);
> 
> this rbs[] business saves 2 lines at the cost of adding 5 lines and
> nontrivialness.
fDeleteX doesn't accept `const GLuint*`. If we don't want to play with const-casts, we need to assign these four vars to temporary `GLuint`s, then pass those into fDeleteX. I think this this is much nicer than const-casting, and this API was explicitly designed for this.
> 
> @@ +570,5 @@
> > +        mStencilRB
> > +    };
> > +
> > +    mGL->fDeleteFramebuffers(1, &fb);
> > +    mGL->fDeleteRenderbuffers(2, rbs);
> 
> same.
> 
> ::: gfx/gl/GLScreenBuffer.h
> @@ +99,5 @@
> > +protected:
> > +    GLContext* const mGL;
> > +
> > +    const GLuint mFB;
> > +    // mFB contains:
> 
> rather 'has the following attachments'
> 
> @@ +260,5 @@
> > +    GLuint mUserDrawFB;
> > +    GLuint mUserReadFB;
> > +    GLuint mInternalDrawFB;
> > +    GLuint mInternalReadFB;
> > +
> 
> Please group the data members together.
Ok.
> 
> ::: gfx/gl/Makefile.in
> @@ +15,5 @@
> >  EXPORT_LIBRARY	= 1
> >  FAIL_ON_WARNINGS = 1
> >  
> >  EXPORTS	= \
> > +    ForceDiscreteGPUHelperCGL.h \
> 
> Here, this list was indented with tabs. Either preserve the tabs, or switch
> it all to 2 spaces, but don't do 4 spaces.
Aww. Redone with 2-space indents.
> 
> ::: gfx/gl/SharedSurface.h
> @@ +59,5 @@
> > +    /*
> > +     * Lock is called before production.
> > +     * Unlock is called before consumption, but
> > +     * not necessarily before the someone else's Lock.
> > +     * XXX This sounds wrong.
> 
> It does. From the name 'lock' I was looking for mutexes here. Please rename.
I fixed this, actually. It asserts if more than one surf is locked at a time, now. Comment updated.
> 
> ::: gfx/gl/SurfaceStream.cpp
> @@ +76,5 @@
> > +    MOZ_ASSERT(!surf);
> > +}
> > +
> > +SharedSurface*
> > +SurfaceStream::Surrender(SharedSurface*& surf)
> 
> I am french, I can review that!
\o/
(Assignee)

Comment 99

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #94)
> Comment on attachment 686263 [details] [diff] [review]
> supplimental patch for skipping redundent clears
> 
> Review of attachment 686263 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/canvas/src/WebGLContextGL.cpp
> @@ +585,5 @@
> > +    // Ok, we're clearing the default framebuffer/screen.
> > +
> > +    bool needsClear = true;
> > +    if (mIsScreenCleared) {
> > +        bool isClearRedundent = true;
> 
> redundant?
...maybe. For bonus points, I also misspelled 'supplemental'. :(
> 
> @@ +587,5 @@
> > +    bool needsClear = true;
> > +    if (mIsScreenCleared) {
> > +        bool isClearRedundent = true;
> > +        if (mScissorTestEnabled || mDitherEnabled)
> > +            isClearRedundent = false;
> 
> I don't understand how scissor or dither can make the clear not redundant.
> The clear is only going to be redundant anyways if the clear color is the
> default and we didn't draw anything since the last clear due to
> preservedrawingbuffer=false. In such a case, dithering should have no
> effect, and a scissorbox is not going to make the clear less redundant.
You're right about scissor, and since we're clearing to (0, 0, 0, 0), dithering shouldn't matter.
> 
> @@ +604,5 @@
> > +                !mColorWriteMask[2] ||
> > +                !mColorWriteMask[3])
> > +            {
> > +                isClearRedundent = false;
> > +            }
> 
> Likewise I don't understand how having a color write mask can make a clear
> less redundant.
Yep. Same with depthMask (Boolean, so we're either clearing again, or not touching it) and stencilMask. (Zero masked with anything is still zero)
(Assignee)

Comment 100

4 years ago
(In reply to Benoit Girard (:BenWa) from comment #95)
> Has there been in Talos run of these patches? Might be worth looking into
> while we're still reviewing.

Good point. Not yet, but I'll spin one up with the next Try run.

(In reply to Benoit Girard (:BenWa) from comment #96)
> Comment on attachment 686455 [details] [diff] [review]
> After round 2 of reviews
> 
> Review of attachment 686455 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ conditional on this patch getting a good testing on all platforms with
> all layer manager supported with that platform including manual testing
> where try doesn't cover.
So manual testing on Basic+Mac and OGL+Linux? I suppose we don't run unaccelerated m1 on windows anywhere, either.
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +1111,5 @@
> >  
> > +    // fun GL fact: no need to worry about the viewport here, glViewport is just
> > +    // setting up a coordinates transformation, it doesn't affect glClear at all.
> > +
> > +#ifndef DEBUG
> 
> Did you mean ifdef DEBUG?
Yes, fixed.
> 
> ::: gfx/gl/GLContext.cpp
> @@ +1161,5 @@
> > +    if (drawFB_out) {
> > +        *drawFB_out = drawFB;
> > +    } else if (drawFB) {
> > +        printf_stderr("drawFB created when not requested!");
> > +        MOZ_CRASH();
> 
> NS_RUNTIMEABORT
Ok.
> 
> @@ +1168,5 @@
> > +    if (readFB_out) {
> > +        *readFB_out = readFB;
> > +    } else if (readFB) {
> > +        printf_stderr("readFB created when not requested!");
> > +        MOZ_CRASH();
> 
> NS_RUNTIMEABORT
Ok.
> 
> ::: gfx/layers/opengl/CanvasLayerOGL.cpp
> @@ +247,5 @@
> >  
> > +      pixels[2] = black;
> > +      pixels[3] = purple;
> > +
> > +      temporarySurface->MarkDirty();
> 
> I though I had commented on this part I can't find it. Is this expected to
> be hit? I don't think we should be doing this for only one backend as it
> confusing to the user why the browser behaves differently randomly.
> 
> Let's do it everywhere or nowhere.
You did, and this was so that we could assert that we always end up with a texture to display.
Let's do 'nowhere' for now, though, and consider 'everywhere' in another bug.
(Assignee)

Comment 101

4 years ago
Created attachment 689114 [details] [diff] [review]
After round 3

Combined diff of everything through review 3.
Attachment #685423 - Attachment is obsolete: true
Attachment #686263 - Attachment is obsolete: true
Attachment #686455 - Attachment is obsolete: true
Attachment #689114 - Flags: review?(bjacob)
Comment on attachment 689114 [details] [diff] [review]
After round 3

Review of attachment 689114 [details] [diff] [review]:
-----------------------------------------------------------------

Let's call that an R+, I am definitely running out of comments to make on this patch, I just have some nits and comments about things that could have been easier to understand. Above all, I'd like the new classes you're introducing to have comments in the header files about what each member is doing.

As said above, I haven't reviewed the D3D-specific parts; and I'm not sure if I said it earlier, but I also feel incompetent about the gfx/layers/ parts (though I have had a look at them).

This being a 400k patch, I expect/hope that my responsibility will be mitigated if something bad happens :-/

All in all though, this beautiful work, lots of good design here, very much looking forward to this landing.

::: gfx/gl/GLScreenBuffer.h
@@ +249,5 @@
> +     * performance reasons,
> +     * but we haven't made any guarantee that rendering is actually
> +     * done when Morph is run, just that it can't run concurrently
> +     * with rendering.
> +     */

This is really hard to understand. Doesn't seem to explain what Morph does in general. I also don't understand the remark about mPreserveBuffer. Came here to understand why it is OK for a consumer to forget() the factory after passing it to Morph, didn't find the answer.

While we are in this file, please add comments explaining what the other methods do, too. I know that we haven't been doing this very carefully so far, but it would be a good time to start doing it, and would make the review easier.

::: gfx/gl/SharedSurfaceANGLE.cpp
@@ +162,5 @@
> +    // TODO: Pick a config progamatically instead of hoping that
> +    // the first config will be minimally matching our request.
> +    EGLConfig config = configs[0];
> +#ifdef DEBUG
> +    egl->DumpEGLConfig(config);

This may be too verbose to dump unconditionally in DEBUG builds.

::: gfx/gl/SurfaceStream.h
@@ +17,5 @@
> +    namespace gfx {
> +        class SharedSurface;
> +        class SurfaceFactory;
> +    }
> +}

Why don't you move the forward decls just below so that they can be in the same namespace braces.

@@ +31,5 @@
> +        MainThread,
> +        OffMainThread
> +    } OMTC;
> +
> +    const SurfaceStreamType mType;

Does this have to be public?

@@ +40,5 @@
> +    static SurfaceStream* CreateForType(SurfaceStreamType type,
> +                                        SurfaceStream* prevStream = nullptr);
> +
> +    SurfaceStreamHandle GetShareHandle() {
> +        return (SurfaceStreamHandle)this;

Avoid C-style casts. here that would be a reinterpret_cast.

::: gfx/gl/SurfaceTypes.h
@@ +13,5 @@
> +
> +namespace mozilla {
> +namespace gfx {
> +
> +typedef uintptr_t SurfaceStreamHandle;

I see, we must accomodate any kind of system-specific handle, so OK.

@@ +59,5 @@
> +        return caps;
> +    }
> +};
> +
> +#define TEMP_ENUM_CLASS_BEGIN(Name,Type)\

This seems to be unused now, right? Only uses below are commented out.

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +166,4 @@
>      }
>  
> +    if (screen->Morph(factory, streamType)) {
> +      factory.forget();  // Pass ownership off to |screen|.

So, to review this line, I need to understand exactly what Morph does; see comment on GLScreenBuffer.h.

@@ +167,5 @@
>  
> +    if (screen->Morph(factory, streamType)) {
> +      factory.forget();  // Pass ownership off to |screen|.
> +    } else {
> +      NS_ERROR("Morph failed!");

When can a morph fail? is this the appropriate response?

@@ +357,2 @@
>      gl()->MakeCurrent();
> +    gl()->fDeleteTextures(1, &mUploadTexture);

Shouldn't you clear mUploadTexture there? I'm uncomfortable about having a function that deletes resources without writing down what it deleted. Could be convinced that it's fine though.
Attachment #689114 - Flags: review?(bjacob) → review+
(Assignee)

Comment 103

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #102)
> Comment on attachment 689114 [details] [diff] [review]
> After round 3
> 
> Review of attachment 689114 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Let's call that an R+, I am definitely running out of comments to make on
> this patch, I just have some nits and comments about things that could have
> been easier to understand. Above all, I'd like the new classes you're
> introducing to have comments in the header files about what each member is
> doing.
> 
> As said above, I haven't reviewed the D3D-specific parts; and I'm not sure
> if I said it earlier, but I also feel incompetent about the gfx/layers/
> parts (though I have had a look at them).
Great. D3D was Bas, and Layers was Nical/BenWa, so we're fine.
> 
> This being a 400k patch, I expect/hope that my responsibility will be
> mitigated if something bad happens :-/
It'll be on me, don't worry. :P
> 
> All in all though, this beautiful work, lots of good design here, very much
> looking forward to this landing.
Good to hear.
> 
> ::: gfx/gl/GLScreenBuffer.h
> @@ +249,5 @@
> > +     * performance reasons,
> > +     * but we haven't made any guarantee that rendering is actually
> > +     * done when Morph is run, just that it can't run concurrently
> > +     * with rendering.
> > +     */
> 
> This is really hard to understand. Doesn't seem to explain what Morph does
> in general. I also don't understand the remark about mPreserveBuffer. Came
> here to understand why it is OK for a consumer to forget() the factory after
> passing it to Morph, didn't find the answer.
Ok, sure. The idea is that since we know all our lifetimes, we don't need to keep RefPtrs to them.
However, we use RefPtrs in code that calls Morph so we don't have to worry about cleaning up after ourselves.
Once we successfully Morph, we know that the GLScreenBuffer will manage the lifetime of the factory, so we `forget` it, so RefPtr doesn't decrement and garbage collect, as there aren't any other refs.
I added comments to this affect.
> 
> While we are in this file, please add comments explaining what the other
> methods do, too. I know that we haven't been doing this very carefully so
> far, but it would be a good time to start doing it, and would make the
> review easier.
Fair point.
> 
> ::: gfx/gl/SharedSurfaceANGLE.cpp
> @@ +162,5 @@
> > +    // TODO: Pick a config progamatically instead of hoping that
> > +    // the first config will be minimally matching our request.
> > +    EGLConfig config = configs[0];
> > +#ifdef DEBUG
> > +    egl->DumpEGLConfig(config);
> 
> This may be too verbose to dump unconditionally in DEBUG builds.
Yep, this should be MOZ_GL_DEBUG only.
> 
> ::: gfx/gl/SurfaceStream.h
> @@ +17,5 @@
> > +    namespace gfx {
> > +        class SharedSurface;
> > +        class SurfaceFactory;
> > +    }
> > +}
> 
> Why don't you move the forward decls just below so that they can be in the
> same namespace braces.
I really like doing the forward decls ahead of time, as we're basically trying to use them in lieu of headers. Since everything's in the same namespace here, though, it's probably best to fold them together.
> 
> @@ +31,5 @@
> > +        MainThread,
> > +        OffMainThread
> > +    } OMTC;
> > +
> > +    const SurfaceStreamType mType;
> 
> Does this have to be public?
Probably not, but since it may be useful, and is const, I was exposing it anyways. Folded it back down into the `protected`s now.
> 
> @@ +40,5 @@
> > +    static SurfaceStream* CreateForType(SurfaceStreamType type,
> > +                                        SurfaceStream* prevStream = nullptr);
> > +
> > +    SurfaceStreamHandle GetShareHandle() {
> > +        return (SurfaceStreamHandle)this;
> 
> Avoid C-style casts. here that would be a reinterpret_cast.
Fixed.
> 
> ::: gfx/gl/SurfaceTypes.h
> @@ +13,5 @@
> > +
> > +namespace mozilla {
> > +namespace gfx {
> > +
> > +typedef uintptr_t SurfaceStreamHandle;
> 
> I see, we must accomodate any kind of system-specific handle, so OK.
I think we can actually just use pointers for this. I'm not sure why I hide the type like this. I think I thought I needed to. I'll check.
Nope, ipdl doesn't like pointers.
> 
> @@ +59,5 @@
> > +        return caps;
> > +    }
> > +};
> > +
> > +#define TEMP_ENUM_CLASS_BEGIN(Name,Type)\
> 
> This seems to be unused now, right? Only uses below are commented out.
Yep, this is obsolete. 
> 
> ::: gfx/layers/opengl/CanvasLayerOGL.cpp
> @@ +166,4 @@
> >      }
> >  
> > +    if (screen->Morph(factory, streamType)) {
> > +      factory.forget();  // Pass ownership off to |screen|.
> 
> So, to review this line, I need to understand exactly what Morph does; see
> comment on GLScreenBuffer.h.
> 
> @@ +167,5 @@
> >  
> > +    if (screen->Morph(factory, streamType)) {
> > +      factory.forget();  // Pass ownership off to |screen|.
> > +    } else {
> > +      NS_ERROR("Morph failed!");
> 
> When can a morph fail? is this the appropriate response?
Morph failing means we failed to swap out our surface streaming mechanism (double-buffered, et al) to the type we think will work best for the current configuration. This means that performance will probably be negatively impacted, but everything will keep working.

Looking at the code, there's both no reason it can fail now, and I can't see a reason it should ever fail in the future. Let's make it infallible.

I believe it used to fail because we used to assure that `factory` works if Morph succeeds, but we can't guarantee that well, right now. (And you can't exactly guarantee it ever, anyways). We need to be sure that if factory fails to produce, we swap it out with a default Basic factory.
> 
> @@ +357,2 @@
> >      gl()->MakeCurrent();
> > +    gl()->fDeleteTextures(1, &mUploadTexture);
> 
> Shouldn't you clear mUploadTexture there? I'm uncomfortable about having a
> function that deletes resources without writing down what it deleted. Could
> be convinced that it's fine though.
Agreed, we should be setting this to 0.
(Assignee)

Comment 104

4 years ago
Created attachment 700597 [details] [diff] [review]
after round 4 (patch)

Here's the patch on top of the existing queue. With this, we're green on try.
Attachment #689114 - Attachment is obsolete: true
Attachment #700597 - Flags: review?(bjacob)
(Assignee)

Comment 105

4 years ago
Created attachment 700598 [details] [diff] [review]
after round 4 (cumulative)
Comment on attachment 700597 [details] [diff] [review]
after round 4 (patch)

Review of attachment 700597 [details] [diff] [review]:
-----------------------------------------------------------------

I am not competent to review the gfx/layers/ parts.

For the rest, I have enough questions to require another round.

::: content/canvas/src/WebGLContext.cpp
@@ +818,5 @@
>          HTMLCanvasElement *canvas = userdata->mContent;
>          WebGLContext *context = static_cast<WebGLContext*>(canvas->GetContextAtIndex(0));
>  
>          // Mark ourselves as no longer invalidated.
> +        context->MarkContextClean();

Please explain this change?

@@ +1176,4 @@
>  
> +        GLint stencilBits = 0;
> +        gl->fGetIntegerv(LOCAL_GL_STENCIL_BITS, &stencilBits);
> +        GLuint stencilMask = (1 << stencilBits) - 1;

If stencilBits can ever be 32 (unlikely I guess), then it's important that you write this as

GLuint stencilMask = (GLuint(1) << stencilBits) - 1;

Otherwise you're overflowing a signed int, which is undefined.

@@ +1177,5 @@
> +        GLint stencilBits = 0;
> +        gl->fGetIntegerv(LOCAL_GL_STENCIL_BITS, &stencilBits);
> +        GLuint stencilMask = (1 << stencilBits) - 1;
> +
> +        MOZ_ASSERT(((stencilWriteMaskFront ^ mStencilWriteMaskFront) & stencilMask) == 0);

I find (a&c) == (b&c) easier to understand.

::: content/canvas/test/reftest/reftest.list
@@ +177,5 @@
>  
>  # Check that our experimental prefs still work:
>  
>  # 16bpp:
> +skip-if(winWidget) pref(webgl.prefer-16bpp,true)                                        == webgl-color-test.html?16bpp           wrapper.html?colors.png

Why does this have to be skipped on Windows?

::: gfx/gl/GLScreenBuffer.cpp
@@ +345,5 @@
> +
> +        // Swap out the apparently defective old factory.
> +        delete mFactory;
> +        mFactory = basicFactory;
> +    }

Please explain this new if (!nextSurf) { ... } block.

::: gfx/gl/GLScreenBuffer.h
@@ +246,5 @@
>  
> +    /* Morph swaps out our SurfaceStream mechanism and replaces it with
> +     * one best suited to our platform and compositor configuration.
> +     *
> +     * Must be called on the producing thread.

Could you guard this by an assertion?
Attachment #700597 - Flags: review?(bjacob) → review-
(Assignee)

Comment 107

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #106)
> Comment on attachment 700597 [details] [diff] [review]
> after round 4 (patch)
> 
> Review of attachment 700597 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I am not competent to review the gfx/layers/ parts.
> 
> For the rest, I have enough questions to require another round.
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +818,5 @@
> >          HTMLCanvasElement *canvas = userdata->mContent;
> >          WebGLContext *context = static_cast<WebGLContext*>(canvas->GetContextAtIndex(0));
> >  
> >          // Mark ourselves as no longer invalidated.
> > +        context->MarkContextClean();
> 
> Please explain this change?
> 
> @@ +1176,4 @@
> >  
> > +        GLint stencilBits = 0;
> > +        gl->fGetIntegerv(LOCAL_GL_STENCIL_BITS, &stencilBits);
> > +        GLuint stencilMask = (1 << stencilBits) - 1;
> 
> If stencilBits can ever be 32 (unlikely I guess), then it's important that
> you write this as
> 
> GLuint stencilMask = (GLuint(1) << stencilBits) - 1;
> 
> Otherwise you're overflowing a signed int, which is undefined.
Yep, fixed.
> 
> @@ +1177,5 @@
> > +        GLint stencilBits = 0;
> > +        gl->fGetIntegerv(LOCAL_GL_STENCIL_BITS, &stencilBits);
> > +        GLuint stencilMask = (1 << stencilBits) - 1;
> > +
> > +        MOZ_ASSERT(((stencilWriteMaskFront ^ mStencilWriteMaskFront) & stencilMask) == 0);
> 
> I find (a&c) == (b&c) easier to understand.
Ok. Was using ^ to try to keep it on one line. Fixed.
> 
> ::: content/canvas/test/reftest/reftest.list
> @@ +177,5 @@
> >  
> >  # Check that our experimental prefs still work:
> >  
> >  # 16bpp:
> > +skip-if(winWidget) pref(webgl.prefer-16bpp,true)                                        == webgl-color-test.html?16bpp           wrapper.html?colors.png
> 
> Why does this have to be skipped on Windows?
Because 16bpp is both broken and not useful on Windows. We should just not support it for now. We can fix it up in a later bug if we ever have reason to.
> 
> ::: gfx/gl/GLScreenBuffer.cpp
> @@ +345,5 @@
> > +
> > +        // Swap out the apparently defective old factory.
> > +        delete mFactory;
> > +        mFactory = basicFactory;
> > +    }
> 
> Please explain this new if (!nextSurf) { ... } block.
In the case where creating a new buffer fails, we assume the factory we have is defective, and create a Basic (readback) factory to try to salvage things and keep things working.
> 
> ::: gfx/gl/GLScreenBuffer.h
> @@ +246,5 @@
> >  
> > +    /* Morph swaps out our SurfaceStream mechanism and replaces it with
> > +     * one best suited to our platform and compositor configuration.
> > +     *
> > +     * Must be called on the producing thread.
> 
> Could you guard this by an assertion?
I don't think we really can. Especially once we move to WebGL on Web Workers, there won't be a single production thread anymore.
(Assignee)

Comment 108

4 years ago
(In reply to Benoit Jacob [:bjacob] from comment #106)
> Comment on attachment 700597 [details] [diff] [review]
> after round 4 (patch)
> 
> Review of attachment 700597 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I am not competent to review the gfx/layers/ parts.
I'll get one of nrc/nical/BenWa to check those.
> 
> For the rest, I have enough questions to require another round.
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +818,5 @@
> >          HTMLCanvasElement *canvas = userdata->mContent;
> >          WebGLContext *context = static_cast<WebGLContext*>(canvas->GetContextAtIndex(0));
> >  
> >          // Mark ourselves as no longer invalidated.
> > +        context->MarkContextClean();
> 
> Please explain this change?
I think it snuck into this patch because I missed it while merging previous patches with trunk. It's not a new change.
(Assignee)

Comment 109

4 years ago
Created attachment 701404 [details] [diff] [review]
after bjacob's round 5 (patch)
Attachment #701404 - Flags: review?
(Assignee)

Updated

4 years ago
Attachment #701404 - Flags: review? → review?(bjacob)
(Assignee)

Comment 110

4 years ago
Comment on attachment 700597 [details] [diff] [review]
after round 4 (patch)

To nical for the stuff in Layers.
Attachment #700597 - Flags: review?(nical.bugzilla)
(Reporter)

Updated

4 years ago
Attachment #701404 - Flags: review?(bjacob) → review+
Comment on attachment 700597 [details] [diff] [review]
after round 4 (patch)

Review of attachment 700597 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicCanvasLayer.cpp
@@ +392,5 @@
>          }
>        }
>      }
>  
> +    if (factory)

nit: use {} braces with if statement
(it's the convention in this file)

::: gfx/layers/d3d10/CanvasLayerD3D10.cpp
@@ +51,5 @@
>                                                          device(),
>                                                          screen->Caps());
>      }
>  
> +    if (factory)

nit: use braces with if statements (it's the convention in this files)

::: gfx/layers/opengl/CanvasLayerOGL.cpp
@@ +165,5 @@
>        factory = new SurfaceFactory_GLTexture(mGLContext, gl(), screen->Caps());
>      }
>  
> +    if (factory)
> +      screen->Morph(factory, streamType);

nit: use {} braces with if statements
(it's the convention in this files)
Attachment #700597 - Flags: review?(nical.bugzilla) → review+
(Assignee)

Comment 112

4 years ago
(In reply to Nicolas Silva [:nical] from comment #111)
> Comment on attachment 700597 [details] [diff] [review]
> after round 4 (patch)
> 
> Review of attachment 700597 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicCanvasLayer.cpp
> @@ +392,5 @@
> >          }
> >        }
> >      }
> >  
> > +    if (factory)
> 
> nit: use {} braces with if statement
> (it's the convention in this file)
> 
> ::: gfx/layers/d3d10/CanvasLayerD3D10.cpp
> @@ +51,5 @@
> >                                                          device(),
> >                                                          screen->Caps());
> >      }
> >  
> > +    if (factory)
> 
> nit: use braces with if statements (it's the convention in this files)
> 
> ::: gfx/layers/opengl/CanvasLayerOGL.cpp
> @@ +165,5 @@
> >        factory = new SurfaceFactory_GLTexture(mGLContext, gl(), screen->Caps());
> >      }
> >  
> > +    if (factory)
> > +      screen->Morph(factory, streamType);
> 
> nit: use {} braces with if statements
> (it's the convention in this files)

Will do.

Any bets on how many times this bounces from inbound? :P
(Assignee)

Comment 113

4 years ago
Created attachment 702013 [details] [diff] [review]
final patch 1

Final patches for committal.
Attachment #702013 - Flags: review+
(Assignee)

Comment 114

4 years ago
Comment on attachment 702013 [details] [diff] [review]
final patch 1

Actually, I'll just upload the combined patch for this.
Attachment #702013 - Attachment is obsolete: true
(Assignee)

Comment 115

4 years ago
Created attachment 702015 [details] [diff] [review]
Final patch 1

Combined 'final' patch.
Attachment #702015 - Flags: review+
(Assignee)

Updated

4 years ago
Attachment #700597 - Attachment is obsolete: true
(Assignee)

Comment 116

4 years ago
Most recent try build of Android (and incidentally Mac):
https://tbpl.mozilla.org/?tree=Try&rev=9e536066911f

This segfaults on my Nexus S, but runs fine on my Nexus 7. If anyone else can load this[1] up and try running a simple demo[2], it'd help determine how widespread this issue is.

[1]: Preferably the debug build, as that has the asserts it it.
[2]: http://spidergl.org/example.php?id=1
(In reply to Jeff Gilbert [:jgilbert] from comment #116)
> Most recent try build of Android (and incidentally Mac):
> https://tbpl.mozilla.org/?tree=Try&rev=9e536066911f
> 
> This segfaults on my Nexus S, but runs fine on my Nexus 7. If anyone else
> can load this[1] up and try running a simple demo[2], it'd help determine
> how widespread this issue is.
> 
> [1]: Preferably the debug build, as that has the asserts it it.
> [2]: http://spidergl.org/example.php?id=1

I have a bunch of devices here I could try it on. Where is the apk?
Nevermind.
On my Galaxy Nexus, it started to work and then the phone rebooted.
Launching the aquarium demo consistently crashes the browser or reboots the phone on Galaxy Nexus: http://conduit.bitops.com/~vladimir/misc/webglsamples/aquarium/aquarium.html

https://crash-stats.mozilla.com/report/index/bp-a200b95a-1366-4fe9-a9c2-c171e2130117
(Assignee)

Comment 121

4 years ago
My theory is that the segfault is a driver bug that occurs when we create an EGLImage on the consumer thread, and render to it on the producer. Reversing the construction of the EGLImage pipe makes everything work fine on my Nexus S.
(Assignee)

Comment 122

4 years ago
Created attachment 703807 [details] [diff] [review]
r+'d patch
Attachment #700598 - Attachment is obsolete: true
Attachment #701404 - Attachment is obsolete: true
Attachment #702015 - Attachment is obsolete: true
(Assignee)

Comment 123

4 years ago
Created attachment 703809 [details] [diff] [review]
Misc fixes.
Attachment #703809 - Flags: review?(bjacob)
(Assignee)

Comment 124

4 years ago
Created attachment 703810 [details] [diff] [review]
Construct the EGLImage pipe from prod-to-cons, instead of vice-versa.

I will remove the old path in favor of this one.
Attachment #703810 - Flags: review?(bjacob)
(Assignee)

Comment 125

4 years ago
This should be clean except for B2G R6:
https://tbpl.mozilla.org/?tree=Try&rev=becc348d6efa
blocking-b2g: --- → tef?
I need to have a chat with you about these patches in order to make any kind of constructive review. Maybe we can chat on IRC on Monday.
(Assignee)

Comment 127

4 years ago
Created attachment 704686 [details] [diff] [review]
Reverse construction direction of EGLImage pipe.
Attachment #703810 - Attachment is obsolete: true
Attachment #703810 - Flags: review?(bjacob)
Attachment #704686 - Flags: review?(bjacob)
Hey guys, just wanted to ping that this, and the follow up optimizations, are growing in priority.  More and more of the port efforts we are running into are using OpenGL ES and that means that WebGL is the best target on the web.  Jeff, try and make this a focus and let me know if you get pulled off for other issues.
(Assignee)

Comment 129

4 years ago
Here's the current version with the old EGLImage share replaced with the new one: 
https://tbpl.mozilla.org/?tree=Try&rev=e296f135b68f

Updated

4 years ago
Blocks: 833134
This isn't a change we can make for v1.0.0 - too risky.
blocking-b2g: tef? → -
tracking-b2g18: --- → ?
Comment on attachment 703809 [details] [diff] [review]
Misc fixes.

Review of attachment 703809 [details] [diff] [review]:
-----------------------------------------------------------------

Discussed with jgilbert over the phone.
Attachment #703809 - Flags: review?(bjacob) → review+
(Reporter)

Updated

4 years ago
Attachment #704686 - Flags: review?(bjacob) → review+
Re: previous review: please add a big class comment on SharedSurface_EGL to explain the things here, and consider turning the 3 bools there into a single enum so that we can't possibly be in an inconsistent state. Up to you.
FWIW, on a Nexus 10, the latest try builds are only a 10% win over current release Firefox version.  I'm guessing we'll need async layer transactions to see the really big wins here.  The patch will have the biggest immediate impact on (older) Tegra devices because of the CopyTexImage bug, but we will definitely need the async canvas layer transactions to get full benefit.
Blocks: 835165
I'm just curious, are we landing this soon?
(Assignee)

Comment 135

4 years ago
As soon as B2G tests pass. I have been pulled away for meetings all last week and the first part of this week.
This appears to be a major and risky change, and therefore won't be taken in v1.x unless partner required for stability/perf.
tracking-b2g18: ? → -
(Assignee)

Comment 137

4 years ago
Created attachment 709544 [details] [diff] [review]
combined patch
(Assignee)

Updated

4 years ago
Blocks: 788201
Created attachment 710076 [details] [diff] [review]
waitsync and swapconsumer fixes (on top of combined patch)

Tegra has issues with ClientWaitSync with non-FOREVER timeout.  Also guards against a potential crash if SwapConsumer returns null (that was being triggered by the FOREVER thing).
Attachment #710076 - Flags: review?(jgilbert)
(Assignee)

Updated

4 years ago
Attachment #710076 - Flags: review?(jgilbert) → review+
Blocks: 829747
(Assignee)

Comment 139

4 years ago
Created attachment 711199 [details] [diff] [review]
cumulative patch with gralloc/ipc path
(Assignee)

Comment 140

4 years ago
Created attachment 711227 [details] [diff] [review]
interdiff with cumulative patch that gets a cube spinning on unagi
I've gotten a couple of different crashes on B2G during:
http://10.242.24.43:8888/tests/content/canvas/test/reftest/webgl-clear-test.html?__&_____&preserve


#0  0x4129036e in mozilla::gl::SharedSurface_GL::GL (src=0x43be70d0, dest=0x0, factory=0x46c3ccf0)
    at /home/jrmuizel/src/B2G/gecko/gfx/gl/SharedSurfaceGL.h:65
#1  mozilla::gl::SharedSurface_GL::Copy (src=0x43be70d0, dest=0x0, factory=0x46c3ccf0)
    at /home/jrmuizel/src/B2G/gecko/gfx/gl/SharedSurfaceGL.cpp:22
#2  0x4128f97e in mozilla::gfx::SharedSurface::Copy (src=0x43be70d0, dest=0x0, factory=0x46c3ccf0)
    at /home/jrmuizel/src/B2G/gecko/gfx/gl/SharedSurface.cpp:25
#3  0x412914b6 in mozilla::gfx::SurfaceStream_TripleBuffer_Copy::SwapProducer (this=0x4535dac0, factory=0x46c3ccf0, size=...)
    at /home/jrmuizel/src/B2G/gecko/gfx/gl/SurfaceStream.cpp:309
#4  0x4128e65c in mozilla::gl::GLScreenBuffer::Swap (this=0x470f5140, size=...) at /home/jrmuizel/src/B2G/gecko/gfx/gl/GLScreenBuffer.cpp:333
#5  0x4128e862 in mozilla::gl::GLScreenBuffer::PublishFrame (this=0x470f5140, size=...)
    at /home/jrmuizel/src/B2G/gecko/gfx/gl/GLScreenBuffer.cpp:388
#6  0x41289bac in mozilla::gl::GLContext::PublishFrame (this=<value optimized out>) at /home/jrmuizel/src/B2G/gecko/gfx/gl/GLContext.cpp:1239
#7  0x40bfb4ec in mozilla::WebGLContext::PresentScreenBuffer (this=0x42dabbc0)

and another in pixman_composite()
More specifically:
mozilla::gl::SharedSurface_GL::Copy (src=0x43a11580, dest=0x0, factory=0x44bc5940)

This will crash because dest is NULL.

dest is NULL presumeably because New in mozilla::gfx::SurfaceStream_TripleBuffer_Copy::SwapProducer() failed to set mStaging.

(BTW, it seems like New() should return a surface instead of taking it as an out-param.)
(Assignee)

Comment 143

4 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #142)
> More specifically:
> mozilla::gl::SharedSurface_GL::Copy (src=0x43a11580, dest=0x0,
> factory=0x44bc5940)
> 
> This will crash because dest is NULL.
> 
> dest is NULL presumeably because New in
> mozilla::gfx::SurfaceStream_TripleBuffer_Copy::SwapProducer() failed to set
> mStaging.
> 
> (BTW, it seems like New() should return a surface instead of taking it as an
> out-param.)

Yes, this is what it looks like. We need a stronger guarantee for New(), such that if creating a new surface with out 'accelerated' factory type fails, we fall back to our 'baseline' readback factory. If that fails, we should probably force-lose the context.
(Assignee)

Comment 144

4 years ago
(In reply to Jeff Gilbert [:jgilbert] from comment #143)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #142)
> > More specifically:
> > mozilla::gl::SharedSurface_GL::Copy (src=0x43a11580, dest=0x0,
> > factory=0x44bc5940)
> > 
> > This will crash because dest is NULL.
> > 
> > dest is NULL presumeably because New in
> > mozilla::gfx::SurfaceStream_TripleBuffer_Copy::SwapProducer() failed to set
> > mStaging.
> > 
> > (BTW, it seems like New() should return a surface instead of taking it as an
> > out-param.)
> 
> Yes, this is what it looks like. We need a stronger guarantee for New(),
> such that if creating a new surface with out 'accelerated' factory type
> fails, we fall back to our 'baseline' readback factory. If that fails, we
> should probably force-lose the context.

This is already half-way in place. However, for 3buff_copy, we don't return null from SwapProd when we fail to alloc the new *staging* surface. Also, we don't yet lose the context if we fail to SwapProd even with the basic readback factory.
(Assignee)

Comment 145

4 years ago
Created attachment 712703 [details] [diff] [review]
Cumulative patch with safe SwapProducer fixes

Here's a cumulative patch based against trunk that should fix this. I'm rebuilding for the emulator now.
Attachment #712703 - Flags: feedback?(jmuizelaar)
(Assignee)

Comment 146

4 years ago
Created attachment 712705 [details] [diff] [review]
interdiff of what's changed for making SwapProducer behavior better
(Assignee)

Comment 147

4 years ago
Oops, that patch is busted. I'll get it fixed before uploading again.
Attachment #712703 - Flags: feedback?(jmuizelaar) → feedback+
(Assignee)

Comment 148

4 years ago
Things are looking better:
https://tbpl.mozilla.org/?tree=Try&rev=5c5497474a38

However, b2g emulator reftests seem to have an issue with webgl with alpha channel: Each reftest that uses alpha fails the first 83 pixels.[1] WebGL seems to work otherwise, and I don't see this effect on local devices or a local copy of the emulator.

I propose allowing a fuzz threshold of 83 on alpha tests for b2g until we can figure out what's up there.

[1]: 
(Assignee)

Comment 149

4 years ago
There are actually a number of alpha and premultAlpha failures in there. Will investigate, but this is something I've fixed on other platforms repeatedly.
(Assignee)

Comment 150

4 years ago
Created attachment 713648 [details] [diff] [review]
Mark failing b2g tests
Attachment #713648 - Flags: review?(vladimir)
(Assignee)

Comment 151

4 years ago
Created attachment 713649 [details] [diff] [review]
Switch Readback to use Textures for easier blitting
Attachment #713649 - Flags: review?(vladimir)
(Assignee)

Comment 152

4 years ago
Created attachment 713654 [details] [diff] [review]
Use centralized SurfaceStream type chooser.
Attachment #713654 - Flags: review?(vladimir)
(Assignee)

Updated

4 years ago
Attachment #712705 - Flags: review?(vladimir)
(Assignee)

Comment 153

4 years ago
Created attachment 713657 [details] [diff] [review]
Misc fixes to get b2g working
Attachment #713657 - Flags: review?(vladimir)
(Assignee)

Comment 154

4 years ago
Created attachment 713659 [details] [diff] [review]
More misc fixes

The finish we add here gets removed by a the patch which fixes the EGLImage pipe, so don't worry about that.
Attachment #713659 - Flags: review?(vladimir)
Attachment #713648 - Flags: review?(vladimir) → review+
Comment on attachment 712705 [details] [diff] [review]
interdiff of what's changed for making SwapProducer behavior better

Review of attachment 712705 [details] [diff] [review]:
-----------------------------------------------------------------

r=bjacob+jrmuizel except for the accidental thing in SurfaceStream_SingleBuffer::SwapProducer
Attachment #712705 - Flags: review?(vladimir) → review+
Attachment #713654 - Flags: review?(vladimir) → review+
Comment on attachment 713649 [details] [diff] [review]
Switch Readback to use Textures for easier blitting

Review of attachment 713649 [details] [diff] [review]:
-----------------------------------------------------------------

r=jrmuizel+bjacob
Attachment #713649 - Flags: review?(vladimir) → review+
Comment on attachment 713659 [details] [diff] [review]
More misc fixes

Review of attachment 713659 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with this assumption:

::: gfx/gl/SharedSurfaceEGL.cpp
@@ +245,5 @@
>          EGLClientBuffer buffer = reinterpret_cast<EGLClientBuffer>(mConsTex);
>          mImage = mEGL->fCreateImage(Display(), context,
>                                      LOCAL_EGL_GL_TEXTURE_2D, buffer,
>                                      nullptr);
> +        consGL->fFinish();

my understanding here is that this finish call is taken out by another patch applied after it. Otherwise please explain.
Attachment #713659 - Flags: review?(vladimir) → review+
Comment on attachment 713657 [details] [diff] [review]
Misc fixes to get b2g working

Review of attachment 713657 [details] [diff] [review]:
-----------------------------------------------------------------

r=jrmuizel+bjacob with this:

::: gfx/layers/basic/BasicCanvasLayer.cpp
@@ +196,2 @@
>        mGLContext->Screen()->Readback(surfGL, readSurf);
> +      readSurf->MarkDirty();

That Flush/MarkDirty is redundant (or if it's not, it should probably be, right? Readback should be taking care of that, it seems). Please resolve/take it out.
Attachment #713657 - Flags: review?(vladimir) → review+
(Assignee)

Comment 159

4 years ago
Created attachment 713683 [details] [diff] [review]
Cumulative patch before ipc/gralloc stuff
Attachment #713683 - Flags: feedback?(bjacob)
Comment on attachment 713683 [details] [diff] [review]
Cumulative patch before ipc/gralloc stuff

Review of attachment 713683 [details] [diff] [review]:
-----------------------------------------------------------------

just land!
Attachment #713683 - Flags: feedback?(bjacob)
(Assignee)

Comment 161

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/38c7d7a7f586
(Assignee)

Comment 162

4 years ago
Try run was:
https://tbpl.mozilla.org/?tree=Try&rev=669e538607d8
It's sticking !!!
There are crashes on the try run along with reftest failures. If what's on try matches what landed, this will need to be backed out.
Backed out for B2G reftest failures and Android R2 failures. Both were visible in the Try run.
https://hg.mozilla.org/integration/mozilla-inbound/rev/25ba41772e0f
Note that the Android R2 failures have the "application ran for longer than allowed maximum time" common in them, but the actual failing test varies.
(Assignee)

Comment 167

4 years ago
Android R2 is the test that runs WebGL reftests on Android. We probably need to increase the time we allow.

Also, this line is giving us an 'unexpected fail' on b2g:
fuzzy(1,65536) fails-if(B2G)        fuzzy-if(Android||B2G,9,65536)                                 == webgl-color-alpha-test.html?colorVal=1.0&alphaVal=0.5&alpha               wrapper.html?colors-half-alpha.png

Can we not have fuzzy-if and fails-if in the same line?
The B2G M1 failure was yours too.
So I had a look at the b2g crashes and it looks like we're crashing during gc because of heap corruption. i.e we hit RELEASE_ASSERT(run->magic == ARENA_RUN_MAGIC). This doesn't seem to be related to particular test as presumably we don't crash until we detect the corruption afterwards.
I looked at the area of memory that is supposed to be the jemalloc stuff and it looks like it is opaque green (ff,0,ff,0)
Blocks: 835058
(Assignee)

Comment 171

4 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #170)
> I looked at the area of memory that is supposed to be the jemalloc stuff and
> it looks like it is opaque green (ff,0,ff,0)

Interesting. Perhaps this is related to the graphical corruption.
So, if I wrap fReadPixels() in GLContext::ReadPixelsIntoImageSurface() with the following:

    uint32_t p = *(uint32_t*)(readSurf->Data()  + readSurf->Stride()*readSurf->Height());

    fReadPixels(0, 0,
                width, height,
                readFormat, readType,
                readSurf->Data());
    uint32_t newp = *(uint32_t*)(readSurf->Data()  + readSurf->Stride()*readSurf->Height());
    if (p != newp)
       NS_RUNTIMEABORT("FOV");

I get aborts. While it's possible that p and newp don't match for good reason (perhaps the neighbouring block is owned by the gl code by coincidence), it does seem somewhat unlikely. However, I'm not really sure how to debug from this point on.

The readFormat and readType are GL_RGBA and GL_UNSIGNED_BYTE and the surface is gfxASurface::ImageFormatRGB24 or some other matching type. Nothing else seems wrong with parameters.
I confirmed that something fishy is going on by allocating my own buffer that extends 100 bytes beyond the surface and the values do get changed.
should probably check the various pack state flags.. something might be setting it to something weird, like an odd stride?
All of
# GL_UNPACK_SWAP_BYTES 
# GL_UNPACK_LSB_FIRST
# GL_UNPACK_ROW_LENGTH
# GL_UNPACK_SKIP_ROWS
# GL_UNPACK_SKIP_PIXELS
# GL_UNPACK_ALIGNMENT
# GL_PACK_SWAP_BYTES
# GL_PACK_LSB_FIRST
# GL_PACK_ROW_LENGTH
# GL_PACK_SKIP_ROWS
# GL_PACK_SKIP_PIXELS
# GL_PACK_ALIGNMENT

are 0 except PACK and UNPACK ALIGNMENT which are 4
Dumb question -- at that spot above, does readSurf->Width() == width, readSurf->Height() == height, and readSurf->Stride() == width * 4?
Looks like the bug is in the qemu pipe stream code:
https://github.com/mozilla-b2g/android-development/blob/435a7da6d02083ba6583e1654db351991cf79578/tools/emulator/opengl/system/OpenglSystemCommon/QemuPipeStream.cpp#L125

Every time around that loop we read len bytes instead of res bytes. This causes us to overwrite.
The changes to emulator are being deployed in bug 841836
Depends on: 841836
We have switched to a repository that contains the fix in: 

https://github.com/mozilla-b2g/b2g-manifest/commit/c0d63f006c810942716dcdf48584c5f1947c2ba9

If you ./repo sync and are on the master branch (BRANCH=master ./config.sh emulator), you should pick up the fix.
https://tbpl.mozilla.org/?tree=Try&rev=a6e960a81673

This is a push with the fixed emulator.
We crashed again on R6 :( This time it was during shutdown...
(In reply to Jeff Gilbert [:jgilbert] from comment #167)
> Android R2 is the test that runs WebGL reftests on Android. We probably need
> to increase the time we allow.
> 
> Also, this line is giving us an 'unexpected fail' on b2g:
> fuzzy(1,65536) fails-if(B2G)        fuzzy-if(Android||B2G,9,65536)          
> == webgl-color-alpha-test.html?colorVal=1.0&alphaVal=0.5&alpha              
> wrapper.html?colors-half-alpha.png
> 
> Can we not have fuzzy-if and fails-if in the same line?

The fuzzy-if() will override the fails-if.

Removing the ||B2G should fix the problem.
And here's one with reftest fixes.

https://tbpl.mozilla.org/?tree=Try&rev=e5e56b9c9f6d
(In reply to Jeff Muizelaar [:jrmuizel] from comment #181)
> We crashed again on R6 :( This time it was during shutdown...

I also can not reproduce the crash locally :(
So two things:
1. The push I did with the adjusted reftest went green.
2. The emulator update caused M1 orange and was backed out.
Some things about the emulator update:
1. I've been able to reproduce the failures locally. However, I can reproduce them with and without my patch. This is weird.

2. These two failures are both related to permissions not working properly.
ERROR TEST-UNEXPECTED-FAIL | /tests/dom/alarm/test/test_alarm_non_permitted_app.html | navigator.mozAlarms should return null - got [xpconnect wrapped nsIDOMMozAlarmsManager], expected null

ERROR TEST-UNEXPECTED-FAIL | /tests/dom/power/test/test_power_basics.html | Shouldn't be able to access power manager without permission.

I have not tried debugging them yet.


3.
ERROR TEST-UNEXPECTED-FAIL | /tests/dom/bindings/test/test_integers.html | Should not start in an error state - got 1286, expected 0

This seems like it might be a legitimate failure caused by this patch, at least when running on the emulator:
     try {
      var gl = $("c").getContext("experimental-webgl");
    } catch (ex) {
      // No WebGL support on MacOS 10.5.  Just skip this test
      todo(false, "WebGL not supported");
      return;
    }
    is(gl.getError(), 0, "Should not start in an error state");
(In reply to Jeff Muizelaar [:jrmuizel] from comment #186)
> 3.
> ERROR TEST-UNEXPECTED-FAIL | /tests/dom/bindings/test/test_integers.html |
> Should not start in an error state - got 1286, expected 0
> 
> This seems like it might be a legitimate failure caused by this patch, at
> least when running on the emulator:
>      try {
>       var gl = $("c").getContext("experimental-webgl");
>     } catch (ex) {
>       // No WebGL support on MacOS 10.5.  Just skip this test
>       todo(false, "WebGL not supported");
>       return;
>     }
>     is(gl.getError(), 0, "Should not start in an error state");

Entirely possible -- we do a bunch of GL calls in getContext, and it's possible we do something that's not supported by the emulator.  1286/0x0506 is GL_INVALID_FRAMEBUFFER_OPERATION.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #187)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #186)
> Entirely possible -- we do a bunch of GL calls in getContext, and it's
> possible we do something that's not supported by the emulator.  1286/0x0506
> is GL_INVALID_FRAMEBUFFER_OPERATION.

Can you think of what would've changed to cause this?
Here's a try push with some tests disabled that should hopefully go green:
https://tbpl.mozilla.org/?tree=Try&rev=ba37feca1bfa
And with an updated emulator:
https://tbpl.mozilla.org/?tree=Try&rev=a36365c8e6cc
(Assignee)

Comment 191

4 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #188)
> (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #187)
> > (In reply to Jeff Muizelaar [:jrmuizel] from comment #186)
> > Entirely possible -- we do a bunch of GL calls in getContext, and it's
> > possible we do something that's not supported by the emulator.  1286/0x0506
> > is GL_INVALID_FRAMEBUFFER_OPERATION.
> 
> Can you think of what would've changed to cause this?

It sounds like something to do with how we assemble and handle the GLContext's screenbuffer. Try running it with MOZ_GL_DEBUG=1?
https://hg.mozilla.org/integration/mozilla-inbound/rev/b46c006a7696

Let's try again.
Guess what? We're still failing B2G tests! Since this and bug 841836 landed so close together, I can't separate which actually broke them, so I backed both out.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f04919e7789f
So we may be getting a legitimate intermittent crash:
06:36:00     INFO -  E/GeckoConsole(  815): [JavaScript Warning: "Error: WebGL: Exceeded 2 live WebGL contexts for this principal, losing the least recently used one." {file: "http://10.0.2.2:8888/tests/content/canvas/test/reftest/webgl-utils.js" line: 59}]
06:36:00     INFO -  E/eglCodecCommon(  815): glUtilsParamSize: unknow param 0x00000bd0
06:36:00     INFO -  E/GeckoConsole(  815): [JavaScript Warning: "Use of Mutation Events is deprecated. Use MutationObserver instead." {file: "chrome://reftest/content/reftest-content.js" line: 449}]
06:36:00    ERROR -  F/libc    (  815): Fatal signal 11 (SIGSEGV) at 0x43c00000 (code=2)
06:36:00    ERROR -  This usually indicates the B2G process has crashed
It looks like there have been very similar failures in the past:
https://tbpl.mozilla.org/php/getParsedLog.php?id=19776867&tree=Firefox#error0
https://hg.mozilla.org/integration/mozilla-inbound/rev/82747d694e7a
Third time's a charm?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #196)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/82747d694e7a
> Third time's a charm?

Do we need the updated emulator in bug 841836 for this as well?  It was backed out this morning at the same time as this patch.
I relanded that earlier today.
Depends on: 843490
https://hg.mozilla.org/mozilla-central/rev/82747d694e7a
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Blocks: 843599

Updated

4 years ago
Depends on: 843735
Depends on: 843738
Backed out for causing bug 843738:
https://hg.mozilla.org/mozilla-central/rev/885cde564ff3

Changing test timeouts is wallpapering over the real issue as well as causing longer end-to-end times; so this needs to be fixed before relanding.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Ed Morley [:edmorley UTC+0] from comment #200)
> Backed out for causing bug 843738:
> https://hg.mozilla.org/mozilla-central/rev/885cde564ff3
> 
> Changing test timeouts is wallpapering over the real issue as well as
> causing longer end-to-end times; so this needs to be fixed before relanding.

It would've been nice if we could discuss this before you backed it out.
(In reply to Ed Morley [:edmorley UTC+0] from comment #200)
> Backed out for causing bug 843738:
> https://hg.mozilla.org/mozilla-central/rev/885cde564ff3
> 
> Changing test timeouts is wallpapering over the real issue as well as
> causing longer end-to-end times; so this needs to be fixed before relanding.

Longer end-to-end times on 3 year old Tegra 2 hardware is 150% totally uninteresting.  We should not be using that hardware at all in 2013 (though I guess they're useful for ARMv6 tests!); at the very least we should not be caring about WebGL at all on them.

Let's kill WebGL tests/reftests on that hardware.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #201)
> It would've been nice if we could discuss this before you backed it out.

This.  I understand that we want to clean up oranges and that landing things that introduce oranges is bad; however, at the same time, with a patch and bug such as this, at least a conversation is useful to have (sure, if you can't get a response within 15 minutes, back out by all means.. but that would not have been the case here).  As those discussions were already ongoing, we probably should've starred the builds with that info, but here we are.
Created attachment 716720 [details] [diff] [review]
Disable webgl on ARMv6

If the reftests still work properly with this patch. We'll reland with it.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #201)
> It would've been nice if we could discuss this before you backed it out.

Unfortunately, that's not the workflow for sheriffing cases like these; the "safe position at rest" is with the changeset backed out, at which point discussions can then occur as to whether it's acceptable to bump timeouts/disable whole portions of testsuites. (Particularly for bugs like this that have bounced once already).

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #202)
> This.  I understand that we want to clean up oranges 

This was very much more than "cleaning up oranges"; the failure rate was an order of magnitude higher than what we would classify as a typical intermittent failure - so much so that Ryan very nearly didn't perform today's inbound to mozilla-central merge because of it. (And he only resorted to merging since we were blocking all other development activity).
This was posted in the wrong bug; transposing:

(In reply to Jeff Muizelaar [:jrmuizel] from bug 820756 comment #35)
> Landed again with webgl disabled on ARMv6
> https://hg.mozilla.org/integration/mozilla-inbound/rev/3a7d4085787e
https://hg.mozilla.org/mozilla-central/rev/3a7d4085787e
https://hg.mozilla.org/mozilla-central/rev/6c64bae71de5
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED

Updated

4 years ago
Depends on: 844280
This is another critical piece of technology from the product perspective.  We're kinda pretending that webgl actually works right now, but it's mostly a toy.
blocking-b2g: - → leo?
blocking-kilimanjaro: + → ---
Whiteboard: webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M] → webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M] regression-risk
Blocks: 844910
blocking-b2g: leo? → leo+

Comment 208

4 years ago
mingw bustage fix:

https://hg.mozilla.org/integration/mozilla-inbound/rev/b00914c9e7d4

Updated

4 years ago
No longer blocks: 835058
While I'm sure this comes as a shock to all, this doesn't apply cleanly to b2g18 at all. Presumably it'll need a branch-specific patch for uplift.
Wait, stop.  This cannot go on b2g18.  If b2g wants this, it will need a new platform uplift -- this is way too risky for B2G/B2G18.  If we take this we'll also fragment gecko ridiculously code-wise.  This patch also does nothing for b2g, because it has no gralloc support.  That part is not checked in or even 100% functional yet.

The only patch that we should consider for b2g18 for webgl is bug 837591, which is untested; but it should get some testing and see where it stands.
Note that leo+ means b2g18 (v1.1.x).
This addresses part of the question I asked in bug 837591 comment 43.  Thanks Vlad.
blocking-b2g: leo+ → -
Whiteboard: webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M] regression-risk → webgl-next [games:p1] [soft] [WebAPI:P0] [LOE:M] regression-risk [tech-p1]
https://hg.mozilla.org/mozilla-central/rev/b00914c9e7d4
(Reporter)

Updated

4 years ago
Depends on: 847283
This seems to have caused bug 847283, which is a regression whereby we try to use textures above the MAX_TEXTURE_SIZE to back FBOs without properly checking; we had fixed that in 20 (see bug 827170) and it's back.
Depends on: 863477

Updated

4 years ago
Blocks: 872167

Updated

4 years ago
Depends on: 876721

Updated

4 years ago
Depends on: 881311
(Reporter)

Updated

4 years ago
Depends on: 884057

Updated

4 years ago
Depends on: 901574

Updated

4 years ago
Depends on: 916567
Setting ddn off, see comment 12.
Keywords: dev-doc-needed
You need to log in before you can comment on or make changes to this bug.