Closed Bug 1453501 Opened 2 years ago Closed 2 years ago

When the compositor is restarted after a page has been loaded, the page is not rendered.

Categories

(Core :: Graphics, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: rbarker, Assigned: snorp)

References

Details

(Whiteboard: [geckoview:crow][gfx-noted] [geckoview:fenix] )

Attachments

(3 files)

Steps to reproduce:

Create a GeckoSession and open a URL.
After the page has loaded, start the compositor by passing in a valid Surface to the GeckoSession display.

Expected result, the page will render.
Actual result, only the back ground color layer is rendered.

Page must be reloaded for page to be rendered correctly.
Whiteboard: [geckoview:crow]
Priority: -- → P3
Whiteboard: [geckoview:crow] → [geckoview:crow][gfx-noted]
I think this bug or some variation is happening with the GeckoView example app, so I'm taking a look.
Assignee: nobody → snorp
OK, at least in my case, I think I know what's happening. It's a race between creation of the compositor and other rendering init.

When the <xul:browser> is added to the <xul:window>, we end up in TabParent::InitRenderFrame(), called from nsFrameLoader::TryRemoteBrowser. TabParent::InitRenderFrame then calls SendInitRendering with the result of RenderFrameParent::IsLayersConnected(). This is where things go south. In the successful case, IsLayersConnected() returns true, but in the failure case it returns false. When false, TabChild::InitRenderingState (called by TabChild::RecvInitRendering) does not attempt to create a remote layer manager. Subsequently, the puppet widget creates a fallback BasicLayerManager, and that's why we don't see anything on the screen.

The inflection point in all of this seems to be in RenderFrameParent::Init(). It gets the PCompositorBridgeChild from the layer manager and passes that to GPUProcessManager::AllocateAndConnectLayerTreeId(). If the PCompositorBridgeChild is null, this method returns false, and we end up in the situation above. Sometimes we win the race and it returns true because the compositor has been created. All of this seems caused by the fact that we do not create a compositor until we have a surface. I see at least two possible solutions here.

1) Always create a compositor immediately, regardless of whether or not we have a surface. If we don't have a surface, the compositor would start in a paused state.


2) Allow the BasicLayerManager path we hit now, but migrate to a LayerManagerComposite once the Compositor is created. I don't know if we have enough existing machinery to do this easily. Maybe it's already supposed to work?

Kats, if you have any guidance given the above I would appreciate it. Thanks!
Flags: needinfo?(bugmail)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #2)
> It gets the PCompositorBridgeChild from the layer
> manager and passes that to
> GPUProcessManager::AllocateAndConnectLayerTreeId(). If the
> PCompositorBridgeChild is null, this method returns false, and we end up in
> the situation above.

Do you know if `lm` is also null in RenderFrameParent::Init when this happens? Or is it just `lm->GetCompositorBridgeChild()`?

> 
> 1) Always create a compositor immediately, regardless of whether or not we
> have a surface. If we don't have a surface, the compositor would start in a
> paused state.

This seems like a reasonable approach to take.

> 2) Allow the BasicLayerManager path we hit now, but migrate to a
> LayerManagerComposite once the Compositor is created. I don't know if we
> have enough existing machinery to do this easily. Maybe it's already
> supposed to work?

This seems much harder to do. We *might* have some machinery to assist with this but I don't know how reliable it is. It would need quite a bit more work, I think, and I would prefer option (1).

Another option might be to delay adding the <xul:browser> to the <xul:window> until after we have the surface and stuff set up. I don't know what would be involved in doing that.
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #2)
> > It gets the PCompositorBridgeChild from the layer
> > manager and passes that to
> > GPUProcessManager::AllocateAndConnectLayerTreeId(). If the
> > PCompositorBridgeChild is null, this method returns false, and we end up in
> > the situation above.
> 
> Do you know if `lm` is also null in RenderFrameParent::Init when this
> happens? Or is it just `lm->GetCompositorBridgeChild()`?

lm is also null.

> 
> > 
> > 1) Always create a compositor immediately, regardless of whether or not we
> > have a surface. If we don't have a surface, the compositor would start in a
> > paused state.
> 
> This seems like a reasonable approach to take.

I can look into this then. One concern I have, though, is that we need to be able to use GeckoSession in 'headless' mode (no surface). I guess right now that ends up in this BasicLayerManager state, which is fine. I worry that things won't work well if we have a ClientLayerManager connected to a compositor that is always paused. Do you foresee any problem there?

> 
> > 2) Allow the BasicLayerManager path we hit now, but migrate to a
> > LayerManagerComposite once the Compositor is created. I don't know if we
> > have enough existing machinery to do this easily. Maybe it's already
> > supposed to work?
> 
> This seems much harder to do. We *might* have some machinery to assist with
> this but I don't know how reliable it is. It would need quite a bit more
> work, I think, and I would prefer option (1).

Bummer.

> 
> Another option might be to delay adding the <xul:browser> to the
> <xul:window> until after we have the surface and stuff set up. I don't know
> what would be involved in doing that.

We actually do want to use it without a surface, as described above, so option 1) seems best for now.
Flags: needinfo?(bugmail)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #4)
> I can look into this then. One concern I have, though, is that we need to be
> able to use GeckoSession in 'headless' mode (no surface). I guess right now
> that ends up in this BasicLayerManager state, which is fine. I worry that
> things won't work well if we have a ClientLayerManager connected to a
> compositor that is always paused. Do you foresee any problem there?

Oh. The problem with I foresee is that since we won't be compositing, we won't be sending the "DidComposite" messages back to the content side, so the content will basically stop generating paints (we'll keep going down the path at [1]). What are you planning on doing with the headless gecko? Why do you need a compositor if it's headless? Or is the problem that you don't care what happens graphics-wise during the headless part of it, and then when you do eventually attach a surface you can't recover properly?

> We actually do want to use it without a surface, as described above, so
> option 1) seems best for now.

I was thinking this was just a startup race condition. But if you actually want to use gecko in headless mode for a while before attaching the surface then option (2) might be more reasonable. But honestly I'm not too sure. /cc Matt and Andrew as well who know about swapping layer managers and things like that.

[1] https://searchfox.org/mozilla-central/rev/36dec78aecc40539ecc8d78e91612e38810f963c/layout/base/nsRefreshDriver.cpp#2261
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #4)
> > I can look into this then. One concern I have, though, is that we need to be
> > able to use GeckoSession in 'headless' mode (no surface). I guess right now
> > that ends up in this BasicLayerManager state, which is fine. I worry that
> > things won't work well if we have a ClientLayerManager connected to a
> > compositor that is always paused. Do you foresee any problem there?
> 
> Oh. The problem with I foresee is that since we won't be compositing, we
> won't be sending the "DidComposite" messages back to the content side, so
> the content will basically stop generating paints (we'll keep going down the
> path at [1]). What are you planning on doing with the headless gecko? Why do
> you need a compositor if it's headless? Or is the problem that you don't
> care what happens graphics-wise during the headless part of it, and then
> when you do eventually attach a surface you can't recover properly?

Yes it's exactly this -- we want to be able to defer attaching a surface for an arbitrary time (forever, in some cases). We don't care about painting while there is no surface as long as it doesn't cause adverse affects to content.

Matt, Andrew -- see comments above for more context.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(aosmond)
I now remember why we don't create the compositor immediately. We need an EGLSurface in order to create the GLContext for the compositor. Some devices support EGL_NO_SURFACE now which could help, but I think it'll be a pain otherwise.
It should work to recreate the compositor and switch from basic -> compositor. I agree that it's not really what we want to do though.

We should be able to just deliver DidComposite if the compositor is paused to prevent us from disabling paints if that's what we need.

If we're running headless, then how much painting do we actually want to do? Painting on the content side to push layers to a compositor with no surface seems like a waste of CPU cycles.

Do we want to let the lack of DidComposite messages block painting, since that's the useful thing to do here? We could even disable painting more explicitly.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8971054 [details]
Bug 1453501 - Allow GLContextEGL to be used without a surface

https://reviewboard.mozilla.org/r/239796/#review245792

::: gfx/gl/GLContextProviderEGL.cpp:137
(Diff revision 1)
>  
>  static EGLSurface
>  CreateSurfaceFromNativeWindow(EGLNativeWindowType window, const EGLConfig& config) {
> -    EGLSurface newSurface = nullptr;
> +    EGLSurface newSurface = EGL_NO_SURFACE;
>  
>      MOZ_ASSERT(window);

Shouldn't this MOZ_ASSERT move into the #else block, since the change you're making seems to allow android to continue with !window
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> Comment on attachment 8971054 [details]
> Bug 1453501 - Allow GLContextEGL to be created without a surface
> 
> https://reviewboard.mozilla.org/r/239796/#review245792
> 
> ::: gfx/gl/GLContextProviderEGL.cpp:137
> (Diff revision 1)
> >  
> >  static EGLSurface
> >  CreateSurfaceFromNativeWindow(EGLNativeWindowType window, const EGLConfig& config) {
> > -    EGLSurface newSurface = nullptr;
> > +    EGLSurface newSurface = EGL_NO_SURFACE;
> >  
> >      MOZ_ASSERT(window);
> 
> Shouldn't this MOZ_ASSERT move into the #else block, since the change you're
> making seems to allow android to continue with !window

Correct, my try run choked on that too.
(In reply to Matt Woodrow (:mattwoodrow) from comment #11)
> It should work to recreate the compositor and switch from basic ->
> compositor. I agree that it's not really what we want to do though.
> 
> We should be able to just deliver DidComposite if the compositor is paused
> to prevent us from disabling paints if that's what we need.
> 
> If we're running headless, then how much painting do we actually want to do?
> Painting on the content side to push layers to a compositor with no surface
> seems like a waste of CPU cycles.

We generally don't want to paint at all, with the caveat that it should not disrupt "normal" operation in content. I think it's fine if rAF doesn't fire, though, which is the only thing I really know of that would be affected.

> 
> Do we want to let the lack of DidComposite messages block painting, since
> that's the useful thing to do here? We could even disable painting more
> explicitly.

I think that's fine, yeah, but I also went ahead and tried to create the compositor in a paused state. It seems a little more correct that way, since that's what happens when the surface goes away after the compositor has been started.
Comment on attachment 8971055 [details]
Bug 1453501 - Immediately create Compositor on Android

https://reviewboard.mozilla.org/r/239792/#review245818

Looks ok to me but jchen/rbarker know this code better so their r+ is good enough.
Attachment #8971055 - Flags: review?(bugmail)
Comment on attachment 8971056 [details]
Bug 1453501 - Allow the compositor to be created in a paused state

https://reviewboard.mozilla.org/r/239794/#review245820

Are you going to add code that actually reads this flag and pauses the compositor? That seems to be missing... (or maybe you just forgot to add that file to the commit?)
Attachment #8971056 - Flags: review?(bugmail) → review+
Comment on attachment 8971056 [details]
Bug 1453501 - Allow the compositor to be created in a paused state

https://reviewboard.mozilla.org/r/239794/#review245820

Weird, I know I wrote code for that but it's not any patch. I'll put up another one.
Comment on attachment 8971055 [details]
Bug 1453501 - Immediately create Compositor on Android

https://reviewboard.mozilla.org/r/239792/#review245976

::: widget/android/nsWindow.cpp:1852
(Diff revision 2)
>      // Ensure that gfxPlatform is initialized first.
>      gfxPlatform::GetPlatform();
>  
>      if (ShouldUseOffMainThreadCompositing()) {
> -        CreateCompositor(aCompositorWidth, aCompositorHeight);
> +        LayoutDeviceIntRect rect = GetBounds();
> +        CreateCompositor(rect.Width(), rect.Height());

I think this will always be something like (1, 1) in practice?
Attachment #8971055 - Flags: review?(nchen) → review+
Comment on attachment 8971055 [details]
Bug 1453501 - Immediately create Compositor on Android

https://reviewboard.mozilla.org/r/239792/#review246212
Attachment #8971055 - Flags: review?(rbarker) → review+
(In reply to Jim Chen [:jchen] [:darchons] from comment #21)
> Comment on attachment 8971055 [details]
> Bug 1453501 - Immediately create Compositor on Android
> 
> https://reviewboard.mozilla.org/r/239792/#review245976
> 
> ::: widget/android/nsWindow.cpp:1852
> (Diff revision 2)
> >      // Ensure that gfxPlatform is initialized first.
> >      gfxPlatform::GetPlatform();
> >  
> >      if (ShouldUseOffMainThreadCompositing()) {
> > -        CreateCompositor(aCompositorWidth, aCompositorHeight);
> > +        LayoutDeviceIntRect rect = GetBounds();
> > +        CreateCompositor(rect.Width(), rect.Height());
> 
> I think this will always be something like (1, 1) in practice?

Probably correct. Do you think that's trouble? Should we use some other size at startup?
Flags: needinfo?(nchen)
Comment on attachment 8971054 [details]
Bug 1453501 - Allow GLContextEGL to be used without a surface

https://reviewboard.mozilla.org/r/239796/#review246606

::: gfx/gl/GLContextProviderEGL.cpp:138
(Diff revision 2)
>  CreateSurfaceFromNativeWindow(EGLNativeWindowType window, const EGLConfig& config) {
> -    EGLSurface newSurface = nullptr;
> +    EGLSurface newSurface = EGL_NO_SURFACE;
>  
> -    MOZ_ASSERT(window);
>  #ifdef MOZ_WIDGET_ANDROID
> +    if (window) {

I would really prefer that this doesn't silently return null here.

Ideally you'd realize that you can get away with a surfaceless context and just not call this with null `window`, and we could leave the ASSERT(window) here.

::: gfx/gl/GLContextProviderEGL.cpp:199
(Diff revision 2)
>          }
>      }
>  
>      EGLSurface surface = mozilla::gl::CreateSurfaceFromNativeWindow(aWindow, config);
>  
> -    if (!surface) {
> +    if (surface == EGL_NO_SURFACE && !sEGLLibrary.HasSurfacelessContext()) {

If EGL_NO_SURFACE is 0, don't use ==.

Also, since you know ahead of time here, don't try to create a surfaceless context unless it's supported.

::: gfx/gl/GLLibraryEGL.h:368
(Diff revision 2)
>      bool HasANGLESurfaceD3DTexture2DShareHandle() {
>          return IsExtensionSupported(ANGLE_surface_d3d_texture_2d_share_handle);
>      }
>  
> +    bool HasSurfacelessContext() {
> +        return IsExtensionSupported(KHR_surfaceless_context);

Just call IsExtensionSupported directly. No need for a new boilerplate member func.
Attachment #8971054 - Flags: review?(jgilbert) → review+
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #23)
> (In reply to Jim Chen [:jchen] [:darchons] from comment #21)
> > Comment on attachment 8971055 [details]
> > Bug 1453501 - Immediately create Compositor on Android
> > 
> > >      if (ShouldUseOffMainThreadCompositing()) {
> > > -        CreateCompositor(aCompositorWidth, aCompositorHeight);
> > > +        LayoutDeviceIntRect rect = GetBounds();
> > > +        CreateCompositor(rect.Width(), rect.Height());
> > 
> > I think this will always be something like (1, 1) in practice?
> 
> Probably correct. Do you think that's trouble? Should we use some other size
> at startup?

I thought maybe we should just call `CreateCompositor(1, 1);` to make that clear. Not a big deal though.
Flags: needinfo?(nchen)
Comment on attachment 8971054 [details]
Bug 1453501 - Allow GLContextEGL to be used without a surface

Jeff, can you take another look at this one please? I believe I addressed your review comments, but I also changed the patch to use the fallback surface any time MakeCurrent() requires one (see commit comments).
Attachment #8971054 - Flags: review+ → review?(jgilbert)
Whiteboard: [geckoview:crow][gfx-noted] → [geckoview:crow][gfx-noted] [geckoview:fenix]
Flags: needinfo?(aosmond)
This is blocking nightly dogfooding of GeckoView powered Klar so bumping priority a little.
Priority: P3 → P2
Comment on attachment 8971054 [details]
Bug 1453501 - Allow GLContextEGL to be used without a surface

https://reviewboard.mozilla.org/r/239796/#review248736

::: gfx/gl/GLContextEGL.h:120
(Diff revision 3)
>  
>  public:
>      const EGLConfig mConfig;
>  protected:
>      EGLSurface mSurface;
> +    EGLSurface mFallbackSurface;

I think this can be const?

::: gfx/gl/GLContextProviderEGL.cpp:140
(Diff revision 3)
> -    EGLSurface newSurface = nullptr;
> +    if (sEGLLibrary.IsExtensionSupported(GLLibraryEGL::KHR_surfaceless_context)) {
> +        // We don't need a PBuffer surface in this case
> +        return EGL_NO_SURFACE;
> +    }
> +
> +    nsTArray<EGLint> pbattrs(2 + MOZ_ARRAY_LENGTH(kTerminationAttribs));

I would prefer std::vector here, but it's fine.

::: gfx/gl/GLContextProviderEGL.cpp:142
(Diff revision 3)
> +        return EGL_NO_SURFACE;
> +    }
> +
> +    nsTArray<EGLint> pbattrs(2 + MOZ_ARRAY_LENGTH(kTerminationAttribs));
> +
> +    pbattrs.Clear();

This isn't needed.

::: gfx/gl/GLContextProviderEGL.cpp:144
(Diff revision 3)
> +
> +    nsTArray<EGLint> pbattrs(2 + MOZ_ARRAY_LENGTH(kTerminationAttribs));
> +
> +    pbattrs.Clear();
> +    pbattrs.AppendElement(LOCAL_EGL_WIDTH); pbattrs.AppendElement(16);
> +    pbattrs.AppendElement(LOCAL_EGL_HEIGHT); pbattrs.AppendElement(16);

Try a 1x1 unless that runs into issues.

::: gfx/gl/GLContextProviderEGL.cpp:150
(Diff revision 3)
> +
> +    for (const auto& cur : kTerminationAttribs) {
> +        pbattrs.AppendElement(cur);
> +    }
> +
> +    EGLSurface surface = sEGLLibrary.fCreatePbufferSurface(EGL_DISPLAY(), config, &pbattrs[0]);

I would prefer .get()/.data() or whatever the API here is, instead of &foo[0].

::: gfx/gl/GLContextProviderEGL.cpp:157
(Diff revision 3)
> +
> +    return surface;
> +}
>  
> +static EGLSurface
> +CreateSurfaceFromNativeWindow(EGLNativeWindowType window, const EGLConfig& config) {

{ to new line (for new code)

Also for when you change old code, as above.

::: gfx/gl/GLContextProviderEGL.cpp:423
(Diff revision 3)
>      ReleaseSurface();
>      MOZ_ASSERT(aWidget);
> -    mSurface = mozilla::gl::CreateSurfaceFromNativeWindow(GET_NATIVE_WINDOW_FROM_COMPOSITOR_WIDGET(aWidget), mConfig);
> -    if (!mSurface) {
> -        return false;
> +
> +    void* nativeWindow = GET_NATIVE_WINDOW_FROM_COMPOSITOR_WIDGET(aWidget);
> +    if (nativeWindow) {
> +        mSurface = mozilla::gl::CreateSurfaceFromNativeWindow(nativeWindow, mConfig);

Did you want to return falso if we fail to get a surface but we were expecting to get one?
Attachment #8971054 - Flags: review?(jgilbert) → review+
Comment on attachment 8971054 [details]
Bug 1453501 - Allow GLContextEGL to be used without a surface

https://reviewboard.mozilla.org/r/239796/#review248736

> Did you want to return falso if we fail to get a surface but we were expecting to get one?

Yeah, that's probably correct.
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c231423492b
Allow GLContextEGL to be used without a surface r=jgilbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/c07588586701
Immediately create Compositor on Android r=jchen,rbarker
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b53a7b34121
Allow the compositor to be created in a paused state r=kats
Backed out 3 changesets (bug 1453501) for bustage in build/build/src/gfx/gl/GLContextProviderEGL.cpp on a CLOSED TREE

Problematic push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=77845b8d2930ab73838be0329cb3f17a5f23828d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified
Failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified&selectedJob=178334383
Log: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=90a4eaef784da7af7b7cbbbfa272f784d6f913a0&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/90a4eaef784da7af7b7cbbbfa272f784d6f913a0

13:41:33     INFO -  z:/build/build/src/gfx/gl/GLContextProviderEGL.cpp(425): error C2664: 'EGLSurface mozilla::gl::CreateSurfaceFromNativeWindow(EGLNativeWindowType,const EGLConfig &)': cannot convert argument 1 from 'void *' to 'EGLNativeWindowType'
13:41:33     INFO -  z:/build/build/src/gfx/gl/GLContextProviderEGL.cpp(425): note: Conversion from 'void*' to pointer to non-'void' requires an explicit cast
13:41:33     INFO -  z:/build/build/src/config/rules.mk:1031: recipe for target 'Unified_cpp_gfx_gl0.obj' failed
13:41:33     INFO -  mozmake.EXE[4]: *** [Unified_cpp_gfx_gl0.obj] Error 2
13:41:33     INFO -  mozmake.EXE[4]: Leaving directory 'z:/build/build/src/obj-firefox/gfx/gl'
13:41:33     INFO -  z:/build/build/src/config/recurse.mk:73: recipe for target 'gfx/gl/target' failed
13:41:33     INFO -  mozmake.EXE[3]: *** [gfx/gl/target] Error 2
13:41:33     INFO -  mozmake.EXE[3]: *** Waiting for unfinished jobs....
Flags: needinfo?(snorp)
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bac069b49c19
Allow GLContextEGL to be used without a surface r=jgilbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/a72e0ed80250
Immediately create Compositor on Android r=jchen,rbarker
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1644bd5755
Allow the compositor to be created in a paused state r=kats
You need to log in before you can comment on or make changes to this bug.