LayerManagerOGL not correctly updated on SIZE_CHANGED

RESOLVED FIXED in Firefox 18

Status

()

defect
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

unspecified
Firefox 18
x86
Android
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

In bug 792212, we experience severe reftest failures occasionally on some tegras, and almost-always on other tegras. In the failure runs, nearly all the tests fail, because the screen is blank and remains blank for the duration of the test run.

Some robocop intermittent test failures have very similar symptoms (bug 767215) but have not been conclusively investigated.

Debugging the reftest failures shows that LayerManagerOGL::Render returns early because the width and height derived from mSurfaceSize are both 0. In the failure cases, the CompositorParent is created with a 0-size rect. nsWindow later receives a SIZE_CHANGED event with the correct size, and propagates the size change to windows, but not to the CompositorParent or LayerManagerOGL.

It appears that if SIZE_CHANGED is received after the CompositorParent is created, the CompositorParent and LayerManagerOGL continue to operate with 0-size bounds. I believe that resuming the compositor would rectify the situation, but in the case I am debugging, there is no compositor pause/resume.
Blocks: 792212
Blocks: 767215
Here's a terribly naive approach that appears to fix the reftest failures. I have hardly tested it otherwise. Is it okay to resume the compositor without pausing it?
Attachment #668081 - Flags: feedback?(chrislord.net)
Comment on attachment 668081 [details] [diff] [review]
straw man / patch for discussion

Review of attachment 668081 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/nsWindow.cpp
@@ +1247,5 @@
>      if (mWidgetListener) {
>          mWidgetListener->WindowResized(this, aSize.width, aSize.height);
>      }
> +
> +    ScheduleResumeComposition(aSize.width, aSize.height);

should we pause composition first to make sure the width and height don't change at an inopportune time?
I decided it would be better to pause first. 

I verified that this fixes the reftest problem and a try run looks fine: https://tbpl.mozilla.org/?tree=Try&rev=50684e431314
Attachment #668081 - Attachment is obsolete: true
Attachment #668081 - Flags: feedback?(chrislord.net)
Attachment #668267 - Flags: review?(chrislord.net)
Comment on attachment 668267 [details] [diff] [review]
on SIZE_CHANGED, pause then resume compositor with new size

Review of attachment 668267 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the commend addressed (just move the pause to before the mBounds modification).

::: widget/android/nsWindow.cpp
@@ +1247,5 @@
>      if (mWidgetListener) {
>          mWidgetListener->WindowResized(this, aSize.width, aSize.height);
>      }
> +
> +    SchedulePauseComposition();

I think it'd make sense to schedule the pause before resizing the widget.
Attachment #668267 - Flags: review?(chrislord.net) → review+
Pause moved as per review comment. r=cwiiis
Assignee: nobody → gbrown
Attachment #668267 - Attachment is obsolete: true
Attachment #668415 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/a7a10b14ff06
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 18
I think we should back this out and find another solution. The current patch doesn't even really make sense to me, given that GeckoLayerClient::surfaceChanged already sends a resume event (http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/gfx/GeckoLayerClient.java#641). The patch we have in this bug is really causing some bad threading problems for Flash. (Bug 801627)
But there are other code paths (besides surfaceChanged) that bring about a change to nsWindow bounds, and when that happens, surely we want to propagate the change to CompositorParent, etc. - wouldn't you say?

In the failed reftest (and possibly other tests) cases, I wonder if we are missing a surfaceChanged call, or if maybe the resize is lost because the compositor is not yet created? http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/gfx/GeckoLayerClient.java#628
(In reply to Geoff Brown [:gbrown] from comment #9)
> But there are other code paths (besides surfaceChanged) that bring about a
> change to nsWindow bounds, and when that happens, surely we want to
> propagate the change to CompositorParent, etc. - wouldn't you say?

I'm not 100% certain, but it looks to me like we only send it when the surface size changes.

> 
> In the failed reftest (and possibly other tests) cases, I wonder if we are
> missing a surfaceChanged call, or if maybe the resize is lost because the
> compositor is not yet created?

It looks like it's possible to miss a surfaceChanged call if the LayerView is added enough time before GeckoLayerClient.notifyGeckoReady is called (which hooks up the surfaceChanged listener). We send an initial resize event when the GeckoLayerClient is hooked up, but maybe it should send a compositor resume as well?
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #8)
> The patch we have in this bug is really causing
> some bad threading problems for Flash. (Bug 801627)

Bug 801627 seems like enough reason to back this out until we find a fix.
I checked with :callek today -- reftests are still not being assigned to new tegras, so backing this out should not cause many new test failures. I will look into the alternative suggested in Comment 10.

Backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/907354856a07
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like Armv6 is particularly prone to the all-white reftests - of the 7 runs that have finished since your backout, 4 were all-white.
Blocks: 803408
(In reply to Geoff Brown [:gbrown] from comment #12)
> I checked with :callek today -- reftests are still not being assigned to new
> tegras, so backing this out should not cause many new test failures. I will
> look into the alternative suggested in Comment 10.
> 
> Backed out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/907354856a07

Shouldn't the (still applied AFAIK) workaround in bug 792212 be preventing this problem from reoccuring, still?
There were two that I starred in bug 784278 during the period when that was in but this wasn't yet, dunno how many others there were that other people (or for that matter that I) just starred as "a" without mentioning it in any bug.

Anyway, the failure rate since this was backed out is 44%, all Android reftests on trunk trees are now hidden, bug 803408.
(In reply to William Lachance (:wlach) from comment #15)
> (In reply to Geoff Brown [:gbrown] from comment #12)
> > I checked with :callek today -- reftests are still not being assigned to new
> > tegras, so backing this out should not cause many new test failures. I will
> > look into the alternative suggested in Comment 10.
> > 
> > Backed out:
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/907354856a07
> 
> Shouldn't the (still applied AFAIK) workaround in bug 792212 be preventing
> this problem from reoccuring, still?

I thought/hoped that the bug 792212 workaround would continue to be effective. But I believe it worked by changing the timing of our startup...and that will tend to change as the code evolves.
Following up on Comment 10...

Calling compositionResumeRequested in GeckoLayerClient.notifyGeckoReady is often ineffective because the compositor is not yet created (mCompositorCreated is false).

I thought it would be okay to call compositionResumeRequested in GeckoLayerClient.compositorCreated, but that does not work either -- the CompositorParent surface size is not updated. I don't know why yet.
Calling compositionResumeRequested from GeckoLayerClient.compositorCreated fails because nsWindow.sCompositorParent is null -- nsWindow.SetCompositor has not yet been called. I'm looking for a better place (later in initialization) to update the compositor...open to suggestions!
Here's an alternative solution: Instead of using SIZE_CHANGED to propagate bounds to the CompositorParent, do it in nsWindow::SetCompositor. This is the first chance to update the CompositorParent from nsWindow, and the nsWindow already has the correct bounds. It seems appropriate to sync-up the bounds with the new CompositorParent at this point. 

nsWindow::SetCompositor seems to be called multiple times during Fennec startup; this patch calls ScheduleResumeCompositor only on the first such call. Given the trouble with the previous patch, I'm trying to be conservative: propagate the bounds only when needed to get the reftests passing.

snorp: Can you verify that this patch does not cause trouble with full-screen flash?
Attachment #668415 - Attachment is obsolete: true
Attachment #673472 - Flags: review?(chrislord.net)
Attachment #673472 - Flags: feedback?(snorp)
Comment on attachment 673472 [details] [diff] [review]
update CompositorParent bounds from nsWindow on initialization

Review of attachment 673472 [details] [diff] [review]:
-----------------------------------------------------------------

This looks ok to me, but I'm really not sure I'm the best person to review this anymore - Let's see what snorp says.

::: widget/android/nsWindow.cpp
@@ +2241,5 @@
>  nsWindow::SetCompositor(mozilla::layers::CompositorParent* aCompositorParent,
>                          mozilla::layers::CompositorChild* aCompositorChild)
>  {
> +    bool firstSetting = false;
> +    if (!sCompositorParent && aCompositorParent) {

Could/should this be sCompositorParent != aCompositorParent?
Attachment #673472 - Flags: review?(chrislord.net) → review+
(In reply to Chris Lord [:cwiiis] from comment #21)
> ::: widget/android/nsWindow.cpp
> @@ +2241,5 @@
> >  nsWindow::SetCompositor(mozilla::layers::CompositorParent* aCompositorParent,
> >                          mozilla::layers::CompositorChild* aCompositorChild)
> >  {
> > +    bool firstSetting = false;
> > +    if (!sCompositorParent && aCompositorParent) {
> 
> Could/should this be sCompositorParent != aCompositorParent?

No: If aCompositorParent == 0, we do not want to resize. Possibly aCompositorParent && (sCompositorParent != aCompositorParent) ...
Unfortunately, try runs show that this patch is not always effective:

https://tbpl.mozilla.org/?tree=Try&rev=575957e7324c&noignore=1
I think you want to be setting the size on it whenever aCompositorParent != 0, not just on the "firstSetting". I'm also curious to know why it's getting set multiple times, if you can provide backtraces of the different invocations of that functions that might be useful.
(In reply to Kartikaya Gupta (:kats) from comment #24)
> I think you want to be setting the size on it whenever aCompositorParent !=
> 0, not just on the "firstSetting". 

Good idea but that doesn't work either: https://tbpl.mozilla.org/?tree=Try&rev=d3e4a537874f&noignore=1
(In reply to Kartikaya Gupta (:kats) from comment #24)
> I'm also curious to know why it's getting
> set multiple times, if you can provide backtraces of the different
> invocations of that functions that might be useful.

I'm sure I was originally seeing SetCompositor called 5 times at startup, but now it is only 3 times:
 ~nsWindow calls SetCompositor(NULL, NULL)
 ~nsWindow calls SetCompositor(NULL, NULL)
 nsWindow::GetLayerManager calls SetCompositor(mCompositorParent, mCompositorChild) after CreateCompositor()

I am a little disturbed by the idea of ~nsWindow calling SetCompositor and thereby resetting static members...but I haven't seen it cause a problem.
There are 2 conditions necessary before the CompositorParent can have its size set correctly from nsWindow:
 - the CompositorParent must be created and set in nsWindow: nsWindow::SetCompositor must have been called to set sCompositor
 - nsWindow must know the window size: gAndroidBounds must be set

The test failures with the previous patch came about when SetCompositor was called before nsWindow had processed a SIZE_CHANGED event -- so gAndroidBounds was still (0,0) and the CompositorParent was updated with 0 size.

This patch allows for either SetCompositor or SIZE_CHANGED to occur first and only proceeds to update the CompositorParent size once both sCompositorParent and gAndroidBounds have been set
Attachment #673472 - Attachment is obsolete: true
Attachment #673472 - Flags: feedback?(snorp)
Comment on attachment 674229 [details] [diff] [review]
update CompositorParent bounds from nsWindow on initialization

Review of attachment 674229 [details] [diff] [review]:
-----------------------------------------------------------------

I like this a lot better than the first patch and your explanation makes sense.

::: widget/android/nsWindow.cpp
@@ +2250,5 @@
>                          mozilla::layers::CompositorChild* aCompositorChild)
>  {
> +    bool sizeChangeNeeded = false;
> +    if (aCompositorParent && !sCompositorParent && gAndroidBounds.width != 0) {
> +        sizeChangeNeeded = true;

You can just do this all on one line:

bool sizeChangeNeeded = aCompositorParent && !sCompositorParent && gAndroidBounds.width != 0
(In reply to Geoff Brown [:gbrown] from comment #26)
> (In reply to Kartikaya Gupta (:kats) from comment #24)
> > I'm also curious to know why it's getting
> > set multiple times, if you can provide backtraces of the different
> > invocations of that functions that might be useful.
> 
> I'm sure I was originally seeing SetCompositor called 5 times at startup,
> but now it is only 3 times:
>  ~nsWindow calls SetCompositor(NULL, NULL)
>  ~nsWindow calls SetCompositor(NULL, NULL)
>  nsWindow::GetLayerManager calls SetCompositor(mCompositorParent,
> mCompositorChild) after CreateCompositor()
> 

Ah, thanks for digging into this. The fact that we create a bunch of nsWindows on startup concerns me quite a bit (I've run into this behaviour before and tried investigating it but still don't know why we do it). It's not part of this bug, but good to know that's why SetCompositor was getting called.

> I am a little disturbed by the idea of ~nsWindow calling SetCompositor and
> thereby resetting static members...but I haven't seen it cause a problem.

Yeah, I think there's a bunch of code that assumes there will be only one nsWindow object, but in practice I think once we're started up there are 3 or 4 that remain in existence during the Fennec lifetime. There are a couple that are created and destroyed during startup too, as you noticed.

> Comment on attachment 674229 [details] [diff] [review]
> update CompositorParent bounds from nsWindow on initialization
> 

This patch looks good to me as well, it makes sense conceptually.
Cosmetic change only, as per comment 28. r=snorp (via irc)
Attachment #674229 - Attachment is obsolete: true
Attachment #674329 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ef02503c4387
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Duplicate of this bug: 792212
We need this backed out on Aurora too.
Blocks: 809005
Or rather, we need the new fix on Aurora.
Really, we need to back out the original patch, rev a7a10b14ff06, from aurora, to fix bug 809005. That may cause reftest failures, so we need to also apply the new patch to aurora, rev ef02503c4387.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 674329 [details] [diff] [review]
update CompositorParent bounds from nsWindow on initialization

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 809005
User impact if declined: See comment 37. Fullscreen flash is broken on aurora, by an earlier patch for this bug; that patch must be backed out. Doing so will likely cause reftest failures unless this patch is then applied.
Testing completed (on m-c, etc.): on m-c since 2012-10-23
Risk to taking this patch (and alternatives if risky): Minimal risk.
String or UUID changes made by this patch: none
Attachment #674329 - Flags: approval-mozilla-aurora?
Comment on attachment 674329 [details] [diff] [review]
update CompositorParent bounds from nsWindow on initialization

[Triage Comment]
Approving for Aurora, along with whatever backout is necessary to resolve bug 809005.
Attachment #674329 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
tracking-fennec: --- → ?
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.