Closed Bug 940257 Opened 11 years ago Closed 10 years ago

Only small rectangle area is rendered

Categories

(Core :: Graphics: Layers, defect)

26 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
blocking-b2g koi+
Tracking Status
firefox27 --- wontfix
firefox28 --- unaffected
firefox29 --- unaffected
b2g-v1.2 --- fixed
b2g-v1.3 --- unaffected

People

(Reporter: sotaro, Assigned: botond)

References

Details

(Keywords: regression, verifyme, Whiteboard: [b2g])

Attachments

(4 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #923303 +++

On STR in Bug 923303 Comment 0, sometimes only small rectangle area is rendered like attachment 8333734 [details]. I confirmed on v1.2 hamachi and master hamachi. On master, the rendering became normal more easily than v1.2.
No longer blocks: GFXB2G1.2
No longer depends on: 934602, 905882
blocking-b2g: --- → koi?
How often is "sometimes"?  If this is not very common, we may want to wait for 1.3.
On STR in Bug 923303 Comment 0, it happens 1/3. But I rarely see this symptom in out of the STR.
It seems to related to APZC.
If it involves orientation change to reproduce, Botond may already have a fix.
(In reply to Milan Sreckovic [:milan] from comment #4)
> If it involves orientation change to reproduce, Botond may already have a
> fix.

My screen orientation change-related fix was bug 937896 which landed on Thursday (Nov 14), so if this problem happens with master newer than Thursday then I guess my patch didn't fix this.

That said, there is a known issue with composition bounds on B2G (bug 935219) which may be causing this problem.
(In reply to Botond Ballo [:botond] from comment #5)
> (In reply to Milan Sreckovic [:milan] from comment #4)
> > If it involves orientation change to reproduce, Botond may already have a
> > fix.
> 
> My screen orientation change-related fix was bug 937896 which landed on
> Thursday (Nov 14), so if this problem happens with master newer than
> Thursday then I guess my patch didn't fix this.

I reproduced it by using yesterday's master code.
Using a 1.2 base image, we should try to get a regression range by applying gecko/gaia bits on top.
QA Contact: sparsons
Keywords: qawanted
Still waiting on QA confirmation on the build that comment#7 suggests
This issue started to occur on the Buri 1.2 Build ID: 20131026004003

Gaia   de70f4d9b7a8dd3ba2565184735094eaa2bd3d18
SourceStamp 780de7966fc0
BuildID 20131026004003
Version 26.0a2
Base: 20131115

Last working Buri 1.2 Build ID: 20131025004000

Gaia   606517ceafe0950c2b89822d5f13353743334f2c
SourceStamp 5eabd267ef04
BuildID 20131025004000
Version 26.0a2
Base: 20131115
blocking-b2g: koi? → koi+
Assignee: nobody → sotaro.ikeda.g
Do we have an update on the progress of this bug?
Let's reconsider it.
blocking-b2g: koi+ → koi?
(In reply to Milan Sreckovic [:milan] from comment #11)
> Let's reconsider it.

I don't think so. This is a really bad browser bug that is not acceptable to ship with. We're heavily judged on our browser quality & having proof of a 1/4 rendering of a screen is going to be embarrassing. Along with the fact that this is a confirmed regression.
koi+ for bad browser experience.
blocking-b2g: koi? → koi+
Scanning through the regression range, I do see one APZC bug in the list that might have caused this - bug 923482.

kats - Could bug 923482 cause this regression?
Flags: needinfo?(bugmail.mozilla)
I would consider it pretty unlikely to have been caused by 923482 but I see nothing else in that range that's likely either. It would be worth trying a master build with that backed out to see if it still happens. I think it should back out pretty cleanly.
Flags: needinfo?(bugmail.mozilla)
The bug seems fix on master hamachi ROM that made from today's master. Sometimes small rectangle is drawn in the STR, but soon after display is redrawn as correct size.
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> The bug seems fix on master hamachi ROM that made from today's master.
> Sometimes small rectangle is drawn in the STR, but soon after display is
> redrawn as correct size.

Sotaro, is this actually a testable patch on master?  i didnt see a patch in this bug (or a related bug fix), so i'm a bit confused on what fixed it.  If it is indeed fixed, please provide more information, how to test it, and QA can take a look to see if this is fixed and upliftable to 1.2.

Thanks,
Tony
Flags: needinfo?(sotaro.ikeda.g)
The STR is Comment 0 from bug 923303.  Don't know what fixed it.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> The bug seems fix on master hamachi ROM that made from today's master.
> Sometimes small rectangle is drawn in the STR, but soon after display is
> redrawn as correct size.

Can we nail down the last build this reproducing on in master & first fixed on master? That might help figure out what patch we need to uplift here.
From the nightly ROM, I already confirmed the fix is between the following.
- hamachi-mozilla-central-20131212040203-ril01.02.00.019.102.zip
- hamachi-mozilla-central-20131213040203-ril01.02.00.019.102.zip
Remove regressionwindow-wanted form Comment 21.
I checked the a change of the fixing the bug based on git log information. From the check Bug 932728 seem to fix the problem. But it is not directly related to the drawing logic. It seems to fix the problem just by changing the event timing :-( And patches in Bug 932728 can not applied to b2g v1.2.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> I would consider it pretty unlikely to have been caused by 923482 but I see
> nothing else in that range that's likely either. It would be worth trying a
> master build with that backed out to see if it still happens. I think it
> should back out pretty cleanly.

I tried the rom to backout a patch in Bug 923482, the problem was not fixed. As in comment 24, just timing change might affect the problem.
(In reply to Botond Ballo [:botond] from comment #5)
> 
> That said, there is a known issue with composition bounds on B2G (bug
> 935219) which may be causing this problem.

botond, is it possible to apply a patch in bug 935219? It seems not easy to apply to b2g v1.2.
Flags: needinfo?(botond)
From the analysis until now, simple uplift seems not work for this bug. Even in master, sometimes small rectangle area is drawn, but soon after redraw in correct size.
This week I have to work for Bug 925444 and go into 2 week PTO. It might be better to be handled by an APZC experienced engineer.
(In reply to Sotaro Ikeda [:sotaro] (PTO Dec. 21 ~ Jan. 5) from comment #28)
> This week I have to work for Bug 925444 and go into 2 week PTO. It might be
> better to be handled by an APZC experienced engineer.

Is there someone that you can suggest?   Actually, this bug is marked koi+, and should be higher priority than bug 925444, which is 1.3+. Triage would like to see patches landed before 12/20, else bugs will get bumped or backouts need to happen.

Milan, Vivien, do you know someone that can help?
Flags: needinfo?(milan)
(In reply to Sotaro Ikeda [:sotaro] (PTO Dec. 21 ~ Jan. 5) from comment #28)
> This week I have to work for Bug 925444 and go into 2 week PTO. It might be
> better to be handled by an APZC experienced engineer.

kats - can you help here?
Flags: needinfo?(bugmail.mozilla)
I've kicked off a koi-hamachi build and will see if I can reproduce the problem. If so then I can try to debug it.
I can repro the problem on my hamachi running 1.2. I'll take a look.
Flags: needinfo?(milan)
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(botond)
Assignee: sotaro.ikeda.g → bugmail.mozilla
So as I was debugging this I kept running into the screen flickering issue from bug 923303, so I did a ./flash.sh -f to flash all the partitions and pick up the fix for that bug. It worked and I no longer see the screen flickering, but I also can no longer reproduce *this* bug either.

Sotaro, when you were seeing this bug, were you using the latest images (as you would get from ./flash.sh -f)?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #33)
> So as I was debugging this I kept running into the screen flickering issue
> from bug 923303, so I did a ./flash.sh -f to flash all the partitions and

If you saw screen flickering of bug 923303. Your phone seems not have correct base image. Correct base image is US_V1.2_20131111.cfg. On the image I use ./flash.sh -f to the ROM, I built by myself.

> Sotaro, when you were seeing this bug, were you using the latest images (as you would get from ./flash.sh -f)?

I often use ./flash.sh -f to update almost image.
Flags: needinfo?(sotaro.ikeda.g)
> If you saw screen flickering of bug 923303. Your phone seems not have
> correct base image. Correct base image is US_V1.2_20131111.cfg.

To flash the rom, you need to use a proprietary tool.
I tried pinging you on IRC but didn't get a response. Can you tell me where to get that base image and how to install it?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #36)
> I tried pinging you on IRC but didn't get a response. Can you tell me where
> to get that base image and how to install it?

See https://intranet.mozilla.org/QA/B2G_Flash_machine#Flashing_on_TCL_Partner_Builds_.28Alcatel_One_Touch_Fire.29.
Jason, thanks. By using the tool you need to flash US_V1.2_20131111.cfg.
Flags: needinfo?(sotaro.ikeda.g)
Just an update: my hamachi may or may not be bricked now. I'll to revive it a bit longer but if somebody else can take this bug that might be a good idea.

Also, word of warning: running the teleweb tool on windows 8 is not a good idea.
Flags: needinfo?(milan)
Yeah, this tool sometimes causes a problem. I also bricked one buri phone in the past.
I have two hamachi phones. If you come to toronto office, I can give you one hamachi.
I don't know if I can make it in tomorrow. But I'll be in Toronto over the break so if you leave it somewhere I can find it, then I can pick it up when I come and continue debugging. It would be even better if you can make sure it's flashed with the right base image so then I can just use it right away.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #42)
> I don't know if I can make it in tomorrow. But I'll be in Toronto over the
> break so if you leave it somewhere I can find it, then I can pick it up when
> I come and continue debugging. It would be even better if you can make sure
> it's flashed with the right base image so then I can just use it right away.

If Sotaro or someone else in the office has a hamachi phone with the right base image installed, I can take a look. (I've also had trouble running the teleweb tool, and I also only have windows 8 to run it on.)
(In reply to Sotaro Ikeda [:sotaro] (PTO Dec. 21 ~ Jan. 5) from comment #38)
> Jason, thanks. By using the tool you need to flash US_V1.2_20131111.cfg.

FWIW, it's 20131115.cfg

(In reply to Botond Ballo [:botond] from comment #43)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #42)
> > I don't know if I can make it in tomorrow. But I'll be in Toronto over the
> > break so if you leave it somewhere I can find it, then I can pick it up when
> > I come and continue debugging. It would be even better if you can make sure
> > it's flashed with the right base image so then I can just use it right away.
> 
> If Sotaro or someone else in the office has a hamachi phone with the right
> base image installed, I can take a look. (I've also had trouble running the
> teleweb tool, and I also only have windows 8 to run it on.)

Thanks Botond for stepping in to help too.  If you need any real time help with that stupid windows tool, come find me in #fxosqa.  there are others there that can help if i'm offline.
I will provide hamachi phone to Botond. I am flashing the ROM now.
(In reply to Tony Chung [:tchung] from comment #44)
> (In reply to Sotaro Ikeda [:sotaro] (PTO Dec. 21 ~ Jan. 5) from comment #38)
> > Jason, thanks. By using the tool you need to flash US_V1.2_20131111.cfg.
> 
> FWIW, it's 20131115.cfg
> 

My mistake. I recognize it just before flashing the rom...
(In reply to Sotaro Ikeda [:sotaro] (PTO Dec. 21 ~ Jan. 5) from comment #45)
> I will provide hamachi phone to Botond. I am flashing the ROM now.

I pass flashed hamachi to Botond.
Thanks, Sotaro! I can reproduce the issue.

The composition bounds values look OK, so it's not related to bug 935219.

I will keep looking to see whether it's some other APZC bug.
I think it is too late to deal with this in 1.2.  Let's make it 1.3? and discuss it there (actually, it's getting late to deal with it there as well)
blocking-b2g: koi+ → koi?
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #49)
> I think it is too late to deal with this in 1.2.  Let's make it 1.3? and
> discuss it there (actually, it's getting late to deal with it there as well)

We've discussed this multiple times - we aren't moving this to 1.3, as the UX too bad to be acceptable. This needs to be fixed in 1.2.
blocking-b2g: koi? → koi+
OK, but it may be end of January before we get a fix.  It may be earlier, but whoever is tracking 1.2, if this will stop us from shipping it, should be ready for any delays this may cause.
(In reply to Milan Sreckovic [:milan] from comment #51)
> OK, but it may be end of January before we get a fix.  It may be earlier,
> but whoever is tracking 1.2, if this will stop us from shipping it, should
> be ready for any delays this may cause.

We can't wait that long. We've known about this bug since mid-November & known as a blocker since end of November. Why are we prioritizing to get a fix here asap? What's the blocking issues from preventing a fix here in the short-term?
Mentioned in IRC - I think at this point we need to consider discussing each remaining 1.2 gfx blockers on b2g-release-drivers to figure out what to do here.
Just for a better understanding of severity of this bug - can someone get a video of this bug?
Keywords: qawanted
Assignee: bugmail.mozilla → botond
Here is the link to a video of the bug: 

http://youtu.be/XAAzFIOY7Yw
Keywords: qawanted
(In reply to Botond Ballo [:botond] from comment #48)
> Thanks, Sotaro! I can reproduce the issue.
> 
> The composition bounds values look OK, so it's not related to bug 935219.
> 
> I will keep looking to see whether it's some other APZC bug.

I investigated this a bit. What seems to be happening is:

 - the page has size 980 x 1578 CSS pixels
 - initially, the zoom is 0.327, so the entire width of page is visible (screen width is 320 pixels)
 - accordingly, displayport is initially the whole 980 x 1578 CSS pixels
 - now, APZC gets a NotifyLayersUpdated request with a zoom of 1, and a correspondingly reduced
   displayport of 320 x 740 pixels (this reduction seems wrong, by the way, the width should
   be more than 320 pixels so horizontal scrolling will be smooth)
 - the page seems to get stuck in a state where the zoom has not yet taken effect, but the
   reduced displayport of 320 x 740 pixels has, and accordingly only a small portion of the
   page is rendered

I will continue investigating.
After updating to a more recent version of gecko (on the 1.2 branch), I can no longer reproduce this.

I can repro it with this revision: 55c160b2f3755218622564f9cde22dd364761fae (Jan 2).
But not this one: a5644aa31966e3eacdcf28f435bf9c26016ad4ff (Jan 6).

Is it possible that something in between fixed the problem?

Can anyone else still repro this with today's Gecko?
Flags: needinfo?(sparsons)
Flags: needinfo?(sotaro.ikeda.g)
Keywords: qawanted
(In reply to Botond Ballo [:botond] from comment #57)
> After updating to a more recent version of gecko (on the 1.2 branch), I can
> no longer reproduce this.
> 
> I can repro it with this revision: 55c160b2f3755218622564f9cde22dd364761fae
> (Jan 2).
> But not this one: a5644aa31966e3eacdcf28f435bf9c26016ad4ff (Jan 6).
> 
> Is it possible that something in between fixed the problem?
> 
> Can anyone else still repro this with today's Gecko?

I wonder if bug 951361 fixed this bug. What do you think?
I can still repro this issue on the Buri 1.2 Build ID: 20140106004001

Gaia   8441587c3b352e052fee07665c21fd192540f19f
SourceStamp d552c08a72d0
BuildID 20140106004001
Version 26.0
Flags: needinfo?(sparsons)
Keywords: qawanted
(In reply to Sarah Parsons from comment #59)
> I can still repro this issue on the Buri 1.2 Build ID: 20140106004001
> 
> SourceStamp d552c08a72d0

This Gecko revision is from Jan 3: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/d552c08a72d0

The one I could not reproduce it with was from Jan 6.

Perhaps we can wait until the next nightly build and try to repro with that?
(In reply to Botond Ballo [:botond] from comment #60)
> (In reply to Sarah Parsons from comment #59)
> > I can still repro this issue on the Buri 1.2 Build ID: 20140106004001
> > 
> > SourceStamp d552c08a72d0
> 
> This Gecko revision is from Jan 3:
> http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/d552c08a72d0
> 
> The one I could not reproduce it with was from Jan 6.
> 
> Perhaps we can wait until the next nightly build and try to repro with that?

Hmm - the three other patches that aren't included in this build I don't think should have an effect on this, as two of them are JS related (assertion, crash), the last is a test only fix.
Never mind, I can still repro it. It's just fairly intermittent. I will continue debugging it.
I also confirmed the problem still happens on today's v1.2 hamachi.
Flags: needinfo?(sotaro.ikeda.g)
Here is what some further investigation revealed:

It seems that, for the pages in question, at any point in time, there is just one APZC in the APZC tree, the root APZC for the content document. So far, so good. Over time, though, it seems that for every page that is loaded, two APZC objects are created, both with the same ScrollableLayerGuid (each taking their turn being the only APZC in the tree). 

The first APZC object:
  - appears around the time the page starts loading
  - it never receives a NotifyLayersUpdated (henceforth NLU) with aIsFirstPaint = 1
  - the zoom, resolution, and cumulative zoom in its NLUs is always 1
  - the scrollable rect in its NLUs varies from page to page, is constant over its lifetime,
    yet does not seem to be the correct scrollable rect for the page
  - the displayport in its NLUs is always zero

The second APZC object:
  - appears around the time the page has loaded completely
  - the first NLU it receives has aIsFirstPaint = 1, subsequent ones aIsFirstPaint = 0
  - the zoom, resolution, and cumulative zoom in its NLUs is initially whatever factor is 
    necessary to fit the page into the screen (typically 980/320 = 0.327), and after
    zooming whatever the zooming changed it to be
  - its NLUs have correct-looking scrollable rects and displayports

Keep in mind that these two APZCs have the same ScrollableLayerGuid, but the SLG changes when a new page is loaded.

Another thing that happens every time a new page is loaded, is an UpdateCompositionBounds message is sent to APZ, with the target APZC being identified by its SLG. The UpdateCompositionBounds typically causes a RequestContentRepaint.

Now, whether or not this bug occurs seems to depend on which of the above two APZCs are around to handle the UpdateCompositionBounds. 

  - If it's the first, the RequestContentRepaint that is sent off will contain
    a displayport calculated (in RequestContentRepaint) based on the incorrect
    scrollable rect maintained by this APZC. In our test case, this displayport
    is 320 x 740, based on the incorrect scrollable rect of 320 x 740. The correct
    scrollable rect is 980 x 1578. By the time the corresponding NLU gets back
    to APZ, the second APZC has invariably taken the place of the first, and
    enacted its zoom of 0.327 and correct scroll rect and displayport - but
    this NLU overwrites the displayport with the incorrect requested one
    of 320 x 740. The result is that only 320 x 740 of the page's 980 x 1578
    CSS pixels are drawn.

  - If it's the second, the RequestContentRepaint that is sent off will contain
    a correct displayport, and all is well.

I will continue investigating this tomorrow. My ideas for where to continue to investigate are:

  1) Where the scrollable rect of the first APZC comes from.

  2) Whether we can prevent an NLU corresponding to a RequestContentRepaint 
     sent by the first APZC from being handled by the second APZC.

In the meantime, I just wanted to bring this analysis to Kats' attention in case he has any ideas.
Whoops, bugzilla overwrote my ni?
Flags: needinfo?(bugmail.mozilla)
I'm curious to know why we create a second APZC at all. If it is because the layer instance changes and we don't have the old APZC anymore maybe we should just backport the recent changes to UpdatePanZoomControllerTree (bug 934420 and a bit of bug 937185) which would do a better job of keeping the same APZC instance and solve the problem.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #66)
> I'm curious to know why we create a second APZC at all. If it is because the
> layer instance changes and we don't have the old APZC anymore 

I assumed so. I can look into it and verify.

> maybe we
> should just backport the recent changes to UpdatePanZoomControllerTree (bug
> 934420 and a bit of bug 937185) which would do a better job of keeping the
> same APZC instance and solve the problem.

I'm not sure I see how this would solve the problem. The same possibility of a RequestContentRepaint being fired with the old/wrong values and its NotifyLayersUpdated then stomping on the new/correct values will remain.
(In reply to Botond Ballo [:botond] from comment #67)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #66)
> > I'm curious to know why we create a second APZC at all. If it is because the
> > layer instance changes and we don't have the old APZC anymore 
> 
> I assumed so. I can look into it and verify.

Yep, the layer instance changes.
Here is an attempt at a fix. It involves detecting when a NotifyLayersUpdated contains a displayport from a stale RequestContentRepaint (where "stale" is defined for this purpose as "the scrollable rect has changed"), and requests a content repaint for that purpose. This is the Part 2 patch.

To perform this detection, we need the APZC instance that sent the RequestContentRepaint to be the same as the one handling the NotifyLayersUpdated, so I also uplifted the patch from bug 934420 and part of the patch from bug 937185 which keep the same APZC instance when the layer instance changes but the SLG stays the same. This is the Part 1 patch.

With these patches, you can still see a flash of the small rectangle, but then it corrects itself on the next repaint. Hopefully this is accceptable.
Attachment #8356627 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8356626 [details] [diff] [review]
Part 1 - Keep the same APZC instance when the layer instance changes but the SLG remains the same (uplift of bug 934420 and part of bug 937185)

Review of attachment 8356626 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comments addressed

::: gfx/layers/composite/APZCTreeManager.cpp
@@ +127,5 @@
> +        // If the content represented by the container layer has changed (which may
> +        // be possible because of DLBI heuristics) then we don't want to keep using
> +        // the same old APZC for the new content. Null it out so we run through the
> +        // code to find another one or create one.
> +        if (apzc && !apzc->Matches(ScrollableLayerGuid(aLayersId, container->GetFrameMetrics()))) {

This should be FullyMatches as well

::: gfx/layers/ipc/AsyncPanZoomController.cpp
@@ +1467,5 @@
>  }
>  
> +bool AsyncPanZoomController::FullyMatches(const ScrollableLayerGuid& aGuid)
> +{
> +  return aGuid.mLayersId == mLayersId 

trailing ws
Attachment #8356626 - Flags: review?(bugmail.mozilla) → review+
Updated Part 1 patch to address review comments. Carrying r+.
Attachment #8356626 - Attachment is obsolete: true
Attachment #8356678 - Flags: review+
Could QA verify that these patches fix the issue? Thanks!
Keywords: qawanted
(In reply to Botond Ballo [:botond] from comment #74)
> Could QA verify that these patches fix the issue? Thanks!

We'll need to do that post landing - we don't have gecko try builds to use here. Let's land this on trunk & test it there.
Keywords: qawantedverifyme
Note that the patches don't make sense for trunk (they guard against the effects of a RequestContentRepaint caused by an UpdateCompositionBounds, and UpdateCompositionBounds has been removed on trunk). They should land on the 1.2 branch only.
needinfo?ing Ryan to do the actual landing on the 1.2 branch.
Flags: needinfo?(ryanvm)
Botond confirmed on IRC that this doesn't affect Aurora (v1.3) either.

https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/e3fc3dee4775
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/9d2a5bc8947d
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(ryanvm)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: