Closed Bug 964517 Opened 10 years ago Closed 10 years ago

[Accuweather] Search result page with an empty string overlaps for a few seconds

Categories

(Core :: Layout, defect)

28 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: sarsenyev, Assigned: botond)

References

Details

(Keywords: regression, Whiteboard: dogfood1.3)

Attachments

(11 files, 1 obsolete file)

4.47 MB, audio/ogg
Details
138.35 KB, text/plain
Details
63.23 KB, image/png
Details
64.26 KB, image/png
Details
870 bytes, patch
roc
: review+
Details | Diff | Splinter Review
1.49 KB, patch
botond
: review+
Details | Diff | Splinter Review
419 bytes, patch
Details | Diff | Splinter Review
360 bytes, patch
Details | Diff | Splinter Review
315 bytes, patch
Details | Diff | Splinter Review
289 bytes, patch
Details | Diff | Splinter Review
46 bytes, text/x-github-pull-request
alive
: review+
RyanVM
: checkin+
Details | Review
Attached audio VideoBug.1-27-14.ogg
Description:
After clicking on "Search" with an empty string, the page is navigated to the search page with double layers, black and white color, during switching the background colors the page is overlapped until the new page with the white background opens 

Repro Steps:
1) Updated Buri to BuildID: 20140127004002
2) Download the AccuWeather app from Marketplace
3) Open the app and navigate to the search location bar
4) Hit enter on search button without entering any search string

Actual:
The new search opens with overlapping text for a few sec

Expected:
1))The search page is not overlaps the text
2) A user cannot submit the search without entering any string

 
Device: Buri 1.3 MOZ
BuildID: 20140127004002
Gaia: 25a45a836a4a21a30f63fa7b544b42e8b781180a
Gecko: c40099a42c1f
Version: 28.0a2
Firmware Version: v1.2-device.cfg

Notes:
Repro frequency: 100%
See attached: video
Doesn't reproduce on 1.2
The "Search" result page doesn't switch between black and white background, so the issue haven't been seen

Device: Buri 1.2 MOZ
BuildID: 20140127004002
Gaia: 539a25e1887b902b8b25038c547048e691bd97f6
Gecko: d10e1f965d0c
Version: 26.0
Firmware Version: v1.2-device.cfg
No longer blocks: b2g-accuweather
blocking-b2g: --- → 1.3?
Component: Preinstalled B2G Apps → Graphics: Layers
Product: Tech Evangelism → Core
Version: unspecified → 28 Branch
QA Contact: jzimbrick
Regression Window:

Last Working Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20140109004002
Gaia: 22bc6be5b76cdc6d4e9667ff070979041a20ce2f
Gecko: 2c8f8683bd0d
Version: 28.0a2
Base Image: V1.2-device.cfg

First Broken Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20140110004009
Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
Gecko: c5109884ae3a
Version: 28.0a2
Base Image: V1.2-device.cfg
1.3+ for regression
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(milan)
Botond, can you take a look and see if this is in our area?
Assignee: nobody → botond
Flags: needinfo?(milan) → needinfo?(botond)
Testing on Nexus 4, I see the background change from black to white, but I don't see the text overlapping seen in the video. 

However, something other things trigger text overlapping:
  1) Switching the device's orientation
  2) Scrolling up and down quickly

This makes me wonder if the problem is related to bug 959781.
Nick Cameron sent me some patches that make the text overlapping problem in bug 959781 go away. They also make the text overlapping problems in this bug go away, suggesting that the two have the same cause.

Closing as duplicate of bug 959781. The background changing from black to white is a different issue, I will file a separate bug for that.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(botond)
Resolution: --- → DUPLICATE
965389(In reply to Botond Ballo [:botond] from comment #6)
> Closing as duplicate of bug 959781. The background changing from black to
> white is a different issue, I will file a separate bug for that.

I filed bug 965389 for the background changing issue.
It was brought to my attention that bug 959781 is believed to be a regression caused by bug 951554, which landed for 1.4 only, while this behaviour is seen with 1.3. Reopening while I investigate more carefully.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
See Also: → 959781
See Also: → 965389
I had a lot of trouble getting a Buri 1.3 build with working wifi that could actually access the Marketplace and where the AccuWeather app is functioning, but I finally was able to test with it. 

With this build, I see the text overlapping shown in the video, but not the one described in comment #5, suggesting that they are different issues and thus this bug is unrelated to bug 959781. 

I will try to investigate the overlapping issue shown in the video, but from the symptoms it looks more likely to be a graphics bug rather than an APZ bug (turning on APZ causes displayports to be set which can cause different codepaths to be exercised in graphics code, as well as change timings which can trigger timing-related bugs).
See Also: 959781
(In reply to J Zimbrick from comment #2)
> Regression Window:
> 
> Last Working Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140109004002
> Gaia: 22bc6be5b76cdc6d4e9667ff070979041a20ce2f
> Gecko: 2c8f8683bd0d
> Version: 28.0a2
> Base Image: V1.2-device.cfg
> 
> First Broken Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140110004009
> Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
> Gecko: c5109884ae3a
> Version: 28.0a2
> Base Image: V1.2-device.cfg

I can't reproduce the bug with the "First Broken" revision. I can, however, with the latest code in the 1.3 tree.
Strange. I'm able to reproduce the issue on the build listed for First Broken.

Device: Buri v1.3 Mozilla RIL
BuildID: 20140110004009
Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
Gecko: c5109884ae3a
Version: 28.0a2
Base Image: V1.2-device.cfg
(In reply to J Zimbrick from comment #11)
> Strange. I'm able to reproduce the issue on the build listed for First
> Broken.
> 
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140110004009
> Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
> Gecko: c5109884ae3a
> Version: 28.0a2
> Base Image: V1.2-device.cfg

I tried downloading the prebuilt nightly and it does indeed repro. But not when I build those Gecko/Gaia revisions myself (which is what I was doing before). I wonder what the difference is.
(In reply to J Zimbrick from comment #2)
> Regression Window:
> 
> Last Working Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140109004002
> Gaia: 22bc6be5b76cdc6d4e9667ff070979041a20ce2f
> Gecko: 2c8f8683bd0d
> Version: 28.0a2
> Base Image: V1.2-device.cfg
> 
> First Broken Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140110004009
> Gaia: c606b129a2d1647c0fc7bfb197555026e9b27f9e
> Gecko: c5109884ae3a
> Version: 28.0a2
> Base Image: V1.2-device.cfg

Looking at the Gecko pushlog between these revisions, I see

 Bug 909877 - Enable APZ for all of gaia. rs=milan 

in there. So you probably tested the Last Working with APZ off and the First Broken with APZ on.

When I test Last Working with APZ on, I can repro the bug. 

This suggests that the problem has been there all along (or at least since earlier than Jan-09) with APZ on, which in turn means this revision range isn't very useful for us.
Talked to BenWa about this bug, he suggested it may be an invalidation bug. CC'ing some layout people in case they have any insights/suggestions.
A little more detail about what's going on here: shortly after loading, the page is dynamically inserting a DOM node with the text "Use Current Location" in the middle of the page. This is supposed to have the effect of moving the existing nodes down the page, but for a handful for frames what we actually see if the new text drawn overlapping with the content that used to be at that height on the page, which is another piece of text, "Browse for your location".

Attached is a portion of logcat output from the AccuWeather app's process near the frame where the text overlapping occurs, including display list dumps and invalidation logging.

The output covers six paints. The "Use Current Location" text first appears in the display list in the third paint. The bounds of the "Browse for your location" display item is correctly shifted down in the display list dump (both before and after optimization), and the invalidation logging in between shows the rectangles corresponding to the old and new positions of "Browse for your location" being correctly invalidated.
The above suggests that this is not an invalidation problem, but rather painting/graphics-related. I will take a moz2d recording to try to narrow the cause further.

By the way, this bug seems to reproduce on the 1.3 branch only, not with master, and it reproduces with both Nexus 4 and Buri.
(In reply to Botond Ballo [:botond] from comment #16)
> By the way, this bug seems to reproduce on the 1.3 branch only, not with
> master, and it reproduces with both Nexus 4 and Buri.

Since this bug reproduces on the 1.3 branch, but not on master, it suggests that it was fixed by a commit on the master branch that's not on the 1.3 branch. It would be great to find the commit on the master branch that fixed this - a "fix window" so to speak.

I'm not sure whether "regressionwindow-wanted" or "qawanted" is the more appropriate keyword for this purpose,  marking as "qawanted".
Keywords: qawanted
I checked this issue on 1.4 on the first available build (12/10/13) and several points after that through December and January and could not reproduce this on any of them (Turned APZ on manually on each build, and checked with it off as well on each). The overlapping text does not seem to occur on 1.4 at any point.

Device: Buri v1.4 Mozilla RIL
BuildID: 20131210040206
Gaia: c952e2756c03eceb4de6a3eba15651741a62f9e8
Gecko: df82be9d89a5
Version: 29.0a1
Base Image: V1.2-device.cfg

I'll bet I could find a more specific regression window with APZ manually turned on somewhere after the 10th of December in 1.3, but I'll have to get to that tomorrow.
Keywords: qawanted
(In reply to J Zimbrick from comment #18)
> I checked this issue on 1.4 on the first available build (12/10/13) and
> several points after that through December and January and could not
> reproduce this on any of them (Turned APZ on manually on each build, and
> checked with it off as well on each). The overlapping text does not seem to
> occur on 1.4 at any point.
> 
> Device: Buri v1.4 Mozilla RIL
> BuildID: 20131210040206
> Gaia: c952e2756c03eceb4de6a3eba15651741a62f9e8
> Gecko: df82be9d89a5
> Version: 29.0a1
> Base Image: V1.2-device.cfg
> 
> I'll bet I could find a more specific regression window with APZ manually
> turned on somewhere after the 10th of December in 1.3, but I'll have to get
> to that tomorrow.

Leaving qawanted to check this.
Keywords: qawanted
Whiteboard: dogfood1.3 → dogfood1.3, [ETA: active triage]
Seems that it was just before the 10th on 1.3, around the time of 1.3 and 1.4 branching off. 

It should be noted that the "last working build" in this window was from (as far as I know) the last 1.3 build that was still considered central, and the "first broken" was from the same day, but from the aurora folder.

Both of these were tested with APZ manually turned on.

Last Working Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20131209053402
Gaia: 1d45d1dc3201059d5c8f2efdeb92c04576d8e161
Gecko: 9f12a9fab080
Version: 28.0a1
Base Image: V1.2-device.cfg

First Broken Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20131209182241
Gaia: ff703bf2c0baaf06b2b74144bec9b9cc99f22404
Gecko: e0c328d99742
Version: 28.0a2
Base Image: V1.2-device.cfg
Keywords: qawanted
(In reply to J Zimbrick from comment #20)
> Seems that it was just before the 10th on 1.3, around the time of 1.3 and
> 1.4 branching off. 
> 
> It should be noted that the "last working build" in this window was from (as
> far as I know) the last 1.3 build that was still considered central, and the
> "first broken" was from the same day, but from the aurora folder.
> 
> Both of these were tested with APZ manually turned on.
> 
> Last Working Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20131209053402
> Gaia: 1d45d1dc3201059d5c8f2efdeb92c04576d8e161
> Gecko: 9f12a9fab080
> Version: 28.0a1
> Base Image: V1.2-device.cfg
> 
> First Broken Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20131209182241
> Gaia: ff703bf2c0baaf06b2b74144bec9b9cc99f22404
> Gecko: e0c328d99742
> Version: 28.0a2
> Base Image: V1.2-device.cfg

I tried these two Gecko/Gaia combinations, and I can't repro with either. Strange.

Meanwhile, in my other line of investigation, I succeeded in getting a Moz2D recording (http://people.mozilla.org/~bballo/moz2drec_2_5129.aer) and playing it back. The recording shows the bad frames, but unfortunately it does not show the text drawing commands because recording these is not implemented for B2G. BenWa suggested the problem may be either with what layout is painting, or with buffer rotation, but it's not clear which. As a next step I will write some debug code to try to discriminate between these two cases.
Can we get ETA to fix this bug? Thank you.
Whiteboard: dogfood1.3, [ETA: active triage] → dogfood1.3, [ETA: 2/14]
(In reply to Botond Ballo [:botond] from comment #21)
> Meanwhile, in my other line of investigation, I succeeded in getting a Moz2D
> recording (http://people.mozilla.org/~bballo/moz2drec_2_5129.aer) and
> playing it back. The recording shows the bad frames, but unfortunately it
> does not show the text drawing commands because recording these is not
> implemented for B2G. BenWa suggested the problem may be either with what
> layout is painting, or with buffer rotation, but it's not clear which. As a
> next step I will write some debug code to try to discriminate between these
> two cases.

I added command at the beginning and end of DrawTargetCairo::FillGlyphs() to dump the surface as a PNG file before and after, and then examined the dumped PNGs. I found a before/after pair where the overlapping does not occur before but does occur after (see attachments), suggesting that the overlapping is produced by a bad draw command received from layout.
Here is a stack trace from the offending call to FillGlyphs():

0  mozilla::gfx::DrawTargetCairo::FillGlyphs (this=0xb039abe0, aFont=0xb1625c60, aBuffer=..., aPattern=..., aOptions=...) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/2d/DrawTargetCairo.cpp:935
#1  0xb4eb2726 in GlyphBufferAzure::Flush (this=0xbe9ea0e0, aDT=0xb039abe0, aContextPaint=0x0, aFont=0xb1625c60, aDrawMode=GLYPH_FILL, aReverse=false, aOptions=0xb258c2c0, aThebesContext=0xb1d624e0, aInvFontMatrix=0x0, 
    aDrawOptions=..., aFinish=true) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/thebes/gfxFont.cpp:2320
#2  0xb4eb43ac in gfxFont::Draw (this=0xb2faa060, aTextRun=0xb213c060, aStart=0, aEnd=20, aContext=0xb1d624e0, aDrawMode=GLYPH_FILL, aPt=0xbe9eb930, aSpacing=0x0, aContextPaint=0x0, aCallbacks=0x0)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/thebes/gfxFont.cpp:2830
#3  0xb4eba532 in gfxTextRun::DrawGlyphs (this=0xb213c060, aFont=0xb2faa060, aContext=0xb1d624e0, aDrawMode=GLYPH_FILL, aPt=0xbe9eb930, aContextPaint=0x0, aStart=0, aEnd=20, aProvider=0xbe9ebb88, aSpacingStart=0, aSpacingEnd=20, 
    aCallbacks=0x0) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/thebes/gfxFont.cpp:5880
#4  0xb4ebad4c in gfxTextRun::Draw (this=0xb213c060, aContext=0xb1d624e0, aPt=..., aDrawMode=GLYPH_FILL, aStart=0, aLength=20, aProvider=0xbe9ebb88, aAdvanceWidth=0xbe9ebce0, aContextPaint=0x0, aCallbacks=0x0)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/thebes/gfxFont.cpp:6088
#5  0xb5924cc4 in DrawTextRun (aTextRun=0xb213c060, aCtx=0xb1d624e0, aTextBaselinePt=..., aOffset=0, aLength=20, aProvider=0xbe9ebb88, aTextColor=4290145086, aAdvanceWidth=0xbe9ebce0, aContextPaint=0x0, aCallbacks=0x0)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/generic/nsTextFrame.cpp:5926
#6  0xb5924d0a in nsTextFrame::DrawTextRun (this=0xb039e7f8, aCtx=0xb1d624e0, aTextBaselinePt=..., aOffset=0, aLength=20, aProvider=..., aTextColor=4290145086, aAdvanceWidth=@0xbe9ebce0, aDrawSoftHyphen=false, aContextPaint=0x0, 
    aCallbacks=0x0) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/generic/nsTextFrame.cpp:5942
#7  0xb59254cc in nsTextFrame::DrawText (this=0xb039e7f8, aCtx=0xb1d624e0, aDirtyRect=..., aFramePt=..., aTextBaselinePt=..., aOffset=0, aLength=20, aProvider=..., aTextStyle=..., aTextColor=4290145086, aClipEdges=..., 
    aAdvanceWidth=@0xbe9ebce0, aDrawSoftHyphen=false, aDecorationOverrideColor=0x0, aContextPaint=0x0, aCallbacks=0x0) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/generic/nsTextFrame.cpp:6093
#8  0xb5924bc4 in nsTextFrame::PaintText (this=0xb039e7f8, aRenderingContext=0xb03be700, aPt=..., aDirtyRect=..., aItem=..., aContextPaint=0x0, aCallbacks=0x0)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/generic/nsTextFrame.cpp:5902
#9  0xb5920cec in nsDisplayText::Paint (this=0xb2fdbcd0, aBuilder=0xbe9ec474, aCtx=0xb03be700) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/generic/nsTextFrame.cpp:4546
#10 0xb57fe47c in mozilla::FrameLayerBuilder::PaintItems (this=0xb039ca80, aItems=..., aRect=..., aContext=0xb1d624e0, aRC=0xb03be700, aBuilder=0xbe9ec474, aPresContext=0xb1dbe800, aOffset=..., aXScale=1, aYScale=1, aCommonClipCount=0)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/FrameLayerBuilder.cpp:3407
#11 0xb57feb34 in mozilla::FrameLayerBuilder::DrawThebesLayer (aLayer=0xb2f9cb80, aContext=0xb1d624e0, aRegionToDraw=..., aClip=mozilla::layers::CLIP_DRAW_SNAPPED, aRegionToInvalidate=..., aCallbackData=0xbe9ec474)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/FrameLayerBuilder.cpp:3572
#12 0xb4f23e60 in mozilla::layers::ClientThebesLayer::PaintBuffer (this=0xb2f9cb80, aContext=0xb1d624e0, aRegionToDraw=..., aExtendedRegionToDraw=..., aRegionToInvalidate=..., aDidSelfCopy=false, 
    aClip=mozilla::layers::CLIP_DRAW_SNAPPED) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientThebesLayer.cpp:146
#13 0xb4f23b02 in mozilla::layers::ClientThebesLayer::PaintThebes (this=0xb2f9cb80) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientThebesLayer.cpp:76
#14 0xb4f23d3e in mozilla::layers::ClientThebesLayer::RenderLayer (this=0xb2f9cb80) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientThebesLayer.cpp:107
#15 0xb4f20b7e in mozilla::layers::ClientContainerLayer::RenderLayer (this=0xb03d6800) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientContainerLayer.h:81
#16 0xb4f223d8 in mozilla::layers::ClientLayerManager::EndTransactionInternal (this=0xb288f550, 
    aCallback=0xb57fe669 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*)>, aCallbackData=0xbe9ec474)
    at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientLayerManager.cpp:188
#17 0xb4f224b2 in mozilla::layers::ClientLayerManager::EndTransaction (this=0xb288f550, 
    aCallback=0xb57fe669 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*)>, aCallbackData=0xbe9ec474, 
    aFlags=mozilla::layers::LayerManager::END_NO_COMPOSITE) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/gfx/layers/client/ClientLayerManager.cpp:211
#18 0xb584e7a0 in nsDisplayList::PaintForFrame (this=0xbe9ec7c8, aBuilder=0xbe9ec474, aCtx=0x0, aForFrame=0xb2185298, aFlags=13) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsDisplayList.cpp:1221
#19 0xb584e094 in nsDisplayList::PaintRoot (this=0xbe9ec7c8, aBuilder=0xbe9ec474, aCtx=0x0, aFlags=13) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsDisplayList.cpp:1074
#20 0xb58662d0 in nsLayoutUtils::PaintFrame (aRenderingContext=0x0, aFrame=0xb2185298, aDirtyRegion=..., aBackstop=0, aFlags=772) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsLayoutUtils.cpp:2330
#21 0xb57ddedc in PresShell::Paint (this=0xb1d6d380, aViewToPaint=0xb1f8cf10, aDirtyRegion=..., aFlags=1) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsPresShell.cpp:5896
#22 0xb54466b4 in ProcessPendingUpdatesForView (aFlushDirtyRegion=<optimized out>, aView=0xb1f8cf10, this=0xb1f6e430) at ../../../gecko/view/src/nsViewManager.cpp:421
#23 nsViewManager::ProcessPendingUpdatesForView (this=0xb1f6e430, aView=0xb1f8cf10, aFlushDirtyRegion=<optimized out>) at ../../../gecko/view/src/nsViewManager.cpp:368
#24 0xb57eefc8 in nsRefreshDriver::Tick (this=0xb3118370, aNowEpoch=1391632019000592, aNowTime=...) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsRefreshDriver.cpp:1208
#25 0xb57ecc52 in mozilla::RefreshDriverTimer::TickDriver (driver=0xb3118370, jsnow=1391632019000592, now=...) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsRefreshDriver.cpp:168
#26 0xb57ecbb4 in mozilla::RefreshDriverTimer::Tick (this=0xb28eb500) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsRefreshDriver.cpp:160
#27 0xb57ecc78 in mozilla::RefreshDriverTimer::TimerTick (aTimer=0xb28fcc40, aClosure=0xb28eb500) at /home/botond/dev/mozilla/b2g-nexus-1.3/gecko/layout/base/nsRefreshDriver.cpp:185
#28 0xb4b99f4e in nsTimerImpl::Fire (this=0xb28fcc40) at ../../../gecko/xpcom/threads/nsTimerImpl.cpp:551
Dumps of the painted layer surface show that the layer has a transparent background. This seems uncorrelated with whether we see a black or white background. (The text overlapping, however, only ever occurs on a black background, and almost every time (but not every time) we see the black background, we see the overlapping).

The background being transparent gives rise to a conceivable reason the overlapping occurs: even though the old location of "Browse for your location" is invalidated, there is no background to draw there that would erase that text and make the subsequent "Use Current Location" not overlap it. BenWa said that layers code should handle situations like this by clearing the invalidated regions of a transparent layer here [1]. It's possible that this is not happening for the paints where we see overlapping.

I will continue investigating in this direction. Also cc'ing Nick as he touched this part of the code recently (though only on master, while this bug is only on 1.3).

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/RotatedBuffer.cpp#701
CC'ing john daggett and jonathan kew since this is probably closer to their area of expertise - any insights would be appreciated. Also Botond will be at a conference next week so we might need somebody else to pick this up if it hasn't been sorted out by the end of the week.
Rob, any thoughts on the subject?  Trying to find the right person to look at this while Botond is away next week.
Flags: needinfo?(roc)
Whiteboard: dogfood1.3, [ETA: 2/14] → dogfood1.3, [ETA: 2/21]
(In reply to Botond Ballo [:botond] from comment #27)
> The background being transparent gives rise to a conceivable reason the
> overlapping occurs: even though the old location of "Browse for your
> location" is invalidated, there is no background to draw there that would
> erase that text and make the subsequent "Use Current Location" not overlap
> it. BenWa said that layers code should handle situations like this by
> clearing the invalidated regions of a transparent layer here [1]. It's
> possible that this is not happening for the paints where we see overlapping.
> 
> [1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/RotatedBuffer.
> cpp#701

It looks like this is indeed what's happening. On good paints (ones that produce no overlapped text), the content type in RotatedContentBuffer::BeginPaint() is GFX_CONTENT_COLOR_ALPHA, and invalidated rects are cleared as they should be, but on paints that produce overlapped text, the content type is not COLOR_ALPHA and invalidated rects are not cleared.
Some further observations:

  - The content type being passed into RotatedContentBuffer::BeginPaint()
    is always GFX_CONTENT_COLOR, because CanUseOpaqueSurface() returns
    true for the layer.

    Note that the layer structure of the app is very simple:
    
    <parent process>
      RefLayer
        ContainerLayer
          ThebesLayer

    The ThebesLayer's content type is not CONTENT_OPAQUE, but CanUseOpaqueSurface()
    nevertheless returns true for it because the parent ContainerLayer's content
    type is CONTENT_OPAQUE, and the ThebesLayer is the bottom child of it.

    The ContainerLayer's content type becomes CONTENT_OPAQUE here [1], it isn't
    CONTENT_OPAQUE before then.
 
  - On good paints, the content type is adjusted to GFX_CONTENT_COLOR_ALPHA here [2],
    and the invalidated rects are therefore cleared.

  - On bad paints, this adjustment fails to be made because the 
    "neededRegion.GetNumRects() > 1" condition fails. neededRegion.GetNumRects()
    returns 1 on bad paints and 3 on good paints.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#1221
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/RotatedBuffer.cpp#468
(In reply to Botond Ballo [:botond] from comment #31)
>     The ContainerLayer's content type becomes CONTENT_OPAQUE here [1], it isn't
>     CONTENT_OPAQUE before then.
> 
> [1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#1221

Note that removing this line makes the problem go away. I guess this is not the correct fix though.
(In reply to Botond Ballo [:botond] from comment #31)
>     The ContainerLayer's content type becomes CONTENT_OPAQUE here [1], it isn't
>     CONTENT_OPAQUE before then.
> 
> [1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#1221

After speaking to Jeff, I believe the problem is that the layer isn't opaque to begin with. The app has a solid white background, and it has only one layer of content, so that layer should be opaque. More generally, a root layer should be opaque.

The determination of whether a layer is opaque is made by layout. Moving to Layout component on that basis.
Component: Graphics: Layers → Layout
I'm swamped. Try tnikkel or mats.
Flags: needinfo?(roc) → needinfo?(tnikkel)
I'm working with Botond on irc on this.
Flags: needinfo?(tnikkel)
Finally getting somewhere now:

 - The app really has a transparent background. The white background 
   that we see probably comes from the parent process when we 
   composite the layer.

 - We override the transparent background to be white here [1], but
   only if we're a cross-process root content document, and for some
   reason this app's root document isn't one.

 - We tell the layers system the layer is opaque regardless of whether
   or not it actually is here [2]. If we failed to make the background
   white here [1] there is a mismatch between what we tell the layers
   system the layer is like (opaque) and what it's really like 
   (transparent), and this causes issues like text overlapping.

I discussed this on irc with :roc, :tn, and :romaxa, leading to some
suggestions on how this could be fixed:

 - (roc) Remove the code at [1] and [2] and thus treat child process
   documents the same as other documents. This may lead to a loss of
   performance in some situations (the reason why [1] was written),
   but perhaps that can be fixed in other ways.

 - (roc) Add a UA style rule that gives apps a white background on
   the viewport and let some apps override it with background:transparent.
   We can control which apps can override it by having the system app
   stick a white background on the app's <iframe>.

 - (romaxa) Have the code at [1] run for the browser app only, not
   other apps.

One thing to keep in mind is that the symptoms of this bug can only
be seen with 1.3, not with master. Figuring out what changed might
inform what the most appropriate way to fix this is. Unfortunately
the regression range in comment #20 isn't working for me (can't
repro the bug with the "First Broken" build there), but now that I
know how the bug comes about in 1.3, I can take a look at why the
same thing doesn't happen on master.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#5172
[2] http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#1221
(In reply to Botond Ballo [:botond] from comment #36)
>  - (roc) Remove the code at [1] and [2] and thus treat child process
>    documents the same as other documents. This may lead to a loss of
>    performance in some situations (the reason why [1] was written),
>    but perhaps that can be fixed in other ways.
> 
>  - (roc) Add a UA style rule that gives apps a white background on
>    the viewport and let some apps override it with background:transparent.
>    We can control which apps can override it by having the system app
>    stick a white background on the app's <iframe>.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.
> cpp#5172
> [2]
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.
> cpp#1221
>
>  - (romaxa) Have the code at [1] run for the browser app only, not
>    other apps.
> 
> One thing to keep in mind is that the symptoms of this bug can only
> be seen with 1.3, not with master. Figuring out what changed might
> inform what the most appropriate way to fix this is. Unfortunately
> the regression range in comment #20 isn't working for me (can't
> repro the bug with the "First Broken" build there), but now that I
> know how the bug comes about in 1.3, I can take a look at why the
> same thing doesn't happen on master.

We definitely need to delete the [2] code.

See bug 917416 for a related issue and discussions.

For best performance of app content drawing we do need to make sure apps that don't need to be transparent have an opaque display item making their root layer be marked OPAQUE, and this requires some kind of signal from the container of the app's <iframe>. Here's my idea: if an <iframe> has an opaque CSS background color, ensure the display list of the subdocument (subprocess or not) has a background color item covering its viewport and displayport that is the same color. I guess there would be the limitation that changes to the background color would not be immediate for a cross-process IFRAME because the subprocess would have to update its layer tree. Otherwise this is just an optimization and should not affect any behavior. I assume this is still needed on master; even though the [2] hack marks the ContaineLayer as OPAQUE, there must be some ThebesLayers that should also be marked opaque that aren't.

This could be implemented in PresShell::ComputeBackstopColor in the child process. Of course we'd need to transmit the color via IPDL, probably through FrameLoader.

For 1.3 I'm not sure what to do other than just delete the [2] code and check that all the apps we really care about define a solid-color background somewhere.
(In reply to Botond Ballo [:botond] from comment #36)
>  - (roc) Remove the code at [1] and [2] and thus treat child process
>    documents the same as other documents. This may lead to a loss of
>    performance in some situations (the reason why [1] was written),

This should have said "the reason why [2] was written".
(In reply to Botond Ballo [:botond] from comment #36)
>  - We tell the layers system the layer is opaque regardless of whether
>    or not it actually is here [2]. If we failed to make the background
>    white here [1] there is a mismatch between what we tell the layers
>    system the layer is like (opaque) and what it's really like 
>    (transparent), and this causes issues like text overlapping.
>
> [1]
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.
> cpp#5172
> [2]
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.
> cpp#1221

Just for completeness, the reason this bug only happens with APZ on
is that the code at [2] only runs if we have a display port, and we
only have a display port when APZ is on.
(In reply to Botond Ballo [:botond] from comment #36)
>  - We override the transparent background to be white here [1], but
>    only if we're a cross-process root content document, and for some
>    reason this app's root document isn't one.

I looked into why this app's root document is not a cross-process root content document. It turns out this is actually the case for all apps.

TabChild::IsRootContentDocument() returns true if the app represented by the TabChild does not have an owner app. 

Currently, all apps actually report _themselves_ as their owner app, due to a bug where TabContext::SetTabContextForAppFrame() sets the containing app ID incorrectly. Attached is a one-line patch that fixes this bug.

Even with this patch, however, TabChild::IsRootContentDocument() continues returning false for every app because they now report the System app as their parent. I'm not sure whether or not this is the intended behaviour.

Note that this patch does not solve the problem in this bug, it just fixes something small that is pretty clearly wrong.
Attachment #8372452 - Flags: review?(roc)
I debugged this on master to see why the overlapping was not happening, 
and I now understand why. I also have a better understanding of the 
relationship between the background going black issue (for which I had 
filed bug 965389) and the text overlapping issue.

  - On master, like on 1.3, the app is in fact transparent and we're
    pretending it's opaque.

  - On master, like on 1.3, the layers system sometimes adjusts the
    content type of the layer back from opaque to transparent, as
    described in comment #31. Whether or not this adjustment happens
    seems to hinge on whether the layer's visible region has one
    rectangle (adjustment doesn't happen) or more than one (adjustment 
    happens).

     - If the adjustment doesn't happen, the background we see is 
       black (incorrect) and there is the _potential_ for text 
       overlapping (also incorrect).

        - Text overlapping only _actually_ happens when there is a
          change to the DOM, such as the new text element being
          inserted.

     - If the adjustment does not happen, the background we see is
       white (correct) and there is no text overlapping (correct).

  - After performing step 4 of the STR in comment #0, when the page
    starts to reload, initially the layer's visible region has one
    rectangle, so there is no adjustment and we see a black
    background.

     - Shortly thereafter, two things happen:
      
         (1) The layer's visible region changes to consist of
             more than one rectangle. (I don't know why this
             happens, but it doesn't seem very relevant as
             explained below.)
  
         (2) The page inserts the new text element to the DOM.

       (1) causes the adjustment to happen and thus any black
       background or text overlapping to go away. (2) causes
       text overlapping (if (1) hasn't happened yet).

     - On 1.3, (2) happens before (1), so in addition to seeing
       the black background there is a period of time when we see
       the text overlapping.

     - On master, (1) happens before (2), so by the time the DOM
       element is added, the adjustment started to be made and
       we don't see the text overlapping.

This gives rise to us seeing the black background _and_ text
overlapping on 1.3, but only the black background on 1.3.

The implications of this are that:

  - The underlying cause of both the black background and the
    text overlapping is the same (us pretending the app is
    opaque when it's transparent). Bug 965389 can probably be
    closed as a duplicate of this, although we should check
    that the STRs of other bugs that have been duped on that
    are also fixed by the fix to this.

  - This issue is not "fixed" in any meaningful way on master.
    The apparent "fix" hinges on the timing between (1) and (2)
    which is probably nondeterministic, even if we typically
    see one order on 1.3 and the reverse on master. This explains
    why this bug is intermittent, and why it's difficult to get
    an accurate regression window.

  - Even if we found a regression window, it wouldn't be useful
    because the manner in which the problem is "fixed" on master
    (the adjustment due to the visible region consisting of 
    multiple rectangles) is incidental. The proper fix for this
    is to not pretend an app is opaque if it's really transparent.
Attached is a patch that removes the code that pretends a layer is opaque even if it's transparent, as :roc suggested. The patch keeps the IsCrossProcessRootContentDocument/IsTransparentContainerElement logic in PresShell::UpdateCanvasBackground() because I think that will mitigate some of the potential performance impact.

We should probably evaluate any remaining performance impact before landing.
Attachment #8372586 - Flags: review?(roc)
(In reply to Botond Ballo [:botond] from comment #42)
> We should probably evaluate any remaining performance impact before landing.

In addition to the AccuWeather app, I see the warning that is removed in the above patch in the Contacts and Homescreen apps, but not in Messages or Call Log. I'll test some more apps and report back about them.
Comment on attachment 8372586 [details] [diff] [review]
Do not pretend a layer is opaque if it really is transparent

Review of attachment 8372586 [details] [diff] [review]:
-----------------------------------------------------------------

Let's keep the warning for now. We need to use it as we examine and fix apps to make sure that they don't trigger it.

r+ with that.
Attachment #8372586 - Flags: review?(roc) → review+
(In reply to Botond Ballo [:botond] from comment #43)
> (In reply to Botond Ballo [:botond] from comment #42)
> > We should probably evaluate any remaining performance impact before landing.
> 
> In addition to the AccuWeather app, I see the warning that is removed in the
> above patch in the Contacts and Homescreen apps, but not in Messages or Call
> Log. I'll test some more apps and report back about them.

Other apps where I see the warning: Camera, Email, Keyboard, Settings, Calendar, Video.
Other apps where I do not see the warning: Gallery, Clock.
Updated patch to keep the warning.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=58ba522656fb
Attachment #8372586 - Attachment is obsolete: true
Attachment #8372951 - Flags: review+
(In reply to Botond Ballo [:botond] from comment #45)
> Other apps where I see the warning: Camera, Email, Keyboard, Settings,
> Calendar, Video.

We need to fix these!
I also see the warning all the time when browsing in Fennec.
Vivien, please see comments 45-49.
Flags: needinfo?(21)
Unassigning while I'm at the C++ standards meeting in Seattle this week. The next steps in this bug are:
  - investigate the apps mentioned in comment #43 and comment #45
  - if appropriate, make fixes to these apps
  - land the patch I posted
If someone would like to work on this while I'm away please go ahead, otherwise I'll reassign to myself and continue working on this next week.
Assignee: botond → nobody
Add a background to the dialer app. Can't see any warnings anymore on it.
Flags: needinfo?(21)
Add a background on the contacts app. No more warnings on the main panel. but there is still some on other panels. Need to investigate, even if that's probably a bit less important since the main scrollable is covered.
Add a background to Settings. No more warnings when launching the app but there is still some when transitioning to a panel that has been freshly created.
Add a background to video. Can't see any others warnings with it.
There is also tons of them in the homescreen. Will try to see tomorrow by applying the patch from bug 960325 locally.
(In reply to Botond Ballo [:botond] [c++ standards meeting Feb 10-15] from comment #45)
> (In reply to Botond Ballo [:botond] from comment #43)
> > (In reply to Botond Ballo [:botond] from comment #42)
> > > We should probably evaluate any remaining performance impact before landing.
> > 
> > In addition to the AccuWeather app, I see the warning that is removed in the
> > above patch in the Contacts and Homescreen apps, but not in Messages or Call
> > Log. I'll test some more apps and report back about them.
> 
> Other apps where I see the warning: Camera, Email, Keyboard, Settings,
> Calendar, Video.
> Other apps where I do not see the warning: Gallery, Clock.

It may be because I'm testing mostly on m-c/gaia-master but I do not see the warnings on Camera, Calendar and the Keyboard.

But I'm seeing the warnings in the Call Log.
Assignee: nobody → botond
I tried Vivien's patches, and they work as described, suggesting that the strategy of adding "background-color" styles to the affected apps is the right one.

Here are the places where I still see the warning with Vivien's patches applied:
  - Homescreen
  - Call Log
  - Contacts (on Contact Details and Edit Contact pages)
  - Calendar (in week view)
  - Settings (only briefly while transitioning from one view to another, doesn't seem like a big problem)

I guess we should add "background-color" styles in these places as well.
(In reply to Botond Ballo [:botond] from comment #58)
> Here are the places where I still see the warning with Vivien's patches
> applied:
>   ...
>   - Contacts (on Contact Details and Edit Contact pages)

Strangely, I don't see this any more. Possibly I had been running a stale version of the app from the /data/webapps folder, or something like that.
(In reply to Botond Ballo [:botond] from comment #59)
> (In reply to Botond Ballo [:botond] from comment #58)
> > Here are the places where I still see the warning with Vivien's patches
> > applied:
> >   ...
> >   - Contacts (on Contact Details and Edit Contact pages)
> 
> Strangely, I don't see this any more. Possibly I had been running a stale
> version of the app from the /data/webapps folder, or something like that.

In fact, I don't see the warning anywhere any more except Settings, Camera (I omitted that above), and Homescreen.

For Settings and Camera, I only see it for a few frames when transitioning between screens. I think that's fine for now.

For Homescreen, it looks like it's being handled in bug 960325. I tried applying the patch for that bug locally to verify that it fixes the warning, but the patch does not apply cleanly to trunk. We should test after it lands to make sure.

I think we can just go ahead and land Vivien's Gaia patches and my Gecko patch. I'll send a PR for the Gaia patches, and then land the Gecko patch once those are reviewed and have landed.
Attachment #8377953 - Flags: review?(alive) → review+
Marking as checkin-needed for the gaia patches.

The gecko patches can be landed at the same time, or I can land them myself.
Keywords: checkin-needed
Attachment #8377953 - Flags: checkin?
After talking to RyanVM, decided to land Gecko patches now, and mark Gaia patch as checkin? so he knows to land it when the Gaia tree reopens.

https://hg.mozilla.org/integration/mozilla-inbound/rev/613665a774f2
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ddabb9fd909
Keywords: leave-open
Comment on attachment 8377953 [details] [review]
Pull request for adding backgrounds to Gaia apps

Master: 1b9cffdad52f4c8a4081635c93c65ace9dacaa94
Attachment #8377953 - Flags: checkin? → checkin+
Please nominate these patches for v1.3 uplift when you're ready.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(botond)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment on attachment 8372951 [details] [diff] [review]
Do not pretend a layer is opaque if it really is transparent

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): APZ
User impact if declined: user sees overlapped text in some apps
Testing completed: yes
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8372951 - Flags: approval-mozilla-b2g18?
Flags: needinfo?(botond)
Comment on attachment 8377953 [details] [review]
Pull request for adding backgrounds to Gaia apps

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): the gecko patch for this bug
User impact if declined: potentially slower paints in affected apps
Testing completed: yes
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8377953 - Flags: approval-mozilla-b2g18?
Comment on attachment 8372951 [details] [diff] [review]
Do not pretend a layer is opaque if it really is transparent

Sorry, meant to set 'b2g28', not 'b2g18'.
Attachment #8372951 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g28?
Comment on attachment 8377953 [details] [review]
Pull request for adding backgrounds to Gaia apps

Likewise.
Attachment #8377953 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g28?
Attachment #8377953 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Attachment #8372951 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: