Closed Bug 992218 Opened 10 years ago Closed 10 years ago

Regression: Content glitches/flashes during periods of heavy repainting

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox30 verified, firefox31 verified, fennec30+)

VERIFIED FIXED
Firefox 31
Tracking Status
firefox30 --- verified
firefox31 --- verified
fennec 30+ ---

People

(Reporter: capella, Assigned: kats)

References

()

Details

(Keywords: regression, reproducible)

Attachments

(2 files, 1 obsolete file)

Not sure if this is the right component for this report ...

Using last a current m-c repo / build (No patches in queue), I notice this oddness while moving text selection handles around the page while in reader mode.

(Starts around :05sec)

   https://www.dropbox.com/s/do5fxn065s2kuu1/readerModeFlicker.mp4
Without a doubt a regression from recent graphics refactoring work. CC'ing some folks.
tracking-fennec: --- → ?
Regression window wanted:
mozilla central:
good build: 08-03-2014 
bad build: 09-03-2014

pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d01bf8596d3b&tochange=21f293fc8d34
Teodora, can you use narrow this using mozilla-inbound please?

I suspect this is a regression bug 963073 (enable b2g tiling by default).
Flags: needinfo?(teodora.vermesan)
(In reply to Aaron Train [:aaronmt] from comment #3)
> I suspect this is a regression bug 963073 (enable b2g tiling by default).

Yes, the bug is Bug 963073.
inbound:
good build: 1394227648
bad build: 1394229086

pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c8355055899c&tochange=40b2ac413772
Flags: needinfo?(teodora.vermesan)
Assignee: nobody → chrislord.net
tracking-fennec: ? → 30+
Blocks: b2g-tiling
Flags: needinfo?(bas)
Keywords: reproducible
Summary: Odd Flicker viewing page in readerMode and moving TextSelection handles → Regression: Scrollable layers yields odd content flicking in Reader Move and or moving Text Selection handles
Summary: Regression: Scrollable layers yields odd content flicking in Reader Move and or moving Text Selection handles → Regression: Scrollable layers yields odd content flickering in Reader Mode and or moving text selection handles around in content
This is interesting... I expect it's another case of things going wrong in the progressive update code (can you reproduce if you change layers.progressive-paint to false?)

I'll be on PTO for 2 weeks, so I'm not going to get to this soon. needinfoing myself in case this isn't picked up before then, and kats in case this is related to the progressive update code.
Assignee: chrislord.net → nobody
Flags: needinfo?(chrislord.net)
Flags: needinfo?(bugmail.mozilla)
Added to my list of "possibly progressive update" bugs.
Flags: needinfo?(bugmail.mozilla)
I think what I see is this bug but am not 100% certain: https://www.youtube.com/watch?v=WRpKmnobUNg (720p it)
Yah, that UI "twitching" ... I do see it in other places, I just documented the first one I could repro.
I know little to nothing about this code so for now I'm going to un-need-info myself.
Flags: needinfo?(bas)
Ah, just realized comment #5 might have been a question for me ... in any case after a little more tinkering, I find I can only repro this on my GS/3 4.3, and have been unable to so far on my N7 4.4.2.

On the GS3 I can repro regardless of the layers.progressive-paint setting.
I was also able to repro this on the Nexus 4. I'm looking into it.
Assignee: nobody → bugmail.mozilla
also reproducible just by viewing an animation e.g, a *.gif, http://i.imgur.com/wEGGj3w.gif
Ah, better STR! Does occur on the N7 ...

(Spotted that GIF recently btw! Cooler than Honda's Awesome-o http://www.telegraph.co.uk/technology/technology-video/8877806/Honda-show-off-new-faster-smarter-Asimo-robot.html)
That gif makes it much easier to reproduce, thanks! (Also pretty scary..)

I'm not yet sure of the cause of the glitching but if tiling is enabled it usually seems to affect one or two tiles at a time, and if I disable tiling the entire content area seems affected. So it's not specific to tiling per se. I also looked at the layer tree dumps on both the client and compositor sides and there is no difference between the glitchy composites and the normal ones. I suspect that this is just inadequate locking somewhere and some buffer that is in the middle of getting updated is copied, resulting in partial garbage. I'm going to try to audit the codepaths to see if I can find this by inspection.
Summary: Regression: Scrollable layers yields odd content flickering in Reader Mode and or moving text selection handles around in content → Regression: Content glitches/flashes during periods of heavy repainting
CC'ing nical as well, since nical, Bas and Cwiiis are the ones who worked on the b2g-tiling code that caused this regression.
On a debug build where things are slower this is actually quite funny. Tiles are just being misplaced, making it look like a jigsaw puzzle:

https://www.youtube.com/watch?v=Us5sH6AY_4c

Bas, nical, any thoughts on where the problem might be?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
Flags: needinfo?(nical.bugzilla)
Attached patch WorkaroundSplinter Review
After much debugging I seem to have narrowed it down to the texture pool (unsurprising in hindsight) and shrinking the size of the texture pool to 0 fixes the problem. I had tried some things that I thought were equivalent to this before (such as never returning textures to the pool) but they didn't work, so maybe there are some subtler differences between this and what I was trying. Anyway, stashing this patch here for now.
Flags: needinfo?(bas)
Comment on attachment 8410463 [details] [diff] [review]
Workaround

Review of attachment 8410463 [details] [diff] [review]:
-----------------------------------------------------------------

Would you rather distable the texture client pool some other way, if it comes to that? Tomorrow I'll take another look to see if I can figure out exactly what codepaths are triggering the improper texture reuse.
Attachment #8410463 - Flags: feedback?(bgirard)
Comment on attachment 8410463 [details] [diff] [review]
Workaround

Review of attachment 8410463 [details] [diff] [review]:
-----------------------------------------------------------------

I would be very apprehensive to disable this cache on b2g because it means multiple synchronous round trip to the compositor to allocate every single tile. I'm afraid this will come out as major performance/checkerboarding regressions. If it comes to it we should instead double down on this issue and bring in nical/cwiiis who know the new tiling code the best.
Attachment #8410463 - Flags: feedback?(bgirard) → feedback-
Attached patch Patch (obsolete) — Splinter Review
I narrowed down the problem to recycling the backbuffer while it is still in use. Based on the code at [1] I think this is the correct fix but as we said before this texture pool is kinda funky so I'm not sure if I'm missing something.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/TiledContentClient.cpp?rev=68c8916ca208#510
Attachment #8411058 - Flags: review?(bgirard)
Comment on attachment 8411058 [details] [diff] [review]
Patch

Review of attachment 8411058 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not familiar with this code. -> nical
Attachment #8411058 - Flags: review?(bgirard) → review?(nical.bugzilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22)
> Created attachment 8411058 [details] [diff] [review]
> Patch
> 
> I narrowed down the problem to recycling the backbuffer while it is still in
> use. Based on the code at [1] I think this is the correct fix but as we said
> before this texture pool is kinda funky so I'm not sure if I'm missing
> something.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/
> TiledContentClient.cpp?rev=68c8916ca208#510

In theory, as long as a gralloc buffer is bound to a gl texture, it should remain locked. Except error on our side, if a gralloc buffer is the front buffer of something, it should be bound to a texture.
What should happen here is that the client side blocks on the gralloc lock when over-producing, which is the desired result. obviously something is wrong in what I said but that was the idea.

if we take this patch, recycling will be less efficient, but we might not have a choice in the end. I just want to understand what is breaking my assumptions before giving the r+
(In reply to Nicolas Silva [:nical] from comment #24)
> In theory, as long as a gralloc buffer is bound to a gl texture, it should
> remain locked. Except error on our side, if a gralloc buffer is the front
> buffer of something, it should be bound to a texture.
> What should happen here is that the client side blocks on the gralloc lock
> when over-producing, which is the desired result. obviously something is
> wrong in what I said but that was the idea.

On Fennec we don't use gralloc so the behaviour you're describing is probably not valid there.

> if we take this patch, recycling will be less efficient, but we might not
> have a choice in the end. I just want to understand what is breaking my
> assumptions before giving the r+

Honestly I'll be happy to try out any other suggestions you have - I don't really understand this code that well so I'm just grasping at straws.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #25)
> (In reply to Nicolas Silva [:nical] from comment #24)
> > In theory, as long as a gralloc buffer is bound to a gl texture, it should
> > remain locked. Except error on our side, if a gralloc buffer is the front
> > buffer of something, it should be bound to a texture.
> > What should happen here is that the client side blocks on the gralloc lock
> > when over-producing, which is the desired result. obviously something is
> > wrong in what I said but that was the idea.
> 
> On Fennec we don't use gralloc so the behaviour you're describing is
> probably not valid there.

Oops, my bad. I am still actively thinking about this. I'll either have a counter-proposal or a r+ by the end of tomorrow.
Comment on attachment 8411058 [details] [diff] [review]
Patch

Review of attachment 8411058 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the modification below:

::: gfx/layers/client/TiledContentClient.cpp
@@ +484,5 @@
>  TileClient::DiscardBackBuffer()
>  {
>    if (mBackBuffer) {
>      MOZ_ASSERT(mBackLock);
> +    if (mBackLock->GetReadCount() > 1) {

Please change this condition to 

(!mBackBuffer->ImplementsLocking() && mBackLock->GetReadCount() > 1)

This will keep the current behavior where the texture's blocking lock will prevent over-production (on b2g and windows D3D11), and fallback to discarding tiles more aggressively if the the TextureClient doesn't have a blocking lock (like on fennec).
Attachment #8411058 - Flags: review?(nical.bugzilla) → review+
Makes sense. I verified the updated patch still fixes the problem on Fennec and landed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/a0fb78183db0
Attached patch Patch v2Splinter Review
This is the updated patch. Requesting uplift to 30 as well (currently aurora, soon to be beta)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 963073
User impact if declined: while content is repainting (e.g. animated gifs, dragging around selection handles, and many other scenarios) often the repainted area will be of the wrong spot, because some other tile's content is painted instead of the right thing. This results in ugly visual glitching, pretty severe in some cases.
Testing completed (on m-c, etc.): locally so far
Risk to taking this patch (and alternatives if risky): this code is used on B2G, metro, and fennec, so it has the potential to negatively impact B2G or metro. However the additional condition nical had me add to the patch should restrict it to Fennec.
String or IDL/UUID changes made by this patch: none
Attachment #8411058 - Attachment is obsolete: true
Attachment #8412591 - Flags: review+
Attachment #8412591 - Flags: approval-mozilla-aurora?
> Testing completed (on m-c, etc.): locally so far

For driver, tested the patch locally last evening against the steps outlined in this bug and it's duplicates and it worked for me.
Blocks: 996062
No longer depends on: 996062
Prime example of naming actively confusing even us: this cset landed with this commit msg, " Don't flag texture clients for reuse [...]". A TextureClient is not "a client" but "a texture" (on the client side). When the idea that's floating around, to s/TextureClient/ClientTexture/.
(In reply to Benoit Jacob [:bjacob] from comment #31)
> Prime example of naming actively confusing even us: this cset landed with
> this commit msg, " Don't flag texture clients for reuse [...]". A
> TextureClient is not "a client" but "a texture" (on the client side). When
> the idea that's floating around, to s/TextureClient/ClientTexture/.

+1. While we're at it, let's please fix the inconsistency where in "ClientLayer", "Layer" is at the end but in "LayerComposite", "Layer" is at the beginning.
https://hg.mozilla.org/mozilla-central/rev/a0fb78183db0
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 31
Attachment #8412591 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: RESOLVED → VERIFIED
The patch for this looks sound to me, clearing needinfo.
Flags: needinfo?(chrislord.net)
Using the gif provided in the URL, there are no more glitches/flashes, so:
Verified fixed on:
Build: Firefox for Android 30 Beta 4
Device: Alcatel One Touch (Android 4.1.2)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: