Closed Bug 1063046 Opened 9 years ago Closed 9 years ago

[Flame] Graphics UI shows black artifact on 50% of the time

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla35
blocking-b2g 2.1+
Tracking Status
firefox33 --- wontfix
firefox34 --- fixed
firefox35 --- fixed
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: njpark, Assigned: BenWa)

References

Details

(Keywords: regression)

Attachments

(9 files, 4 obsolete files)

Attached image Dialer_Contact_UI.png
Version Info:
Gaia      a2393917a358a616950d0b02280e03c5c10795b9
Gecko (git commit) 54b65f9f37946f3eeffea00652fc04ff634eecaf
BuildID   20140904101243
Version   34.0a2
ro.build.date   Thu Aug 28 15:16:38 CST 2014
ro.bootloader   L1TC00011660

STR:
With about 50% rate of reproduction, UIs show black artifacts when opened.  When an HTML image email is opened and the artifact appears, after clicking 'show external image', the email cannot be scrolled.
Attached image 2014-09-04-14-31-24.png
Non-Jun, can you attach a logcat log?
Flags: needinfo?(npark)
Sorry, here's the logcat, Is "Error::Loading image failed with ret = -1" msg related to this bug?
Flags: needinfo?(npark)
[Blocking Requested - why for this release]:
Bad UI experience - Black artifact appears across different Apps
blocking-b2g: --- → 2.1?
(In reply to No-Jun Park [:njpark] from comment #3)
> Created attachment 8484384 [details]
> logcat when artifact appeared on Email App
> 
> Sorry, here's the logcat, Is "Error::Loading image failed with ret = -1" msg
> related to this bug?

It seems not related to this bug.

The problem seems to happen without error log. One possibility is that OpenGL Texture mapping is incorrect. Since JB, genlock is removed, by it even when gralloc ownership/mapping is incorrect, gralloc does not print error in logcat.
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> (In reply to No-Jun Park [:njpark] from comment #3)
> > Created attachment 8484384 [details]
> > logcat when artifact appeared on Email App
> > 
> > Sorry, here's the logcat, Is "Error::Loading image failed with ret = -1" msg
> > related to this bug?
> 
> It seems not related to this bug.
> 
> The problem seems to happen without error log. One possibility is that
> OpenGL Texture mapping is incorrect. Since JB, genlock is removed, by it
> even when gralloc ownership/mapping is incorrect, gralloc does not print
> error in logcat.

I'll see whether this happens in on JB on tip of 2.1 as well.
This seems to occur in JB as well with same frequency, so at least it's not KK specific.  Is there any developer option that I can turn on to isolate the cause?
Sorry, I did not read the bug carefully. This is a bug about black rectangle. Then, comment6 is not related to this bug. This bug seems dup of Bug 1050468.
This bug might be a bug of layout area.
It might be better to check regression window. Since firefox os v2.0, Bug 1022612 is very big change around layout. It might be related to this bug.
BenWa, do you have any ideas about this bug?
Flags: needinfo?(bgirard)
This also happens in Calendar app, when the view is set to weekly and swipe to move to next week.  The top part becomes black.
Can someone turn on tile boundary from the Settings app and see if the black squares are algined to tiles?

We used to have a bug like this with tiling. If it matches tile boundary maybe we have a similar bug again.
Flags: needinfo?(bgirard)
Based on the screenshot it seems like it does.

The problem we had was that we support a format per tile. In some case we would reuse a tile surface with a format that didn't match the tile or didn't clear the surface properly for the new format.
The first nightly build this happens is 20140903160206.  The previous nightly build, 20140903000204, does not have this issue.
Attached image 2014-09-04-17-09-07.png
In case of calendar app, it matches the boundary, but not so sure about the contacts app (attached)
Keywords: regression
Summary: [Flame][KK] Graphics UI shows black artifact on 50% of the time → [Flame] Graphics UI shows black artifact on 50% of the time
Adding qawanted to find the reg range. Comment 16 would be useful for bisection.
Possible dupe of 1060006, but definitely related - adding to 'see also'
See Also: → 1060006
QA Contact: ckreinbring
Surely a duplicate of bug 1060006, which is currently pointing to the usage of gaia-header, but apparently there are other scenarios as well.
blocking-b2g: 2.1? → 2.1+
Regression window
Last working
BuildID: 20140902111800
Gaia: 7e469783859785a8bd4bf02a802f32561c84be7b
Gecko: c24ec895b156
Platform Version: 35.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

First broken
BuildID: 20140902113441
Gaia: 7e469783859785a8bd4bf02a802f32561c84be7b
Gecko: 7753c52d5976
Platform Version: 35.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

Working Gaia / Broken Gecko = Repro
Gaia: 7e469783859785a8bd4bf02a802f32561c84be7b
Gecko: 7753c52d5976
Broken Gaia / Working Gecko = No repro
Gaia: 7e469783859785a8bd4bf02a802f32561c84be7b
Gecko: c24ec895b156
Gecko pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c24ec895b156&tochange=7753c52d5976


Mozilla inbound
Last working
BuildID: 20140902021513
Gaia: 44bf2e3bc5ddea9db9a8c851bd353cb234aa883c
Gecko: 39a8d9b2b639
Platform Version: 34.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

First broken
BuildID: 20140902025713
Gaia: 44bf2e3bc5ddea9db9a8c851bd353cb234aa883c
Gecko: ea4cfd84e417
Platform Version: 34.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Working Gaia / Broken Gecko = Repro
Gaia: 44bf2e3bc5ddea9db9a8c851bd353cb234aa883c
Gecko: ea4cfd84e417
Broken Gaia / Working Gecko = No repro
Gaia: 44bf2e3bc5ddea9db9a8c851bd353cb234aa883c
Gecko: 39a8d9b2b639
Gecko pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=39a8d9b2b639&tochange=ea4cfd84e417
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
broken by  Bug 967844? Can you take a look Robert?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell) → needinfo?(roc)
I think bug 1060006 is same issue as well?
Bug 1063171 is likely the same issue too.
See Also: → 1063171
I can reproduce this on my Flame in the import contacts dialog. I'll take a look.
Assignee: nobody → mstange
I've seen this quite a lot; most frequently either writing a long message in the messaging app (seems to trigger once the text entry gets long enough), or scrolling down in an e-mail (temporarily shows black at the top of the message as you scroll down).
Bug 1064181 is also likely the same issue.
This is a bug with progressive painting.

When the layer stops being opaque, its surface mode changes. In that case we need to throw away the old buffers of each tile. This is supposed to happen here:
http://hg.mozilla.org/mozilla-central/annotate/2255d7d187b2/gfx/layers/client/TiledContentClient.cpp#l1096

However, mLastPaintContentType / mLastPaintSurfaceMode (which ClientTiledLayerBuffer::HasFormatChanged() compares the layer's current required format to) are updated at the end of ClientTiledLayerBuffer::PaintThebes, which is called for every tile during the progressive update loop. Consequently, we only throw away the first tile we encounter during the update, because for subsequent tiles mLastPaintContentType and mLastPaintSurfaceMode have already been updated to the new values.
Attached patch track format per tile (obsolete) — Splinter Review
Something like this maybe? This patch at least fixes the bug, but it doesn't look very pretty to me. Is TileClient is a good place to store the information? Maybe we should just discard all tiles of a layer as soon as the layer's format changes?
Attachment #8485873 - Flags: review?(bgirard)
For testing I used these steps to reproduce:
1. Kill the Phone app using the app switcher if it's already running.
2. Launch the Phone app.
3. Switch from the dialer pane to the contacts pane.
4. Press the gear icon in the top right corner.
5. Wait for the settings pane to slide in from the bottom.
(In reply to Markus Stange [:mstange] from comment #30)
> Created attachment 8485873 [details] [diff] [review]
> track format per tile
> 
> Something like this maybe? This patch at least fixes the bug, but it doesn't
> look very pretty to me. Is TileClient is a good place to store the
> information?

IMO I think TextureClient (per front/back buffer) would be better. When we go to paint it we should check what format the paint will be in and perform the discard then if there's a format mismatch.

> Maybe we should just discard all tiles of a layer as soon as
> the layer's format changes?

Perhaps, my goal was to let tiling support having a non homogenous format. The goal of tiling is to amortize changes as we wish. So ideally the discard/allocation would be done over two frames. As well we can even support knowing which tiles need to be transparent and which do not.

In practice maybe this is just too complex for to little win and it's the second time we see a bug like this. So if we want to make the conscious decision to not amortize format changes then I'm happy to take a patch like the one here.
Interesting. Unfortunately I'm not familiar enough with tiling to be able to provide useful input on this question.

Can somebody else take over from here, now that we've established that it's not caused by invalidation?
Assignee: mstange → nobody
Flags: needinfo?(milan)
Unfortunately this testcase doesn't show the bug. I'm attaching it here anyway in case somebody knows what to change, so that we can turn it into a reftest.
Assignee: nobody → bgirard
Flags: needinfo?(milan)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
hey guys, just poking here.  whats the status and do we have a path to a fix?
Going to work on this bug this week.
Is it doable to backout bug 967844 until this bug is resolved, or is it too much work?
(In reply to Tony Chung [:tchung] from comment #38)
> hey guys, just poking here.  whats the status and do we have a path to a fix?

Right, too many duplicate bugs - what Benoit just said, and what I mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1060006#c23 last Wednesday.
No longer blocks: 1060006
It seems we can test to turn this off with this pref:
pref("layout.async-containerless-scrolling.enabled", false);
Milan, Roc: is that right?
Flags: needinfo?(roc)
Flags: needinfo?(milan)
Yes. But it would be better to fix this bug so we don't have to do that.
Flags: needinfo?(roc)
Flags: needinfo?(milan)
Attached patch patch (obsolete) — Splinter Review
Along the same lines but use the format from the TileClient.
Attachment #8485873 - Attachment is obsolete: true
Attachment #8485873 - Flags: review?(bgirard)
Attachment #8490217 - Flags: review?(mstange)
Comment on attachment 8490217 [details] [diff] [review]
patch

Review of attachment 8490217 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ClientTiledThebesLayer.cpp
@@ -336,5 @@
>      MOZ_ASSERT(mContentClient->GetForwarder());
>    }
>  
> -  if (mContentClient->mTiledBuffer.HasFormatChanged()) {
> -    mValidRegion = nsIntRegion();

Don't we still need something like this somewhere?
Attached patch patch v2 (obsolete) — Splinter Review
Actually you're right. Let's leave that in.
Attachment #8490217 - Attachment is obsolete: true
Attachment #8490217 - Flags: review?(mstange)
Attachment #8490267 - Flags: review?(mstange)
Comment on attachment 8490267 [details] [diff] [review]
patch v2

Review of attachment 8490267 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/TiledContentClient.h
@@ +238,4 @@
>  
>    void DiscardBackBuffer();
>  
> +  void UpdateFormatAndMaybeDiscard(gfxContentType aNewContentType,

Maybe call this DiscardIfFormatChanged or something like that instead. With the "update" part I was referring to the fact that the function was storing the new format on the TileClient, and you're no longer doing that.
Attachment #8490267 - Flags: review?(mstange) → review+
remote: You can view the progress of your build at the following URL:
remote:   https://tbpl.mozilla.org/?tree=Try&rev=900ecce6b088
remote: Alternatively, view them on Treeherder (experimental):
remote:   https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=900ecce6b088
Attached file try failure
Attached patch patch v3Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=0bdde994800e
Attachment #8490267 - Attachment is obsolete: true
Attachment #8490912 - Flags: review+
Comment on attachment 8490912 [details] [diff] [review]
patch v3

Review of attachment 8490912 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/TiledContentClient.cpp
@@ +1124,5 @@
>    // the calling code will already have taken care of invalidating the entire
>    // layer.
> +  SurfaceMode mode;
> +  gfxContentType content = GetContentType(&mode);
> +  aTile.DiscardIfFormatChanged(content, mode);

We're in good shape when all we're arguing is what to name methods :)  We already have 'HasFormatChanged' which compares the "new" format from what was last painted, and that seems to be a useful one to keep.  'DiscardIfFormatChanged' sounds suspiciously like it would do something based on the results of 'HasFormatChanged', but that's not really the case here.  It's really more of a 'IsBufferFormatCompatible' + the discard (ignoring the fact that these are on two different classes)

It almost file like we should have the 'IsBufferFormatCompatible', that doesn't do any discarding, and do what we did before - check and call aTile.Discard...x2 if it isn't compatible.

Feel free to fully ignore this; as I said, when we start talking about method names, we're in good shape.
My original patch removed HasFormatChanged because it's really misleading but a mstange points out it would change more of the behavior. I decided to restore it not to change the discard behavior which is unrelated to this bug. Particularly since we're looking for a branch patch.

I'm happy to update the name again.
(In reply to Benoit Girard (:BenWa) from comment #52)
> Created attachment 8490912 [details] [diff] [review]
> patch v3
> 
> https://tbpl.mozilla.org/?tree=Try&rev=0bdde994800e

I ran this patch on my KK flame build, and I can still see the artifact by doing the following:
1. Open calendar app, change to Week view.
2. Instead of swiping to left/right to change weeks (it used to show artifacts before the patch, but now it doesn't anymore), scroll down to see the timeslots.  Black artifacts start to show on top.
I'm looking a bit deeper into this. If we decide to discard a buffer it wont force it to be repainted. So the code as to carefully match 1:1 with the format change.

I think the right thing to do here is to stop supporting having different format for different tiles. We don't benefit from it right now. We would need to modify the display list/frame layer builder code and supporting this optimization wouldn't be highly beneficial.

Most of the work would be in tweaking display list/frame layer builder. If we ever did want to support this could easy restore the support in TileClient.

I'm just going to stop supporting this and simplify the code for now. I just need to make sure it doesn't have any adverse effects with progressive drawing.
Waiting on try to open
Attachment #8491666 - Attachment is obsolete: true
Attachment #8492349 - Flags: review?(bas)
Attachment #8492349 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/77995516ba17
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
See Also: → 1071574
AFAICT this never landed on Aurora. How is this verified on v2.1 already?
Flags: needinfo?(tchung)
Still reproducible on v2.1.
doh! i flipped the wrong bug.  thanks Ryan.
Flags: needinfo?(tchung)
Please request Aurora approval on this when you get a chance :)
Flags: needinfo?(bgirard)
Comment on attachment 8492349 [details] [diff] [review]
patch v5: Simplify format change for tiled layers

Approval Request Comment
[Feature/regressing bug #]: This would of regressed when turning on tiling for scrollable content on b2g.
[User impact if declined]: Risk of showing black square while drawing some scrollable content.
[Describe test coverage new/current, TBPL]: Changes to the drawing code. This has good test coverage on OS X + Fennec + B2G
[Risks and why]: Low. This could lead to incorrect/incomplete rendering if we didn't account for some cases.
[String/UUID change made/needed]: None

Best to just keep aurora and central in sync. This has good test coverage so we're not taking a big risk in uplifting.
Attachment #8492349 - Flags: approval-mozilla-aurora?
Flags: needinfo?(bgirard)
Attachment #8492349 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Issue is verified fixed in Flame 2.2, 2.1 (Full Flash, nightly). 

Actual Results: Graphical errors do not appear in applications. 

Device: Flame Master
Build ID: 20141016040204
Gaia: 841d0d7d1b879f0ff4b5a8727f5dd23c7b0000a9
Gecko: a280a03c9f3c
Version: 36.0a1 (Master)
Firmware Version: v180
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0

Device: Flame 2.1
Build ID: 20141016001201
Gaia: 477a9e61c3edf12f32a62a19d329cd277202cc6b
Gecko: 67573e422a0f
Version: 34.0 (2.1)
Firmware Version: v180
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage+][lead-review+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.