Closed Bug 1098654 Opened 5 years ago Closed 5 years ago

[Messages]Multiple graphic corruptions occur when overscrolling after a large message

Categories

(Core :: Panning and Zooming, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- wontfix
b2g-v2.2 --- fixed

People

(Reporter: cinnes, Assigned: kats)

References

Details

(Keywords: regression, Whiteboard: [2.1-exploratory-3])

Attachments

(8 files, 7 obsolete files)

155.82 KB, image/png
Details
611.27 KB, application/zip
Details
63.97 KB, image/png
Details
106.09 KB, image/png
Details
89.47 KB, image/png
Details
5.32 KB, patch
roc
: review+
Details | Diff | Splinter Review
8.77 KB, patch
tnikkel
: review+
Details | Diff | Splinter Review
33.14 KB, patch
Details | Diff | Splinter Review
Attached image Overscrolling.png
Description:
When the user overscrolls, after sending or receiving a large message, graphical corruptions can be seen throughout the message page. Text also gets partially cut off when user overscrolls.
   
Repro Steps:
1) Update a Flame device to BuildID: 20141113001200
2) Open message app
3) Send/receive message to/from other test device with one word in the message 
4) Send/receive a message back
5) Send/receive a message with 160 characters or more
6) Overscroll up or down the message page
  
Actual:
Graphical corruption occurs around last message and poor overlay happens with first message.
  
Expected: 
No graphical corruptions or poor overlays should occur
  
Environmental Variables:
Device: Flame 2.1 kk (319mb)(Shallow Flash)
BuildID: 20141113001200
Gaia: 569a299ca446f714cd98d5881cc058fd6f6e257b
Gecko: d188e92aa5a6
Version: 34.0 (2.1)
Firmware: V188-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Notes:
Overscrolling varies between 2.0,2.1, and 2.2
-2.0 has the pan and zoom overscrolling 
-2.1 has the stretch and scrunch overscrolling
-2.2 has the elastic overscrolling
  
Repro frequency: 100%
See attached: Screenshot
Flags: needinfo?(dharris)
Issue does NOT occur on Flame 2.2 Master (319mb)(Kitkat Base)(Shallow Flash) and Flame 2.0 (319mb)(Kitkat Base)(Shallow Flash)

Flame 2.2

Device: Flame 2.2 Master (319mb)(Kitkat Base)(Shallow Flash)
BuildID: 20141113040205
Gaia: be8b0151d2f9a4c41fc63952128e0b723cd1161d
Gecko: ab137ddd3746
Version: 36.0a1 (2.2 Master)
Firmware: V188-1
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0

Flame 2.0

Device: Flame 2.0 (319mb)(Kitkat Base)(Shallow Flash)
BuildID: 20141113000201
Gaia: ab83632c92f9fc571b11d8468b6901cc4ed905c0
Gecko: e21bf45e6c44
Version: 32.0 (2.0)
Firmware: V188-1
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Actual results:
Over scrolling does not cause any graphic corruptions within the message app
QA Whiteboard: [QAnalyst-Triage?]
Hey Botond, is it something you should fix?
Flags: needinfo?(botond)
blocking-b2g: --- → 2.2?
Ah I see it happens with the overscroll in v2.1 too.
blocking-b2g: 2.2? → 2.1?
Component: Gaia::SMS → Panning and Zooming
Product: Firefox OS → Core
It looks like graphics bug so I modified the component. 
Feel free to correct if I'm wrong.
Flags: needinfo?(pchang)
Since it doesn't occur on 2.2, can we get a fix window for 2.2? Then we can just uplift the missing change.
Flags: needinfo?(botond)
Keywords: qawanted
Swapping qawanted tag for regression-window tag (slightly more accurate though still confusing, maybe we should ask to create a fixed-window tag)
QA Whiteboard: [QAnalyst-Triage?]
QA Contact: ckreinbring
We are seeing this as affected in 2.2 so we will look for a regular regression window instead.
Probably not worth looking for a regular regression window. It'll probably lead you to the patch that turned on the overscroll behaviour. We can investigate to find out what's going on here.
clear the ni since it may be related to overscroll.
Flags: needinfo?(pchang)
Flags: needinfo?(dharris)
blocking-b2g: 2.1? → 2.1+
Are there STR for this that can be used without having a working SIM in the device?
Yes, here you go. :)

Repro Steps:
1) Open message app
2) Attempt to send two messages with one word each
3) Send one more message with 150+ characters
4) Observe overscroll graphical corruptions
I tried this, but didn't see any graphical corruption. (I do see the issue Kats mentions in https://bugzilla.mozilla.org/show_bug.cgi?id=1096513#c10, but that's unrelated and specific to 2.2.)
will see if I can reproduce
Flags: needinfo?(felash)
ok, I don't reproduce on 2.1.

Corinne, maybe the best could be that you send us your SMS db? Just zip what's "/data/local/storage/persistent/chrome/idb/*ssm*" and send it either by mail or attach to this bug.

Kats, Botond, what I see however is that the element that has "position: sticky" is being transformed by the overscroll. I don't know if that's expected because after all it's not scrolling (unless we're at the edge maybe?).
Flags: needinfo?(felash) → needinfo?(cinnes)
(In reply to Julien Wajsberg [:julienw] from comment #14)
> Kats, Botond, what I see however is that the element that has "position:
> sticky" is being transformed by the overscroll. I don't know if that's
> expected because after all it's not scrolling (unless we're at the edge
> maybe?).

This is being discussed as part of bug 1096513 (see comment 10 on that bug).
Attached file ssm.zip
Flags: needinfo?(cinnes)
NI me to try to reproduce
Flags: needinfo?(felash)
So I definitely reproduce with the attachment, but I still can't reproduce with my own data.

But I think I understand why we reproduce with the attachment but not with other more current data.

In the data we have in the attachment, the thread has just enough data to be scrollable but still the whole thread fits in the screen as you can see in attachment 8530010 [details].

I don't see the issue from the first panel (but I tested in 2.2 only).

This makes this issue quite rare for the SMS app, but maybe the underlying issue can manifest in other ways in other applications or website.
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #19)
> But I think I understand why we reproduce with the attachment but not with
> other more current data.
> 
> In the data we have in the attachment, the thread has just enough data to be
> scrollable but still the whole thread fits in the screen as you can see in
> attachment 8530010 [details].

Interesting. That would suggest you're running into bug 1100680.

However, bug 1100680 only affects 2.2.

This bug is marked as a 2.1 blocker, but there are conflicting accounts as to whether it reproduces on 2.1, 2.2, or both. Could it be that we're conflating two different issues?
I'm also able to reproduce the second and third panels from the screenshot in comment 0 on 2.2. I looked into what's happening and here's what I found:
- The "Mobile, 6509336963" text is set to z-index:6 in the CSS code, and sometimes gets its own layer that sits on top of everything. Sometimes, however, layout can flatten it into the layer underneath and so when we overscroll the scrollable thing (which is slightly transparent), it goes over top of that header. This is unexpected behaviour and results in the visual error. I'll attach some layerscope screenshots that illustrate this.
Attached image layerscope-1.png
Here you can see that the mobile text header (the layer texture is highlighted in red) is the topmost painted layer in the layer tree of the child process (highlighted in yellow).
Attached image layerscope-2.png
And here you can see that the mobile header got flattened into the base layer and the "MONDAY" text with transparent background (highlighted in red/yellow) is the topmost layer in the layer tree. This makes it appear on top of the mobile text. The scrollable layer is supposed to be sandwiched between the base layer and the mobile text always, but I guess layout can't always properly anticipate what the compositor is going to do.
(I'm looking into this)
Assignee: nobody → bugmail.mozilla
Further digging around in the code shows that [1] is supposed to prevent this from happening. However for some reason the PaintedLayerData which *is* subject to async transforms ends up returning false from IsSubjectToAsyncTransforms. Investigating more...

[1] http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp?rev=923c99af6863#2517
Actually it looks like the PaintedLayerData for the scrollable content isn't even in the painted layer data stack at the time that the PLD is picked. so maybe I'm way off here. I'll try posting a reduced test case and bouncing to layout.
Attached patch Hacky WIP (obsolete) — Splinter Review
Here's something that fixes the problem, but only if event-regions are disabled. If they are enabled then there's an event regions display item which triggers the code at http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp?rev=923c99af6863#2536 and that screws things up. I don't understand the purpose of that code so I'll have to talk to somebody about that before I can proceed here.
Attached patch WIP (obsolete) — Splinter Review
Here's an updated version of the WIP that also works with event regions enabled. Instead of needing the patch in bug 1107404 it ensures that when a layer that is "subject to async transforms" is popped off, it marks the layer below it as "AllDrawingAbove" so that future display items don't accidentally fall through an async-transformable layer. Try push with this patch at https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=d8f1c56b29a2 to see how it fares.
Attachment #8531098 - Attachment is obsolete: true
Comment on attachment 8533990 [details] [diff] [review]
WIP

Seems to fix the reftests and the issue I saw on-device. Any objections to this? I can clean it up with more comments and a commit message but let me know if the code needs fixing too.
Attachment #8533990 - Flags: feedback?(matt.woodrow)
Attached patch Patch (obsolete) — Splinter Review
Cleaned it up and ready for proper review
Attachment #8533990 - Attachment is obsolete: true
Attachment #8533990 - Flags: feedback?(matt.woodrow)
Attachment #8534327 - Flags: review?(tnikkel)
Attachment #8534327 - Flags: review?(matt.woodrow)
Hm, looks like the patch reliably causes layout/reftests/image-element/element-paint-native-widget.html to have a "max difference: 1, number of differing pixels: 1" failure on B2G. It's related to the SetAllDrawingAbove() call somehow; when I remove that the test passes. Initial investigation shows no difference in layerization so I'm not sure how this is happening; maybe just something I need to fuzz. Will keep investigating but leaving review request on for now.
Ah it does cause a difference in layerization, but on the -ref page. The difference comes from a layer which has an actively scrolled animated geometry root, but contains only a single event regions. The draw region for this layer is empty but my patch still ends up marking the layer below it as SetAllDrawingAbove which prevents things from falling into it. Arguably my patch is wrong in this case because it results in suboptimal layerization. Updated patch coming.
Attached patch Patch v2 (obsolete) — Splinter Review
Attachment #8534327 - Attachment is obsolete: true
Attachment #8534327 - Flags: review?(tnikkel)
Attachment #8534327 - Flags: review?(matt.woodrow)
Attachment #8534364 - Flags: review?(tnikkel)
Attachment #8534364 - Flags: review?(matt.woodrow)
Comment on attachment 8534364 [details] [diff] [review]
Patch v2

The scroll frame being active doesn't tell us that it will be subject to async transformations. Isn't a displayport what we want to check here?
Agree with what tn said here.

Do we actually need the change to IsSubjectToAsyncTransforms here? If we do, then maybe we should change it to properly catch all possible cases where the compositor can make changes to layer tree (async animations, async scrolling etc) rather than just checking for position:fixed.

CopyAboveRegion already has a fall-through to SetAllDrawingAbove, probably better to add a condition to that rather than a new code path.
(In reply to Timothy Nikkel (:tn) from comment #34)
> The scroll frame being active doesn't tell us that it will be subject to
> async transformations. Isn't a displayport what we want to check here?

Yeah, that makes sense. I'll make that change.

(In reply to Matt Woodrow (:mattwoodrow) from comment #35)
> Do we actually need the change to IsSubjectToAsyncTransforms here? If we do,
> then maybe we should change it to properly catch all possible cases where
> the compositor can make changes to layer tree (async animations, async
> scrolling etc) rather than just checking for position:fixed.

Is there an easy way to check for all of these cases? I'll double-check if we need this (my patch changed significantly since the first version, and it might be that I have stuff there that doesn't need to be) but in case we do what would be a good way to do that?

> CopyAboveRegion already has a fall-through to SetAllDrawingAbove, probably
> better to add a condition to that rather than a new code path.

Sure, I can do that.
Attached patch Patch (v3) (obsolete) — Splinter Review
This one takes into account the previous comments, except for the bit about including OMTA layers in the IsSubjectToAsyncTransforms. I'm not sure what the correct way to do that is - maybe we can defer that to a follow-up? Or if it's trivial to do I can update this patch.
Attachment #8534364 - Attachment is obsolete: true
Attachment #8534364 - Flags: review?(tnikkel)
Attachment #8534364 - Flags: review?(matt.woodrow)
Attachment #8535046 - Flags: review?(tnikkel)
Attachment #8535046 - Flags: review?(matt.woodrow)
Comment on attachment 8535046 [details] [diff] [review]
Patch (v3)

This will capture when the direct animated geometry root is async scrollable, but any ancestor animated geometry root could be async scrollable as well, meaning we would get frame metrics from the loop in ContainerState::SetupScrollingMetadata and then be subject to async transforms from azpc.
Attached patch Patch (v4) (obsolete) — Splinter Review
Good catch! How about this?
Attachment #8535046 - Attachment is obsolete: true
Attachment #8535046 - Flags: review?(tnikkel)
Attachment #8535046 - Flags: review?(matt.woodrow)
Attachment #8535183 - Flags: review?(tnikkel)
Attachment #8535183 - Flags: review?(matt.woodrow)
Comment on attachment 8535183 [details] [diff] [review]
Patch (v4)

My only concern is that walking the animated geometry root chain for each item could be expensive and we might want to limit doing that. But I'm not sure if that is really a problem.
roc (or matt), do you have any performance concerns with the above patch? tn and I discussed it a bit and I could probably add some sort of cache on the animated geometry root to track that if it'll help perf.
Flags: needinfo?(roc)
We need a cache of animated geometry roots. dvander's looked into this, talk to him :-)
Flags: needinfo?(roc)
Attached patch Cache animated geometry roots (obsolete) — Splinter Review
After talking with dvander I figured the most useful approach here would be to put a cache on the displaylist builder. I ran with this patch and am seeing cache hit rates in the neighbourhood of 46-58% for just the new cache. If I include the mCurrentFrame/mCurrentAnimatedGeometryRoot cache that number goes up to 63-73%.
Attachment #8537223 - Flags: review?(roc)
Attachment #8537223 - Flags: feedback?(dvander)
Perhaps a better way to put it is that with just the first patch (the fix for this bug) the cache miss rate is around 60-70%. With the second-level cache I added that drops down to 25-35%.
Comment on attachment 8537223 [details] [diff] [review]
Cache animated geometry roots

Seems reasonable, though I don't know the cost of misses that well.
Attachment #8537223 - Flags: feedback?(dvander) → feedback+
Comment on attachment 8537223 [details] [diff] [review]
Cache animated geometry roots

Review of attachment 8537223 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, we might as well go one step further and have ComputeAnimatedGeometryRootFor look up the cache in each iteration, so as soon as we find an ancestor whose animated geometry root we know, we're done.

Also, the animated geometry root doesn't really depend on aStopAtAncestor. That should be just an optimization, although nsLayoutUtils::GetAnimatedGeometryRootFor looks a bit fragile in that respect. So ignore this for now, but please fix the previous paragraph since that's trivial.
Attachment #8537223 - Flags: review+ → review-
I did a try push and I'm seeing a reftest failure with the latest version of the patches. R7 on B2G: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=678f7c2547a2

I'm investigating.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #48)
> I'm investigating.

As far as I can tell this failure can be fuzzed. A few things get shifted from one layer to another which is expected behaviour from this patch. The reftest is already fuzzed on all platforms by maxDiff=50,diffCount=31 and on B2G where I ran it locally it already has a maxDiff=5,diffCount=4 difference without my patch. With my patch it moves up to maxDiff=5,diffCount=145. If I enable event regions then the corresponding change is from maxDiff=5,diffCount=4 (without my patch) to maxDiff=5,diffCount=18 (with my patch) and so the test passes just fine.

Looking at the reftest analyzer visually all the things that the test is trying to test are correct, it's mostly just text antialiasing differences that's causing the high diffCount.
Attached patch Patch (v5)Splinter Review
Updated to increase fuzz on test.
Attachment #8535183 - Attachment is obsolete: true
Attachment #8535183 - Flags: review?(tnikkel)
Attachment #8540180 - Flags: review?(tnikkel)
Attachment #8540180 - Flags: review?(tnikkel) → review+
https://hg.mozilla.org/mozilla-central/rev/c1dc4e9356f8
https://hg.mozilla.org/mozilla-central/rev/f971784902fe
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): unsure, but it was likely exposed by enabling overscroll
User impact if declined: in some (rare) cases when the user overscrolls things are displayed in the wrong z-order. Note that even in the messages app you need a set of messages that is exactly the right length to have this bug reproduce.
Testing completed: mostly on m-c, but I built and verified that this bug is fixed on 2.1 with this rollup patch. I didn't check for any other regressions from the rollup on 2.1
Risk to taking this patch (and alternatives if risky): It's a fairly large patch in code that's known to be a bit finicky. That combined with the fact that I had to pull in changesets from other bugs to make the uplift work means it's even risker. Honestly I would rather just live with this bug on 2.1 because it's a visual glitch that happens rarely and doesn't affect users interacting with anything. I'm only flagging this for uplift because the bug is a 2.1 blocker.
String or UUID changes made by this patch: none
Attachment #8543303 - Flags: approval-mozilla-b2g34?
Discussed this offline with tony and we tend to agree with kats input here on living with this visual glitch in 2.1 considering the high risk and this maybe a "rare" case . Nevertheless, QA decided to test more apps around this to see if this behavior is observed else where which may warrant us to consider uplift. NI'ing tony so he can help with that request.

I am clearing the nom and approval request for now, we can re-visit depending on further QA input and testing.
blocking-b2g: 2.1+ → ---
Flags: needinfo?(tchung)
Attachment #8543303 - Flags: approval-mozilla-b2g34?
(In reply to bhavana bajaj [:bajaj] from comment #55)
> Discussed this offline with tony and we tend to agree with kats input here
> on living with this visual glitch in 2.1 considering the high risk and this
> maybe a "rare" case . Nevertheless, QA decided to test more apps around this
> to see if this behavior is observed else where which may warrant us to
> consider uplift. NI'ing tony so he can help with that request.
> 
> I am clearing the nom and approval request for now, we can re-visit
> depending on further QA input and testing.

william, as this is a 2.1 request, can you find a tester to run through the testing to see if there are other visual graphical glitch concerns with other apps right now?    Im particularly interested to see if other apps that overscroll can have these graphics issues.   we need to know how bad this problem is or if its isolated to the edge case that the reporter found.   this will determine how important this patch needs to be uplifted or not.   Thanks.
Flags: needinfo?(tchung) → needinfo?(whsu)
(In reply to Tony Chung [:tchung] from comment #56)
> (In reply to bhavana bajaj [:bajaj] from comment #55)

> william, as this is a 2.1 request, can you find a tester to run through the
> testing to see if there are other visual graphical glitch concerns with
> other apps right now?    Im particularly interested to see if other apps
> that overscroll can have these graphics issues.   we need to know how bad
> this problem is or if its isolated to the edge case that the reporter found.
> this will determine how important this patch needs to be uplifted or not.  
> Thanks.

No problem! I will do a RAT test and find some resources to reconfirm it, and then evaluate the impacted scopes.
Hi, everyone,

Sorry for the late reply.

I can reproduce this issue on Flame v2.1 but not easy (Rate: 3/10).
(Follow comment 0, comment 11, and comment 19)

I also tested the following built-in app and third party app.
But, there is no app that it can reproduce this bug (graphical glitch).
- Built-in: "Dialer", "Contacts", "E-mail", "Browser", "Settings", "Music", and "Video" app.
- Third party app: "Facebook", "Line app"

The only concern is third party social app since we don't know if designer uses the same layout to design social app.

--- -- - --- -- - --- -- - --- -- - --- -- - --- -- - --- -- - --- -- -
Hi, Mike,

This is the case I talked to you on Wednesday.
Could you please reconfirm this bug, and evaluate the patches to see if we can uplift it?
If need be, we can discuss this bug with NoJun.
Many thank.
Flags: needinfo?(whsu) → needinfo?(mlien)
Refer to comment 58, this issue might be edge case and hard to reproduce.
Considering high risky patch for v2.1, I think we can live with it.
If partner think it's a blocker for them, they can cherry pick this patch to resolve it.

Tony, Bhavana, how do you think?
Flags: needinfo?(tchung)
Flags: needinfo?(mlien)
Flags: needinfo?(bbajaj)
(In reply to Mike Lien[:mlien] from comment #59)
> Refer to comment 58, this issue might be edge case and hard to reproduce.
> Considering high risky patch for v2.1, I think we can live with it.
> If partner think it's a blocker for them, they can cherry pick this patch to
> resolve it.
> 
> Tony, Bhavana, how do you think?

Mike, William, thanks for the reply.  Given your investigation and also the risk in comment 54, i agree to minus this for 2.1.    you can renom if partner requests later.
Flags: needinfo?(tchung)
NI?whsu
Flags: needinfo?(whsu)
Flags: needinfo?(whsu)
Flags: needinfo?(bbajaj)
You need to log in before you can comment on or make changes to this bug.