Closed Bug 986699 Opened 6 years ago Closed 6 years ago

EMAIL app scrolling shows white screen occasionally

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+)

VERIFIED DUPLICATE of bug 796474
2.0 S2 (23may)
blocking-b2g 2.0+

People

(Reporter: tkundu, Assigned: doliver)

References

Details

(Keywords: perf, Whiteboard: [c=handeye p= s=2014.05.23.t u=2.0])

Attachments

(4 files)

Reference Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=942750#c0

STR:
1) Flash v1.4 FFOS build on device
2) make sure that APZ is turned on for All gaia app in your device
3) Try to scroll EMAIL app as fast as possible on 800x480 display device.
4) You will see white screens occasionally.

Max size 10MB is tool small for video attachment. I can still compress and upload a video if needed.

Gaia:https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v1.4&id=ee89ad8ce3dbaa27b372affb7121a429ffe18f7a
Gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v1.4&id=176fc2ed072055ac33a174d4e92169705874a4b9
blocking-b2g: --- → 1.4?
QRD 8x26/8x28 device? I want to make sure graphics confirms that this particular issue is what we've seen ourselves on other devices.
Related, I am looking at changes to the way email handles the display of messages in a long list, tracked in bug 796474, but even with the approach in that bug of moving around divs that are absolutely positioned, I still see some white screens occasionally. So I do not believe the changes in bug 796474 will be a way to completely address the issue.
blocking-b2g: 1.4? → 1.4+
Flags: needinfo?(bgirard)
(In reply to James Burke [:jrburke] from comment #2)
> Related, I am looking at changes to the way email handles the display of
> messages in a long list, tracked in bug 796474, but even with the approach
> in that bug of moving around divs that are absolutely positioned, I still
> see some white screens occasionally. So I do not believe the changes in bug
> 796474 will be a way to completely address the issue.

The GfxTestApp has a demo of how to implement various virtual list. These scroll without checkerboarding:
http://people.mozilla.org/~vladimir/misc/gfxapp/

so it's entirely possible if the app is careful.
Flags: needinfo?(bgirard)
Keywords: perf
triage: James, can you please get a profile with the latest build and with bug 796474? It will help us figure out what's going on. Thanks! Profile guide - https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler

As for the virtual list implementation, bkelly tried this with contacts and it required a very risky change. Bkelly, can you please chime in? Thanks
Flags: needinfo?(jrburke)
Flags: needinfo?(bkelly)
See Also: → 965019
Whiteboard: [c=effect p= s= u=]
Whiteboard: [c=effect p= s= u=] → [c=handeye p= s= u=1.4]
The virtual-list risks for contacts are due to the current implementation in contacts that tightly couples many features to the DOM structure.  That should not directly impact email.
Flags: needinfo?(bkelly)
Marking this as depending on the virtual scroll changes in bug 796474, as it will likely be a more efficient use of resources to use the baseline established by 796474 to address any other checkerboarding.

I have looked at trying to do a perf capture in bug 796474, but caught between two things:

* I have some fixes and want to put out a second code snapshot, and use that as the perf baseline.
* I can no longer build b2g on my box due to bug 935776, and have not taken the hit yet to figure out how to work around it, so cannot attach symbols for the perf captures.
Depends on: 796474
Flags: needinfo?(jrburke)
Dylan,

Per Jame's comment 6, this bug depends on James' E-mail scrolling fix so please assign to him when appropriate.

Thanks,
Mike
Assignee: nobody → doliver
Flags: needinfo?(doliver)
James,

What next steps here?
Flags: needinfo?(jrburke)
I am hoping to get to a resolution for bug 796474, which this bug is marked as dependent for this bug. Then I expect it will make more sense to do any debugging/perf of this issue with that new code.
Flags: needinfo?(jrburke)
Since this is not a regression issue. Maybe we can think about fixing this bug in 2.0, if this bug can't be fixed before the end of April.
(In reply to Kevin Hu [:khu] from comment #10)
> Since this is not a regression issue. Maybe we can think about fixing this
> bug in 2.0, if this bug can't be fixed before the end of April.

This is a QC blocker. We can't remove this from 1.4 scope.
In bug 796474 comment 59, :khu asked if bug 796474 was suitable for 1.4 blocking status given the scope of change in it. For the devs on the email team, it is an important step to generally makes scrolling better. However, it is a bigger change, and it is still possible to cause "You will see white screens occasionally", which the devs on gaia read as "checkerboarding", when a white area may appear on part of the screen while scrolling,  which do go away, are not persistent.

This leads into a discussion of what exactly needs to be fixed for this bug, so asking Tapas Kumar Kundu for more information on what white screen case needs to be fixed. So, having the video mentioned in comment 1 would be helpful, and also to know how reproducible it is.

I believe some cases are just because of APZC quirks, or because of an interaction of email workload and the moment of scrolling, and the amount of effort to figure out what cases are app and what are APZC will be difficult to do as most of the time, exact causes for me have not 100% reproducible.

Tapas Kumar Kundu: it is also worth trying out the latest app zip for bug 796474, to see if in general it fixes the case that triggered this bug:

http://jrburke.com/work/gaia/email-redacto.zip

Video:

https://vimeo.com/92457670

You can install it using the App Manager in firefox:

https://github.com/jrburke/gaia-dev-zip#for-the-ux-person
Flags: needinfo?(tkundu)
Dependency has been renomed for being too risky, so this is being renomed for the same reasons.
blocking-b2g: 1.4+ → 1.4?
Attached video VID_0004.3gp
(In reply to James Burke [:jrburke] from comment #12)

> Tapas Kumar Kundu: it is also worth trying out the latest app zip for bug
> 796474, to see if in general it fixes the case that triggered this bug:
> 
> http://jrburke.com/work/gaia/email-redacto.zip
> 

I tried with above email-redacto app but I still see same issue on v1.4 tip with QRD msm8226 device

I attached video for that.
Flags: needinfo?(tkundu)
Please review the video that Tapas attached. I don't think we can defer fixing this issue till 2.0.
Thanks for the video, that helps!

I tried the following on a hamachi device, so a different device, with a different screen size (shorter aspect ratio), but I could trigger that behavior every so often:

* Set up email account.
* Fetched initial inbox.
* Not enough messages on first fetch to do the two full scroll swipes shown in the video, so I tapped the "Load more messages from server" button 4 times.
* I scrolled to the bottom of that list, then all the way back to the top.
* I then did the two long scroll swipes as done in the video.

At first I could not reproduce, but after a while of doing the two long swipes as timed in the video, I could get it to happen every so often. I was logging the scrollTop number we collect in an event listener to the div's 'scroll' event. When I saw this behavior show up, the log of the scrollTop did not show up.

Near the end, when the scrolling just about stops, I would then see the scrollTop logged, and I would see the messages show up. One time I did not see it logged, and the screen stayed white.

So it seems like perhaps the 'scroll' event was not always fired in that case. I am not sure what would cause that. There was not an error in the log, and at first thought, I could not think of some pathological email app code that would be running to block the scroll listener from firing in the main thread.

For the time when no scroll event seemed to fired and the screen stayed white, I just lightly touched the screen to scroll, a scrollTop log would show up and then the messages appeared.

So probably a perf capture could help debug, but I have inadequate local capabilities to set up a profile-enabled gecko build with symbols. Perhaps if someone has a clue to what might cause scroll events to get dropped, that also might give a clue.

Getting late for me now, but I wanted log what I found so far.
It looks like the problem in the video is that we are performing a snippet fetch for the iTunes email.  This will take a mutex and block slice grow requests until the snippet fetching job completes and releases its mutex.  The bug for this is bug 888807.

This is somewhat guesswork without a logcat, of course.  Tapas, do you have a logcat from the run available that you could provide us with?

(For general reference for everyone, we always want a logcat on email bugs, and it can also be important that other account-related details that are also requested on https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo be provided.  For example, the account type is very important since we support 3 different account types and each of them would interact in this scenario somewhat differently from a latency perspective.)

My basis for suspecting this from the video is that:
- There's no snippet for the iTunes entry initially.
- The refresh icon is not spinning, suggesting this is not a question of a refresh sync blocking the UI.
- The system bar across the top plays its "data was sent and/or received" animation right after our UI paints again.  And it's my understanding that it's just an animation loop triggered by hinted data traffic.


In regards to James' excellent investigation and repro, us potentially missing scroll events does seem concerning.  This fundamentally seems like it would constitute a platform bug, but we could band-aid that with a timer that maybe logs when it seems like the scroll event has betrayed us and by how much it has betrayed us (ex: 40 pixels versus 400 pixels).  It would also be worth making sure that it's not a logic glitch like say where we will call grow() at most once for each scroll event, so if we receive too few events we end up not growing all the way to where the screen ended up.  I'll take a look at this now as I finish by bug 796474 review.
Flags: needinfo?(tkundu)
Pending Release Management decision. Leaving as 1.4? (nom) - FxOS Perf Triage
As to the hypothesis that snippet fetching may be involved, I am less likely to suspect that cause as that snippet fetching is done in the worker, where the scrolling in the main thread should not be blocked by that, unless it somehow took much more resources from the system. But I also think these data points help rule it out:

1) I got it to happen after a while, and luckily, it happened while I was scrolling up, and ended up at the very top of the scroll area. The whole scroll area was white, but the Search bar should have been visible. However it was not there, just white.

This implies to me a drawing/scrolling issue in the platform, as the Search bar is outside the div that is used for the messages, but still inside the scrolling area. No app code is run to show that search bar, it is just sitting at the top of the scrollable region.

2) I turned on logging of all vscroll calls during a different white screen case, it looked like:


I/GeckoDump( 1335): LOG: vscroll: onEvent
I/GeckoDump( 1335): LOG: ON EVENT: 3327: 1398278932888
I/GeckoDump( 1335): LOG: vscroll: onChange
I/GeckoDump( 1335): LOG: SCROLL TOP: 3327
I/GeckoDump( 1335): LOG: vscroll: _startScrollStopPolling
I/GeckoDump( 1335): LOG: vscroll: _scrollTimeoutPoll
I/GeckoDump( 1335): LOG: vscroll: _scrollTimeoutPoll
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (0)
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (1)
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (2)
I/GeckoDump( 1335): LOG: vscroll: emit: (scrollStopped)
I/GeckoDump( 1335): LOG: vscroll: getVisibleIndexRange

[white scroll area now visible, no logging happening now, even after scrolling stopped]


[touch to scroll now, and events continue:]

I/GeckoDump( 1335): LOG: vscroll: onEvent
I/GeckoDump( 1335): LOG: ON EVENT: 5031: 1398278940961
I/GeckoDump( 1335): LOG: vscroll: onChange
I/GeckoDump( 1335): LOG: SCROLL TOP: 5031
I/GeckoDump( 1335): LOG: vscroll: _render: (60)
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (1)
I/GeckoDump( 1335): LOG: vscroll: _startScrollStopPolling
I/GeckoDump( 1335): LOG: vscroll: _scrollTimeoutPoll
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (0)
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (1)
I/GeckoDump( 1335): LOG: vscroll: _isCacheVisible: (2)
I/GeckoDump( 1335): LOG: vscroll: emit: (scrollStopped)
I/GeckoDump( 1335): LOG: vscroll: getVisibleIndexRange
I/Gecko   ( 1335): WLOG: queueOp 2 downloadBodies
I/Gecko   ( 1335): WLOG: runOp(do: {"type":"downloadBodies","longtermId":"2/P","lifecycle":"do","localStatus":"done","serverStatus":"doing","tryCount":0,"humanOp":"downloadBodies","messages":[{"s)
I/Gecko   ( 1335): WLOG: runOp_end(do: {"type":"downloadBodies","longtermId":"2/P","lifecycle":"do","localStatus":"done","serverStatus":"doing","tryCount":0,"humanOp":"downloadBodies","messages":[{"s)
I/Gecko   ( 1335): 

Those last two calls for "downloadBodies" is snippet fetching. Where the white screen occurred, that visible range already had snippets so no snippet fetching was in progress, at least for that region. 

There were many scroll events above the snippet I posted, but a ways up was a snippet request like the above. I suppose it is still conceivable a previous snippet fetch was in play in the worker.

3) So I made sure to scroll through all the items in the list to make sure all had snippets fetched for them. It took a while longer, but I could still trigger a white screen eventually that stayed white, with the same symptom of not getting a scroll event. 

In that case, the last scroll event reported a scrollTop of 1663, then white screen. Then I gently touched it enough to trigger a very small scroll and the scrollTop was logged at 3245, a fairly big jump, bigger than the small scroll I triggered.

In summary: I think we may need some more help from the perf or platform teams for this, but also I would like to see us land bug 796474 first as I think it gives a better baseline to use for investigation. 

If someone wanted to be proactive though, they could use the app zip in comment 12 as a start.
(In reply to Andrew Sutherland (:asuth) from comment #17)
> This is somewhat guesswork without a logcat, of course.  Tapas, do you have
> a logcat from the run available that you could provide us with?
> 

This is collected on v1.4 QRD MSM8x26 device. Please let me know if you have any difficulty in reproducing this issue on your end.
Flags: needinfo?(tkundu)
Preeti -- checked with perf team and we are okay fixing bug 796474 to land the new app in v1.4 and the remaining work tracked for 2.0 but for the very least bug 796474 needs to be fixed for v1.4.
Flags: needinfo?(praghunath)
blocking-b2g: 1.4? → 2.0?
Flags: needinfo?(doliver)
Whiteboard: [c=handeye p= s= u=1.4] → [c=handeye p= s= u=1.4][priority]
Assignee: doliver → nobody
No longer blocks: 984663
QA please restest once bug 796474 has landed.
Keywords: qawanted
Priority: P1 → P2
Whiteboard: [c=handeye p= s= u=1.4][priority] → [c=handeye p= s= u=][priority]
We usually only flag qawanted after the patch lands, not before.
Keywords: qawanted
waiting for https://bugzilla.mozilla.org/show_bug.cgi?id=796474 to land and retest before triaging
Duplicate of this bug: 965019
Attached video part1.mov
Tested part 1 with the jn.rlly@gmail.com test account now that 796474 has landed. We see one large checkerboard while loading new messages and when we hit the bottom.
Attached video part2.mov
Part 2 with bug 796474 landed. Just to confirm what James said (https://bugzilla.mozilla.org/show_bug.cgi?id=796474#c85), I tested scrolling right after touching get new messages. We get an initial checkerboard while scrolling up, another one while in edit mode, and one small one while scrolling from recovering from edit mode. It's not a lot and it's a fairly large list. All testing was done on a flame device.
And a profile up scrolling on a flame + one "load new messages" button. This is with a fairly long list already cached - https://people.mozilla.org/~bgirard/cleopatra/#report=8b84474baf71c89024df9ea9a835591e51af1d55
Some notes from the profile:

1) Getting some fairly long rasterizes - 40ms+.
2) Getting one 40ms gralloc, which might have caused the one checkerboard I saw - at the 13487 ms mark.
3) 350 ms total reflow time, this should probably be the first investigation.
4) Fairly consistent 20ms buildDisplayList, not sure how we optimize this.

To summarize, it seems like when we have a long rasterize time without a long layout, it's because of a gralloc. Second, when we have large layout times, it's because of a reflow. The reflow issues occur more often in this profile. 

Overall, there is some room to optimize, but I'm not sure we checkerboard enough that we should block.
I'm re-adding qawanted since bug 796474 has landed, and given mchang's last words in comment 29.
Keywords: qawanted
With today's 1.4, I CAN get white screen to show occasionally. I'm not sure what behavior is acceptable so I made a video.

http://youtu.be/1YiH8wHLJds

The video shows 2 blank screen instances, one at 0:12, another at 1:05.

Tested on:
Device: Buri 1.4 MOZ
BuildID: 20140430000201
Gaia: 81e97c3ca58be0487292011bc59efa4cebab30be
Gecko: 123485e733d5
Version: 30.0
Firmware Version: v1.2-device.cfg
Keywords: qawanted
Tentatively blocking this bug for 2.0, but we are doing further investigations and will likely break this out into a couple of more directed bugs and either close this one or convert it to a meta bug.

Inder, it would be good to get updated input from your team now that bug 796474 has landed.
Assignee: nobody → doliver
blocking-b2g: 2.0? → 2.0+
Flags: needinfo?(praghunath) → needinfo?(ikumar)
Whiteboard: [c=handeye p= s= u=][priority] → [c=handeye p= s= u=]
Sure, Tapas, can you please give it another try.
Flags: needinfo?(ikumar) → needinfo?(tkundu)
(In reply to Inder from comment #33)
> Sure, Tapas, can you please give it another try.

For #comment 32, I am seeing a big improvement as white-screen is coming less frequently during scrolling emails.
Flags: needinfo?(tkundu)
Priority: P2 → P1
Whiteboard: [c=handeye p= s= u=] → [c=handeye p= s= u=2.0]
Status: NEW → ASSIGNED
Do we have requirements on how much white screen is acceptable. From comment 34 and my own tests, we can still checkerboard, but much less frequently. Any thoughts Dylan? Thanks!
Flags: needinfo?(doliver)
I'm all for numbers, and if we can get them, great, I wonder if we may have to settle for "when there isn't too much of it" in this case, as the answer as to how much checkerboarding we can accept...
Our own testing and comment #32 indicate that we've made significant improvements here. Since we don't have any further actionable data in this bug, we'd like to close this one as a duplicate of bug 796474. 

If there are further improvements required here, please create new bugs where we can get more specific about the remaining requirements.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(doliver)
Resolution: --- → DUPLICATE
Duplicate of bug: 796474
Whiteboard: [c=handeye p= s= u=2.0] → [c=handeye p= s=2014.05.22.t u=2.0]
No longer depends on: 796474
Whiteboard: [c=handeye p= s=2014.05.22.t u=2.0] → [c=handeye p= s=2014.05.23.t u=2.0]
Target Milestone: --- → 2.0 S2 (23may)
Duplicate with Bug 796474, so I mark it to "VERIFIED"
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.