Closed Bug 1001629 Opened 10 years ago Closed 10 years ago

[Email] Closed notifications that launch the email app (because it was not running) will result in the app displaying a black screen until a notification is clicked or email activity is triggered

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.0+, b2g-v1.3 wontfix, b2g-v1.3T wontfix, b2g-v1.4 affected, b2g-v2.0 verified, b2g-v2.1 verified)

RESOLVED FIXED
2.1 S3 (29aug)
blocking-b2g 2.0+
Tracking Status
b2g-v1.3 --- wontfix
b2g-v1.3T --- wontfix
b2g-v1.4 --- affected
b2g-v2.0 --- verified
b2g-v2.1 --- verified

People

(Reporter: jcastro, Assigned: jrburke)

References

()

Details

(Whiteboard: [tarako-exploratory])

Attachments

(7 files)

Description:
User repeatedly accessed the email app. On the 9th attempt a black screen appeared and the app never loaded fully. 

Repro Steps:
1) Update a Tarako to BuildID: 20140424014003
2) Unlock the screen select the email app
3) Return home
4) Repeat steps 2 and 3 until a black screen appears

Actual:
A black screen appears and the app never fully loads in.

Expected:
The app always loads and user is shown email

1.3t Environmental Variables:
Device: Tarako 1.3t MOZ
BuildID: 20140424014003
Gaia: 706f474230432c8cb01e124aee2c055ec902aa1d
Gecko: 26ef3dded9ff
Version: 28.1
Firmware Version: Sp6821a-4-18


Repro frequency:  1/9 attempts
See attached: 
video clip
http://youtu.be/TTobzmbTDog
Logcat
Firewatch
Does not reproduce on a Buri with 1.3

1.3 Environmental Variables:
Device: buri 1.3 MOZ
BuildID: 20140423024002
Gaia: fb2b8e25e8095a5db49b5eb68c74b49f180605d4
Gecko: da2500909f91
Version: 28.0
Firmware Version: V1.2device.cfg
Somewhat ironically, the video on youtube is 34-seconds of complete blackness (and no sound).  There is no obvious smoking gun in the logcat, but given the series of niceness changes followed by the pipe error, I would infer that the email app is just getting low-memory-killed and a bug in the window manager is resulting in just a black screen being displayed.

Because of bug 1001206 we aren't going to reliably be able to infer memory-deaths from the adb logcat at this time, so you will want to also provide the contents of invoking "adb shell dmesg" shortly after triggering the bug if reproducing.
Whiteboard: [tarako-exploratory]
Can you reproduce this with just random launches of the email app that are not in succession with each other?
Keywords: qawanted
I think it might have LMK'ed the one trying to launch?

I've seen the black screen when trying to SMS a picture from gallery with low memory pressure.  Basically both apps get LMK'ed.

This isn't email specific.
Component: Gaia::E-Mail → Gaia::System
(In reply to Jason Smith [:jsmith] from comment #4)
> Can you reproduce this with just random launches of the email app that are
> not in succession with each other?

No. Unable to reproduce this with random launches of the email app that are not in succession with each other. Attempted 50 random launches of the email app - after tapping other apps first - and was never able to reproduce this behavior on the latest 1.3T build. 

The only times behavior reproducible were following the STR from Comment 0.

Environmental Variables:
Device: Tarako 1.3T MOZ
BuildID: 20140428014001
Gaia: 8895b180ed636069473703d0e7b73086989601ce
Gecko: 7caf4b5abfce
Version: 28.1
Firmware Version: sp6821a_gonk4.0_user.pac
Keywords: qawanted
Happened to me twice on Flame / 1.3T during one week. With low probability, yes, but it occurs anyway, perhaps usually after long stand-by...
Happened again to me on latest 2.0 nightly version, so this is surely not fixed yet...
Happened again, twice. It is not related to the fact if email application was started before and only on the background or not.
Once it was the case, but second time, after fresh reboot, the very first opening and email application ended the same - black screen and nothing else. Terminating and running the app again solves tho problem, but it is very annoying...
Is it possible to get a logcat from when this happens?  Details on how to provide one can be found at https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo

Also, can you indicate if you have periodic sync enabled and at what interval?
Flags: needinfo?(marek.raida)
Next time it happens I'll try to dump logcat, I promise.
I've 30 or 15 minutes pull set to both my emails, one to outlook.office365.com and another to GMail.
As far as I can tell it is not related to new incoming messages...
Flags: needinfo?(marek.raida)
Attached file logcat.txt
It happened again. I immediately run adb logcat and there it is for analysis.
This time it was: after OTA update, phone restarted, I entered pin and the very first action I did was opening and email app => and it ended in black-screen...
Attached file logcat2.txt
First log was from stand-by mode, this one is after wake-up with black-screened email app in the foreground...
Comment on attachment 8472548 [details]
logcat.txt

Unfortunately, I think we missed some of the interesting stuff.

At the top we see:
--------- beginning of /dev/log/main^M
I/Gecko   (23315): WLOG: Sync Completed! null days 48 messages synced^M
I/Gecko   (23315): WLOG: folder message count 9719 dbCount 160 syncedThrough null oldest known 1407726020000^M
I/GeckoDump(23315): LOG: email oncronsyncstop: 0,1^M
I/Gecko   (23315): WLOG: Killing unused IMAP connection.^M
I/GeckoDump(23315): LOG: email: clearing wake locks for "id0 1"^M

and then a little bit later:
I/Gecko   (23315): WLOG: Killing stale connection.^M


What I infer from this is that you opened up the email app while it was performing a periodic sync.

It's worth noting that the way the mozAlarms system works is that at startup it loads the database and checks if any alarms should have already fired.  If so, it fires them immediately.  So other apps may also have started around the same time, so the system might even have been a little bit taxed.

Since we've been landing changes to the email app on v2.0, it's also quite possible that the cookie cache had been invalidated and so we would start the app without using the cookie cache.


I suspect what's happening is this (NB: I started out thinking I had found a problem, but by the end of the story I don't think there's a problem):
- The email app is getting started up by an alarm.
- You tap on the email app around roughly the same time.

The core thing is the email app is starting up with an alarm message, this results in the following:
- in html_cache_restore.js:
  - Because (window.htmlCacheRestorePendingMessage.length) is truthy, we do not restore the cached node state.
  - Note that in this case because of the cookie cache difference, we will not restore the cached node state
- in mail_app.js:
  - at the top, startCardId ends up null because cachedNode will be null because we did not use the cookie cache.
  - because appMessages.hasPending('alarm'), we clear cachedNode again (although it doesn't matter since it was already cleared) and set startedInBackground to true
  - we do *not* call pushStartCard because both !cachedNode and !startCardId.
  - we do register various model event listeners
    - 'acctsSlice': we will register for 'foldersSlice', which will then call showMessageList()
  - we do call model.init
    - this will emit 'api', 'acctsSlice', and 'metrics:apiDone'
    - this will call model.changeAccount which will emit 'foldersSlice'
  - okay, so after all that async stuff, we should end up calling showMessageList() (eventually).
  - showMessageList calls showFinalCardState with a closure that can push the 'message_list' card.
  - showFinalCardState saves off the action function if (startedInBackground && document.hidden).
  - we already registered a listener long ago for a 'visibilitychange' and it calls stuff if (startedInBackground && finalCardStateCallback && !document.hidden)

The dance of onVisibilityChange and showFinalCardState seems fairly correct, and even if there's some wackiness with document.hidden relative to that event, unless it can fire when (document.hidden === true), it seems like we'd err on the side of spinning up the UI.

Because if visibilitychange fires before showFinalCardState, then when showFinalCardState gets called, it should be !document.hidden, and we should fall through into fn().  If showFinalCardState gets called first, then finalCardStateCallback should be set and that method should do the right thing.

So in summary, I'm confused and I'm sorta hoping that there is a smoking gun in there, and I just forgot it while tracing down the async ping-ponging.

James, thoughts/ideas?
Flags: needinfo?(jrburke)
Attached file output2.txt
I've captured this logcat DURING - WEN the problem happened. I hope it will help to solve this troublesome case...
Attached file output3.txt
This one follows my previous attachment submit; core problem should be in the output2.txt but perhaps something valuable could be readable from this one too...
Was captured quickly after it...
I am still investigating, and have this in my bug queue, but it is a bit difficult since I do not have a device to reproduce the failure. 

Marek Raida: what if you turn off background sync, does the problem happen then? From the most recent logs, background sync seems to be on. If you cannot reproduce with it turned on, then you can reproduce again after it has been on for a bit, then that would be a useful data point.

As it is now, the logs do not point to a specific error, so if the failure is in the email app, I suspect some sort of logic flow failure as :asuth was attempting to find. The more recent logcat may indicate that, as there are some backend startup log statements but no front end ones.

To turn off background sync: in the account settings, set "Check for new messages" to "Manually" for both accounts that are configured in the email app.
Flags: needinfo?(jrburke)
OK, I'll try to turn both accounts to manual check and we'll see....
MR
:asuth pointed out to me that this can be reproduced on 2.0 on flame, so I will try more tests on device that way instead of asking for more test work from others. Sorry for the extra noise.
No problem for me ;-)
My scenario I was able to reproduce many times is:
- I have set option to manually (not pull)
- restart OS (I have option "Display notifications after restart ON" - could have some impact)
- enter PIN
- very first action is to tam Email app from Homescreen
- screen is black
- open Rocketbar and there are some email notifications (from previous time when I had pull on the background turned ON)
- tap on the notification => email app without necessity to kill opens correctly specific message

Hope this will help in your investigation...
Wish you luck,
Marek
Ahh, notifications!

Those will cause waitForAppMessage to be true, which will cause us to wait for a notification and not push messages_list.

But we filter out "!msg.clicked" notifications (clearing/etc.), so we'll never generate a notification.  Instead of short-circuiting out, we need to do something to kick the app again in mail_app.js or something.
Attached file GitHub pull request
:asuth found that this state can be caused by the system notifying email that a notification was cleared/closed, and not clicked. This process could take a few seconds to occur. When it happens, and the email app was not already running, it leads to the black screen.

This fix makes sure to initialize the app in that notification cleared case.
Attachment #8473463 - Flags: review?(bugmail)
Component: Gaia::System → Gaia::E-Mail
Target Milestone: --- → 2.1 S3 (29aug)
Assignee: nobody → jrburke
I also confirmed there is something weird with the system on reboot -- it sends notification close events to the apps. Perhaps this is expected behavior, but what I saw:

* I background sync an email message, get a notification.
* I then restart the phone
* On restart, I see the email app launched and told of a notification close event (I added extra logging for my test)

So this explains the STR case where after reboot, the first action of launching email, it was blank. Because even before the user has a chance to launch the app, the system already has, to notify of the close event.

It does seem weird to me that the system does this sort of behavior. Regardless of that though, email needs to be able to handle close notifications.
Comment on attachment 8473463 [details] [review]
GitHub pull request

Yeah, close notifications are fairly sad in general.  It seems like it would be better if they didn't exist.  I think most purposes would be satisfied by the tag aggregation stuff and the availability of get() to see what's still displayed.
Attachment #8473463 - Flags: review?(bugmail) → review+
One thing that's worth considering to add to the patch is some more console.log()'s to indicate when we're intentionally not pushing a card/etc.  I feel like we probably have gotten all the bugs out of the system, but it does seem handy if it was explicit what async thing we're waiting on that could then turn out to go wrong.
I added a few logs for when we switch one of the state flags that takes the loading off the default path. So it should only log this new information for those less common cases. Squashed, and if try build is happy, I will merge tomorrow.
Thx for your effort. And may I ask - the 2.1 flag confuses me.
It is so fatal error IMO (mail app - MUST for most of the users, blackscreen - perceived as BIG fail) that it would be worth fixing also in 2.0 (if not also 1.4 which is not released yet either), hmmm?
Target milestone is used by project management to track when the work was done.  We will be requesting blocking v2.0 for this bug once it lands in order to uplift it to v2.0.  (Recent experience has shown approval is not being granted for approval flags, just blockers.)  We should also consider trying to get this up to the v1.4 branch too.  You can learn more about the branch policies at https://wiki.mozilla.org/Release_Management/B2G_Landing if you are interested.
Merged in master:
https://github.com/mozilla-b2g/gaia/commit/0ba00a5a247d22c738c1ea482742ab7157fd323d

from pull request:
https://github.com/mozilla-b2g/gaia/pull/22908

Asking for 2.0? because it can lead to a bad user experience -- a black screen with no clear indication of a problem. 

The main concern is if the user reboots the phone while there are email notifications in the notification tray, on reboot if the email app is launched shortly after that, the email app will just be a black screen.

Also, if the user clears an email notification without tapping on it and the email is not already running in the background, the user will get a black screen when going to the email app shortly after that.

The patch is low risk: it detects if notification event was not a clicked notification, and if so, and only if email startup was waiting on the notification event pathway to complete, and if there are not already cards being shown in the UI, reset the UI to default message list display. All of those tests and state tracking already exist in the code.

It may make sense to push to 1.4, but it is my understanding a separate group is managing the bugs for those, and are being very selective for them. I will ask around about that. Getting 2.0 clearance though is the first step.
[Blocking Requested - why for this release]: (sorry, midair collision, see comment 29 for reason for 2.0?)
Status: NEW → RESOLVED
blocking-b2g: --- → 2.0?
Closed: 10 years ago
Resolution: --- → FIXED
Clarifying bug subject.  I also updated the status flags based on the presence of this code-flow idiom on the branch.  It's conceivable that varying uplifts make things behave slightly differently on those branches, but I think our only other activity in this area has been as it related to multiple-activity triggerings, which is different.
Summary: [B2G][Tarako][Email] Occasionally when accessing the Email app, a black screen appears and stay on screen → [Email] Closed notifications that launch the email app (because it was not running) will result in the app displaying a black screen until a notification is clicked or email activity is triggered
I spoke offline with :doliver on what makes this regression more important for 2.0, given we've shipped with this in 1.3 already.Apparently

a) it's more likely to be an issue on higher memory devices because we aren't getting LMK'ed, so people running flame on 1024mb (default config) are going to see it frequently and likely less seen on lower mem config
b) non-risky fix, isolated to email
c) comment #29

so marking this as a blocker to consider uplift. If we have any fallouts though, we should back this out as its not worth fixing this and causing more code churn and regressions.
blocking-b2g: 2.0? → 2.0+
Attached video verify_video.MP4
This issue has been verified successfully on Flame v2.0 & v2.1
See attachment: verify_video.MP4
Reproducing rate: 0/5
Flame 2.0 versions:
Gaia-Rev        8d1e868864c8a8f1e037685f0656d1da70d08c06
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/c756bd8bf3c3
Build-ID        20141202000201
Version         32.0
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20141202.034707
FW-Date         Tue Dec  2 03:47:18 EST 2014
Bootloader      L1TC00011880

Flame 2.1 versions:
Gaia-Rev        ccb49abe412c978a4045f0c75abff534372716c4
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/18fb67530b22
Build-ID        20141202001201
Version         34.0
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20141202.034824
FW-Date         Tue Dec  2 03:48:34 EST 2014
Bootloader      L1TC00011880
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: