Closed Bug 985598 Opened 7 years ago Closed 7 years ago

[email] Double-triggering of share/new activity by clicking on a mailto link results in black screen.

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:1.3+, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

VERIFIED FIXED
1.4 S6 (25apr)
blocking-b2g 1.3+
Tracking Status
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: mclemmons, Assigned: jrburke)

References

()

Details

(Keywords: regression, verifyme, Whiteboard: dogfood1.4 [cert])

Attachments

(4 files, 1 obsolete file)

User sets up Latch App account and opts to submit a Contact Us request without first setting up email on the device. When user generates on screen message of asking whether they want to set up to send or receive email and decides to do so, user is taken to a black screen indefinitely. The user can only exit this section by tapping on homescreen button. This problem does not occur on 1.3 Buri.

Prerequisites:
1. Access Marketplace
2. Download and install Latch App
3. Create and activate Latch App account
4. Email is not setup on device

Repro Steps:
1) Updated Buri to BuildID: 20140319000200
2) Tap Latch App and sign in to account
3) Tap drawers icon in upper right of screen and select Support
4) Tap contact us button
5) Select OK on screen message “You are not set up to send or receive email. Would you like to do that now?

Actual:
Screen turns black except the status bar. Email setup screen never appears and user can only get around this page by tapping homescreen button

Expected:
New Account screen from Email App appears

Environmental Variables:
Device: Buri 1.4 MOZ
BuildID: 20140319000200
Gaia: c03a6af9028c4b74a84b5a98085bbb0c07261175
Gecko: b07ecc057abf
Version: 30.0a2
Base image: v1.2-device.cfg

Notes:
Repro frequency: (5/5 – 100%)
Link to failed test case: none 
See attached: video clip
Other notes: This problem occurs with device USB attached and charging as well as no USB attached and not charging. Waited over 5 minutes for black screen to go away on its own but nothing occurred. However, status bar clock continued as normal throughout the wait time.
This issue does not reproduce on 1.3 Buri. Following the STR from Comment 0 the New Account screen displays from the Email App normally. 

Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20140319004003
Gaia: f2f2be55dcca1d6775592ef478948045935a91ed
Gecko: 87d90ecfe616
Version: 28.0
Base image: v1.2-device.cfg
blocking-b2g: --- → 1.4?
Component: General → Gaia::E-Mail
Can we get a logcat?
Attached file Logcat Email Latch App Black Screen (obsolete) —
(In reply to Jason Smith [:jsmith] from comment #2)
> Can we get a logcat?

Logcat attached. Updating STR as the reported behavior was not reproducible on the same 1.4 build with device connected with USB cable to retrieve logcat.

New STR that reproduced Comment 0 Actual Results for QAWanted
1) Updated Buri to BuildID: 20140319000200
2) Tap Latch App and sign in to account
3) Tap drawers icon in upper right of screen and select Support
4) Tap contact us button and select cancel to screen message “You are not set up to send or receive email. Would you like to do that now?
5) Tap contact us button and select OK to screen message "You are not set up to send or receive email. Would you like to do that now? 

Actual result - Black screen appears with exception of Status bar.
Keywords: qawanted
Comment on attachment 8393814 [details]
Logcat Email Latch App Black Screen

This logcat does not contain any entries from the e-mail app during its 1min 12 sec coverage so I am obsoleting it.  The log does indicate a disconnection; maybe that's why the log has nothing useful in it?
Attachment #8393814 - Attachment is obsolete: true
Please provide a logcat with e-mail logs in there.  As documented on https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo in the logcat section there is a brief mention of what type of log entry you should see in there.
Keywords: qawanted
blocking-b2g: 1.4? → 1.4+
Target Milestone: --- → 1.4 S4 (28mar)
Attached file latch_logcat
I was able to reproduce this issue using the updated STR from comment 4 and am attaching a logcat.
Keywords: qawanted
Based on the log, it appears that Tony managed to trigger the e-mail activity twice before the e-mail app received or was able to service the activity.  This is terrifying on multiple levels because there are potentially a few horrible things intersecting:
- The system may be tracking multiple activities at the same time?
- The e-mail app uses window.confirm() to prompt the user about whether they want to create an account.
  - This is a synchronous API so there's a potential for re-entrancy.
  - The e-mail app is not able to cancel the confirm dialog.  The only way this doesn't turn out the most horrible thing ever is if the system app is somehow closing the first confirm dialog before the e-mail app asks for the second one.  This seems unlikey.


So basically this is a train-wreck.  Varying options:

1) Eliminate the confirm call entirely and be a jerk about it, compelling the user to either create an account or manually go back to where they came from.

2) Eliminate the confirm call entirely but be nice about it.  Our new user dialog optionally has a back button.  If we're creating the first account an activity brought us there, we could show the back button and have pressing it cancel the activity, returning the user to where they came from.

3) Swap out our remaining uses of window.confirm/window.alert for something explicitly handled by the card infrastructure.  We already have a ConfirmDialog hackjob in mail_common and we already know that we don't want to be using window.confirm/window.alert anywhere in our codebase because of the synchronous stuff.  (There's an opposing force about unifying styling/presentation of dialogs, however.)  ConfirmDialog is not part of the card infrastructure which in general is already fixin' to be a footgun.  So when addressing this we would ideally also transform ConfirmDialog to use our card mechanism so that we can removeAllCards it.  As long as its die() method generates a callback with an appropriate cancellation or 'unsolicited force close' semantic, we should be golden and it's a major improvement.

4) Implement some type of mutex-y thing related to activities.  Basically we would end up discarding the second 'new' after we see the first 'new' and throw up that prompt.

5) Implement some kind of mutex related to use of window.confirm/window.alert in the e-mail app.  I hate this and because of the nested event loop stuff this is still fairly horrible.


In general I think option 3 is the right course of action since any use of window.confirm/window.alert/existing ConfirmDialog is a major foot-gun that is going to continue to result in more horrible situations like this.  The ability for us to blow away the entire UI state reliably is now a must-have.

I think there could be some arguments made for the UX flow of option 2, but in practice just seeing the new e-mail account screen is probably tremendously confusing.  And the good bit of option 2 is major feature creep in any other context.


James, your thoughts on this?
Flags: needinfo?(jrburke)
Like you I prefer 3, but I am fine with the UX flow of option 2: removing the confirm() in favor of a real card, the one for creating an account, with the Back button. It is what needs to happen anyway, and the Back button should be enough to indicate to the user they can get back to the other app that way. Maybe we could print a message on that card under the buttons that state email needs to be set up to complete the action. Definitely would need UX review.

If a separate card is wanted by UX, then the general path of 3 still makes sense, using a real card and not a confirm().

However, I am on the fence about wholesale replacement of ConfirmDialog at this point. May not be too bad to replace them all, but also feeling like we have a high rate of change already with the scroll bug and we should be in a mode now for 1.4 about reducing the amount of change.

I also think it is legitimate to open a specific system/platform bug about the double activity calls. I am sure we are not the only app that would have trouble with two of them. It is also not clear if proper cleanup would happen across the system with two that are fired right in a row anyway.

As a platform fix helps all apps, and the current behavior of multiple activities right in a row is likely not desired(?), if that pathway is fixed, it would seem to take care of this bug too.

I also would suspect 1.3 would be susceptible, at least from the email side, particularly since the 1.4 and 1.3 entry point pathways will be the same when bug 969609 is uplifted.
Flags: needinfo?(jrburke)
QA Contact: jschmitt
B2G Inbound Regression Window:

Last Working
Environmental Variables:
Device: Buri 1.4 MOZ
BuildID: 20140314154217
Gaia: 2b5e6cb4dfad3cea948de6653843ca00848b313e
Gecko: 8179026d9140
Version: 30.0a1
Firmware Version: V1.2-device.cfg 

First Broken
Environmental Variables:
Device: Buri 1.4 MOZ
BuildID: 20140314180214
Gaia: 56f8ff83f112b706270150b70b753a1034625a18
Gecko: 5736a56beeb0
Version: 30.0a1
Firmware Version: V1.2-device.cfg

Last Working Gecko First Broken Gaia: Issue DOES repro
Gaia: 56f8ff83f112b706270150b70b753a1034625a18
Gecko: 8179026d9140

First Broken Gecko Last Working Gaia: Issue does NOT repro
Gaia: 2b5e6cb4dfad3cea948de6653843ca00848b313e
Gecko: 5736a56beeb0

Gaia Pushlog:
https://github.com/mozilla-b2g/gaia/compare/2b5e6cb4dfad3cea948de6653843ca00848b313e...56f8ff83f112b706270150b70b753a1034625a18
This is another regression from bug 969609.
Blocks: 969609
Duplicate of this bug: 984203
Summary: [B2G][Marketplace]Latch App Support page leads to black screen after user agrees to email setup on device → [email] Double-triggering of share/new activity by clicking on a mailto link results in black screen.
Attached file GitHub pull request
This change converts all ConfirmDialog uses to be backed by a card insertion instead of the document.body appendChild approach, which made it harder to clean up/reset state for those cases. That new ConfirmDialog is then used in place of the confirm() in the activity flow. This allows for cleaner app reset and avoids the double triggering of activities from freaking out the email app.

Some notes:

* With this change, it is possible for toast banners to appear in front of the ConfirmDialogs as the ConfirmDialogs are now in the card stack.
* The compose card needed a fix for when the compose card is destroyed very quickly when the email app speculates that compose should be the start card for the activity. I believe this is needed now there is no longer a confirm() that is blocking part of the flow. It seemed like a good fix to have in general too.
* There are two styles for calling ConfirmDialog.show: the old way, so that the rest of the ConfirmDialog references did not have to change, and a newer, simpler way that mimics the style of a confirm() dialog, with just a message that can be customized.
* No new l10n strings, but the simple confirm dialog uses l10n ID "message-multiedit-cancel". While not ideologically pure, other dialogs were already using it, so seemed safe to do so here too, although long term, each one of those cases should have its own cancel, or convert more of the code to use the simple ConfirmDialog approach.

It may make sense to file a separate bug to track cleaning any alert() usage to use a similar approach to this one, and then also perhaps a more invasive change that converts ConfirmDialog.show() calls to be more like the simpler approach, with maybe some more config options like button titles and styles like "danger" used in some ConfirmDialogs now.

I still may need to fix some things in the integration tests, but had trouble running them locally, so will look at it more tomorrow and the Travis result against all the tests. Thought I would start this pull request though as the approach may still need some discussion.
Attachment #8396161 - Flags: review?(bugmail)
Assignee: nobody → jrburke
Target Milestone: 1.4 S4 (28mar) → 1.4 S5 (11apr)
Update: I did a preliminary review of this last week and while performing on-device testing I experienced a new regression related to account deletion.  It's hard to be confident about a fix (and we are trying to be minimal here, which is a factor) without the integration tests.  I'm working the integration tests repairing/cleanup; given the very serious risk of whack-a-mole regressions on this, I think it makes sense to block on that.  (And since James is very busy with quasi-infinite scroll and that's important and rather hard, I don't think this is stalling him too much.)
Depends on: 975588
This issue will show up in 1.3 if bug 969609, which is 1.3+, is uplifted to 1.3. 

So asking to get a 1.3? ruling on this bug, so that we know if it is OK to live with in 1.3, and therefore can proceed with the uplifts mentioned in bug 969609 comment 34. If it is considered 1.3+, then we will wait until we have a fix landed on master for this bug before asking for 1.3 uplift for the other bugs mentioned in 969609.

If it is not seen as 1.3+, then the flag should be reset to the previous value of 1.4+.

(Hoping I am doing the process correctly here, sorry for any noise if I am not)
blocking-b2g: 1.4+ → 1.3?
Blocks a blocker.
blocking-b2g: 1.3? → 1.3+
No longer going to block on Mozilla-specific 1.3 blockers unless it hits a special exception list, so moving to 1.4? for triage.
blocking-b2g: 1.3+ → 1.4?
(In reply to Jason Smith [:jsmith] from comment #17)
> No longer going to block on Mozilla-specific 1.3 blockers unless it hits a
> special exception list, so moving to 1.4? for triage.

Actually I'm wrong here - this originally was a required followup to a dupe of bug 981934, which was a bug TCL was requiring us to fix.
blocking-b2g: 1.4? → 1.3+
Dylan

Please provide next steps here
Flags: needinfo?(doliver)
We are still pursuing this bug for 1.3. Andrew is finishing up the work on the tests as stated in comment #14 and then we'd like to get both this and bug 969609 uplifted.
Flags: needinfo?(doliver)
Whiteboard: dogfood1.4 → dogfood1.4 [cert]
Duplicate of this bug: 994689
Target Milestone: 1.4 S5 (11apr) → 1.4 S6 (25apr)
I have tried deleting one account with only one configured, and then one account with two configured, but I did not see an issue with that pathway, no black screen or weird screen, so asking :asuth for guidance on what sort of failure happened in comment 14.

Something I did notice:

* make sure an account is configured in email app.
* double-tap on an email for a contact in the Contacts app.
* cancel the compose.

The phone ends up on a blank screen that just shows the phone background image.

I believe that to be an issue with the system app or a subsystem handling the activity navigation. I tried in the email app to either always use the first activity or the last activity to do the activity.postError call, but both cases still had the same behavior.

So I do not believe that issue is an email issue, just a general activity issue with multiple fast activities fired in a row. I will file a bug for that, but want to land these changes first so it is easier to give a test case where the email app is not failing.

--

Also, asuth: as this is the last 1.3 issue for email that is open, what do you think about proceeding with review without waiting for the test revamp in master to land (assuming account delete does not have a wrinkle)? We likely would not be able to take any new tests for this ticket to the 1.3 branch given the state of the email tests on that branch, and any issues we discover would likely be new issues, not regressions. I ran the existing marionette tests on this pull request branch and they pass.

By landing this changeset, it would also allow landing bug 994689 to 1.3. We can then ask for a bit more QA testing in this area. The upside of landing these changes is that is very visible when it does fail, whereas before the fix in bug 994689, things may have looked OK but where actually in a bad state.
Flags: needinfo?(bugmail)
I did an inspection pass because I really had no idea what random buttons I mashed previously.  The problem, or at least a problem, is that waitForAppMessage can get latched true-ish, breaking resetApp()'s path when the account is deleted.

STR:
0) Have one or more accounts defined.
1) Have the e-mail app be closed.
2) Trigger a share activity from the web browser, waking up e-mail and setting waitForAppMessage true.
3) Hit back to bail on the compose process, discarding the draft.  You're probably now back in the browser.
4) Bring up the e-mail app from the home-screen or card switcher or probably any way.
5) Bring up settings, delete the account.  We trigger resetApp(), the model.latestOnce('foldersSlice') inside the model.on('acctsSlice') will bail because of that return.

We probably just need to have resetApp re-initialize such variables since the contexts in which it's called should already have mooted them.
Flags: needinfo?(bugmail)
Another problem that I think may be linked waitForAppMessage potentially being stale but could also involve listeners hanging around.  The short story is I managed to get 3 message_lists on the card stack at once and I saw messages and then I didn't see messages and things were generally confusing.  With high probability the extra confusion was due to the header_cursor issue that is being tracked on another bug.

STR, apologies for the complexity here, this is potentially reproducible using fewer steps but I want to get this down.  I've tried to spice up the monotony by having the STR ghost-written by an over-excited would be marketer:
phase A:
0) have email be closed and have no accounts.
1) trigger share activity from browser (once)
2) cancel out, end up back in the browser.
phase 2:
3) trigger share activity from the browser (once)
4) say yes to experiencing email by having an account
5) be on the account info screen
6) be a jerk and manually switch back over to the browser without giving email any closure
phase the third:
7) tap the share activity from browser three times real quick using 1337 tapping skills.  (You know you are 1337 if you see three "Received activity: new" logs followed by 3 "pushCard for type: confirm_dialog" logs.
8) say yes to the possibilities of an email account, the worlds it opens to you, the endless possibilities
9) proceed with account setup as usual, choosing to show your mail rather than adding another account.  One set of endless possibilities is already endless enough, amiright?
10) Observe the logcat doing this:
04-16 15:46:16.249 I/GeckoDump(  814): LOG: pushCard for type: setup_done
04-16 15:46:18.299 I/GeckoDump(  814): LOG: pushCard for type: message_list
04-16 15:46:18.389 I/GeckoDump(  814): LOG: pushCard for type: compose
04-16 15:46:18.519 I/GeckoDump(  814): LOG: pushCard for type: message_list
04-16 15:46:18.569 I/GeckoDump(  814): LOG: pushCard for type: message_list
11) back out of the compose card and then see freaky stuff happening


I think in this case part of the problem is that our top-level model.on('acctsSlice', function() {}) might not be being guarded by waitForAppMessage.  Between it and the evt.on('showLatestAccount') logic, that should get us two message_list instances.

Not immediately clear where the third message_list would come from, but looking at setup_done.js, we don't seem to protect against onShowMail firing more than once.  The code in there does a bunch of things but none of them will synchronously trigger eatEventsUntilNextCard which is usually how we get protected from multiple clicks in succession.
The extra 2 message_list cards were created via a bit of a circuitous route:

* ConfirmDialog.hide called Cards.removeCardAndSuccessors() before calling its callbacks. Since on startup, the only card in existence (since the app was already running) was the confirm card, that meant there would be no more cards in the stack and Cards.pushDefaultCard() was called. 
* However, that call used model.latestOnce('foldersSlice', ...) to wait for a folder to be available. Since no folders were available (was in account setup), it just queued up the creation of the card for later. In comment 24's steps to reproduce, the share action was triggered twice effectively, so there were two message_list's lurking in the shadows.
* Once the account was created, that triggered the foldersSlice listeners and the extra two cards are created, in addition to the one we expect to be created.

So the fix I did was to allow a call to removeCardAndSuccessors to not call pushDefaultCard. I did this by passing another arg to removeCardAndSuccessors. While that part is unfortunate, it seemed the most expressive as to intent, and longer term I would probably want to break up the logic in there a bit more.

I considered changing ConfirmDialog.hide() to pass 'none' to removeCardAndSuccessors, and use that as the trigger to know not to pushDefaultCard, but that did not match the 'immediate' use in ConfirmDialog.show(), and we want the immediate behavior of showing the previous card when ConfirmDialog is used in other confirm cases, like message_reader delete actions.

That change is in this changeset that I added to the pull request:

https://github.com/jrburke/gaia/commit/9eccccd20696abd1ebad30d833ec79201e488719

Unfortunately I forgot I had locally rebased of latest master as of yesterday, so I did a force push to get that change up there. Sorry for any swirl that causes.

In addition to the removeCardAndSuccessors change in that commit, the commit contains:

* resetting more app state in resetApp, fixes the failure case in comment 23.
* add a rate limit on accepting activity triggers. There is a comment in the code as to why. Not technically required, but felt it gave a better overall experience, particularly for the compose case when an email account exists.

While I plan to do a bit more manual testing for other ConfirmDialog cases, I think it should be OK for review to continue.
Comment on attachment 8396161 [details] [review]
GitHub pull request

Looks great.  I think the rate-limiting is a very good idea; that kind of edge-casey churn is no good for anyone.

Technical-debt-wise, I feel like in the future we want to move to a somewhat more explicit state machine type thing to avoid further confusion/etc.  I think you favor a more route-based mechanism (which is state machine enough for me) in general and we're only in this current weird situation because of my initial corner-cutting on that regard, so, uh... I won't stand in your way on that front!
Attachment #8396161 - Flags: review?(bugmail) → review+
Agreed on the state machine-like thing for the future, and hope to consider that more via tickets like bug 995306.

Merged in gaia master:
https://github.com/mozilla-b2g/gaia/commit/e3474caf17e7e2a30e5d99f29dbf3413cb24cc12

from pull request:
https://github.com/mozilla-b2g/gaia/pull/17561

Will work on 1.3 uplift approval comment next, needs to go with bug 969609 landing first.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment on attachment 8396161 [details] [review]
GitHub pull request

[Approval Request Comment]

This bug was flagged as 1.3+, so asking for uplift approval. 

NOTE: bug 985598 needs to be uplifted to 1.3 before this bug. If bug 985598 is not uplifted, this bug should not be uplifted. 

[Bug caused by] (feature/regressing bug #):

Bug 985598, which is 1.3+. The fix in bug 985598 allowed the black screen to happen that caused this bug. This bug rounds out the fix for bug 985598, and also avoids some ugly behavior caused by the blocking confirm() that was previously used in the setup path for an activity.

[User impact] if declined:
If declined, then bug 985598 should not be uplifted to 1.3.

[Testing completed]:

Tested in dev on device, trying the different ConfirmDialog options, and a few activity trigger combinations.

[Risk to taking this patch] (and alternatives if risky):
Main risk is that all ConfirmDialog uses are now cards instead of a special thing outside of the card flow. In general this is a good thing as it allows layers of confirms and fits better with the card flow model in email.

From manual dev testing on device though the ConfirmDialog uses are still working correctly.

Other risk is that we do not correctly reset state for some other scenario we did not encounter. This bug partially exists because we missed one of those for bug 985598. However, we have had more time to consider the reset cases, so feel better about having caught them now.

[String changes made]:

none
Attachment #8396161 - Flags: approval-gaia-v1.3?
Sorry for the silly typo: this bug require bug 969609 to be uplifted before itself. That is what I get for typing both uplift requests at the same time.

So to reiterate: bug 969609 needs to be uplifted before this bug. If that bug is not uplifted, this one should not either, and bug 969609 is the bug that contributed to this issue.
Attachment #8396161 - Flags: approval-gaia-v1.3? → approval-gaia-v1.3+
Keywords: verifyme
[Environment]
Gaia      84f4835833c4cb8ac7e5d9e0c94738b1cb0ef45a
Gecko     https://hg.mozilla.org/mozilla-central/rev/c962bde5ac0b
BuildID   20140421160201
Version   31.0a1
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013

[Result]
PASS
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.