Closed Bug 1206156 Opened 9 years ago Closed 9 years ago

[email] Make requestGrowth not throw on misuse to avoid breaking vscroll state machine

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect, P1)

defect

Tracking

(blocking-b2g:2.5+)

RESOLVED FIXED
blocking-b2g 2.5+

People

(Reporter: wilsonpage, Assigned: asuth)

Details

(Keywords: foxfood, Whiteboard: [bzlite])

Attachments

(6 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Mobile; rv:43.0) Gecko/43.0 Firefox/43.0

Not sure how to reproduce. I got.into this state after I tapped on an email notification and then pressed the back button.
Attached file dev-log-main.log
Attached file properties.log
Attached image screenshot.png
Blocking 2.5 for this defect. P1 priority.
blocking-b2g: --- → 2.5+
QA Whiteboard: [foxfood-triage]
Component: Gaia::Feedback → Gaia::E-Mail
Priority: -- → P1
I suspect this bug may be moot.  This bug is reported against build 20150812231434 which definitely does not have the custom elements fix of bug 1176829 which could potentially be involved.  However, it seems like our failure is in message_list which was already on the card-stack, so maybe it's not that bug.  But then the refactoring changes on bug 1196077 landed after this build and may have corrected any state machine failures.

So I think we want a reproduction on a current build to identify that this still happens and get a logcat of what's going on.  If we can't reproduce, we resolve WFM.

From the log Wilson provided (thank you!), I think these are basically the steps to reproduce:

1) Have the email app configured with a single IMAP account with a small number of messages in the inbox.  In this case, there were only seven messages in the inbox.  (I believe this is significant to the state machine and the "grow" errors generated that may have contributed to things breaking.)

2) Have the email app doing periodic sync.  Using the secret debug mode (https://wiki.mozilla.org/Gaia/Email/SecretDebugMode#Getting_to_the_Secret_Debug_Menu) an option can be added to the menu to set the sync interval to 2 minutes/120 seconds to make it go faster.  Alternately, the https://github.com/jrburke/requestsyncmanager app can be sideloaded to manipulate things.

3) Have the email app active in the background with the inbox displayed (probably).  (So, do not close the email app, and do not test on a 319M device where the email app will almost certainly be killed.  Test on a 1gig device or higher.)  Basically, just hit the home button.

4) Send a single email to the account so that when email performs periodic sync, it will generate a notification that you can tap on and that will bring you directly to the message reader.

5) When the notification triggers and the message reader displays, tap the back button.

6) If the message list is displaying the "loading" fake-messages whose contents are just bars, like in the screenshot, then the bug has reproduced.
Keywords: qawanted
Do you have logcat logs you can provide from your reproduction?  (Also, you may be reproducing a different bug, especially since ActiveSync is involved rather than IMAP; your problem sounds like a hung sync, possibly involving mutex loss.  Wilson's logs indicated no hangs or mutex losses.)
Flags: needinfo?(sleedavid)
Setting qawanted to get the log requested in comment 8 using the steps from comment 7.
QA Whiteboard: [foxfood-triage?] → [foxfood-triage+]
Flags: needinfo?(sleedavid)
Flags: needinfo?(jmercado)
Keywords: qawanted
QA Whiteboard: [foxfood-triage+] → [foxfood-triage?]
QA Whiteboard: [foxfood-triage?] → [foxfood-triage]
I was able to reproduce an empty inbox following steps in comment 7.  Hopefully this log will help.
QA Whiteboard: [foxfood-triage] → [foxfood-triage][QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
Attachment #8663942 - Attachment is obsolete: true
Unfortunately that log was fairly boring.  It looks like the "Diag_Lib" logger is filling up the logcat or something; there was less than 20 seconds of data in the logcat.  And although that included a fresh email run, nothing interesting happened there because email chose to not automatically refresh because it had synced sufficiently recently.

To avoid confusion about the STRs, I'm marking comment 7 obsolete since it probably is a different bug (maybe just device id exhaustion that will clear itself tomorrow?).

Anywho, this is a tricky one and it can be handy for me to be able to inspect state when it reproduces, so I'll take over reproducing this for now.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
QA Whiteboard: [foxfood-triage][QAnalyst-Triage?] → [foxfood-triage][QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Hey Andrew, 

Any luck reproducing this and getting a fix? Its a bad P1 that's a must fix for 2.5.

Thank you!
I've been trying to reproduce in the background while dogfooding, but so many key features of the aries builds have been completely unreliable (can't hit "ok" after entering a wifi password, already-configured wifi doesn't reconnect, the requestsync API gives up on doing its job after a few cycles, etc.), it petered out pretty quick.  I'm spinning a build now so I can investigate why requestsync is unreliable (which is arguably the bigger problem) and will actively work that and then this.
I've re-created the triggering scenario (as I understand it) several times and a few variations on it and am unable to reproduce.

Since that's not entirely reassuring on its own, I also did some code inspection of the code as it existed for comment 0.  I found that in general that things should be self-healing; barring permanent mutex-lossage (which the log did not indicate), attempting to scroll the pane should result in new requests being issued and things like the "scrollStopped" event to fire (even in the face of a jumpy clock).  Having said that, I think there are two ingredients to the breakage:
1) Something was spamming the messages_complete event handler.  I'm thinking the model refactor as part of bug 1196077 probably made this stop happening.
2) The "Error: Already growing in 7 dir." is a side-effect of that spamming of messages_complete.  There's a call to loadNextChunk if `this.desiredHighAbsoluteIndex` in there which calls that can-explode requestGrowth method.  If that call explodes, `this.desiredHighAbsoluteIndex` will not be cleared and all subsequent calls will keep failing to get down to the call to `this.vScroll.updateDataBind` which could explain the failure to update the DOM.  If the timing works out right, we can end up never making it to that call.  It is likely that changing folders away and back would have "healed" the problem.

As noted, I think point 1 is already addressed, but fixing point 2 makes point 1 moot.  The `throw` in requestGrowth is one of those misuse assertions that made sense in unit tests but results in issues like this and where we have now more properly adopted use of `logic.fail()`.  I'll attach a patch momentarily that does this.  I believe that this will alleviate the problem if the root cause (point 1) recurs.
Flags: needinfo?(bugmail)
Summary: [email] Inbox blank → [email] Make requestGrowth not throw on misuse to avoid breaking vscroll state machine
Attachment #8672156 - Flags: review?(jrburke)
Attachment #8672156 - Flags: review?(jrburke) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: