Closed
Bug 1151082
Opened 9 years ago
Closed 9 years ago
[Flame] Request-sync API goes into an infinite loop in the main process if pending messages gets backlogged
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: onelson, Assigned: baku)
References
()
Details
(Keywords: regression, Whiteboard: [3.0-Daily-Testing])
Attachments
(3 files)
746.23 KB,
text/plain
|
Details | |
19.10 KB,
text/plain
|
Details | |
3.66 KB,
patch
|
ehsan.akhgari
:
review+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1122276 +++ Description: If the user attempts to add a Gmail account to the email app after another email account has previously been added, the device will entirely lock up after signing in, through the setup UI. Repro Steps: 1) Update a Flame device to BuildID: 20150403010203 2) Connect to a WiFi or Data network 3) Launch the Email app for the first time 4) Sign into a new account, gmail, and progress to the inbox 5) Tap the menu icon to open the Drawer and select the Gear icon 6) Select 'Add account' 7) Sign into another gmail account and select 'Next' 8) Change the time to update inbox multiple times. Actual: The touch screen, and hardware buttons are unresponsive once the Gmail log-in page loads Expected: User still has full device functionality after the Gmail log-in page loads Repro frequency: 5/5 See attached: video: previous- http://youtu.be/fZfFfGv_hH8 today- https://youtu.be/r5lnCF65ck8 logcat Environmental Variables: -------------------------------------------------- Device: Flame 3.0 BuildID: 20150403010203 Gaia: 7969b367a7da62877c3a24a26d3cb5fda89d766c Gecko: 70a113676b21 Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b Version: 40.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0 -------------------------------------------------- This issue DOES occur on Flame 2.2 The touch screen, and hardware buttons are unresponsive once the Gmail log-in page loads Device: Flame 2.2 BuildID: 20150403002503 Gaia: 022eeb91197ba4a9adfd67bd6db5aa03cc69eb31 Gecko: 77fdc6fbcae9 Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429 Version: 37.0 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 -------------------------------------------------- This issue does NOT occur on Flame 2.1 User still has full device functionality after the Gmail log-in page loads Device: Flame 2.1 BuildID: 20150403001204 Gaia: 4e7d880defb83c25b49462479c02b3e64d21f421 Gecko: e76f5e7a264d Gonk: a814b2e2dfdda7140cb3a357617dc4fbb1435e76 Version: 34.0 (2.1) Firmware: V18D-1 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Reporter | ||
Updated•9 years ago
|
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Comment 1•9 years ago
|
||
Severe functional regression that hardlocks phone. Requesting a window.
blocking-b2g: --- → 2.2?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: regressionwindow-wanted
Updated•9 years ago
|
QA Contact: jmercado
Comment 2•9 years ago
|
||
The log indicates that the email app was still responding to the presses even after the UI failed to update. If I had to guess, it would be that the graphics subsystem got wedged. The only suspicious thing in the log that jumps out at me are some "[Parent][MessageChannel] Error: Channel error: cannot send/recv" messages. I suppose a queue somewhere could have filled up and become blocking? But that's just a wild guess. I would expect that this is probably impacting the phone across the board, so I'm moving this to a more generic component where there might be other similar bugs. I'm not familiar with what the stability component is actually used for, however, so I could be very wrong. Feel free to move somewhere else again. (Just not back to email ;) (I am staying on the cc list in case email is a factor somehow.)
Component: Gaia::E-Mail → Stability
Summary: [Email] Signing in two consecutive Gmail accounts will freeze, lock up the device. → [Flame] Screen stops updating when rapidly switching <select> options after doing various email stuff
Comment 3•9 years ago
|
||
Er, and specifically, for those going log-diving, the log entry that seems to correspond to the UI no longer responding is: 04-03 14:41:04.311 1734 1734 I GeckoDump: LOG: sync interval changed to 1800000
Comment 4•9 years ago
|
||
I can no longer reproduce this issue on the latest central tinderbox build. Oliver can you check the next Nightly build and verify if this is still happening? Issue DOES occur Environmental Variables: Device: Flame 3.0 BuildID: 20150402033937 Gaia: f37be8b44cb7c3a147b9615ab76743b760f08eeb Gecko: cf8864126c58 Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b Version: 40.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0 Issue does NOT occur Environmental Variables: Device: Flame 3.0 BuildID: 20150405145459 Gaia: ef61ebbe5de8c2c9fc2a8f74a12455044c3b82e9 Gecko: 4fe763cbe196 Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b Version: 40.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(onelson)
Flags: needinfo?(ktucker)
Keywords: regressionwindow-wanted
Updated•9 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage-]
Flags: needinfo?(ktucker)
Comment 5•9 years ago
|
||
Please re-nom if this happens on 2.2 after further testing.
blocking-b2g: 2.2? → ---
Reporter | ||
Comment 6•9 years ago
|
||
Issue REPROES in latest nightly for 3.0 and 2.2 flame devices: Device: Flame 3.0 BuildID: 20150407010204 Gaia: c710bac533b76635161315bf907d004e000549cb Gecko: ab0490972e1e Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b Version: 40.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0 Device: Flame 2.2 BuildID: 20150407002501 Gaia: 5e09637414269728f6f1bc0152d0160f3b6b380e Gecko: 245f37f44017 Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429 Version: 37.0 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
QA Whiteboard: [QAnalyst-Triage-] → [QAnalyst-Triage?]
Flags: needinfo?(onelson) → needinfo?(pbylenga)
Comment 7•9 years ago
|
||
Adding qawanted to check this again on 2.2 per request.
Comment 8•9 years ago
|
||
I was also able to reproduce this issue on the latest 2.2 Flame build. Of note, I had to add a third email account before I could get this issue to reproduce. So if it does not seem to be reproducing with the second try completing it and adding more. Environmental Variables: Device: Flame 2.2 BuildID: 20150407002501 Gaia: 5e09637414269728f6f1bc0152d0160f3b6b380e Gecko: 245f37f44017 Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429 Version: 37.0 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
Flags: needinfo?(ktucker)
Keywords: qawanted
Comment 9•9 years ago
|
||
Nominating 2.2? based on Comment 5 since we could reproduce this on 2.2.
blocking-b2g: --- → 2.2?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Updated•9 years ago
|
Keywords: regressionwindow-wanted
Updated•9 years ago
|
Flags: needinfo?(pbylenga)
Comment 10•9 years ago
|
||
Bug 1098289 seems to have caused this issue Central Regression Window: Last Working Environmental Variables: Device: Flame 3.0 BuildID: 20150211132949 Gaia: e280a660955bbdab265d50f8d9e009de34082332 Gecko: 650863f6987b Version: 38.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0 First Broken Environmental Variables: Device: Flame 3.0 BuildID: 20150211174239 Gaia: d5a71cedb37dd45f439f672489db3994b349ac43 Gecko: 3778ff309988 Version: 38.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0 Last Working gaia / First Broken gecko - Issue does NOT occur Gaia: e280a660955bbdab265d50f8d9e009de34082332 Gecko: 3778ff309988 First Broken gaia / Last Working gecko - Issue DOES occur Gaia: d5a71cedb37dd45f439f672489db3994b349ac43 Gecko: 650863f6987b Gaia Pushlog: https://github.com/mozilla-b2g/gaia/compare/e280a660955bbdab265d50f8d9e009de34082332...d5a71cedb37dd45f439f672489db3994b349ac43
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•9 years ago
|
Keywords: regressionwindow-wanted
Comment 11•9 years ago
|
||
James, can you take a look at this please? This might have been caused by the work done for bug 1098289.
Blocks: 1098289
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(jrburke)
Comment 13•9 years ago
|
||
I reproduced and attached gdb. We're ending up in a tight-loop in some JITted code in the main thread. Unfortunately gdb can't figure out how to unwind the stack so I'm not sure if it's email or the request-sync logic, but I'd guess email since any loops should happen in the main process. I'll take a quick dive into the code and maybe instrument a little if it's not obvious.
Comment 14•9 years ago
|
||
Oh, whoops, but it was the main process that freaked out and that I attached gdb to, not the email process. So I'm guessing request-sync after all. Which makes sense from what we saw from the provided logs and how email was still able to timeout its IMAP connection.
Comment 15•9 years ago
|
||
Okay, yeah, this is a requestsync bug. pendingOperationDone has the following loop inside it: while (this._pendingMessages.length) { this.receiveMessage(this._pendingMessages.shift()); } This assumes that the receiveMessage call always consumes at least one entry from the list. But it doesn't because: // We cannot process this request now. if (this._pendingOperation) { this._pendingMessages.push(aMessage); return; } And you'd think that's safe because pendingOperationDone does the following right before its loop: this._pendingOperation = false; But of course, if receiveMessage ever does anything that sets that flag, infinite loops happen. I turned on the debug in the file and indeed we see a pattern like this: 04-09 06:50:40.799 I/Gecko ( 5921): DEBUG RequestSyncService: pendingOperationDone 04-09 06:50:40.799 I/Gecko ( 5921): DEBUG RequestSyncService: receiveMessage 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: unregister 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: removeRegistration 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: removeRegistrationInternal 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: dbTxn 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: pendingOperationStarted 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: receiveMessage 04-09 06:50:40.809 I/Gecko ( 5921): DEBUG RequestSyncService: receiveMessage * infinity I'm going to attach a larger log excerpt now too in case a larger debug trace proves useful for better understanding the shenanigans email is getting up to in its use of the API. Note that I'm just moving this bug to Core::DOM because the bug is in requestsync, the steps to reproduce hold, and I don't have the ability to mark a new bug in Core::DOM as a blocker. (And last time we left a non-actionable blocker in email dependent on a platform blocker, everyone got confused.)
Component: Gaia::E-Mail → DOM
Flags: needinfo?(jrburke) → needinfo?(amarchesini)
Product: Firefox OS → Core
Summary: [Flame] Screen stops updating when rapidly switching <select> options after doing various email stuff → [Flame] Request-sync API goes into an infinite loop in the main process if pending messages gets backlogged
Comment 16•9 years ago
|
||
Note that I think all that's required for email is to have 2 mail accounts configured for periodic sync and then to just keep changing the select value aggressively until the RequestSyncService falls behind enough to backlog and hit the infinite loop case. No need to ever add a 3rd account, etc.
Assignee | ||
Comment 17•9 years ago
|
||
Andrew, thanks for the debugging!
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 18•9 years ago
|
||
Attachment #8590226 -
Flags: review?(ehsan)
Updated•9 years ago
|
Attachment #8590226 -
Flags: review?(ehsan) → review+
Assignee | ||
Comment 19•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=468f3144249b checkin-needed if green.
Keywords: checkin-needed
Comment 20•9 years ago
|
||
Thanks for the fast turnaround on the patch and the test! And major thanks to :oliverthor, :JMercado, :PBylenga, :KTucker and friends for uncovering this and doing the invaluable bisection.
Assignee | ||
Comment 21•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/0097fbcadabc
Keywords: checkin-needed
Comment 22•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/0097fbcadabc
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox40:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment 23•9 years ago
|
||
Since the email app in b2g 2.2 uses requestsync, this also seems to need approval-mozilla-b2g37? set on it to get it landed for that branch?
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8590226 [details] [diff] [review] requestSync.patch NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): 1018320 User impact if declined: a infinite loop Testing completed: mochitest is included Risk to taking this patch (and alternatives if risky): no risks String or UUID changes made by this patch: none
Flags: needinfo?(amarchesini)
Attachment #8590226 -
Flags: approval-mozilla-b2g37?
Updated•9 years ago
|
Attachment #8590226 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment 25•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/db82a837c490
status-firefox38:
--- → wontfix
status-firefox39:
--- → wontfix
Comment 26•9 years ago
|
||
This issue seems to be fixed in the latest 3.0 and 2.2 builds. Actual Results: The phone does not freeze up when rapidly switching the update time. Environmental Variables: Device: Flame 3.0 BuildID: 20150417010203 Gaia: 3cd0a9facce26c2acc7be3755a17131a6358e33f Gecko: 51e3cb11a258 Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b Version: 40.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0 Environmental Variables: Device: Flame 2.2 BuildID: 20150417002504 Gaia: d50b8a3919a7b4d8d289f150d3b9bed704ebafa9 Gecko: 5ebf32030512 Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429 Version: 37.0 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•9 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•