Closed Bug 1008046 Opened 11 years ago Closed 10 years ago

[Sora][Telefonica][Spain] After sometime corporate email stops synchronizing

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect, P1)

defect

Tracking

(blocking-b2g:1.3+, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed, b2g-v2.1 fixed)

RESOLVED FIXED
2.0 S5 (4july)
blocking-b2g 1.3+
Tracking Status
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: sync-1, Assigned: jrburke)

References

Details

(Keywords: meta, Whiteboard: [cert])

Attachments

(4 files)

39.44 KB, image/png
Details
40.24 KB, image/png
Details
54.23 KB, image/png
Details
172.67 KB, application/x-zip-compressed
Details
Mozilla Build ID: 20140422024003 FFOS 1.3 Defect description: =================== After sometime corporate email stops synchronizing Steps to reproduce: =================== 1. Configure corporate account 2. Set corporate account sync to 10 min. 3. Hotmail, Gmail and yahoo.es accounts also configured and set sync to 30 min. Current result: =============== Current status is that last sync. performed for corporate account was performed 23 hours ago. If performing a manual sync. nothing happens we are not seeing data activity on status bar but sync icon on the left bottom of the email screen never stops. However, rest of accounts have been synchronized Expected result: ================ Corporate email should sync data according to sync value set on settings or manually correctly. See attached screenshots and internal logs.
Attached image screenshot3
Attached image screenshot1
Attached image screenshot1
Attached file logcat log
You can download QXDM logs from here: http://we.tl/h5s4i17w9m or long URL: https://www.wetransfer.com/downloads/59efc85dcd00fe50be85dcc9d5e6308b20140508110859/86a672 Note that under this scenario we can not send any email from corporate email and email app remains all the time in "sending message" after trying to send message. Additionally, there is no option to exit from this screen and try to access to another email account unless powering off/on device.
Probably need adb log as well, thanks!
Flags: needinfo?(sync-1)
(In reply to Vance Chen [:vchen][vchen@mozilla.com] from comment #6) > Probably need adb log as well, thanks! The adb log in attachment 8419877 [details].
Thank you for the logcat! It appears that the failure that broke periodic sync occurred prior to the start of the provided log. Because of the many accounts used and deficiencies in our logging, it's hard to tell from the log for sure, but it looks like we might have real problems with the corporate email server in use. From comment 5 and the log it seems like you might not be aware that you can close the email app without restarting the phone by holding down the physical home button on the device. This brings you to the task (sometimes call "card") view where you can swipe upwards on the email app's card to close it or click the 'x' button overlayed in its corner. This also potentially serves as the workaround. If there is a direct server problem, then we will either need a logcat that captures the failure when it happens or credentials with which to try and locally reproduce the problem. You can mail me credentials at asuth@mozilla.com, but please only do this for a testing account; I/we do not want real user credentials. Otherwise the main actionable thing from the bug is that we can hack in a fail-safe cronsync timeout that forces the app to close itself in the event that a cronsync does not complete in a reasonable fashion. Note that if you had rebooted the device within the past 23 hours or otherwise closed the app and were still unable to sync the account, that does suggest some persistent problem communicating with the server. It would be worth checking if there is a server outage or some problem with the credentials. Setting needinfo back to the reporter to indicate there is nothing directly actionable we can do about this bug at the current time without more info, especially if we are talking about uplifting something all the way back to v1.3. ==== Interesting stuff in the log (attachment 841987) == Double resolved promises: There are 2 of these in there: ERR: onerror reporting: Error: nope @ app://email.gaiamobile.org/js/mail_app.js : 110 These are from pre-0.0.5 versions of prim in alameda that would throw an Error('nope!') if you attempted to resolve/reject and already resolved/rejected promise. This occurred on the main page (rather than in the worker), and I don't see any use from our JS there. I notice from the screenshots that some seem to be using en-US and some are using Spanish/Portuguese, so I think this is probably a result of an l10n notification firing in such a way that tries to re-trigger an already resolved promise. This is sorta harmless; we already have a number of bugs relating to not properly relocalizing on the fly, so the loss of some functionality there is not a huge deal. == successful cronsync for other accounts (less frequent sync interval) An IMAP, a POP3, and an ActiveSync (hotmail.com) successfully complete. 05-08 12:05:57.049 I/GeckoDump( 1423): LOG: cronsync-main: wake locks acquired: [object MozWakeLock] for account IDs: 2,1,3 05-08 12:05:57.059 I/GeckoDump( 1423): LOG: email oncronsyncstart: 2,1,3 05-08 12:05:59.949 I/GeckoDump( 1423): LOG: email oncronsyncstop: 2,1,3 05-08 12:05:59.969 I/GeckoDump( 1423): LOG: email: clearing wake locks for "id2 1 3" == failed account 0 cronsyncs None of these ever complete, they just time out. It's not clear if they start at all; our ActiveSync protocol is not very chatty via console.log. It's very possible a mutex broke long before. 05-08 11:52:54.155 I/GeckoDump( 1423): LOG: cronsync-main: wake locks acquired: [object MozWakeLock] for account IDs: 0 05-08 11:52:54.165 I/GeckoDump( 1423): LOG: email oncronsyncstart: 0 05-08 11:53:39.165 I/GeckoDump( 1423): LOG: email: clearing wake locks for "id0" 05-08 12:02:54.239 I/GeckoDump( 1423): LOG: cronsync-main: wake locks acquired: [object MozWakeLock] for account IDs: 0 05-08 12:02:54.279 I/GeckoDump( 1423): LOG: email oncronsyncstart: 0 05-08 12:02:54.379 I/Gecko ( 1423): WLOG: cronsync: received an syncEnsured via a message handler 05-08 12:12:54.379 I/GeckoDump( 1423): LOG: cronsync-main: wake locks acquired: [object MozWakeLock] for account IDs: 0 05-08 12:12:54.389 I/GeckoDump( 1423): LOG: email oncronsyncstart: 0 05-08 12:13:39.379 I/GeckoDump( 1423): LOG: email: clearing wake locks for "id0" 05-08 12:22:54.509 I/GeckoDump( 1423): LOG: email oncronsyncstart: 0 (manual syncing for an IMAP account) 05-08 12:23:39.499 I/GeckoDump( 1423): LOG: email: clearing wake locks for "id0" == hotmail sync? There is an indication of a successful ActiveSync sync in there... 05-08 12:23:53.209 I/Gecko ( 1423): WLOG: Sync completed: added 2, changed 0, deleted 0 05-08 12:23:53.209 I/Gecko ( 1423): WLOG: Sync Completed! 2 messages synced But below that we see fetches that are clearly on account 2, so it's probably the hotmail account: 05-08 12:23:59.659 I/Gecko ( 1423): WLOG: queueOp 2 downloadBodies 05-08 12:23:59.659 I/Gecko ( 1423): WLOG: runOp(do: {"type":"downloadBodies","longtermId":"2/2","lifecycle":"do","localStatus":"done","serverStatus":"doing","tryCount":0,"humanOp":"downloadBodies","messages":[{"s) == sync key failure It's unclear which account is saying this: 05-08 12:17:26.559 I/Gecko ( 1423): WERR: Unable to get sync key for folder 05-08 12:17:26.559 I/Gecko ( 1423): WLOG: Sync Completed! undefined messages synced The preceding context indicates that we did a lot of account/folder switching: 05-08 12:17:15.659 I/GeckoDump( 1423): LOG: Preloading cards: settings_main,account_picker One possible interpretation of this is that a new folder was selected whose mutex was not broken. The implication is then that the server is either broken or otherwise responding in a way that we don't know how to understand. The "Unable to get sync key for folder" error is only generated if we perform a successful postCommand but the WBXML that comes back either fails to parse or does not contain a SyncKey.
Dear Andrew: After killing email app and reopen it again corporate email have been synchronized,so it is not a direct server problem. If performing a manual sync,always show loading message,i think it may be bug 839273. It is a telefonica block,please help to fix.
blocking-b2g: --- → 1.3?
Flags: needinfo?(sync-1)
Vance - Can you find out if this is a cert blocker?
Flags: needinfo?(vchen)
Hi Jason, it's IOT blocker.
Hi Jack - In order to further investigate this bug, we still need logs that captured while the sync issue happens, otherwise we have no idea why there is a sync problem. Could you help to manage to get the logs we need? Thanks Vance
Flags: needinfo?(vchen) → needinfo?(liuyongming)
We're going to need the logs to move forward, but we do already know we can block on this per comment 11.
blocking-b2g: 1.3? → 1.3+
Whiteboard: [cert]
Flags: in-moztrap-
Dears, we are still waiting local team to reproduce and catch logs, will upload logs asap when we got. Thanks.
Flags: needinfo?(liuyongming)
(In reply to Jack Liu from comment #14) > Dears, we are still waiting local team to reproduce and catch logs, will > upload logs asap when we got. > > Thanks. Any progress on getting the logs? Noticed this bug hasn't moved in a couple days. Thanks
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Target Milestone: --- → 2.0 S3 (6june)
Target Milestone: 2.0 S3 (6june) → 2.0 S4 (20june)
Without logs I think this reduces to a combination of the following: - root cause: bug 825538 about us getting confused if multiple devices are using the same id. - source of failure: bug 1009422 about us dying if we fail to download a body part, probably because it doesn't exist because of the former bug. - bug 1018828 about us not having a cronsync failsafe mode to recover from problems like the second problem. Please keep in mind that this is still somewhat guesswork without logs. I am propagating 1.3? across all of these bugs. However, it's worth noting that the bug 825538 root cause is likely to only occur in QA testing situations, at least if we assume most people will own at most one FxOS device.
Assignee: bugmail → nobody
Status: ASSIGNED → NEW
Depends on: 825538, 1009422, 1018828
Target Milestone: 2.0 S4 (20june) → ---
(In reply to Andrew Sutherland [:asuth] (important r/f/ni reqs only thru Jun 30) from comment #16) > - bug 1018828 about us not having a cronsync failsafe mode to recover from > problems like the second problem. cut-and-paste fail: I meant bug 1025727 Corrected: - root cause: bug 825538 about us getting confused if multiple devices are using the same id. - source of failure: bug 1009422 about us dying if we fail to download a body part, probably because it doesn't exist because of the former bug. - bug 1025727 about us not having a cronsync failsafe mode to recover from problems like the second problem.
Depends on: 1025727
No longer depends on: 1018828
Assigning to James who is going to continue communication on this bug while asuth is on PTO. Thanks James!
Assignee: nobody → jrburke
Marking this as a meta bug -- per comment #17 we expect the dependent bugs to address this issue.
Keywords: meta
Target Milestone: --- → 2.0 S5 (4july)
All dependencies are fixed, so closing this out.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: