Closed Bug 839273 Opened 11 years ago Closed 9 years ago

[email] offline moves can break sync of target folder

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:+, b2g-v2.1 affected, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S4 (23jan)
tracking-b2g +
Tracking Status
b2g-v2.1 --- affected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: nhirata, Assigned: mcav)

References

Details

(Keywords: regression, verifyme, Whiteboard: [priority][p=8])

Attachments

(7 files, 1 obsolete file)

Attached file logcat
## Environment :
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/4518b78583d0
Gaia   7e54ca673277b20b1d91d18477dc44d6ad226761
BuildID 20130207070202
Version 18.0
Unagi
  
## Repro :
1. launch email 
2. set up gmail account
3. open an email and mark it then move the file to starred
4. go to the starred directory
5. go back to inbox

## Expected :loading messages will finish

## Actual :stuck at the loading messages

## Note :
1. regression from bug 822327
2. killing email and then going back won't fix it
3. rebooting b2g will fix it.
Sorry for the extra churn in bug 822327.
This is actually disconcertingly similar to bug 822327!  Many thanks for the log; this would be impossible to diagnose without it.

Log indicates:
- E-mail app gets started up, 4 ops are loaded from disk/run: move 'x', star 'y', move 'y', star 'y'.  restored 'op' had only the "move 'x'" having been locally applied; its 'check' operation ran.  The rest were local ops being run.  The implication is that the mutex for the folder was being held for a while.
- IMAP to gmail spun up for sync, sync in progress
- The 2nd star 'y' fails, local_do_modtags fails because the header for 'y' is null.
- sync in 2nd folder apparently started while 1st sync is ongoing
- 2nd IMAP connection to gmail spun up
- IMAP error where we try and do something with a message without specifying a UID.

The failure of "star 'y'" makes complete sense since the local operation for "move 'y'" will have moved the header away from the original location.  Because we already maintain renaming/move maps for operations, we probably should just be transforming id's using the same logic we apply for server operation purposes.  The fundamental problem here is that local operations were assuming they would run to completion faster than the user could do anything, but this is not true given how long we can end up holding mutexes.

The IMAP UID error is indicative of a move (perhaps move 'x') having speculatively moved a header to the target folder without having run the server operation to completion.  This is problem and solution #1 as described here that have not yet been implemented; this should almost be its own bug except the two bugs are going to correlate very highly:
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/data/lib/mailapi/imap/jobs.js#L445
See Also: → 822327
Summary: Mark an email, place it in starred, go to starred, go back to inbox and you will load messages forever → [email] operations performed on a message in the same folder after it is moved but before the UI updates may break sync
Attached file logcat 2
I think I found another case of this, or it may be the same:
1) setup gmail
2) move an email in the inbox to the inbox
3) switch folders
4) switch back to the inbox

Result: keep getting the loading messages (waited 5 minutes)

Second logcat attached.
blocking-b2g: leo? → leo+
tracking-b2g18: ? → ---
Taken and studying ;)
Assignee: nobody → alive
(In reply to Alive Kuo [:alive] (PTO during 3/27~4/7) from comment #4)
> Taken and studying ;)

Cool!  We've created the #gelam channel (gaia-email-libs-and-more) on IRC as a place for those hacking on the email back-end to discuss technical issues, ask questions, etc.  Please feel free to join!
Status update:
I read some mailapi.js code today, but I don't have a clue how to do the fix yet.
Based on that I currently have a more emergent thing to do(audio channel API change, the LOE is still unknown now. could be tef+ soon), I am ni? from yuren to help on this issue.

I will still monitor bugs on email and try to take while I am |tef+|-bug-free, email app/calendar app is the most complex+interesting app and thus I am thirsty to understand them ;)
Flags: needinfo?(yurenju.mozilla)
not sure I have enough time to take it, I still study how to run & write unit test for email library and I have some pending tasks for build system...
Flags: needinfo?(yurenju.mozilla)
Stealing... this bug effects my lazy body/snippet work (and I think I have a quick fix)
Assignee: alive → jlal
Comment on attachment 728826 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/8787

r+ by asuth on GELAM
Attachment #728826 - Flags: review+
Comment on attachment 728826 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/8787

After doing my device testing, this does not fix the entire problem... it looks like there are two bugs and while my patch will likely fix some other errors (or race conditions) it does not fix _this_ bug... I will make a new bug that blocks this to land that work.
Attachment #728826 - Flags: review+
Depends on: 854292
back to nobody- I fixed part of this but not the whole thing.
Assignee: jlal → nobody
(In reply to James Lal [:lightsofapollo] from comment #11)
> Comment on attachment 728826 [details]
> Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/8787
> 
> After doing my device testing, this does not fix the entire problem... it
> looks like there are two bugs and while my patch will likely fix some other
> errors (or race conditions) it does not fix _this_ bug... I will make a new
> bug that blocks this to land that work.

Sending back to Alive who returns from PTO, Monday, 2013.04.08.
Assignee: nobody → alive
Deassign self cuz I am being told to release some blocker.
Assignee: alive → nobody
Whiteboard: c=
Is this still occurring?
Keywords: qawanted
Whiteboard: c= → c= LOE:4d
Whiteboard: c= LOE:4d → c= LOE:1w
Unable to reproduce on Inari Build ID: 20130624070215

Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/9c62297d11b0
Gaia: 93241eb6c5d6c110710fad8da3ccd4423312b0c9
Platform Version: 18.0

Also unable to reproduce on Unagi Build ID: 20130624070224

Environmental Variables
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/62049a1c5d36
Gaia: 885b874029d8143035765ae903c147a9415fcc52
Platform Version: 18.1
Hi James,  This seems to work for me as well:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/62049a1c5d36
Gaia   885b874029d8143035765ae903c147a9415fcc52
BuildID 20130624070224
Version 18.0

If this works for you, should we close it off as WFM?
Flags: needinfo?(jlal)
We haven't actually addressed any of the problems in comment 2.  Bug 822327 was fixed which addresses a lot of the surface area for breakage here, but the STR from comment 0 and comment 3 do not involve putting the device offline and are inherently race-prone depending on the network connection.  Some of the problems cannot be manually tested easily because it depends on the back-end falling behind on executing 'local' operations because of a long-running network task or something else holding the mutex for longer than it should.

The problems that still will exist are:

1) The loss of user actions.  If the user does: [move 'y', star/flag 'y'] where the star happens on the message in its origin folder, the star operation will never take place.  Specifically, we will go to run the operation, fail to find the database header, but because of bug 822327 we won't die.

2) Breakage of synchronization of the moved-to folder by the presence of a speculatively moved offline header by the 'local' move operation.  The breakage may be temporary depending on whether the e-mail app was closed (and the database saved) or not and whether the 'local' operation ran to completion or not.

If we restore the operation from disk, then because we punted on completing our implementation of the move operation in its entirety, the operation will be canceled, leaving the offline header in place and breaking us.


Problem '1' cannot be adequately tested manually.

The steps to reproduce for '2' are:
- Set device offline.
- Move a message from one folder to another.
- Switch to the target folder and verify the message to moved to the target folder.
- Exit the e-mail app.
- Go online.
- Enter the target folder.
- Witness that sync breaks in some fashion.

We can spin these off into distinct bugs that aren't this bug if we want.
Thanks for clarifying the issue.  I was able to reproduce with these steps.
Clearing qawanted and needinfo based on comment 18 and comment 19
Flags: needinfo?(jlal)
Keywords: qawanted
Passing to Dylan to help find an assignee here.
Assignee: nobody → doliver
Summary: [email] operations performed on a message in the same folder after it is moved but before the UI updates may break sync → [email] offline moves can break sync of target folder
Not a regression, no partner blocker, no external feedback, no recent repros internally. Let's fix for 1.2 or later.

Will reconsider if any of that changes.
blocking-b2g: leo+ → -
Target Milestone: --- → 1.2 FC (16sep)
Assignee: doliver → nobody
Tagging this bug to put it in the productivity backlog
Target Milestone: 1.2 FC (16sep) → ---
blocking-b2g: - → backlog
Whiteboard: c= LOE:1w → c= LOE:1w [priority][p=8]
Whiteboard: c= LOE:1w [priority][p=8] → [priority][p=8]
[priority] --> tracking-b2g:+ conversion
tracking-b2g: --- → +
Assignee: nobody → m
This patch resolves the problem as seen when following the steps in Comment 18. As this bugzilla bug has tracked a number of different issues, let me know if there's another facet of this that I've missed.

Using an outdated local B2G build, the tests pass locally.
Attachment #728826 - Attachment is obsolete: true
Attachment #8550393 - Flags: review?(bugmail)
Comment on attachment 8550393 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/357

Looks great, thanks!  It will be great to have this on v2.2!
Attachment #8550393 - Flags: review?(bugmail) → review+
[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): bug 822327 why not
[User impact] if declined: Possible to temporarily or permanently break the sync of a target folder of a move/copy/deletion.  Permanence depending on various factors.
[Testing completed]: High quality unit tests and reviews and a quest for quality that won't quit.
[Risk to taking this patch] (and alternatives if risky):  Low risk.  We're filtering out an argument that we know makes servers angry and breaks sync.
[String changes made]: No string changes made.
Attachment #8552146 - Flags: review+
Attachment #8552146 - Flags: approval-gaia-v2.2?
Attachment #8552146 - Flags: approval-gaia-v2.2? → approval-gaia-v2.2+
Keywords: verifyme
Issue still occurs on Flame 2.2 and Flame 3.0

Following STR from comment 18, disabling cell data and wifi, then moving an email message from inbox to any other folder shows that the email successfully moved. The message appears in the folder. But when the email app is closed, data is turned back on and the app re-opened, the email still shows in the folder on device, but remains in inbox when account is checked on computer.

Email message movement made when phone is offline are never synced. Moving emails to different folders with data connection enabled syncs with email server correctly. Multiple emails moved to multiple different folders were tested in both Gmail and Outlook accounts.

Device: Flame 2.2
BuildID: 20150122002808
Gaia: 237008137f6d72b9cad25ff4faff14ff2c40ac63
Gecko: 4a90da67661e
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 37.0a2 (2.2)
Firmware: V18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Device: Flame 3.0
Build ID: 20150122010203
Gaia: 917b6c36717fddc6e71ffc1ec249633c8044c93c
Gecko: 34e2d2bd7ec4
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage?][failed-verification]
(In reply to Brogan Zumwalt [:BroganZ] from comment #32)
> Issue still occurs on Flame 2.2 and Flame 3.0

Any chance you have the logcats from these around?  And ideally the timestamp of when you checked the online account, including any refresh, as well as confirmation of the client you used to verify.  And for gmail, we definitely want to know the folders in play since gmail's folder semantics are atypical.
Flags: needinfo?(bzumwalt)
QA Whiteboard: [QAnalyst-Triage?][failed-verification] → [QAnalyst-Triage+][failed-verification]
Flags: needinfo?(ktucker)
Attached file b2g38_logcat
Tested again today on Flame 3.0 with same results. Attached logcat from moving email message to junk folder in gmail (b2g38_logcat)

Timestamps and environmental variables below:

Device: Flame 3.0 Master
Build ID: 20150123010227
Gaia: cba2f0bf49b882e0044c3cc583de8fcf83d2ffa4
Gecko: 494632b9afed
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0


Gmail Junk Folder Sync On Device: 9:48:38
Gmail Junk Folder Check On Computer: 9:49:10

Gmail New Custom Folder Sync On Device: 9:50:50
Gmail New Custom Folder Check On Computer: 9:51:30

Outlook Junk Folder Sync On Device: 9:55:56
Outlook Junk Folder Check On Computer: 9:56:21

Outlook New Custom Folder Sync On Device: 9:58:11
Outlook New Custom Folder Check On Computer: 9:58:42

Client Used for Computer Check: Google Chrome Version 40.0.2214.91
Flags: needinfo?(bzumwalt) → needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage+][failed-verification] → [QAnalyst-Triage?][failed-verification]
QA Whiteboard: [QAnalyst-Triage?][failed-verification] → [QAnalyst-Triage+][failed-verification]
Flags: needinfo?(ktucker)
It looks like the server op never gets run, even though we thought we were online. The promise thing looks scary but seems maybe unrelated.
Ah, that's bug 804904.  We're running a "check" operation which is hardcoded to declare the operation moot.

Because we'll now never purge the speculative local header, we probably reallllly want the "check" part for v2.2.  (Undo is not needed currently.)  Also, given the availability of the IMAP MOVE operation, it might make sense to use that when available as part of the same thing.  Or not.

The summary for verification is that the STR for 18 are sufficient to previously reproduce the specific problem but are not suitable for verification of this specific fix.
Hi Reporter and Andrew,
   On Flame 2.2,according to Comment 18,We can't make sure that email is moved to other folder.It will prompt "No mail in this Folder" when we check it after moving one email to it. Can you help with this? Thanks!

Attachments:logcat_0105.txt,0105.mp4
Time:01:05

Note:
If skip STR3 in Comment 18,the sync action will be finished and prompts "No mail in this Folder".

Flame 2.2 build:

Gaia-Rev        80d5b797fd0497a7e3337b7798a21b2e1219681a
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/01bf1516a65b
Build-ID        20150127002504
Version         37.0a2
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20150127.035842
FW-Date         Tue Jan 27 03:58:52 EST 2015
Bootloader      L1TC000118D0
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(bugmail)
Attached video 0105.MP4
Attached file logcat_0105.txt
Looks like your video is corrupt.  Did you have email in the other folders?
Flags: needinfo?(nhirata.bugzilla) → needinfo?(liuyong)
Hi Naoki, 
   There have emails in other folders, but there hasn't any email in the "Unread" fold. I will try again in next week, Today our network is weak to verify it.
Flags: needinfo?(liuyong)
Flags: needinfo?(bugmail)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: