UI race / Undesired trashing or deletion of messages

RESOLVED FIXED in Thunderbird 3.3a3

Status

defect
--
critical
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: mozillabugs, Assigned: Bienvenu)

Tracking

(Depends on 1 bug)

unspecified
Thunderbird 3.3a3
x86_64
macOS
Dependency tree / graph
Bug Flags:
in-testsuite +

Thunderbird Tracking Flags

(thunderbird3.1 .10-fixed)

Details

Attachments

(2 attachments)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.14pre) Gecko/20101201 Lanikai/3.1.8pre

This is using a google IMAP account over SSL.  This has been happening on 2 machines over multiple OS and Thunderbird installs, folder rebuilds, etc.

Reproducible: Sometimes

Steps to Reproduce:
1. Delete a few messages in fairly rapid succession (usually a second apart or less), using the Delete key in the message list pane.
2. Move selection to a different message before Thunderbird is finished resolving the corresponding IMAP transactions, or as happened today, click the close button on an open message tab.
Actual Results:  
Thunderbird trashes or even deletes (which was _not_ the original action) some additional messages without being asked to.  And yes, I keep count of how many times I press Delete, so I'm getting n+1 deletions.

Expected Results:  
My mail is where I left it, unmolested.

A couple of other vaguely-related bits of weird UI and IMAP behaviour which -may- point to architectural issues behind this: (I'll file these as their own bugs eventually but for now they're quite possibly relevant to the overwhelmingly more important issue here)

- Nondeterministic UI behaviour:  Quite frequently, closing a message tab, then quickly using cmd-1 to get back to the Inbox (or otherwise), will result in what appears to be the message preview pane appearing below the message list (I normally have it turned off), but it is NOT checked in the View > Layout menu.  Additionally, if I then go and turn it on, it changes appearance somewhat (thus "what appears to be").

- Weird, probably overly-complicated management of folder actions:  Selecting a group of messages and pressing 'm' to toggle their read/unread status will result in them flipping back and forth to read/unread status a couple of times (I believe it's usually 2-3 transitions total, when one would expect one transition from a sane piece of software).

- Often, using cmd-z to move a message back from the trash will result in the message coming back, and then disappearing a few seconds later.  I don't do this often enough to remember whether it goes back to the trash consistently or sometimes ends up deleted.
Reproduced with:  Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7

Transaction starts around L107; Note that while the transaction specifies a list of 2 messages, the command issues specifies a RANGE of 3.  Fail.  The failure seems to be related to the fact that, IMMEDIATELY after the delete was performed via the inbox GUI, I hit Enter to open message 37623 -- this is a common feature of these failures... changing what's selected or opening messages or whatever; if I wait patiently until the operation is completed and then do something else, this doesn't tend to happen.
thx for the log - unfortunately, it's not complete, so I can't really tell what the state of your mailbox is. We issue the range because we don't think there is a 37623 message; my guess is that's because of the expunge, but without getting the sequence numbers from the initial fetch 1:*, that's hard to tell. If you want to send me a complete log privately (you can edit out message contents), that would be helpful.
Whiteboard: [pending new protocol log and analysis]
Sent complete log to David on request.  If anyone else wants to go over it please e-mail me and I will send it; I would rather not post the unredacted log for all to see.
Whiteboard: [pending new protocol log and analysis] → [pending protocol log analysis by bienvenu]
confirming - there's definitely some confusion going on. Somehow, the log shows that a couple different delete urls are being run, one that deletes messages 37622 and 37624, and one that just deletes 37622. The latter gets run first, and then the imap server issues an unsolicited expunge (which is very odd), so that 37622 is gone from flag state. Then, the url that tries to delete 37622 and 37624 runs, and we convert that into a range of 37622:37624, instead of just 37624. I'm hoping that the code that converts a sequence of uids into a range is just confused when the start of the range is missing.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [pending protocol log analysis by bienvenu] → code needs investigation
I added a test to TestImapFlagAndUidState.cpp that attempted to duplicate this issue and it passed, so I have't quite nailed down what's going wrong here.
I doubt it's that simple -- per the title of the bug this is triggered by WHAT else I do in the mailbox UI and WHEN I do it.  If I simply delete message(s) and then wait until everything is calm it never (or rarely?  can't be certain here) happens.  However if I start using arrow keys to change what's selected, or open another message, or whatever, THEN things go south.
running other urls can change the time we run the delete urls. For example, in the log you sent, we're running a url to mark a message read between the first and second deletes. And since we do deletes as offline operations, followed by online playback .5 seconds later, and we only run one url at a time on a given folder, all sorts of strange ordering can happen.
Does turning off auto expunge in your gmail settings make this problem go away?
Just turned it off now, so I'll see what that does.  I've formed habits to avoid this happening, so I'll see if I can deliberately be a bit "reckless" around some messages I don't care about and see if I can get anything to happen.  Might be a while, and perhaps quite a long while before I can consider responding in the negative.
Turns out that the test I added was running in "condstore" mode, and once I flipped it to be normal mode, the bug occurred.  So my suspicion was right : the code that converts a sequence of uids into a range is confused when the start of the range is missing. So I should be able to come up with a fix fairly quickly.
this patch is against 3.1 branch, because the cpp unit tests don't work on the trunk. But the patch should apply fine. We'll want this fix for 3.1.x anyway...
Assignee: nobody → bienvenu
Attachment #510815 - Flags: review?(bugzilla)
Comment on attachment 510815 [details] [diff] [review]
proposed fix with unit test

>+          if (!foundIt)
>+          {
>+            NS_WARNING("flag state missing key");
>+            // The start of this sequence is missing from flag state, so move
>+            // on to the next key.
>+            curFlagStateIndex = -1;
>+            curSequenceEnd = startSequence = nextKey;
>+            continue;

Is it really still a big concern if the flag state is missing a key? I'm just wondering why we need to keep the warning.

>+  // This tests the middle message missing from the flag state.
>+  struct msgState msgState4[] = {
>+    {10, kImapMsgSeenFlag, 0},
>+    {69, kImapMsgSeenFlag, 1},
>+    {70, kImapMsgSeenFlag, 2},
>+    {71, kImapMsgSeenFlag, 3},
>+    {73, kImapMsgSeenFlag, 3}};

Shouldn't 73 have an index of 4?

>+  flagState->ExpungeByIndex(4);
>+  PRUint32 msgUids2[] = {69,71,73};
...
>+  if (!uidString2.EqualsLiteral("69,73"))

The 73 must overwrite the 71 which is why this still "works". I tried changing that index to 4 locally, and it still passed so I guess the code is fine.
Attachment #510815 - Flags: review?(bugzilla) → review+
I'm going to leave the warning in for now - it's not a big concern, but it's still a bit out of the ordinary.
Comment on attachment 510815 [details] [diff] [review]
proposed fix with unit test

requesting approval for 3.1.9 - safe, has a unit test, and fixes a rather bad edge case.
Attachment #510815 - Flags: approval-thunderbird3.1.9?
fixed on trunk - http://hg.mozilla.org/comm-central/rev/d8ee7cd9a11c
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: code needs investigation
marked in testsuite, but really need bug 632429 to be fixed for it to be in testsuite.
Depends on: 632429
Target Milestone: --- → Thunderbird 3.3a3
Attachment #510815 - Flags: approval-thunderbird3.1.10? → approval-thunderbird3.1.10+
You need to log in before you can comment on or make changes to this bug.