Closed Bug 1929105 (imap-corruption-self-healing) Opened 10 months ago Closed 9 months ago

IMAP code should handle local storage read errors by marking message not offline.

Categories

(MailNews Core :: General, defect)

defect

Tracking

(thunderbird_esr128? fixed, thunderbird133 verified)

VERIFIED FIXED
134 Branch
Tracking Status
thunderbird_esr128 ? fixed
thunderbird133 --- verified

People

(Reporter: benc, Assigned: benc)

References

Details

(Whiteboard: ===== Note for uplifting to comm-esr128: ===== In function _readFromCacheStream the parameter name is different (it's still called "stream", not cacheStream). It's easiest to keep that old parameter name. Manually edit the file after merging to add second…)

Attachments

(7 files, 5 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
16.41 KB, patch
Details | Diff | Splinter Review
7.16 KB, patch
Details | Diff | Splinter Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
1.07 KB, patch
Details | Diff | Splinter Review

IMAP uses local storage (mbox, maildir) as a sort of cache, and will use the local copy in preference to streaming the message down from the server. Which is what you'd expect.

But it doesn't seem that this local storage access does much error handling.

If the IMAP code encounters an error when reading a message from local storage, probably the correct response is to invalidate the local copy of the message: clear .storeToken, set .offlineMessageSize to 0 and clear the Offline flag.
Ideally it'd also retry the operation, this time streaming the message from the server instead (and adding a new local copy to local storage, if appropriate).

The upshot is that if it fails to get a message from local store, it should automatically get it from the server instead, essentially performing an incremental "folder repair" on any borked messages.

Same principle applies to other server-based folder types (eg ews)

See Also: → 1926832
Assignee: nobody → benc
Attachment #9436777 - Attachment description: WIP: Bug 1929105 - Respond to IMAP offline message read errors by discarding local copy. → Bug 1929105 - Respond to IMAP offline message read errors by discarding local copy. r=#thunderbird-reviewers

The easiest way to test this is probably:

  1. set up an IMAP account with a few messages in it
  2. exit TB
  3. load the INBOX file into a text editor
  4. insert a few characters into a message and save back to INBOX
  5. run TB again, and click through the messages to view them

step 4 should have the effect of changing the offsets of all the messages following the one you edited. This means the DB storeTokens for those will now be wrong.
Those messages should now fail to load from the local store, because the message no longer begins exactly with "From ".
The first time you click on such a message, the message view should just be blank. This is the read failing, and the local copy should be discarded as a result.
If you click to another message then back to the first one, this time it should a) download from the server b) show up in the message display c) be added to the end of the mbox as a new (working!) local copy.

Things to note:
Auto-discarding local messages like this does not contribute to the expunge counter (as we don't really have any certainty about the size of a damaged message), so an immediate compact will bail without doing anything. No big deal - once some (working) messages have been deleted, the next compact will clean things up including the damaged message we discarded, leaving the new, good copy only.

You can also trigger the message-too-big sanity check by inserting a bunch of extra lines into a message. The read error will trigger when 10% more bytes are read than the .offlineMessageSize was indicating. Again, the error should be caught (the display will stop at the point the error triggered rather than being blank), and the message will be discarded, so next attempt to access should result in it being re-downloaded.

Alternatively, you can programatically screw up .storetoken values on msgHdrs.

Whiteboard: Note for uplifting to comm-esr128: In function _readFromCacheStream the parameter name is different (it's still called "stream", not cacheStream). It's easiest to keep that old parameter name. Manually edit the file after merging to add second parameter o…
Whiteboard: Note for uplifting to comm-esr128: In function _readFromCacheStream the parameter name is different (it's still called "stream" → ===== Note for uplifting to comm-esr128: ===== In function _readFromCacheStream the parameter name is different (it's still called "stream"

I've applied the patch to comm-esr128 locally to my local build.

Using a corrupted IMAP folder (end markers not being found, some junk displayed at end of message),
the self healing works for me in the following way,

When clicking corrupted message A, it's still displayed with the junk at the end.
Then I click another message B.
Then I go back and again click message A.
Now A is displayed correctly.

(In reply to Kai Engert [:KaiE:] from comment #3)

When clicking corrupted message A, it's still displayed with the junk at the end.

A bit unsightly, but not too much we can do there - the error doesn't trigger until the docshell has already received (and displayed) most of the message. We don't have any control over what the docshell does at that point.
Ideally, I'd like the docshell to handle the error by retrying the request, so the new copy would immediately be downloaded, but hey.
(I'm sure there's some way to configure an error handler in the docshell which could do this, but I'm not sure it's worth the bother - in any case you wouldn't want more than one retry, as the error could be due to some reason which caused the request to always fail - eg bad disk, bad network, server down, whatever).

Attached patch esr128-part1-mark-offline.patch (obsolete) — Splinter Review

Ben's patch merged to esr128

Attachment #9437282 - Attachment is obsolete: true
Attached patch esr128-part2-reload.patch (obsolete) — Splinter Review
Attachment #9437246 - Attachment is obsolete: true
Attachment #9437281 - Attachment is obsolete: true
Attachment #9437283 - Attachment is obsolete: true

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/533bf3e32420
Respond to IMAP offline message read errors by discarding local copy. r=kaie
https://hg.mozilla.org/comm-central/rev/574f520e017d
Automatically reload the displayed email message if reading an offline copy failed. r=aleca,BenC,KaiE

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 134 Branch
Regressions: 1931324
Regressions: 1931325

This needs follow-up patches to fix the tests.

I'll look at the simple gtest failure.

The second failures is
https://treeherder.mozilla.org/logviewer?job_id=482633074&repo=comm-central&lineNumber=15147
which is related to new test
mail/base/test/browser/browser_reloadBadMessage.js

Apparently that test triggers loading of URIs that "not same document load",
and is rejected by gecko's internal checks.

To understand this better I'd need to find out what kind of URIs are being loaded here.

Status: RESOLVED → REOPENED
Flags: needinfo?(geoff)
Resolution: FIXED → ---

imap://user@localhost:34065/fetch%3EUID%3E%5EINBOX%3E1
imap://user@localhost:34065/fetch%3EUID%3E%5EINBOX%3E2
imap://user@localhost:34065/fetch%3EUID%3E%5EINBOX%3E3
imap://user@localhost:34065/fetch%3EUID%3E%5EINBOX%3E4

Attachment #9437693 - Attachment is obsolete: true

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/470a95a50c32
Fix empty-mbox gtests in TestMsgMboxRead. r=kaie

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → FIXED
Flags: needinfo?(geoff)

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/db474dce6ae1
follow-up - Fix test tripping over horrible IMAP code. r=tobyp

Duplicate of this bug: 1931325

We need to reopen, because something isn't right.

I'm investigating regression bug 1931324. It fails, because it is unable to clone an URI. That shouldn't happen.

I've tried to debug why the clone is failing, but when I'm slowly stepping through the code, I don't get the failure. Only if I set a breakpoint on
/home/user/moz/comm-esr128/obj-thunder-debug/dist/include/nsIURIMutator.h:49
I run into the NS_FAILED condition.

  [[nodiscard]] nsresult InitFromURI(T* aURI)
  {
    nsCOMPtr<nsIURI> clone;
    nsresult rv = aURI->Clone(getter_AddRefs(clone));
->  if (NS_FAILED(rv)) {
      return rv;
    }
    mURI = static_cast<T*>(clone.get());
    return NS_OK;
  }

Could there be an unsafe multithreaded race?

I see that the fix for this bug adds checked that we're running on the main thread.
So maybe there isn't yet sufficient protection.

Status: RESOLVED → REOPENED
Flags: needinfo?(benc)
Resolution: FIXED → ---
Regressions: 1931508

Please disregard my previous comment 21.

I wasn't cc'ed on bug 1931324.
That's why I didn't see the comments over there, that the bug was fixed.
I unnecessarily spent time today trying to investigate it...

With that, a remaining regression is bug 1931508.

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → FIXED

Comment on attachment 9436777 [details]
Bug 1929105 - Respond to IMAP offline message read errors by discarding local copy. r=#thunderbird-reviewers

[Approval Request Comment]
This patch has a bunch of follow-ups to fix tests and edge cases, including in two other bugs. I did a try run of it on beta that you can copy. I don't think the order is important ultimately, but it might be.

Regression caused by (bug #):
User impact if declined: messages that are corrupted by folder compact remain that way unless the user intervenes
Testing completed (on c-c, etc.): landed late last week
Risk to taking this patch (and alternatives if risky): there's some risk, that's why I want it on beta for as long as possible before going to ESR.

Attachment #9436777 - Flags: approval-comm-beta?
Attachment #9437615 - Flags: approval-comm-beta?
Attachment #9437805 - Flags: approval-comm-beta?
Attachment #9437843 - Flags: approval-comm-beta?

Comment on attachment 9436777 [details]
Bug 1929105 - Respond to IMAP offline message read errors by discarding local copy. r=#thunderbird-reviewers

[Triage Comment]
Approved for beta

Attachment #9436777 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9437615 [details]
Bug 1929105 - Automatically reload the displayed email message if reading an offline copy failed. r=aleca,BenC,KaiE

[Triage Comment]
Approved for beta

Attachment #9437615 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9437805 [details]
Bug 1929105 - Fix empty-mbox gtests in TestMsgMboxRead. r=#thunderbird-reviewers

[Triage Comment]
Approved for beta

Attachment #9437805 - Flags: approval-comm-beta? → approval-comm-beta+

Comment on attachment 9437843 [details]
Bug 1929105 follow-up - Fix test tripping over horrible IMAP code. r=#thunderbird-reviewers

[Triage Comment]
Approved for beta

Attachment #9437843 - Flags: approval-comm-beta? → approval-comm-beta+
Alias: imap-corruption-self-healing
Flags: needinfo?(benc)

Confirming this issue as verified fixed using 133.0b5-build2(20241118140457) with macOS 14, Ubuntu 24 and WIndows 11 using the STR from comment 2.

Status: RESOLVED → VERIFIED

Ben, could you please request this for uplift to 128.5.0?

Flags: needinfo?(benc)

Comment on attachment 9437668 [details] [diff] [review]
1929105-128-discard.patch

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is part of the "self-healing" change to automatically detect corrupted messages and re-download them from IMAP.
  • User impact if declined: Users will be left with bad messages unless they perform an IMAP folder repair. NOTE: this also needs the patch in Bug 1931508 to be uplifted
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
Flags: needinfo?(benc)
Attachment #9437668 - Flags: approval-mozilla-esr128?
Attachment #9437669 - Flags: approval-mozilla-esr128?
Attachment #9437668 - Flags: approval-mozilla-esr128?
Attachment #9437669 - Flags: approval-mozilla-esr128?

Comment on attachment 9437668 [details] [diff] [review]
1929105-128-discard.patch

[Approval Request Comment]
Regression caused by (bug #):
User impact if declined: Users will be left with bad messages unless they perform an IMAP folder repair. NOTE: this also needs the patch in Bug 1931508 to be uplifted
Testing completed (on c-c, etc.): c-c
Risk to taking this patch (and alternatives if risky): lowish

Attachment #9437668 - Flags: approval-comm-esr128?

Comment on attachment 9437669 [details] [diff] [review]
1929105-128-reload.patch

[Approval Request Comment]
Regression caused by (bug #):
User impact if declined: Users will be left looking at broken messages that should now be working.
Testing completed (on c-c, etc.): c-c
Risk to taking this patch (and alternatives if risky): low

Attachment #9437669 - Flags: approval-comm-esr128?

Comment on attachment 9437805 [details]
Bug 1929105 - Fix empty-mbox gtests in TestMsgMboxRead. r=#thunderbird-reviewers

A test-only tweak to handle the new logic.
Regression caused by (bug #):
User impact if declined: none
Testing completed (on c-c, etc.): c-c
Risk to taking this patch (and alternatives if risky): low

Attachment #9437805 - Flags: approval-comm-esr128?

Comment on attachment 9437843 [details]
Bug 1929105 follow-up - Fix test tripping over horrible IMAP code. r=#thunderbird-reviewers

A test-only tweak to handle the new logic.

Regression caused by (bug #):
User impact if declined: none
Testing completed (on c-c, etc.): c-c
Risk to taking this patch (and alternatives if risky): low

Attachment #9437843 - Flags: approval-comm-esr128?

[Approval Request Comment]
See comment 36. ESR needed a slightly different version of this patch to avoid test failure.

Attachment #9438665 - Flags: approval-comm-esr128?
Attachment #9437843 - Flags: approval-comm-esr128?

I've done a try run of the patches in this bug and related bugs. There were some slight changes (as mentioned in the whiteboard of this bug) and the patch I've just added instead of the one that landed on central.

Comment on attachment 9437668 [details] [diff] [review]
1929105-128-discard.patch

[Triage Comment]
Approved for esr128

Attachment #9437668 - Flags: approval-comm-esr128? → approval-comm-esr128+

Comment on attachment 9437669 [details] [diff] [review]
1929105-128-reload.patch

[Triage Comment]
Approved for esr128

Attachment #9437669 - Flags: approval-comm-esr128? → approval-comm-esr128+

Comment on attachment 9437805 [details]
Bug 1929105 - Fix empty-mbox gtests in TestMsgMboxRead. r=#thunderbird-reviewers

[Triage Comment]
Approved for esr128

Attachment #9437805 - Flags: approval-comm-esr128? → approval-comm-esr128+

Comment on attachment 9438665 [details] [diff] [review]
1929105-test-imap-cleanup.diff

[Triage Comment]
Approved for esr128

Attachment #9438665 - Flags: approval-comm-esr128? → approval-comm-esr128+
Blocks: 1935109
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: