Closed Bug 1062728 Opened 10 years ago Closed 10 years ago

[email/pop3] POP3 sync breaks on messages that lack a Date header

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.0M fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)

VERIFIED FIXED
2.1 S5 (26sep)
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: sharaf.tir, Assigned: asuth, NeedInfo)

Details

(Whiteboard: [LibGLA, TD-92490, QE1, B])

Attachments

(4 files)

With a particular POP3 account in naver.com syncing is failing.

Its happening with only one account.
Account details I can share through mail.

Account configuration:

POP3 Server: pop.naver.com port 995
SMTP Server: smtp.naver.com port SSL port 465
Whiteboard: [LibGLA, TD-92490, QE1, B]
Is it possible to provide a logcat from the failing sync and information on what version of Firefox OS is in use?  Please see https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo for information on how to provide a logcat and version info.
Flags: needinfo?(sharaf.tir)
Attached file Adb log
Please find the adb log when the issue is reproduced

FFOS Veriosn: 73a7acea
Flags: needinfo?(sharaf.tir)
Thanks very much for the log!  It looks like one of your messages lacks a Date header and this is causing sync to break.  The work-around until we fix this bug would be for you to use a desktop-class POP3 client to locate the message that's missing a date header and to permanently delete it from your account.  If naver.com has a webmail interface, you may also be able to do it that way, although the webmail UI might not make it obvious which is the bad message.

It is conceivable, depending on the message ordering, that this problem will solve itself once more messages are added to your inbox assuming they don't share the Date problem.

I suspect we want to handle this by falling over to a non-suspicious 'Received' header if present, or, failing that, giving the message today's date.  We also probably want to handle problems with specific messages by treating them like the user deleted them and never synchronizing them.  (I think we may have discussed the latter at some point.)

The good news is that mcav's excellent disaster-recovery logic did catch the problem and prevented this turning into something even worse.  Although we do oddly end up with 2 disaster-recovery's happening:

09-10 10:21:22.534 I/Gecko   (11427): WLOG: POP3: Next batch. Messages left: 100
09-10 10:21:23.355 I/Gecko   (11427): WERR: [slog] disaster-recovery:exception {"accountId":"0","error":{},"errorName":"Error","errorMessage":"No date?!","stack":"makeHeaderInfo@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:20778:1\nexports.chewHeaderAndBodyStructure@app://email.gaiamobile.org/js/ext/composite/configurator.js:9886:13\nPop3Client.prototype.parseMime@app://email.gaiamobile.org/js/ext/composite/configurator.js:11061:15\nPop3Client.prototype.downloadPartialMessageByNumber/<@app://email.gaiamobile.org/js/ext/composite/configurator.js:10925:16\nPop3Protocol.prototype.onreceive@app://email.gaiamobile.org/js/ext/composite/configurator.js:9483:11\nTCPSocketProxy/routerInfo</<@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:23708:28\nDisasterRecovery.catchSocketExceptions@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:21834:9\nTCPSocketProxy/routerInfo<@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:23707:7\nreceiveInstanceMessage@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:1593:5\nreceiveMessage
09-10 10:21:23.365 I/Gecko   (11427): WERR: *** Disastrous Error for email accountId 0 -- attempting to recover...
09-10 10:21:23.365 I/Gecko   (11427): WWAR: No job operation was currently running.
09-10 10:21:23.365 I/Gecko   (11427): WLOG: pop3:onclose
09-10 10:21:23.365 I/Gecko   (11427): WERR: [slog] disaster-recovery:exception {"accountId":"0","error":{},"errorName":"TypeError","errorMessage":"message is undefined","stack":"fetchProgress@app://email.gaiamobile.org/js/ext/composite/configurator.js:11723:11\nPop3Client.prototype.listMessages/</nextBatch</</<@app://email.gaiamobile.org/js/ext/composite/configurator.js:10841:1\nPop3Client.prototype.downloadPartialMessageByNumber/<@app://email.gaiamobile.org/js/ext/composite/configurator.js:10907:1\nRequest.prototype._respondWithError@app://email.gaiamobile.org/js/ext/composite/configurator.js:9390:5\nPop3Protocol.prototype.onclose@app://email.gaiamobile.org/js/ext/composite/configurator.js:9504:7\nexports.Pop3Client/this.socket.onclose<@app://email.gaiamobile.org/js/ext/composite/configurator.js:10451:7\nTCPSocketProxy/routerInfo</<@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:23708:28\nDisasterRecovery.catchSocketExceptions@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:21834:9\nTCPSocketProxy/routerInfo<@app://email.gaiam
09-10 10:21:23.365 I/Gecko   (11427): WERR: *** Disastrous Error for email accountId 0 -- attempting to recover...
09-10 10:21:23.365 I/Gecko   (11427): WWAR: No job operation was currently running.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: [Email] Sync issues in POP3 → [email/pop3] POP3 sync breaks on messages that lack a Date header
Hi Andrew -

Is it possible to fix this for 2.0? This one probably will be a IOT blocker for partner

Thanks

Vance
Flags: needinfo?(bugmail)
(In reply to Vance Chen [:vchen][vchen@mozilla.com] from comment #4)
> Is it possible to fix this for 2.0? This one probably will be a IOT blocker
> for partner

Yes, although the 2.0 fix will will need to be constructed separately from the 2.1 fix.  The required fix is likely localized enough that this is not a big deal and may just be a manual uplift.


[Blocking Requested - why for this release]:
Per Vance's comment above, if we are expecting this to be an IOT blocker, we should probably mark this a 2.0 blocker too.  This won't take very long for us to fix.
blocking-b2g: --- → 2.0?
Flags: needinfo?(bugmail)
Triage reviewed, and are blocking due to IOT risk.
blocking-b2g: 2.0? → 2.0+
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Target Milestone: --- → 2.1 S5 (26sep)
During the process of creating a test for this and fixing the bug it turned out POP3 actually can hang its sync process in a variety of ways so I added additional test coverage for disconnection/those edge cases and fixed the underlying problems, at least I think.  See the test comments and fixes for more in-depth discussion (hopefully :)

This is the pull request for trunk which will be fully suitable for v2.1 uplift.  I think the v2.0 uplift will be able to take most of the fixes as well, it's just going to be an annoying process of manually apply hunks.  I'm going to wait on trunk feedback and landing before formulating the v2.0 fix.

Note that all tests were authored in a somewhat TDD fashion where I had them cause a provable hang in the test first, then fixed them.  Unfortunately it's probably a major hassle to try and manually reproduce that and I'm not sure it's necessary.

Note that this also depends on the following mail-fakeservers patch that I've speculatively landed and published on npm (per previous rationale, this has no risk and simplifies travis testing and general sanity; I can spin a new mail-fakeservers release as needed):
https://github.com/mozilla-b2g/mail-fakeservers/pull/20
Attachment #8494224 - Flags: review?(m)
Comment on attachment 8494224 [details] [review]
GELAM trunk pull request

Looks good, too bad there are so many edge cases. The nested setTimeouts are the only thing that stands out as a future booby trap, but since we have test coverage and we hate POP3 I'm not worried about it.
Attachment #8494224 - Flags: review?(m) → review+
(In reply to Marcus Cavanaugh [:mcav] <mcav@mozilla.com> from comment #8)
> Looks good, too bad there are so many edge cases. The nested setTimeouts are
> the only thing that stands out as a future booby trap, but since we have
> test coverage and we hate POP3 I'm not worried about it.

It's worth noting that without the setTimeouts everything arguably still works correctly.  It's just that the sync process's "onclose" event fires first and is the cause of aborting the sync rather than the error-handler associated with the request.  If you remove them the test will still pass and the sync process will correctly still abort.  And I believe as the rest of the code works, it doesn't actually matter (there are no other related resources).

I just thought it might be a little bit less foot-gunny/surprising in the future if we let those more specific handlers try and clean things up before the less specific cleanup fires.  And to that end I wanted to make sure it was the failed TOP 'object' error that got reported rather than the generic "onclosed" error for human verification in the log.  (Note that we do report the duplicate errors in the log too so we can see that everything expected is firing either way.)
(This is the Gaia pull request that was landed based on the actual gaia-email-libs-and-more pull request https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/340, and what we want to uplift to v2.1.  I am creating it for uplift clarity, but the real contents including tests are in the gelam pull request.)

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): Sorta not applicable.  The specific error we throw was introduced in bug 871897 in v1.3 (the same release we introduced POP3 support in), but we would have probably ended up dying for some other reason even without the guard in place.
[User impact] if declined: Users using POP3 (who already have a hard time of it, what with using POP3) who receive messages that lack a 'Date' (most likely caused by incompetent spammers or malicious jerks) would experience hung email syncs that never complete.  The problem could only be mitigated by deleting the message from the server or waiting for enough new messages to be received to push the message out of the sync window.  However, in that case using "load more messages" will eventually cause the problem to recur.
[Testing completed]:  I wrote some spiffy and extensive back-end unit tests that reproduced the problem and correct it, as well as correcting some other POP3 edge cases.  I am going to also reproduce on a real server shortly when I put the v2.0-specific patch up.  I will not land this prior to real testing on a real server.  I pinky-swear.
[Risk to taking this patch] (and alternatives if risky): Considered low risk.  We have good and now even more improved back-end tests and these changes are primarily in the area of error handling and the tests all trigger the new error handling.
[String changes made]: No string changes.
Attachment #8495038 - Flags: review+
Attachment #8495038 - Flags: approval-gaia-v2.1?
Tested this patch and found it to be working. Thanks.
Will you please make patch for 2.0 as well?
Flags: needinfo?(vchen)
Needs a v2.0 approval if this is intended for that branch as well (2.0+ blocking status).
Flags: needinfo?(bugmail)
Attachment #8495038 - Flags: approval-gaia-v2.1? → approval-gaia-v2.1+
[Environment]
Gaia-Rev        13973ab50760d1e8bb773082163f0dff19d35a44
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-aurora/rev/6e317e075d04
Build-ID        20140928160204
Version         34.0a2
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20140925.192608
FW-Date         Thu Sep 25 19:26:18 EDT 2014
Bootloader      L1TC10011800

[Result]
PASS
Status: RESOLVED → VERIFIED
QA Whiteboard: [COM=Gaia::E-Mail]
Hi Andrew, Could you please kindly help us rebase the patch on 2.0 ? 
Thank you very much.
clear my ni since Rachelle already asked to land this on 2.0 branch
Flags: needinfo?(vchen)
[Approval Request Comment]
Please refer to comment 11 for the v2.1 approval request for the form stuff.  The situation is the same except this patch has been explicitly reduced in scope so that it is only a fix for the missing date header and does not include additional error guards because of larger-than-hoped-for differences between POP3 in v2.0 and v2.1/v2.2.

I tested this on a v2.0 device with the patch against a real dovecot POP3 server with a message missing a date header (injected via IMAP APPEND).  I confirmed failure without the patch and success with the patch.
Attachment #8497437 - Flags: review+
Attachment #8497437 - Flags: approval-gaia-v2.0?
Flags: needinfo?(bugmail)
Comment on attachment 8497437 [details] [review]
v2.0-specific uplift fix

Thanks for the testing here :asuth. If you think we need more help from QA to verify this once it lands on 2.0, please feel free to add the verifyme keyword. For now , I am relying on your pinky-swear (:P), automation and the manual verification you've done.
Attachment #8497437 - Flags: approval-gaia-v2.0? → approval-gaia-v2.0+
Hi Sharaf,
    Please provide steps and account, which could help with our verification. Thanks.
Flags: needinfo?(sharaf.tir)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: