Closed Bug 964757 Opened 10 years ago Closed 9 years ago

[b2g] email app sometimes stops loading snippets and message bodies

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bkelly, Unassigned)

Details

(Whiteboard: [dogfood1.1])

I've been using a production build of v1.1hd on a helix as my main phone for the last month.  The email app has been working great until today when I stopped getting message bodies for bugzilla mail.  The list snippets also don't seem to load for these messages.  If I go back to a message that shows a snippet I can load the message body as well.

I don't know if its relevant, but I recently enabled HTML mail on my bugzilla account.  This folder only contains bugzilla mail due to sorting rules on zimbra.

:gerard-majax indicated he had a similar issue on his Nexus S at some point, but was unable to isolate steps to reproduce.

I believe the work around is to delete the account and re-create it.  I have been holding off on that to see if we can debug this.
My device does not have logcat enabled in its production build unfortunately.  We are trying to use the secret email logs to track it down:

  https://wiki.mozilla.org/Gaia/Email/SecretDebugMode
Mysteries abound.  I deleted and re-created my account.  Same problem.

I then moved the last two days of messages to another folder.  This allowed me to get snippets and message bodies.

I then incrementally moved them back to my bugzilla folder.  The problem never re-occurred.

I would have expected the issue to be tied to a specific mail message.  Perhaps it was related to an ID our imap server assigned it or something?
I've analyzed the (censored) log.  Our "downloadBodies" job ended up hanging in a "UID FETCH".

After the UID fetch we have a bunch of data() log records like so:
data! (length: 22)
data! (length: 82)
data! (length: 21)
data! (length: 2)
data! (length: 7)
data! (length: 2)
data! (length: 66)
data! (length: 2)
data! (length: 14)
data! (length: 6)
data! (length: 70)
data! (length: 6)
data! (length: 8)
data! (length: 6)
data! (length: 74)
data! (length: 7)
data! (length: 14)
data! (length: 7)
data! (length: 72)
data! (length: 74)
data! (length: 73)
data! (length: 7)
data! (length: 12)
data! (length: 10)
data! (length: 2)

One of three things is happening here:

1) We're in the literal parsing state and the server does not believe in Nagle/buffering/friends.  We never actually get all the bytes the literal implies.

The theory would be that the message body (meta-)state got corrupted on the server.  The server's meta-data claims it has say 100 bytes.  It tells us that but when it goes to stream the body to us in (ridiculous un-buffered) fashion it finds it has run out of data early at say 80 bytes.  It doesn't know to pad out the body with gibberish so we can escape from our literal and the connection is permanently hung.  When :bkelly moved the message, the meta-data corruption was dealt by either a re-parse or error handling logic that was able to fix things up.

This seems crazy, but the non-buffering also seems crazy.

2) We're not actually entering the literal parse state and the state machine is somehow failing to explode.  This seems less likely to me since the "length: 2" seems a lot like a blank (CRLF-only) line, and those would not be reported by our response processing logic.  And if there was a line with 2 content bytes, I'm pretty sure our parse state machine would explode a lot, certainly well before we got that far.

3) Parse state machine failures only get reported via console.error(), not super fancy logging.  If parseFetch or the recipient of our 'emit' call throws, (EventEmitter does not catch anything on our behalf), we are boned and it won't show up in the log.  I'll file a bug on this.  So in this case there's something wacky about the contents of the body that kills us.  The thing is that this would be deterministic and unless one of those message moves discarded the evil message or allowed something like a Bayesian classifier on another computer to perceive the message as new and then discard it, this seems less likely than option 1.


I think the main actionable thing we have is to improve our secret debug logging for parse failures or exploding event consumers.
Filed the IMAP protocol logger deficiency as bug 967764.

I'm going to leave this bug open for now even though we don't have a reproducing test case since we can probably find a reproducing test case via automated synchronization of a giant zimbra mail folder.
If this wasn't resolved looong ago, this was resolved by the snippet fetch hang fix on bug 1128285 that landed recently.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.