Closed Bug 1494764 Opened 6 years ago Closed 5 years ago

MOZ_ASSERT(strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n', "Expect '\n' as only character in this line") - triggered again during last (or only) chunk

Categories

(MailNews Core :: Networking: IMAP, defect)

x86_64
All
defect
Not set
critical

Tracking

(thunderbird_esr6871+ fixed, thunderbird70 fixed)

RESOLVED FIXED
Thunderbird 70.0
Tracking Status
thunderbird_esr68 71+ fixed
thunderbird70 --- fixed

People

(Reporter: aceman, Assigned: gds)

References

()

Details

Attachments

(3 files, 4 obsolete files)

+++ This bug was initially created as a clone of Bug #1451137 +++

I just let a testing IMAP account sync messages and regularly get this TB crash:

Assertion failure: strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n' (Expect '
' as only character in this line), at mailnews/imap/src/nsImapServerResponseParser.cpp:3209

It is supposed to be fixed in bug 1451137, but I still get it on TB trunk.
A few months ago I also saw this error on trunk. I have a fix for it stashed away but haven't revisited it yet or submitted a patch. I think it was caused by fetching a series of messages, each as a single chunk, and on one message the last line of the email was corrupted so that it ended in just \r and not \r\n. This caused the next email to be fetched to be assumed to have just a \n for the first line that triggers the assert.

I will test this again, duplicate it and fix it.
Assignee: nobody → gds
I re-duplicated the problem I saw and tested the patch and it fixes the assert that I saw. If the assert still occurs after this patch applied, I will need more info such as the source for the email that causes the assert and other steps to causes the problem, if possible.

The problem I was seeing was because the last line of the email stored on the server ended in just \r and not \r\n. So this was being treated as a split \r\n and I was expecting to see the \n as the first line of the next chunk. But this was last chunk of message A. When I looked at message B, the assert hit because I was expecting to see just \n but instead see a normal full line.
Attachment #9019931 - Flags: review?(jorgk)
Attachment #9019931 - Flags: feedback?(acelists)
Status: NEW → ASSIGNED
Comment on attachment 9019931 [details] [diff] [review]
1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch

Looks reasonable. I'll run the unit tests for you locally.
Attachment #9019931 - Flags: review?(jorgk) → review+
What's the functional problem mentioned in the commit comment? Does this need backport to TB 60.x?
The problem is that without the assert to cause a crash the next email opened after the one that ends with just \r fails to display. I didn't test that this time but I will to to be sure. Will let you know ASAP.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/7a29a83fb1ab
Ignore missing \n on last line of last chunk. r=jorgk
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 65.0
Comment on attachment 9019931 [details] [diff] [review]
1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch

Review of attachment 9019931 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
I have not seen the crash since this was landed.
It may also happen the problem message is no longer on the server.
I'll speak up if it happens again.
Attachment #9019931 - Flags: feedback?(acelists) → feedback+

Comment on attachment 9019931 [details] [diff] [review]
1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch

Time to move this to ESR 60.

Attachment #9019931 - Flags: approval-comm-esr60+

Hmm, could this be causing problems: https://support.mozilla.org/en-US/questions/1249414 ?

We're doing a TB 60.5.1 next week, so I can revert that change.

Note that this is essentially a one line change:

-      if (charsReadSoFar > numberOfCharsInThisChunk)
+      if (!lastChunk && (charsReadSoFar > numberOfCharsInThisChunk))

The rest of the changes are comments and a string change.

Could you just build it Jorg and I can ask him to test?

That's possible, but I think it's pretty obvious: Change in chunking causes chunking issue. I can do a build, but I need to know the platform. Win, Mac, Linux, 32 or 64 bits? I've offered try builds to SUMO people before and they refused since Windows blocks those binaries.

Oops, thought I was referring to this bug when I referenced the earlier bug with a similar title: bug 1451137. Anyhow, I can check this again with large binary non-inline attachments (e.g., pdf) since I don't see that the support reporter's file types are important and I don't have the microsoft apps to produce them.
Re: https://support.mozilla.org/en-US/questions/1249414

Any chance to get an answer here quickly. I'm getting the code for TB 60.5.1 ready right now and I need to know whether to pull this change out or not.

No luck duplicating the problem and about ready to call it a day. Had the following info queued up to post when I just now saw your message come in. If you want to back it out I guess it's OK and I will still try again tomorrow to duplicate and fix the problem.

Found a large docx file (19M) that I attached and save to a message in an imap folder (no offline storage) but it fetches it in one big chunk and not multiple 98k chunks. Opens OK in Libre Office.

Another test message with an abstract art picture from the original signature bug opens fine with "close to trunk" version pulled fairly recently.

But I'm still a not sure when chunking actually occurs. I think it happen when you are fetching the whole "not modified" message and you are storing the whole thing. If individual parts are accessed, they are fetched completely and not in chunks.

I did see this ram caching error several times:

Assertion failure: !aNew (Logic error: Trying to read part from entire message which doesn't exist), 
at /home/gene/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:9422

Seems to occur when I copy the large message from one folder to another and then try to open it before it has completely loaded. I think this is a known problem that I can fix with my prototype ram cache changes that I haven't yet submitted.

(In reply to Matt from comment #12)

Could you just build it Jorg and I can ask him to test?

Comment posted at SUMO to try
https://queue.taskcluster.net/v1/task/AcRd5mSOROO3yWDvEyrmSQ/runs/0/artifacts/public/build/install/sea/target.installer.exe

Still no luck duplicating the problem. Like I've observed in the past(e.g., bug 1216951 comment29), I rarely see spontaneous fetching in multiple chunks, but typically see the fetch occur as a single large "chunk", even for attachments that exceed the chunk threshold, 98k. But not sure that a single chunk might not be a problem too since it still uses the msg_fetch_literal() code that handles multiple chunks. However, haven't seen a problem when the single big chunk is fetched -- the docx attachment opens OK.

But I have found a reliable way to "force" chunking: set mime_parts_on_demand (2 places) both to false. Then put the emails with big docx attachments into a folder that has no offline storage. Each time you clear cache and access the messages they are fetched in multiple chunks. I see no problem opening the docx file or or seeing an even larger png attachment when fetched in chunks.

I also set the chunk size smaller than the default, 65535 down to 4097, so that many more chunks occur and have no problem opening attachments.

I will try some more, but at this point I will ask the SUMO reporter to send me a sample email that supposedly won't open.

Jorg, I send a sample email to the tbtest account that you can try to open if you please. It is subject "huge docx".

That document opens fine, but then the inbox is synchronised for offline use. As you said, there must be very special circumstances that trigger the bug (if it is indeed related to the change here).

Maybe the way to experiment is to add some debug to the code to see which if/else branch runs and whether reverting the change causes a change in behaviour.

We made this change for a reason, see comment #5 (although maybe not 100% convincing).

(In reply to Jorg K (GMT+1) from comment #19)

We made this change for a reason, see comment #5 (although maybe not 100% convincing).

I tried to correlate what I saw at one time to acemann's report. I probably should have asked aceman for a sample email when he reported this.

The problem is that on the last or only chunk of a message, some servers (e.g., dovecot, openwave) end like this with 2 separate lines:

the last literal data\r\n
)\r\n

and others (e.g., gmail) like this with one line:

the last literal data)\r\n

In the later case, with 60.5 the closing ')' on the literal data was never removed and caused the problem. It is correctly removed with revert in 60.5.1.

(More reported info for this bug can be found starting at probably unrelated bug 570914 comment 96 and in bug 1527632 that I have marked as duplicate of this.)

Thanks for looking into it. We love server specific problems :-(

At this point I probably don't suggest changing this again, at least until the original bug that this was supposed to fix is seen again. However, for possible future use, here is another way to fix the problem.

The problem with the original patch for this bug is that on the last (or only) chunk for a part or full download, if the amount downloaded exceeds the expected chunk size, as it will when the last line includes the terminating ')', the code to remove the')'is skipped causing the')'to look like another literal and be displayed.

With this diff, if the last line of the last (or only) chunk contain the terminating ')'and possibly a missing\n, the last line will still be repaired but thefNextChunksStartsWithNewLinebool will be reset to false for the next part or full download and avoid the assert.

I will continue to watch for the subject assert failure and other possible problems with and without this diff in place.

Attachment #9019931 - Attachment is obsolete: true

Gene, quite a few problems reported where for when this fix was out in the wild. Any progress?

I can confirm that 60.5.1 fixed this.

(In reply to Jorg K (GMT+1) from comment #29)

Gene, quite a few problems reported where for when this fix was out in the wild. Any progress?

See comment 25. Otherwise, don't know what progress you're asking about. I have only seen the problem as described in comment 1 on an imported *.eml I probably corrupted using vi editor.

OK, well, yes, we need to fix the assert failure locally. Now that we backed this out, is it back, Aceman?

Flags: needinfo?(acelists)
Attached file imap_debug_stuff.txt

I hit this one just now too.

Pretty fuzzy STR:
I had a couple of gmail IMAP accounts configured. One was all synced, the other was still downloading messages. I was clicking wildly on messages, using 'n', 'p', cursor keys (trying to trigger another bug :-)... and hit this assert.

I was in the debugger at the time, so I've attached a stacktrace and dumped the contents of *this, just in case.

Ben, Thanks for the report. Was the gmail account downloading messages a newly set-up account so all messages were being downloaded? Or was it just a couple of new messages?

Also, can you possibly attach or at least describe the email that seemed to cause the assert:

$1 = 0x7fffd3985ab0 "Content-Type: multipart/alternative; boundary=0016e6d99fb959a7930482a13924\r\n"

Maybe the boundary number is unique in the account so you might be able to find it in the mbox file. If you don't want to attach it, maybe you can tell me its size and number and type of attachments.

Ben, see comment 34

(bug 1264302 has dependencies where I want to asses the effect of chunking patches, so temporarily making this block bug 1264302)

Blocks: 1264302
Flags: needinfo?(benc)
Summary: MOZ_ASSERT(strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n', "Expect '\n' as only character in this line") - triggered again → MOZ_ASSERT(strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n', "Expect '\n' as only character in this line") - triggered again during last (or only) chunk
Blocks: 628646

The!aNewassertion error described in comment 16 above occurred several times tonight while attempting to sync a gmail account. Had to commented it out to get it to work. However, no luck duplicating the problem observed in comment 33.

I got it again:
Assertion failure: strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n' (Expect '\n' as only character in this line), at comm/mailnews/imap/src/nsImapServerResponseParser.cpp:2803

But the call stack in gdb does not look useful so I'm not pasting it here.

Flags: needinfo?(acelists)

:aceman: Can you attach or at least describe (how big, mime structure etc) the email you were accessing when this happened? Also, may be helpful to know if the folder it resides in has offline store. Thanks.

It crashed while receiving the email (and showing the new mail notification with message excerpt, but it was completely empty without any text).

I don't know which message it is, but the TB window froze with statusbar saying downloading message 5 of 6. It may be the last message I see in the INBOX file. It is created in Outlook with the usual formatting mess.
I can't disclose the full message so if you could be more specific on which parts are needed.

Yes, the Inbox folder is set for "offline use".
Viewing that assumed last message again the TB does NOT cause a crash.

Parts probably not that important. So it was downloading 6 new messages. Can you tell the size (including any attachments) of the messages? You can add a Size and Order Received column and then sort by order received. Then tell me size and the order number (actually imap UID) for each of the 6 messages. That might help.

related to crash bug, so -> critical

Severity: major → critical

The crashing MOZ_ASSERT is actually trying to report problems in the data. I will revisit this but maybe it should be a warning of some sort (MOZ_WARN -- not sure if that exists) or remove from the MOZ_ASSERT. Or maybe my logic is wrong and it is falsely seeing a problem. TBD real soon.

Flags: needinfo?(gds)

Only asserting in debug builds. We've got various versions with this issue out in the wild without complaints. So it can't be critical ... but since "major" is gone, "critical" is the only choice that's left, unless we downgrade to "normal".

(In reply to Jorg K (GMT+2) from comment #43)

Only asserting in debug builds.

Yeah, I knew that.

We've got various versions with this issue out in the wild without complaints.

Is "in the wild" a release or just a beta?

So it can't be critical ... but since "major" is gone, "critical" is the only choice that's left, unless we downgrade to "normal".

Whatever you and Wayne think is fine by me...

Flags: needinfo?(gds)

(sorry about the delay - I've been mostly overseas and offline for the last couple of months)

I managed to track down the offending email from Comment 33 (and it still triggers today).
I can't see anything wrong with the email itself, certainly nothing that would obviously cause the google IMAP server to send it out so oddly.

In any case, I think hitting this in a release build causes the offending email to be skipped, which seems like a real bug-in-the-wild, not just an academic debug-build-only issue...

Next up: I'll try and replicate the bug in a more manageable-sized account (my gmail is huge and takes ages to sync down to TB). Then I can see what's really happening. Lots of IMAP logging in my near future, I think.

For reference:
it was tricky getting hold of the crashing email, but here's what ended up working for me:

I couldn't grep for it locally, because it didn't download onto my local drive (even when going back to a release build, or commenting out the assert).
In the end, I used the gmail archive facility to download everything, then grep through the mbox they sent me.

I wasn't sure if the mbox was being sanitised (by adding a trailing eol, say). So now I knew what email it was, I went back into the gmail webapp, found it by title/author, and saved the email source individually.

Ben, Thanks for finding the email. Are you saying if I append the attached .eml to an gmail imap folder and fetch it, I will see the problem? Haven't tried it yet, but I wlll.

(In reply to gene smith from comment #47)

Ben, Thanks for finding the email. Are you saying if I append the attached .eml to an gmail imap folder and fetch it, I will see the problem? Haven't tried it yet, but I wlll.

Well, that's the thing. I don't think it will exhibit the problem. I can't see anything wrong with that email which would cause problems.
Anyway, I'm just going to try it myself now on a smaller account, and try and work up a contained, replicable case. Will report back.

Flags: needinfo?(benc)

As I rather expected: importing the offending into a minimal gmail account (<10 messages in inbox) doesn't cause the assert to trigger when downloading messages.

Worse still, I can't get the assert to trigger any more on my regular gmail account! So I've lost even my slow & tedious replicable case :-(

I need to tie up some other overdue stuff, but this one really irks me, so I'll probably have another look at it. Probably a matter of reading the IMAP spec and the parsing code side by side and guessing what could have gone wrong...

The other day while working on another bug I actually saw the assert hit. Looked at the last stuff fetched in wireshark, which was running, and everything looked OK. Didn't see any split \r\n or orphaned \n. Only thing that looked a bit strange was that on each fetch the last "line" of data in a response didn't end in \r\n and the line was continued in the next response to the fetch. Checked other gmail responses and responses on other servers and they do the same thing, so seems OK. So tb joins lines correctly that are continued in the next response.

Added your (Ben's) "bad" email to the mix and still haven't seen the assert again. Also, been running under gdb with breakpoint on the MOZ_ASSERT line and haven't hit it again

Comment on attachment 9043793 [details] [diff] [review]
better-lastChunk-fix-then-original.diff

I'm pretty sure this is wrong so obsoleting it. Testing a new patch now. (O.T.: Still having problem with build. Have to restart it one or more times before the linker (ld.lld) starts running after compile completes. Otherwise just sits and does nothing. Is this a known problem? Can't find anything under `build system' in bugzilla.)
Attachment #9043793 - Attachment is obsolete: true
Attached patch 1494764-review-changes-v0.patch (obsolete) — Splinter Review

This should fix the problem with MOZ_ASSERT causing a crash with DEBUG defined. Now the condition that would have caused the assert (expecting a line with just a \n but seeing something else) is treated as a normal line and still processed and just a NS_WARNING is produced (prints only when DEBUG defined).

Also, when this is the first chunk of a new message, header or part, even if the previous chunk ended in \r so that this first chunk is expected to start with \n but definitely won't, this is now correctly handled using the origin parameter and won't cause the regression found in comment 10.

Note: Other than having a message end with just \r instead of \r\n, I was unable to determine how to cause the MOZ_ASSERT to trigger in the release version of this file. I did see it happen once but was unable to see a problem in wireshark (didn't have debugger running or save an imap log when it occurred). Anyhow, now if the problem occurs, the code will make a good faith effort to still process the offending line rather than just ignoring it.

Attachment #9082187 - Flags: review?(jorgk)
Comment on attachment 9082187 [details] [diff] [review]
1494764-review-changes-v0.patch

No functional changed. Want to eliminate some duplicate code. Will test again and submit update.
Attachment #9082187 - Attachment is obsolete: true
Attachment #9082187 - Flags: review?(jorgk)
Attached patch 1494764-review-changes-v1.patch (obsolete) — Splinter Review

No functional changes from obsoleted v0 patch. Got rid of duplicate calls to HandleMessageDownLoadLine(), both having same parameters. Added two "#ifdef DEBUG" enabled IMAP logging calls that may be useful to determine why the old MOZ_ASSERT was sometimes hitting. (Except for last chunk of a message ending in just \r, still don't know why.) Re-tested all conditions to make sure it still works correctly.

Attachment #9082506 - Flags: review?(jorgk)

Also, when this is the first chunk of a new message, header or part, even if the previous chunk ended in \r so that this first chunk is expected to start with \n but definitely won't, this is now correctly handled using the origin parameter and won't cause the regression found in comment 10.

Beyond message corruption, could this potentially cause other chunking issues, and crashes? (such as the "unexplained" bug 1565517)

Flags: needinfo?(gds)
OS: Linux → All

(In reply to Wayne Mery (:wsmwk) from comment #55)

Beyond message corruption, could this potentially cause other chunking issues, and crashes? (such as the "unexplained" bug 1565517)

Wayne, I don't a relation. I've never seen that crash although I've been cc'd on it and saw Jorg's attempted fix. I don't see where it even gets into msg_fetch_literal() where the whole message or chunks of a message are handled. Any idea what it takes to duplicate the unexplained bug?

Flags: needinfo?(gds)

Looks plausible. I needed to do a bit of reformatting. I'll ask a question in the next comment. BTW, you only need to add "clang-format off" if it does something you don't like, so try without it first.

Attachment #9082506 - Attachment is obsolete: true
Attachment #9082506 - Flags: review?(jorgk)
Attachment #9082909 - Flags: review+
Comment on attachment 9082909 [details] [diff] [review]
1494764-review-changes-v1.patch - reformatted

Review of attachment 9082909 [details] [diff] [review]:
-----------------------------------------------------------------

Should we remove the #ifdef DEBUG that surrounds the logging? Other code doesn't do that:
https://searchfox.org/comm-central/search?q=mozilla%3A%3ALogLevel%3A%3ADebug&case=false&regexp=false&path=

::: mailnews/imap/src/nsImapServerResponseParser.cpp
@@ +2685,5 @@
>    // as many octets as we requested, this must be the last or only chunk
>    bool lastChunk = (!chunk || (numberOfCharsInThisChunk !=
>                                 fServerConnection.GetCurFetchSize()));
>  
>  #ifdef DEBUG

Why the conditional compile? Don't we always want to see the logging? If there's a user in the field with a problem, how are we going to get a debug version to them?

@@ +2814,5 @@
> +            // process the line. This should theoretically not occur but rarely,
> +            // and for yet to be determined reasons, it does. Logging may help.
> +            NS_WARNING(
> +                "'\\n' is not the only character in this line as expected!");
> +#ifdef DEBUG

And here.

@@ +2822,5 @@
> +#endif
> +          } else {
> +            // Discard the line containing only \n.
> +            processTheLine = false;
> +#ifdef DEBUG

And here.

The #ifdef DEBUG at the top always had it conditional. Not sure why that was deemed only needed by users with DEBUG builds so I left it as-is. The other two, I don't know, I just kind of thought they might only be of interest for development builds with DEBUG defined, since that's who saw the original problem, AFAIK. Anyhow, ok to remove the #ifdef DEBUG's on all 3 if you prefer. Maybe, if this fixes it, which I think it will, someone can go in a take out the 2nd two and save a few bytes.

I have no idea what the "// clang-format on/off" things do. But someone put them on the top #ifdef DEBUG so I just did the same thing on the ones I added.

O.T: Thanks for the "push to try" advice. I haven't tried what you said but I will later.

OK, I'll take the #ifdef DEBUG out.

// clang-format off switched clang-formatting off, to manually fine-tune the formatting in some places. Where you added it, it wasn't necessary.

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/0e190d24849f
Removed MOZ_ASSERT but now still process line where it would occur. r=jorgk

Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: Thunderbird 65.0 → Thunderbird 70.0
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/c0c1b35d4b8b
Follow-up: tweak clang-format directive. r=me
Comment on attachment 9082909 [details] [diff] [review]
1494764-review-changes-v1.patch - reformatted

Time for an uplift.
Attachment #9082909 - Flags: approval-comm-esr68+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: