MOZ_ASSERT(strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n', "Expect '\n' as only character in this line") - triggered again during last (or only) chunk
Categories
(MailNews Core :: Networking: IMAP, defect)
Tracking
(thunderbird_esr6871+ fixed, thunderbird70 fixed)
People
(Reporter: aceman, Assigned: gds)
References
()
Details
Attachments
(3 files, 4 obsolete files)
7.89 KB,
text/plain
|
Details | |
12.23 KB,
message/rfc822
|
Details | |
6.21 KB,
patch
|
jorgk-bmo
:
review+
jorgk-bmo
:
approval-comm-esr68+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1451137 +++ I just let a testing IMAP account sync messages and regularly get this TB crash: Assertion failure: strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n' (Expect ' ' as only character in this line), at mailnews/imap/src/nsImapServerResponseParser.cpp:3209 It is supposed to be fixed in bug 1451137, but I still get it on TB trunk.
Assignee | ||
Comment 1•6 years ago
|
||
A few months ago I also saw this error on trunk. I have a fix for it stashed away but haven't revisited it yet or submitted a patch. I think it was caused by fetching a series of messages, each as a single chunk, and on one message the last line of the email was corrupted so that it ended in just \r and not \r\n. This caused the next email to be fetched to be assumed to have just a \n for the first line that triggers the assert. I will test this again, duplicate it and fix it.
Assignee | ||
Comment 2•6 years ago
|
||
I re-duplicated the problem I saw and tested the patch and it fixes the assert that I saw. If the assert still occurs after this patch applied, I will need more info such as the source for the email that causes the assert and other steps to causes the problem, if possible. The problem I was seeing was because the last line of the email stored on the server ended in just \r and not \r\n. So this was being treated as a split \r\n and I was expecting to see the \n as the first line of the next chunk. But this was last chunk of message A. When I looked at message B, the assert hit because I was expecting to see just \n but instead see a normal full line.
Assignee | ||
Updated•6 years ago
|
Comment 3•6 years ago
|
||
Comment on attachment 9019931 [details] [diff] [review] 1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch Looks reasonable. I'll run the unit tests for you locally.
Comment 4•6 years ago
|
||
What's the functional problem mentioned in the commit comment? Does this need backport to TB 60.x?
Assignee | ||
Comment 5•6 years ago
|
||
The problem is that without the assert to cause a crash the next email opened after the one that ends with just \r fails to display. I didn't test that this time but I will to to be sure. Will let you know ASAP.
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/7a29a83fb1ab Ignore missing \n on last line of last chunk. r=jorgk
Updated•6 years ago
|
Comment on attachment 9019931 [details] [diff] [review] 1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch Review of attachment 9019931 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. I have not seen the crash since this was landed. It may also happen the problem message is no longer on the server. I'll speak up if it happens again.
Comment 8•5 years ago
|
||
Comment on attachment 9019931 [details] [diff] [review]
1494764-ignore-corrupted-eol-on-last-line-of-last-chunk.patch
Time to move this to ESR 60.
Comment 9•5 years ago
|
||
TB 60.5.0 ESR:
https://hg.mozilla.org/releases/comm-esr60/rev/88bb91a8fb572660dcec0c60f3c95480544dbde7
Comment 10•5 years ago
|
||
Hmm, could this be causing problems: https://support.mozilla.org/en-US/questions/1249414 ?
We're doing a TB 60.5.1 next week, so I can revert that change.
Comment 11•5 years ago
|
||
Note that this is essentially a one line change:
- if (charsReadSoFar > numberOfCharsInThisChunk)
+ if (!lastChunk && (charsReadSoFar > numberOfCharsInThisChunk))
The rest of the changes are comments and a string change.
Comment 12•5 years ago
|
||
Could you just build it Jorg and I can ask him to test?
Comment 13•5 years ago
|
||
That's possible, but I think it's pretty obvious: Change in chunking causes chunking issue. I can do a build, but I need to know the platform. Win, Mac, Linux, 32 or 64 bits? I've offered try builds to SUMO people before and they refused since Windows blocks those binaries.
Assignee | ||
Comment 14•5 years ago
|
||
Oops, thought I was referring to this bug when I referenced the earlier bug with a similar title: bug 1451137. Anyhow, I can check this again with large binary non-inline attachments (e.g., pdf) since I don't see that the support reporter's file types are important and I don't have the microsoft apps to produce them.
Re: https://support.mozilla.org/en-US/questions/1249414
Comment 15•5 years ago
|
||
Any chance to get an answer here quickly. I'm getting the code for TB 60.5.1 ready right now and I need to know whether to pull this change out or not.
Assignee | ||
Comment 16•5 years ago
|
||
No luck duplicating the problem and about ready to call it a day. Had the following info queued up to post when I just now saw your message come in. If you want to back it out I guess it's OK and I will still try again tomorrow to duplicate and fix the problem.
Found a large docx file (19M) that I attached and save to a message in an imap folder (no offline storage) but it fetches it in one big chunk and not multiple 98k chunks. Opens OK in Libre Office.
Another test message with an abstract art picture from the original signature bug opens fine with "close to trunk" version pulled fairly recently.
But I'm still a not sure when chunking actually occurs. I think it happen when you are fetching the whole "not modified" message and you are storing the whole thing. If individual parts are accessed, they are fetched completely and not in chunks.
I did see this ram caching error several times:
Assertion failure: !aNew (Logic error: Trying to read part from entire message which doesn't exist),
at /home/gene/mozilla/comm/mailnews/imap/src/nsImapProtocol.cpp:9422
Seems to occur when I copy the large message from one folder to another and then try to open it before it has completely loaded. I think this is a known problem that I can fix with my prototype ram cache changes that I haven't yet submitted.
Comment 17•5 years ago
|
||
(In reply to Matt from comment #12)
Could you just build it Jorg and I can ask him to test?
Comment posted at SUMO to try
https://queue.taskcluster.net/v1/task/AcRd5mSOROO3yWDvEyrmSQ/runs/0/artifacts/public/build/install/sea/target.installer.exe
Assignee | ||
Comment 18•5 years ago
|
||
Still no luck duplicating the problem. Like I've observed in the past(e.g., bug 1216951 comment29), I rarely see spontaneous fetching in multiple chunks, but typically see the fetch occur as a single large "chunk", even for attachments that exceed the chunk threshold, 98k. But not sure that a single chunk might not be a problem too since it still uses the msg_fetch_literal() code that handles multiple chunks. However, haven't seen a problem when the single big chunk is fetched -- the docx attachment opens OK.
But I have found a reliable way to "force" chunking: set mime_parts_on_demand (2 places) both to false. Then put the emails with big docx attachments into a folder that has no offline storage. Each time you clear cache and access the messages they are fetched in multiple chunks. I see no problem opening the docx file or or seeing an even larger png attachment when fetched in chunks.
I also set the chunk size smaller than the default, 65535 down to 4097, so that many more chunks occur and have no problem opening attachments.
I will try some more, but at this point I will ask the SUMO reporter to send me a sample email that supposedly won't open.
Jorg, I send a sample email to the tbtest account that you can try to open if you please. It is subject "huge docx".
Comment 19•5 years ago
|
||
That document opens fine, but then the inbox is synchronised for offline use. As you said, there must be very special circumstances that trigger the bug (if it is indeed related to the change here).
Maybe the way to experiment is to add some debug to the code to see which if/else branch runs and whether reverting the change causes a change in behaviour.
We made this change for a reason, see comment #5 (although maybe not 100% convincing).
Assignee | ||
Comment 20•5 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #19)
We made this change for a reason, see comment #5 (although maybe not 100% convincing).
I tried to correlate what I saw at one time to acemann's report. I probably should have asked aceman for a sample email when he reported this.
Comment 21•5 years ago
|
||
Backout from all branches:
C-C: https://hg.mozilla.org/comm-central/rev/f4c292c1c0c36e18ff5c69bb8c94678381ced29c
C-B: https://hg.mozilla.org/releases/comm-beta/rev/5ce15c6f7f465d93a0acc02ebd73527033c1c7bb
C-ESR60: https://hg.mozilla.org/releases/comm-esr60/rev/f4619c9515b086c39b47e63a0e8f213563775171
Looks like this is causing chunking errors, see bug 570914 comment #98.
Assignee | ||
Comment 23•5 years ago
|
||
The problem is that on the last or only chunk of a message, some servers (e.g., dovecot, openwave) end like this with 2 separate lines:
the last literal data\r\n
)\r\n
and others (e.g., gmail) like this with one line:
the last literal data)\r\n
In the later case, with 60.5 the closing ')'
on the literal data was never removed and caused the problem. It is correctly removed with revert in 60.5.1.
(More reported info for this bug can be found starting at probably unrelated bug 570914 comment 96 and in bug 1527632 that I have marked as duplicate of this.)
Comment 24•5 years ago
|
||
Thanks for looking into it. We love server specific problems :-(
Assignee | ||
Comment 25•5 years ago
|
||
At this point I probably don't suggest changing this again, at least until the original bug that this was supposed to fix is seen again. However, for possible future use, here is another way to fix the problem.
The problem with the original patch for this bug is that on the last (or only) chunk for a part or full download, if the amount downloaded exceeds the expected chunk size, as it will when the last line includes the terminating ')'
, the code to remove the')'
is skipped causing the')'
to look like another literal and be displayed.
With this diff, if the last line of the last (or only) chunk contain the terminating ')'
and possibly a missing\n
, the last line will still be repaired but thefNextChunksStartsWithNewLine
bool will be reset to false for the next part or full download and avoid the assert.
I will continue to watch for the subject assert failure and other possible problems with and without this diff in place.
Comment 29•5 years ago
|
||
Gene, quite a few problems reported where for when this fix was out in the wild. Any progress?
Comment 30•5 years ago
|
||
I can confirm that 60.5.1 fixed this.
Assignee | ||
Comment 31•5 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #29)
Gene, quite a few problems reported where for when this fix was out in the wild. Any progress?
See comment 25. Otherwise, don't know what progress you're asking about. I have only seen the problem as described in comment 1 on an imported *.eml I probably corrupted using vi editor.
Comment 32•5 years ago
•
|
||
OK, well, yes, we need to fix the assert failure locally. Now that we backed this out, is it back, Aceman?
Comment 33•5 years ago
|
||
I hit this one just now too.
Pretty fuzzy STR:
I had a couple of gmail IMAP accounts configured. One was all synced, the other was still downloading messages. I was clicking wildly on messages, using 'n', 'p', cursor keys (trying to trigger another bug :-)... and hit this assert.
I was in the debugger at the time, so I've attached a stacktrace and dumped the contents of *this, just in case.
Assignee | ||
Comment 34•5 years ago
|
||
Ben, Thanks for the report. Was the gmail account downloading messages a newly set-up account so all messages were being downloaded? Or was it just a couple of new messages?
Also, can you possibly attach or at least describe the email that seemed to cause the assert:
$1 = 0x7fffd3985ab0 "Content-Type: multipart/alternative; boundary=0016e6d99fb959a7930482a13924\r\n"
Maybe the boundary number is unique in the account so you might be able to find it in the mbox file. If you don't want to attach it, maybe you can tell me its size and number and type of attachments.
Comment 35•5 years ago
|
||
Ben, see comment 34
(bug 1264302 has dependencies where I want to asses the effect of chunking patches, so temporarily making this block bug 1264302)
Assignee | ||
Comment 36•5 years ago
•
|
||
The!aNew
assertion error described in comment 16 above occurred several times tonight while attempting to sync a gmail account. Had to commented it out to get it to work. However, no luck duplicating the problem observed in comment 33.
Reporter | ||
Comment 37•5 years ago
|
||
I got it again:
Assertion failure: strlen(fCurrentLine) == 1 && fCurrentLine[0] == '\n' (Expect '\n' as only character in this line), at comm/mailnews/imap/src/nsImapServerResponseParser.cpp:2803
But the call stack in gdb does not look useful so I'm not pasting it here.
Assignee | ||
Comment 38•5 years ago
•
|
||
:aceman: Can you attach or at least describe (how big, mime structure etc) the email you were accessing when this happened? Also, may be helpful to know if the folder it resides in has offline store. Thanks.
Reporter | ||
Comment 39•5 years ago
|
||
It crashed while receiving the email (and showing the new mail notification with message excerpt, but it was completely empty without any text).
I don't know which message it is, but the TB window froze with statusbar saying downloading message 5 of 6. It may be the last message I see in the INBOX file. It is created in Outlook with the usual formatting mess.
I can't disclose the full message so if you could be more specific on which parts are needed.
Yes, the Inbox folder is set for "offline use".
Viewing that assumed last message again the TB does NOT cause a crash.
Assignee | ||
Comment 40•5 years ago
|
||
Parts probably not that important. So it was downloading 6 new messages. Can you tell the size (including any attachments) of the messages? You can add a Size and Order Received column and then sort by order received. Then tell me size and the order number (actually imap UID) for each of the 6 messages. That might help.
Assignee | ||
Comment 42•5 years ago
|
||
The crashing MOZ_ASSERT is actually trying to report problems in the data. I will revisit this but maybe it should be a warning of some sort (MOZ_WARN -- not sure if that exists) or remove from the MOZ_ASSERT. Or maybe my logic is wrong and it is falsely seeing a problem. TBD real soon.
Comment 43•5 years ago
|
||
Only asserting in debug builds. We've got various versions with this issue out in the wild without complaints. So it can't be critical ... but since "major" is gone, "critical" is the only choice that's left, unless we downgrade to "normal".
Assignee | ||
Comment 44•5 years ago
•
|
||
(In reply to Jorg K (GMT+2) from comment #43)
Only asserting in debug builds.
Yeah, I knew that.
We've got various versions with this issue out in the wild without complaints.
Is "in the wild" a release or just a beta?
So it can't be critical ... but since "major" is gone, "critical" is the only choice that's left, unless we downgrade to "normal".
Whatever you and Wayne think is fine by me...
Comment 45•5 years ago
|
||
(sorry about the delay - I've been mostly overseas and offline for the last couple of months)
I managed to track down the offending email from Comment 33 (and it still triggers today).
I can't see anything wrong with the email itself, certainly nothing that would obviously cause the google IMAP server to send it out so oddly.
In any case, I think hitting this in a release build causes the offending email to be skipped, which seems like a real bug-in-the-wild, not just an academic debug-build-only issue...
Next up: I'll try and replicate the bug in a more manageable-sized account (my gmail is huge and takes ages to sync down to TB). Then I can see what's really happening. Lots of IMAP logging in my near future, I think.
Comment 46•5 years ago
|
||
For reference:
it was tricky getting hold of the crashing email, but here's what ended up working for me:
I couldn't grep for it locally, because it didn't download onto my local drive (even when going back to a release build, or commenting out the assert).
In the end, I used the gmail archive facility to download everything, then grep through the mbox they sent me.
I wasn't sure if the mbox was being sanitised (by adding a trailing eol, say). So now I knew what email it was, I went back into the gmail webapp, found it by title/author, and saved the email source individually.
Assignee | ||
Comment 47•5 years ago
|
||
Ben, Thanks for finding the email. Are you saying if I append the attached .eml to an gmail imap folder and fetch it, I will see the problem? Haven't tried it yet, but I wlll.
Comment 48•5 years ago
|
||
(In reply to gene smith from comment #47)
Ben, Thanks for finding the email. Are you saying if I append the attached .eml to an gmail imap folder and fetch it, I will see the problem? Haven't tried it yet, but I wlll.
Well, that's the thing. I don't think it will exhibit the problem. I can't see anything wrong with that email which would cause problems.
Anyway, I'm just going to try it myself now on a smaller account, and try and work up a contained, replicable case. Will report back.
Comment 49•5 years ago
|
||
As I rather expected: importing the offending into a minimal gmail account (<10 messages in inbox) doesn't cause the assert to trigger when downloading messages.
Worse still, I can't get the assert to trigger any more on my regular gmail account! So I've lost even my slow & tedious replicable case :-(
I need to tie up some other overdue stuff, but this one really irks me, so I'll probably have another look at it. Probably a matter of reading the IMAP spec and the parsing code side by side and guessing what could have gone wrong...
Assignee | ||
Comment 50•5 years ago
•
|
||
The other day while working on another bug I actually saw the assert hit. Looked at the last stuff fetched in wireshark, which was running, and everything looked OK. Didn't see any split \r\n or orphaned \n. Only thing that looked a bit strange was that on each fetch the last "line" of data in a response didn't end in \r\n and the line was continued in the next response to the fetch. Checked other gmail responses and responses on other servers and they do the same thing, so seems OK. So tb joins lines correctly that are continued in the next response.
Added your (Ben's) "bad" email to the mix and still haven't seen the assert again. Also, been running under gdb with breakpoint on the MOZ_ASSERT line and haven't hit it again
Assignee | ||
Comment 51•5 years ago
|
||
Comment on attachment 9043793 [details] [diff] [review] better-lastChunk-fix-then-original.diff I'm pretty sure this is wrong so obsoleting it. Testing a new patch now. (O.T.: Still having problem with build. Have to restart it one or more times before the linker (ld.lld) starts running after compile completes. Otherwise just sits and does nothing. Is this a known problem? Can't find anything under `build system' in bugzilla.)
Assignee | ||
Comment 52•5 years ago
|
||
This should fix the problem with MOZ_ASSERT causing a crash with DEBUG defined. Now the condition that would have caused the assert (expecting a line with just a \n but seeing something else) is treated as a normal line and still processed and just a NS_WARNING is produced (prints only when DEBUG defined).
Also, when this is the first chunk of a new message, header or part, even if the previous chunk ended in \r so that this first chunk is expected to start with \n but definitely won't, this is now correctly handled using the origin parameter and won't cause the regression found in comment 10.
Note: Other than having a message end with just \r instead of \r\n, I was unable to determine how to cause the MOZ_ASSERT to trigger in the release version of this file. I did see it happen once but was unable to see a problem in wireshark (didn't have debugger running or save an imap log when it occurred). Anyhow, now if the problem occurs, the code will make a good faith effort to still process the offending line rather than just ignoring it.
Assignee | ||
Comment 53•5 years ago
|
||
Comment on attachment 9082187 [details] [diff] [review] 1494764-review-changes-v0.patch No functional changed. Want to eliminate some duplicate code. Will test again and submit update.
Assignee | ||
Comment 54•5 years ago
|
||
No functional changes from obsoleted v0 patch. Got rid of duplicate calls to HandleMessageDownLoadLine(), both having same parameters. Added two "#ifdef DEBUG" enabled IMAP logging calls that may be useful to determine why the old MOZ_ASSERT was sometimes hitting. (Except for last chunk of a message ending in just \r, still don't know why.) Re-tested all conditions to make sure it still works correctly.
Comment 55•5 years ago
|
||
Also, when this is the first chunk of a new message, header or part, even if the previous chunk ended in \r so that this first chunk is expected to start with \n but definitely won't, this is now correctly handled using the origin parameter and won't cause the regression found in comment 10.
Beyond message corruption, could this potentially cause other chunking issues, and crashes? (such as the "unexplained" bug 1565517)
Assignee | ||
Comment 56•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #55)
Beyond message corruption, could this potentially cause other chunking issues, and crashes? (such as the "unexplained" bug 1565517)
Wayne, I don't a relation. I've never seen that crash although I've been cc'd on it and saw Jorg's attempted fix. I don't see where it even gets into msg_fetch_literal() where the whole message or chunks of a message are handled. Any idea what it takes to duplicate the unexplained bug?
Comment 57•5 years ago
|
||
Looks plausible. I needed to do a bit of reformatting. I'll ask a question in the next comment. BTW, you only need to add "clang-format off" if it does something you don't like, so try without it first.
Comment 58•5 years ago
|
||
Comment on attachment 9082909 [details] [diff] [review] 1494764-review-changes-v1.patch - reformatted Review of attachment 9082909 [details] [diff] [review]: ----------------------------------------------------------------- Should we remove the #ifdef DEBUG that surrounds the logging? Other code doesn't do that: https://searchfox.org/comm-central/search?q=mozilla%3A%3ALogLevel%3A%3ADebug&case=false®exp=false&path= ::: mailnews/imap/src/nsImapServerResponseParser.cpp @@ +2685,5 @@ > // as many octets as we requested, this must be the last or only chunk > bool lastChunk = (!chunk || (numberOfCharsInThisChunk != > fServerConnection.GetCurFetchSize())); > > #ifdef DEBUG Why the conditional compile? Don't we always want to see the logging? If there's a user in the field with a problem, how are we going to get a debug version to them? @@ +2814,5 @@ > + // process the line. This should theoretically not occur but rarely, > + // and for yet to be determined reasons, it does. Logging may help. > + NS_WARNING( > + "'\\n' is not the only character in this line as expected!"); > +#ifdef DEBUG And here. @@ +2822,5 @@ > +#endif > + } else { > + // Discard the line containing only \n. > + processTheLine = false; > +#ifdef DEBUG And here.
Assignee | ||
Comment 59•5 years ago
|
||
The #ifdef DEBUG at the top always had it conditional. Not sure why that was deemed only needed by users with DEBUG builds so I left it as-is. The other two, I don't know, I just kind of thought they might only be of interest for development builds with DEBUG defined, since that's who saw the original problem, AFAIK. Anyhow, ok to remove the #ifdef DEBUG's on all 3 if you prefer. Maybe, if this fixes it, which I think it will, someone can go in a take out the 2nd two and save a few bytes.
I have no idea what the "// clang-format on/off" things do. But someone put them on the top #ifdef DEBUG so I just did the same thing on the ones I added.
O.T: Thanks for the "push to try" advice. I haven't tried what you said but I will later.
Comment 60•5 years ago
|
||
OK, I'll take the #ifdef DEBUG out.
// clang-format off switched clang-formatting off, to manually fine-tune the formatting in some places. Where you added it, it wasn't necessary.
Comment 61•5 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/0e190d24849f
Removed MOZ_ASSERT but now still process line where it would occur. r=jorgk
Updated•5 years ago
|
Comment 62•5 years ago
|
||
Pushed by mozilla@jorgk.com: https://hg.mozilla.org/comm-central/rev/c0c1b35d4b8b Follow-up: tweak clang-format directive. r=me
Comment 63•5 years ago
|
||
Comment on attachment 9082909 [details] [diff] [review] 1494764-review-changes-v1.patch - reformatted Time for an uplift.
Comment 64•5 years ago
|
||
TB 68.3 ESR:
https://hg.mozilla.org/releases/comm-esr68/rev/1a7e9ae905c44101df6e3e075478733f1eb75b88
https://hg.mozilla.org/releases/comm-esr68/rev/bdc4480b897346a57f468a71af3b75cf8fb8ca75
Description
•