Closed Bug 58114 Opened 25 years ago Closed 25 years ago

Straighten out the MIME encoded-word unfolding TAB mess

Categories

(MailNews Core :: MIME, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla0.9

People

(Reporter: nhottanscp, Assigned: jgmyers)

References

Details

(Keywords: intl, Whiteboard: [nsbeta1+])

Attachments

(7 files)

When a message header is folded into multiple lines. A tab can be inserted after CR,LF. The current code depends on CR,LF existance to remove tab, MIME_StripContinuations. http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimehdrs.cpp#856 IMAP server may return headers with CR,LF removed but not tab. MIME_StripContinuations does not recognitze to remove tabs for that case. We need a better line unfolind implementation in client.
Blocks: 38340
Blocks: 53231
Status: NEW → ASSIGNED
Target Milestone: --- → M23
Component: Internationalization → MIME
Keywords: intl
Reassign to putterman. This is a MIME/mail backend issue.
Assignee: nhotta → putterman
Status: ASSIGNED → NEW
Keywords: nsbeta1
marking nsbeta1+ and moving to mozilla0.8. Reassigning to ducarroz. When we were talking about this in the triage meeting, we were told this happens in the subject field of the compose window.
Assignee: putterman → ducarroz
Whiteboard: [nsbeta1+]
Target Milestone: --- → mozilla0.8
*** Bug 38340 has been marked as a duplicate of this bug. ***
*** Bug 53231 has been marked as a duplicate of this bug. ***
QA contact to ji.
QA Contact: momoi → ji
moving to mozilla0.9
Target Milestone: mozilla0.8 → mozilla0.9
John, I think we are ready to move on this bug. We have some test cases which show the following problems: 1. Extraneous spaces are inserted into a quoted header (e.g. Subject line) when a reply message l is being composed. Some Japanese test cases are available. 2. The extraneous spaces are showing in the Window title of a mail message in the message view window. If these are test cases you have in mind, we can upload them here. If these 2 are not enough, can you tell us what welse we should be looking at?
Comment by nhotta from Bug 51453 imported into this bug. ---- ------- Additional Comments From nhotta@netscape.com 2001-01-16 17:44 ------- I have asked IQA to prepare a test data and attach to bug 58114. For code review, please ask ducarroz, he is a module owner of libmime (and bug 58114). ------- End of Additional Comments From nhotta@netscape.com 2001-01-16 17:44 -------
This bug has a definite target fix version and both the internartional and mail team agree on its importance. If jgmyers can provide a patch proposal, that will be great. IQA will be providing some test cases.
This test case set contains 2 msgs with along subject line. Copy the file into your local mail folder and copy to an IMAP server. Look for 3 different palces for breakage. 1. Window title on View window and Composer window. You will see an extraneous dot where there is a linebreak. 2. Reply/quoted subject header. You will see an extraneous space insrted. 3. Task menu -- at the bottom, you see the subject the mesage being displayed. You will see a vertical bar here.
This test case contains 1 msg. It shows the problem of extraneous space in the IMAP thread pane display only. I don't know nhotta's patch does not work in this case. But it does exhit 2 out of the 3 problems mentioned above, i.e. 1 and 2. Because of the location of break, we cannot tell if the porblem 3 exists. The problem is visible only with IMAP messages.
With the nature of fix for this bug, it would be a good idea to run the international smoketest test suite (5 msgs). Copy the file below into the Local folder and copy from there to an IMAP account. Check for the header display in 1. Latin 1 2, 3 -- Japanese headers 4 -- UTF-8 (Japanese) header 5 -- Latin 1 with Euro symbol.
It seems that Bug 64948 might be a duplicate of this bug. If so, please mark that one a dupliucate of this one. It could also be the fix proposed for this problem can fix the problem reported in Bug 51453.
If anything, bug 64948 blocks this bug. Bug 64948 applies even for TAB characters not preceeded by CRLF. This bug as originally stated is INVALID. Per RFC 822, a CR LF TAB sequence is semantically equivalent to a single TAB character. Code which removes the TAB character as a consequence of unfolding the line violates the standard. There is a separate rule in the MIME encoded-word specification that states that when (after applying the line unfolding rule) there are two encoded-words separated with a sequence consisting entirely of TAB and/or SP characters, the TAB/SP characters are removed. The current code violates this requirement of the standard as it only does this when such TAB/SP characters are preceeded by CRLF. I suggest this bug be renamed something like "Straighten out the MIME encoded- word unfolding TAB mess".
Technically, the two messages in the 01/16/01 18:54 attachment each contain illegal MIME in the Subject: header. The first message contains the string: =?ISO-2022-JP?B?GyRCTFwkTjlUJE5OYyRHJDkhIzFROGwkThsoQg==?=Mail This violates RFC 2047 section 5 (1): "However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded- word' or 'text' by 'linear-white-space'." There is, however, no problem in having the code go ahead and interpret this as an encoded-word followed by "Mail". There is, however, a semantically meaningful SP followed by a semantically meaningful TAB between "Mail" and the following encoded-word. Similarly, the second message contains the technically illegal string: =?GB2312?B?tcTN+NW+?=is
Changed the summary line per jgmyers' suggestion. Given the discussions that had gone on in other bugs referred to in this bug, it makes sense to try more general fixes to straighten out the issues surrounding these extraneous character displays. nhotta checked in a workaround solution for M18 (NS6) due to the shortness of time available but the general solution in this bug should be able to supercede that patch if that is the best thing to do. For a bug management reason, we need to choose one bug for the Tab messes and fix all known problems in it. We are proposing that that bug be this one. We could choose Bug 64948 for that purpose but this bug has already gone through triaging. Thus it would be preferable to change the summary. That is done now.
Summary: Remove tab properly for line unfolding message headers → Straighten out the MIME encoded-word unfolding TAB mess
> There is, however, no problem in having the code go ahead and interpret this as > an encoded-word followed by "Mail". There is, however, a semantically meaningful > SP followed by a semantically meaningful TAB between "Mail" and the following > encoded-word. > Similarly, the second message contains the technically illegal string: Both of these messages were composed using a current Mozilla build. When I composed these msgs, I creatd a subject line like the following: xxxxxyyyyyzzz Mail aaaaa where the sequences 'xxxxxyyyyyzzz' and 'aaaaa' are in Japanese while 'Mail' is in English. Apparently it got transformed into: xxxxxyyyyyzzzMail aaaaa losing a SP in front of the character 'Mail' in the process. I may not have inserted any spaces surrounding 'is' in the 2nd Chinese example. I tried Communicator 4 with the same subject lines and the behavior there seems consistent -- if I had 2 SPs surrounding the English word, I got 2 SPs in the received msgs. If I did not have any SPs, then there were no SPs in the received msgs. Since there seems to be no way to tell the user to insert spaces before and after the English word in case the user is mixing language scripts without any spaces, it would seem to be the responsibility of the mail program to insert the spaces where required by RFC 2047. If so, I will file another bug requesting that we conform to RFC 2047. John, please confirm if a new bug should be filed.
> This violates RFC 2047 section 5 (1): "However, an 'encoded-word' that appears in > a header field defined as '*text' MUST be separated from any adjacent 'encoded- > word' or 'text' by 'linear-white-space'." Let me make some additional comments on this. If you look at CJ mixed script messages, I suspect that the majority of mail programs would not be following this part of RFC 2047 in generating MIME headers. Communicator 4.x does not, neither does Mozilla nor Outlook Express. In Japanese or Chinese, space is usually not used to separate words, thus it is perfectly OK to have 'ZZZMailYYY' where ZZZ and YYY in Japanese while 'Mail' part is in ASCII. Now we can follow RFC 2047 to insert a space between ZZZ & Mail and Mail & YYY. But in that case, how can a decoding agent know that that space was there originally or or inserted by the mail program? I get the sense that this portion of the RFC was written with no thought given to languages like Japanese. Factually speaking there are tons of messages with this illegal sequence in Japanese or Chinese. So at least we have to decode it properly whether or not RFC 2047 is violated. Otherwise we will break the display of a large number of messages. > There is, however, no problem in having the code go ahead and interpret this as > an encoded-word followed by "Mail". Given the foregoing discussion, I agree that this is what we need to do. I would like to suggest that if we want to conform to RFC 2047 in generating message headers, then we should file a separate bug and not deal with it here. But in filing a new bug, we need todebate if this part of RFC 2047 is truly enforceable in Japanese and Chinese mixed script messages. Taka, maybe you have some thought on this particular question?
I'm working on this. The squares that appeared the last time I proposed a patch were probably caused by bug 23635, so marking this bug dependent on that.
Assignee: ducarroz → jgmyers
Depends on: 23635, 64948
Attached patch proposed fixSplinter Review
Blocks: 65702
Who should be reviewing the proposed fix?
Status: NEW → ASSIGNED
Keywords: review
Please ask ducarroz for the review, he is the current module owner of MIME.
apart the following typo: - Content-Disposition: XXX; filename=NAME (RFC 1521/1806) + Content-Disposition: XXX; filename=NaAME (RFC 1521/1806) looks good. I suppose you did a lot of testing. Right? If it's the case R=ducarroz
I've only been able to test against the mailboxes attached to this bug. I don't have facilities to test i18n attachment filenames. Kat?
sr=sspitzer (don't forget about the typo that ducarroz mentioned.)
so, is there agreement about the patch fixing the cases it's supposed to correct?
Patch has been tested against all attached test cases. I believe we have agreement.
Fix checked in.
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Checked 2001-02-08-06-mtrunk build. There are still some problems displaying the messages in 01/16/01 18:54 attachment although there are no problems to display the other testcases attached to this report. The first message in 01/16/01 18:54 attachement has a space between ascii "Mail" and following Japanese character, but in the window title a dot is displayed in the place of the space. And in the reply compose window to this message, an extra space is inserted between "Mail" and the Ja character. I'll attach a screen shot for this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The window title weirdness is bug 64948.
Then I'll leave this bug as "fixed", and will verify this after bug 64948 gets fixed.
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Per the 822/MIME specs, there is a SPACE and then a TAB after the "Mail". In the mail compose window, the TAB is displaying as a tab stop.
QA contact to marina, could you verify this after 64948 gets fixed? Thanks.
QA Contact: ji → marina
Esther, i'll verify this bug after you'll verify bug # 64948
Product: MailNews → Core
No longer depends on: 64948
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: