Closed Bug 38192 Opened 24 years ago Closed 24 years ago

IMAP: Displays line-folding Tab character unnecessarily in Subject headers

Categories

(MailNews Core :: Internationalization, defect, P4)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 38340

People

(Reporter: marina, Assigned: nhottanscp)

Details

(Whiteboard: [nsbeta3+][PDTP4])

Attachments

(8 files)

***** observed with 2000-05-04-12 build *****
Steps to reproduce:
-open new mail composition;
- paste some text into the Subject header;
-use Backspace to delete some chars and replace them with different ones;
-send and get message;
//note: a question mark sign is added where you began to type new chars
re-assigning to the compose guru.
Assignee: mscott → ducarroz
I'm not able to reproduce this using 2000-05-04-12m16 commercial build on NT 4.0
in either plain text or html composition window using various tries of html
attribute text, plain text, spaces in pasted text, no spaces in pasted text,
etc.

If this is still reproducible for you, Marina, maybe you could detail what kind
of text string you're pasting into subject and what format you're sending it in.
Maybe there's some detail I'm missing.
QA Contact: lchiang → laurel
yes, it is reproducable and you have to type text in Subject header not in the 
mail body, so there is no dependancy on Plain text or HTML compose.
Perhaps it depends on the content of the text being pasted.  Marina - can you 
include that here?
Component: Mail Back End → Composition
it is 100% reproducable with 2000-05-04-09 (the first morning build) and i can 
not reproduce it with the second one (2000-05-04-12).So we can mark it as 
WORKSFORME and i'll keep my eye on it (in case it'll be seen again i'll reopen)
My bad, it is reproducable in the second build but looks like i18n problem only 
(can not reproduce it with us-ascii data), so changing the description and 
component.
Now with new tests i'm changing the steps to reproduce it:
Steps to reproduce:
- open new mail;
- type a string containing non-ascii chars ending by comma (looks like it is an 
important condition)into the body;
- now select the typed text and copy/paste it into the mail header several times
(meaning do not add any space, just hit Ctrl+V several times);
-send and get message;
//note: question marks in the thread
Component: Composition → Internationalization
Summary: Using backspace after copy/paste in Subject header adds question mark (?) → IMAP : copy/paste in Subject header in the thread adds question mark (?)in certain conditions
*** Bug 38340 has been marked as a duplicate of this bug. ***
This is because MIME decoder does not strip off tabs (which are inserted when 
the MIME header is long).
Reassign to me, I have a fix.
Assignee: ducarroz → nhotta
Status: NEW → ASSIGNED
Target Milestone: --- → M16
QA Contact: laurel → marina
fix checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I'm seeing it again in today's build 2000-05-12, so reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
accepting
Status: REOPENED → ASSIGNED
This happens only for IMAP. Usually, MIME decoder strips off CR, LF and TAB by 
calling MIME_StripContinuations().
In case of IMAP, the input string to the decoder does not have CR and LF but has 
TAB. But those TABs are not removed since MIME_StripContinuations does not strip 
off if only TAB appears in the string. That's why we see question marks in 
thread pane.
I don't know why only IMAP case it feeds headers to MIME decoder without CR, LF 
but only TAB.
Reassign to mscott, cc to bienvenu.
Assignee: nhotta → mscott
Status: ASSIGNED → NEW
Attached file call stack
Target Milestone: M16 → ---
*** Bug 38340 has been marked as a duplicate of this bug. ***
I think we can live with this for beta2.
Keywords: nsbeta3
Target Milestone: --- → M20
yes, i guess we can
I corrected the summary because this problem is 
a matter of displaying the line folding Tab character
unnecesarily. It hampers reading of the headers
if you happen to have messages in languages which contain
frequent line breaks, e.g. CJK. Because these languages
take 2 bytes to represent 1 character, line folding in
the RFC 822 data is quite common. 
Mozilla displays the Tab instead of eating it.
Depending on what language the message is in, Tab may display
as a question, a vertical bar, and possibly as some other
characters.
In the case of a "?", it could also cause an additional problem 
since a normal statement could be taken as a question.

I'll provide an image of this problem under Japanese Windows
where the Tab shows as a solid vertical bar.


I'll attach an image of this problem. 
Summary: IMAP : copy/paste in Subject header in the thread adds question mark (?)in certain conditions → IMAP: Displays line-folding Tab character unnecessarily in Subject headers
QA contact to momoi.
QA Contact: marina → momoi
Per i18n/mail triage meeting, this bug is now marked
as [nsbeta3+]. 

The porblem is highly visible in quite a few messages in Asian 
and other languages. In some cases,the intrusion of "?" will alter 
the meaning of the message header itself.
Whiteboard: [nsbeta3+]
Oh wait, according to the image Kat attached, this bug is in the thread pane not
the message pane. Or does it show up corrupted in both??

Status: NEW → ASSIGNED
Target Milestone: M20 → M18
This example renders correctly in the subject field in the message pane. It just
isn't showing up correctly in the thread pane. This bug may end up going to
putterman. Still investigating.  Seems like when we get it out of the hdr db
it's corrupted...when mime reads it directly out of the incoming message (for
the message pane) it looks correct. 
reassigning to me.
Assignee: mscott → putterman
Status: ASSIGNED → NEW
re-assigning back to me. I was still investigating when i cc'ed putterman.
Didn't meant it was his bug.

Turns out the problem is that the imap server doesn't include the CRLFs when it
puts together the envelope response for the message. All I see in the subject
part of the envelope response is tabs.

I'm not sure what we can do to fix this problem short of scanning through each
header before we add it to the db and replacing any tab with CRLF tab ourselves.
That seems like it could cause other problems though....
Assignee: putterman → mscott
arrgghh we keep crashing every time I try to paste in the log data. Let's try
this again....

unfortunately the log file truncates the data before things get interesting. But
the envelope response starts to show the subject value and you can see that
there are no CRLFs in it. Just the tabs.

334[3788790]: dredd.mcom.com:S-Work/clienteng:CreateNewLineFromSocket: * 178
FETCH (FLAGS (\Recent \Seen) UID 454 RFC822.SIZE 1260 ENVELOPE ("Fri, 12 May
2000 13:54:42 -0700" "=?ISO-8859-1?Q?L=C0T=

However when we go to actually fetch the message you can see that the CRLFs are
preserved (I crash when I try to paste that part of the log so I won't do so now).

I'm not sure if we can fix this problem on the client. Maybe David B. has some
ideas.
Naoki, I talked about this problem with John Myers over lunch today. According
to RFC2047, I think this is a bug in the Mime decoder's: strip continuations
routine.

 Any amount of linear-space-white between 'encoded-word's,
           even if it includes a CRLF followed by one or more SPACEs,
           is ignored for the purposes of display.

If I'm reading this RFC correctly, I think CRLF followed by a tab is the
equivalent of any other form of white space (like just a tab). It's legal for
the imap server to treat CRLFs followed by a tab as being equivalent to just a
tab white space when they generate the enevelope reponse for header data.

I think the mime decoder's strip continuations routine should properly strip
whitespace continuations that may not be of the form CRLF followed by a tab.
The strip continuations routine used by MIME decoder (MIME_StripContinuations 
in mimehdrs.cpp) is shared by other code in libmime and inherited from 4.x.
So what is a difference between IMAP and local mail? It's working fine for local 
mails.
It's also broken for local mail, it's just that it's unlikely to find cases 
which trigger the broken behavior.
What does that mean? I see the same message header broken in IMAP showing fine 
after I copied to a local folder.
Naoki, the difference is that in local mail the subject is built from the actual
message body which we have downloaded locally. The real message body uses CRLF
tabs in the subject of your test message.

In the imap case, we extract the subject for the thread pane from the envelope
response we get from the server. The imap server has removed the CRLFs as it's
entitled to do according to this RFC because for mime encoded headers, CRLF TAB
== TAB. i.e. any white space is supposed to be ignored.....
Given a message with

Subject: =?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=

The spec says this means "ab", not "a b".  See the examples in section 8.

Actually, the CR LF TAB == TAB rule comes from RFC 822.  It applies to all 
headers, MIME or not.

I think 4.x does not fold a line for non MIME headers so we see the problem only 
for MIME encoded headers.
Scott, I am still not sure why this appears in 6.0 and not in 4.x but anyway if 
we want to change the strip continuation function you want to talk to Jeff or 
Rich (adding them to cc).
because in 4.x we didn't use the envelope command. We do in 6.0
On Japanese Windows, I see some extraneous musical 1/8th note symbols 
when I left mouse-click on any of the headers in the message envelope of 
a message, whether it's local or IMAP. This is the extra pop-up headers layer 
you can see without going into the View Page source window. 
This extraneou symbol is not the manifestation of  the same thing, is it? 

The extra character generally shows up in the "Received from: " line which
tends to be long. 
Under US Windows, you see a vertical bar in IMAP or local mail messages.
Keywords: mail2
Elevating priority since we couldn't convince ourselves to cut this today.
Priority: P3 → P2
.
Status: NEW → ASSIGNED
This bug needs to be re-assigned to either I18N or someone who knows something
about how to fix MIME_stripContinuations.

This isn't an important bug for me given the remaining 14 beta2+ bugs I
currently have so I'm pretty sure it's going to get cut unless someone else can
pick up the ball. 
Frank, can we do something about this?
To recap the specifications:

In RFC 822 section 3.1.1, it specifies that a CR LF followed by a SPACE or TAB 
is semantically equivalent to the SPACE or TAB.

In RFC 2047 6.2, it states that any linear white space that separates a pair of 
adjacent encoded-words is ignored.

So after decoding an encoded-word, the code should look ahead to see if the 
encoded-word is followed by a sequence of (up to 76) spaces or tabs (ignoring 
any CRLFs before such spaces or tabs) and then a syntactically valid 
encoded-word.  If so, it should eat all of the spaces and tabs.

I think in intl_decode_mime_part2_str() we want to move the lines:

  /* skip strings don't need conversion */
    strncpy(output_p, begin, p - begin);
    output_p += p - begin;

into an if statement which checks to see if there is a non-whitespace character 
between begin and p.

We then want to walk up all the callers, removing their calls to 
MIME_StripContinuations().
Now that I think about it, the "begin = p + 2;" should probably be "begin = p;".  
This would have been a bug in the original code.  The test case for the bug is

Subject: =?ISO-8859-1?Q?a?=  foo =?not-an-encoded-word 

The "=?" would be incorrectly stripped.
>This would have been a bug in the original code.
Yes, reproducible with both 4.x and 6.0.

I applied the patch and it fixed some cases like Japanese but the problem still 
can be seen with the following case.

Subject: =?ISO-8859-1?Q?L=C0T=CD=D1=2D1?=,=?ISO-8859-1?Q?=E2=20t=E8st?= for 
	=?ISO-8859-1?Q?L=C0T=CD=D1=2D1L=C0T=CD=D1=2D1?=,
	=?ISO-8859-1?Q?=E2=20t=E8st?= for 
	=?ISO-8859-1?Q?L=C0T=CD=D1=2D1L=C0T=CD=D1=2D1?=,
	=?ISO-8859-1?Q?=E2=20t=E8st?= for 
	=?ISO-8859-1?Q?L=C0T=CD=D1=2D1L=C0T=CD=D1=2D1?=,
	=?ISO-8859-1?Q?=E2=20t=E8st?= for =?ISO-8859-1?Q?L=C0T=CD=D1=2D1?=
I've noticed several problems with the current code, I'll take a stab at 
rewriting it.

Could you attatch that latest test case?  I think it's an invalid test case, but 
have to see the exact composition of whitespace to be sure.
Assignee: mscott → jgmyers
Status: ASSIGNED → NEW
Attached file zipped test case
In the zipped test case, each continuation line uses a TAB which is semantically 
meaningful.
...because there is unencoded text before the tab.
Attached patch proposed fixSplinter Review
I will be out of office for a conference rest of the week so I cannot 
review/test the change.
The change is quite extensive. I am not sure we can take this change when we 
have less than a week before the beta3 freeze.
I was expecting the change would be local in intl_decode_mime_part2_str(). Is 
that possible to change only intl_decode_mime_part2_str()?
There is a lot of dry rot around intl_decode_mime_part2_str(), including a few 
crasher bugs.

For example, nsMimeConverter::DecodeMimePartIIStr(const nsCString& header, 
nsCString& charset, nsString &decodedString) differed from 
nsMimeConverter::DecodeMimePartIIStr(const nsString &Header, nsString& charset, 
PRUnichar **decodedString) in that the former treated "charset" as an IN 
parameter and the latter treated "charset" as an IN/OUT parameter.  At least one 
caller to the latter appears to incorrectly assume the value of "charset" would 
not be changed by the routine.

I don't think you can afford not to take a fix for 51453.
PDT thinks this is a P4.
Priority: P2 → P4
Whiteboard: [nsbeta3+] → [nsbeta3+][PDTP4]
I take this bug. Since the current problem happens specific to thread pane 
display so we can remove tabs just before we adding the string to RDF.
John, please localize your change for bug 51453 and attach a diff to 51453.
Assignee: jgmyers → nhotta
This problem is not specific to thread pane display.  It is merely most commonly 
seen in thread pane display.

nhotta said he have a smaller patch which can also fix this issue. he will 
attach the patch here later.

John:
>This problem is not specific to thread pane display.  
We understand, but where else ?

4 days before 0 bugs on nsbeta3, we probably take the safest fix, instead of the 
finest fix for risk reason.
I applied John's patch and tested briefly on WinNT.
* many of the long subjects are still have the problem
* a new problem in the envelop view (now it has the same problem as thread pane)
* a new problem in local folder thread pane (now it has the same problem as 
imap)

*** Bug 52115 has been marked as a duplicate of this bug. ***
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Checked in my patch. It fixes the problem reported in this bug.
Please file separate bugs for any other MIME decoder problem.
The problems I have in bug 52115 is *not* fixed in build 2000091208.
My patch did not change the decoder. I think you want to keep your bug 
separated from this bug.
To marina for verification.
(I'm sorry but I'm on sabbatical for a while.)
QA Contact: momoi → marina
The problem discribed in this bug report is fixed, verifying
Status: RESOLVED → VERIFIED
I will reopen this bug because i do see some new problems with line folding. It
is not in the thread (wchich originally reported here) but it is on the window
title and shows up under Tasks menu. Now we see a bar inserted there where you
paste without space, attaching a screen shot.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attached image this is a screen shot
Remaining problem is a dup of bug 38340


*** This bug has been marked as a duplicate of 38340 ***
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → DUPLICATE
verifying as dup
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: