Open Bug 64948 Opened 25 years ago Updated 3 years ago

inconsistent display of TAB characters in subjects & thread pane

Categories

(MailNews Core :: Backend, defect)

defect

Tracking

(Not tracked)

People

(Reporter: jgmyers, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: comments 20 thru 25 are irrelevant)

Subject: headers containing TAB characters are displayed differently in the caption and message panes. A message with Subject: tab<TAB>separated where <TAB> is a tab character is displayed as "tabseparated" in the caption pane and "tab separated" in the message pane. I believe the latter is better.
note to self, caption pane == thread pane. accepting.
Status: NEW → ASSIGNED
add to the cc list
fixed. "tab<TAB>separated" will be "tab seperated" in the thread pane. note, "test<tab>test<tab><tab>test" will be "test test test" in the thread pane in the message page, you'd see "test test test" I'm using ReplaceChar('\t',' '). the message pane looks different because we don't strip any white space (so the tabs remain) and then because it is in an iframe, we treat it like html so white space is ignored. if that is a problem, log a new bug. marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Couldn't a similar fix with something like: ReplaceChar('\r',' ') fix bug 23635 ?
Blocks: 58114
I'll go look into that other bug. thanks gemal.
actually, my fix only only fixes the Subject header case. all tabs in headers need to be treated as spaces. re-opening while I investigate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
re-assign to ducarroz. jgmyers has provided patches that would fix this bug the right way and would we chould remove my hack (of calling ReplaceChar()) re-assign to ducarroz for him to sort out. (thanks for all the help and info jgmyers.)
Assignee: sspitzer → ducarroz
Status: REOPENED → NEW
I haven't provided patches to fix this bug, just bug 23635.
CC'ed to myself.
Now also happening in the thread pane.
Summary: inconsistent display of TAB characters in subjects → inconsistent display of TAB characters in subjects & thread pane
*** Bug 72569 has been marked as a duplicate of this bug. ***
*** Bug 72889 has been marked as a duplicate of this bug. ***
This bug, having crept into the thread pane since the mailnews perf landing, will seriously annoy users of non-ASCII characters. Nominating.
*** Bug 72889 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
*** Bug 104447 has been marked as a duplicate of this bug. ***
There was a code in nsMsgMessageDataSource.cpp (around line 126) which removes tabs. The file seems to be removed (I cannot find it in LXR). http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/mailnews/base/src/nsMsgMessageDataSource.cpp So the code had to be copied to a new file for thread pane. But I am not clear why we cannot remove TAB along with CR and LF when unfolding the header.
> But I am not clear why we cannot remove TAB along with CR and LF when > unfolding the header. Because RFC 2822 (and 822) clearly and unambiguously specify otherwise.
But why? The CR, LF, TAB sequence is put when folding a line, so it is more natural to remove them all together when unfolding a line.
Per the RFCs, one folds a line by adding CRLF before an existing TAB or SPACE. It is incorrect to add a TAB when folding lines. This is bug 73403
But what if no existing TAB or SPACE in the subject? Can the next line start without having TAB or SPACE?
Please correct me if I am wrong but here's my take on 2 relevant RFCs on this issue, RFC 2822 (replacing RFC 822) for ASCII headers and RFC 2047 for non-ASCI headers. ====================================== RFC 2822: Internet Message Format The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example the header field: FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWS WSP = (SP,ASCII value 32) and (HTAB, ASCII value 9) ====================================== From this, if the header contains ASCII text to the left of where it folds, then you simpmly remove a CRLF for unfolding. ** This is as jgmyers mentions above. RFC 2822 does not cover non-ASCII headers -- this is left to RFC 2047, which I quote below: ====================================== RFC 2047: Message Header Extensions for Non-ASCII Text 2 Syntax of encoded-words: An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. 5 Use of encoded-words in message headers: Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'. 6.2 Display of 'encoded-word's When displaying a particular header field that contains multiple 'encoded-word's, any 'linear-white-space' that separates a pair of adjacent 'encoded-word's is ignored. (This is to allow the use of multiple 'encoded-word's to represent long strings of unencoded text, without having to separate 'encoded-word's where spaces occur in the unencoded text.) *** Definitions from RFC 822 used in RFC 2047 *** RFC 822 (referenced by RFC 2047): [obsoleted] Internet Message Format linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding LWSP-char = SPACE / HTAB ; semantics = SPACE SPACE = <ASCII SP, space> ; ( 40, 32.) HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) ====================================== For non-ASCII encoded words used in headers, it seems from the above there are 2-steps. 1. First an encoding agent needs to **insert** a linear-white-space, which is defined in RFC 822 as an optional [CRLF] followed by either SPACE or TAB. This insertion occurs between an encoded word and (another encoded word or an unencoded text). 2. Next if the header line exceeds 75 chsrs, then insert CRLF in front of SPACE. (SPACE here I believe is ambiguous between the real Space token or Semantic SPACE as defined in RFC 822, which would include TAB also.) I assume that line folding can occur by inserting CRLF in front an existing Space or TAB created by the process in step 1. Following the above folding generation logic, unfolding then will have to: A. remove a CRLF in front of TAB or Space. (or CRLF) B. Then ignore any remaining linear-white-spaces(s) between encoded word and anothger encoded word or unencoded text in displaying/decoding the MIME words.
From the above explanation, I think we can reply to nhotta's question above: > But what if no existing TAB or SPACE in the subject? Can > the next line start without having TAB or SPACE? If there is more than 1 encoded word or an encoded word followed by unencoded text in the header, the encoding mail agent must have inserted a TAB or a Space. If my reading of RFC 2047 is correct, this situation posed by the question above cannot possible arise.
Kat, you are reading a rule into RFC 2047 that simply isn't there. RFC 2047 only specifies the removal of linear-white-space between two encoded-words. It says nothing about removing white space between an encoded-word and following unencoded text, there is no such rule.
> Kat, you are reading a rule into RFC 2047 that simply > isn't there. RFC 2047 only specifies the removal of > linear-white-space between two encoded-words. It says > nothing about removing white space between an > encoded-word and following unencoded text, there is no > such rule John, thanks for this correction. Upon 2nd reading of sections 5 and 6.2, there seems to be an asymmetry. An agent will insert linear-white-space between an encoded word and 'text' by section 5, but section 6.2 does not say to remove it. So, then item #B above should read: B. Then ignore any remaining linear-white-spaces(s) between encoded word and anothger encoded word in displaying/decoding the MIME words. So the items I listed above should cover textbook cases of folding and unfolding. I guess we are allowing for non-textbook cases in the code, right? Like some mail agents forgetting to insert any linear-white-space between encoded words, the use CRLF only for folding without trailing Space or Tab, etc. In decoding/displaying, we are being generous to some extent. Is that right?
What section 5 is getting at is that encoded-words must (in some situations) be surrounded by whitespace. One cannot encode as: J=?iso-8859-1?q?=E4?=rnefors The only way to encode this is to put the surrounding non-whitespace characters inside the encoded-word. If one instead did: J =?iso-8859-1?q?=E4?= rnefors That would be syntactically legal, but the spaces surrounding the encoded-words would be semantically meaningful. There are cases where an encoded-word can legally appear adjacent to a non-whitespace character. In all these cases, the adjacent character is a special. These characters include the parentheses of a comment and the delimiters around a phrase. Since the decoder cannot in general even know if an arbitrary header is structured or unstructured, the Mozilla decoder will simply decode anything that matches the syntax of an encoded-word, regardless of whether or not it is surrounded by whitespace. The header: From: =?ISO-8859-1?Q?J=E4rnefors?= Olle <jarnefo@example.com> is entirely legitimite and clearly has a semantically meaningful space before "Olle". To remove linear-whitespace between an encoded-word and unencoded text would, in violation of the spec, produce output other than what was intended by the sender.
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Keywords: nsbeta1
Target Milestone: mozilla0.9.9 → ---
Keywords: nsbeta1nsbeta1-
Target Milestone: --- → mozilla1.2
*** Bug 72276 has been marked as a duplicate of this bug. ***
This is quite ugly and visible for non-ascii users, can we get some traction here from the l10n people? I'd prefer to see this fixed before 1.0 (nominating).
Keywords: mozilla1.0
to answer your question please see Taka's comments #27 in the bug # 73403.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030924 Long silence to this bug. I still see this problem: Subject is folded with a tab character at the beginning of a line (arguably this is bad wrapping, but it is not wrong to have it, it would only be wrong to change it), then the thread pane displays the subject unfolded but with a dotted square where the tab was. Just for the thread pane display we should display a blank. pi
I am also seeing the behaviour described in comment 29 in Thunderbird 0.3 under Windows 2000. In my case, this is happening for a lot of messages coming from people with either Mac or Windows Outlook Express, which seems to wrap long subject lines by replacing a space between words with <CR><LF><TAB>. The result just looks ugly.
Product: Browser → Seamonkey
There are quite a few bugs that complain about this problem, and it is not limited to unfolded headers. Moving this to the backend, and adding dependencies for the Moz and TB front end bugs.
Blocks: 240924, 271312
No longer blocks: 58114
Severity: normal → minor
Component: MailNews: Main Mail Window → MailNews: Backend
OS: Linux → All
Product: Mozilla Application Suite → Core
Hardware: PC → All
Whiteboard: comments 20 thru 25 are irrelevant
Target Milestone: mozilla1.2alpha → ---
*** Bug 251325 has been marked as a duplicate of this bug. ***
*** Bug 346976 has been marked as a duplicate of this bug. ***
I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP. In my case, mailman is what is indenting the wrapped subject lines with TAB characters. mailman is very popular software! Here is a dump of an example message header: 0002020 C c : sp nl S u b j e c t : sp [ B 0002040 u i l d b o t ] sp B u i l d B o 0002060 t sp S U C C E S S nl ht ( B l d _ 0002100 g s s d k _ g s s d k _ t e s t 0002120 _ 2 0 0 6 _ 0 9 _ 1 2 _ 0 0 _ 0 0002140 1 _ 0 0 _ I n c r e m e n t a l 0002160 _ 2 ) nl X - B e e n T h e r e :
Assignee: ducarroz → nobody
Status: ASSIGNED → NEW
QA Contact: esther → backend
vseerror@lehigh.edu, did you mean to mark this bug and new, and remove the assigned owner?
yes, most certainly. outdated (non)assignment
Product: Core → MailNews Core
No longer blocks: 593337
Depends on: 593337
This bug's phenomenon seems to have been morphed to Bug 553280(I dupe'ed to Bug 593337)/Bug 593337 by Tb 3.0(and Tb 3.1)/Sm 2.0, and Bug 593337 is already WORKSFORME with recent trunk builds. See Bug 593337 Comment #15 for check result with trunk builds(Tb 3.2pre/Sm 2.2pre), please. Next can be said. 1. This bug at thread pane never occurs with Tb 3.0/Sm 2.0(this bug is fixed), because Tab is removed upon Subject display at thread pane. And, by Tb 3.0/Sm 2.0, Bug 553280/Bug 593337 is generated. 2. Bug 553280/Bug 593337 is fixed by Tb 3.2pre/Sm 2.2 pre.
(In reply to comment #34) > I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP. > > In my case, mailman is what is indenting the wrapped subject lines with TAB > characters. mailman is very popular software! I had the same problem with mail from mailman in Thunderbird 3.1.x but it's fixed for me in Thunderbird 5. In Thunderbird 5 the tab character is replaced with a space character in the thread pane's subject line as I would expect it to be.
I side with those who follow RFC 822 and say that any white space including TABs should be replaced with a single space when unfolding lines. To reproduce the issue in Icedove 3.1.15: edit message headers outside the mail app while it is not running so that the Subject line has a CRLF followed by a TAB followed by other words. Subject: foo<CRLF> <TAB>bar Try opening the message in Icedove. The thread pane will show the subject as "foobar", and I believe this goes against RFC 822. The message pane will unfold the line into "foo bar", and I think this agrees with the RFC. Cheers.
I understand the concern in comment 20 as saying that splitting long subject lines without white space with CRLF will render their unfolded content non-identical to the original. It seems RFC 2047 allows to work around this issue by Q- or B-encoding the complete line, splitting it with CRLFs and putting the encoding prefixes at the beginning of each continuation line.
(In reply to Ilguiz Latypov from comment #43) > I side with those who follow RFC 822 and say that any white space including > TABs should be replaced with a single space when unfolding lines. This is not what RFC 822 says.
Thanks for the correction. I see that the RFC prescribes removing only the CRLF and leave the following character such as TAB intact. (It seems that in my case, some intermediate MTA generated a TAB instead of a space). It would be nice if Thunderbird displayed the TAB in a uniform way. Currently, I see no white space in the thread pane and a single space character in the message page. Cheers.
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.