Open
Bug 64948
Opened 24 years ago
Updated 2 years ago
inconsistent display of TAB characters in subjects & thread pane
Categories
(MailNews Core :: Backend, defect)
MailNews Core
Backend
Tracking
(Not tracked)
NEW
People
(Reporter: jgmyers, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: comments 20 thru 25 are irrelevant)
Subject: headers containing TAB characters are displayed differently in the caption and message panes. A message with Subject: tab<TAB>separated where <TAB> is a tab character is displayed as "tabseparated" in the caption pane and "tab separated" in the message pane. I believe the latter is better.
Comment 1•24 years ago
|
||
note to self, caption pane == thread pane. accepting.
Status: NEW → ASSIGNED
Comment 2•24 years ago
|
||
add to the cc list
Comment 3•24 years ago
|
||
fixed. "tab<TAB>separated" will be "tab seperated" in the thread pane. note, "test<tab>test<tab><tab>test" will be "test test test" in the thread pane in the message page, you'd see "test test test" I'm using ReplaceChar('\t',' '). the message pane looks different because we don't strip any white space (so the tabs remain) and then because it is in an iframe, we treat it like html so white space is ignored. if that is a problem, log a new bug. marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 4•24 years ago
|
||
Couldn't a similar fix with something like: ReplaceChar('\r',' ') fix bug 23635 ?
Comment 5•24 years ago
|
||
I'll go look into that other bug. thanks gemal.
Comment 6•24 years ago
|
||
actually, my fix only only fixes the Subject header case. all tabs in headers need to be treated as spaces. re-opening while I investigate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 7•24 years ago
|
||
re-assign to ducarroz. jgmyers has provided patches that would fix this bug the right way and would we chould remove my hack (of calling ReplaceChar()) re-assign to ducarroz for him to sort out. (thanks for all the help and info jgmyers.)
Assignee: sspitzer → ducarroz
Status: REOPENED → NEW
Reporter | ||
Comment 10•23 years ago
|
||
Now also happening in the thread pane.
Summary: inconsistent display of TAB characters in subjects → inconsistent display of TAB characters in subjects & thread pane
Reporter | ||
Comment 11•23 years ago
|
||
*** Bug 72569 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 12•23 years ago
|
||
*** Bug 72889 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 13•23 years ago
|
||
This bug, having crept into the thread pane since the mailnews perf landing, will seriously annoy users of non-ASCII characters. Nominating.
Keywords: mozilla0.9,
nsCatFood
Reporter | ||
Comment 14•23 years ago
|
||
*** Bug 72889 has been marked as a duplicate of this bug. ***
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
Reporter | ||
Comment 15•23 years ago
|
||
*** Bug 104447 has been marked as a duplicate of this bug. ***
Comment 16•23 years ago
|
||
There was a code in nsMsgMessageDataSource.cpp (around line 126) which removes tabs. The file seems to be removed (I cannot find it in LXR). http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/mailnews/base/src/nsMsgMessageDataSource.cpp So the code had to be copied to a new file for thread pane. But I am not clear why we cannot remove TAB along with CR and LF when unfolding the header.
Reporter | ||
Comment 17•23 years ago
|
||
> But I am not clear why we cannot remove TAB along with CR and LF when
> unfolding the header.
Because RFC 2822 (and 822) clearly and unambiguously specify otherwise.
Comment 18•23 years ago
|
||
But why? The CR, LF, TAB sequence is put when folding a line, so it is more natural to remove them all together when unfolding a line.
Reporter | ||
Comment 19•23 years ago
|
||
Per the RFCs, one folds a line by adding CRLF before an existing TAB or SPACE. It is incorrect to add a TAB when folding lines. This is bug 73403
Comment 20•23 years ago
|
||
But what if no existing TAB or SPACE in the subject? Can the next line start without having TAB or SPACE?
Comment 21•23 years ago
|
||
Please correct me if I am wrong but here's my take on 2 relevant RFCs on this issue, RFC 2822 (replacing RFC 822) for ASCII headers and RFC 2047 for non-ASCI headers. ====================================== RFC 2822: Internet Message Format The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example the header field: FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWS WSP = (SP,ASCII value 32) and (HTAB, ASCII value 9) ====================================== From this, if the header contains ASCII text to the left of where it folds, then you simpmly remove a CRLF for unfolding. ** This is as jgmyers mentions above. RFC 2822 does not cover non-ASCII headers -- this is left to RFC 2047, which I quote below: ====================================== RFC 2047: Message Header Extensions for Non-ASCII Text 2 Syntax of encoded-words: An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. 5 Use of encoded-words in message headers: Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'. 6.2 Display of 'encoded-word's When displaying a particular header field that contains multiple 'encoded-word's, any 'linear-white-space' that separates a pair of adjacent 'encoded-word's is ignored. (This is to allow the use of multiple 'encoded-word's to represent long strings of unencoded text, without having to separate 'encoded-word's where spaces occur in the unencoded text.) *** Definitions from RFC 822 used in RFC 2047 *** RFC 822 (referenced by RFC 2047): [obsoleted] Internet Message Format linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding LWSP-char = SPACE / HTAB ; semantics = SPACE SPACE = <ASCII SP, space> ; ( 40, 32.) HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) ====================================== For non-ASCII encoded words used in headers, it seems from the above there are 2-steps. 1. First an encoding agent needs to **insert** a linear-white-space, which is defined in RFC 822 as an optional [CRLF] followed by either SPACE or TAB. This insertion occurs between an encoded word and (another encoded word or an unencoded text). 2. Next if the header line exceeds 75 chsrs, then insert CRLF in front of SPACE. (SPACE here I believe is ambiguous between the real Space token or Semantic SPACE as defined in RFC 822, which would include TAB also.) I assume that line folding can occur by inserting CRLF in front an existing Space or TAB created by the process in step 1. Following the above folding generation logic, unfolding then will have to: A. remove a CRLF in front of TAB or Space. (or CRLF) B. Then ignore any remaining linear-white-spaces(s) between encoded word and anothger encoded word or unencoded text in displaying/decoding the MIME words.
Comment 22•23 years ago
|
||
From the above explanation, I think we can reply to
nhotta's question above:
> But what if no existing TAB or SPACE in the subject? Can
> the next line start without having TAB or SPACE?
If there is more than 1 encoded word or an encoded word
followed by unencoded text in the header, the encoding mail
agent must have inserted a TAB or a Space. If my reading of
RFC 2047 is correct, this situation posed by the question
above cannot possible arise.
Reporter | ||
Comment 23•23 years ago
|
||
Kat, you are reading a rule into RFC 2047 that simply isn't there. RFC 2047 only specifies the removal of linear-white-space between two encoded-words. It says nothing about removing white space between an encoded-word and following unencoded text, there is no such rule.
Comment 24•23 years ago
|
||
> Kat, you are reading a rule into RFC 2047 that simply
> isn't there. RFC 2047 only specifies the removal of
> linear-white-space between two encoded-words. It says
> nothing about removing white space between an
> encoded-word and following unencoded text, there is no
> such rule
John, thanks for this correction. Upon 2nd reading of
sections 5 and 6.2, there seems to be an asymmetry.
An agent will insert linear-white-space between
an encoded word and 'text' by section 5, but section 6.2
does not say to remove it. So, then item #B above should
read:
B. Then ignore any remaining linear-white-spaces(s)
between encoded word and anothger encoded word in
displaying/decoding the MIME words.
So the items I listed above should cover textbook cases
of folding and unfolding.
I guess we are allowing for non-textbook cases in the
code, right? Like some mail agents forgetting to insert
any linear-white-space between encoded words, the use CRLF
only for folding without trailing Space or Tab, etc. In
decoding/displaying, we are being generous to some extent.
Is that right?
Reporter | ||
Comment 25•23 years ago
|
||
What section 5 is getting at is that encoded-words must (in some situations) be surrounded by whitespace. One cannot encode as: J=?iso-8859-1?q?=E4?=rnefors The only way to encode this is to put the surrounding non-whitespace characters inside the encoded-word. If one instead did: J =?iso-8859-1?q?=E4?= rnefors That would be syntactically legal, but the spaces surrounding the encoded-words would be semantically meaningful. There are cases where an encoded-word can legally appear adjacent to a non-whitespace character. In all these cases, the adjacent character is a special. These characters include the parentheses of a comment and the delimiters around a phrase. Since the decoder cannot in general even know if an arbitrary header is structured or unstructured, the Mozilla decoder will simply decode anything that matches the syntax of an encoded-word, regardless of whether or not it is surrounded by whitespace. The header: From: =?ISO-8859-1?Q?J=E4rnefors?= Olle <jarnefo@example.com> is entirely legitimite and clearly has a semantically meaningful space before "Olle". To remove linear-whitespace between an encoded-word and unencoded text would, in violation of the spec, produce output other than what was intended by the sender.
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Updated•23 years ago
|
Comment 26•23 years ago
|
||
*** Bug 72276 has been marked as a duplicate of this bug. ***
Comment 27•23 years ago
|
||
This is quite ugly and visible for non-ascii users, can we get some traction here from the l10n people? I'd prefer to see this fixed before 1.0 (nominating).
Keywords: mozilla1.0
Comment 28•23 years ago
|
||
to answer your question please see Taka's comments #27 in the bug # 73403.
Comment 29•21 years ago
|
||
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030924 Long silence to this bug. I still see this problem: Subject is folded with a tab character at the beginning of a line (arguably this is bad wrapping, but it is not wrong to have it, it would only be wrong to change it), then the thread pane displays the subject unfolded but with a dotted square where the tab was. Just for the thread pane display we should display a blank. pi
Comment 30•21 years ago
|
||
I am also seeing the behaviour described in comment 29 in Thunderbird 0.3 under Windows 2000. In my case, this is happening for a lot of messages coming from people with either Mac or Windows Outlook Express, which seems to wrap long subject lines by replacing a space between words with <CR><LF><TAB>. The result just looks ugly.
Updated•20 years ago
|
Product: Browser → Seamonkey
Comment 31•19 years ago
|
||
There are quite a few bugs that complain about this problem, and it is not limited to unfolded headers. Moving this to the backend, and adding dependencies for the Moz and TB front end bugs.
Comment 32•19 years ago
|
||
*** Bug 251325 has been marked as a duplicate of this bug. ***
Comment 33•18 years ago
|
||
*** Bug 346976 has been marked as a duplicate of this bug. ***
Comment 34•18 years ago
|
||
I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP. In my case, mailman is what is indenting the wrapped subject lines with TAB characters. mailman is very popular software! Here is a dump of an example message header: 0002020 C c : sp nl S u b j e c t : sp [ B 0002040 u i l d b o t ] sp B u i l d B o 0002060 t sp S U C C E S S nl ht ( B l d _ 0002100 g s s d k _ g s s d k _ t e s t 0002120 _ 2 0 0 6 _ 0 9 _ 1 2 _ 0 0 _ 0 0002140 1 _ 0 0 _ I n c r e m e n t a l 0002160 _ 2 ) nl X - B e e n T h e r e :
Updated•17 years ago
|
Assignee: ducarroz → nobody
Status: ASSIGNED → NEW
QA Contact: esther → backend
Comment 35•17 years ago
|
||
vseerror@lehigh.edu, did you mean to mark this bug and new, and remove the assigned owner?
Comment 36•17 years ago
|
||
yes, most certainly. outdated (non)assignment
Assignee | ||
Updated•16 years ago
|
Product: Core → MailNews Core
Updated•14 years ago
|
Comment 40•14 years ago
|
||
This bug's phenomenon seems to have been morphed to Bug 553280(I dupe'ed to Bug 593337)/Bug 593337 by Tb 3.0(and Tb 3.1)/Sm 2.0, and Bug 593337 is already WORKSFORME with recent trunk builds. See Bug 593337 Comment #15 for check result with trunk builds(Tb 3.2pre/Sm 2.2pre), please. Next can be said. 1. This bug at thread pane never occurs with Tb 3.0/Sm 2.0(this bug is fixed), because Tab is removed upon Subject display at thread pane. And, by Tb 3.0/Sm 2.0, Bug 553280/Bug 593337 is generated. 2. Bug 553280/Bug 593337 is fixed by Tb 3.2pre/Sm 2.2 pre.
Comment 41•13 years ago
|
||
(In reply to comment #34) > I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP. > > In my case, mailman is what is indenting the wrapped subject lines with TAB > characters. mailman is very popular software! I had the same problem with mail from mailman in Thunderbird 3.1.x but it's fixed for me in Thunderbird 5. In Thunderbird 5 the tab character is replaced with a space character in the thread pane's subject line as I would expect it to be.
Comment 43•13 years ago
|
||
I side with those who follow RFC 822 and say that any white space including TABs should be replaced with a single space when unfolding lines. To reproduce the issue in Icedove 3.1.15: edit message headers outside the mail app while it is not running so that the Subject line has a CRLF followed by a TAB followed by other words. Subject: foo<CRLF> <TAB>bar Try opening the message in Icedove. The thread pane will show the subject as "foobar", and I believe this goes against RFC 822. The message pane will unfold the line into "foo bar", and I think this agrees with the RFC. Cheers.
Comment 44•13 years ago
|
||
I understand the concern in comment 20 as saying that splitting long subject lines without white space with CRLF will render their unfolded content non-identical to the original. It seems RFC 2047 allows to work around this issue by Q- or B-encoding the complete line, splitting it with CRLFs and putting the encoding prefixes at the beginning of each continuation line.
Reporter | ||
Comment 45•13 years ago
|
||
(In reply to Ilguiz Latypov from comment #43) > I side with those who follow RFC 822 and say that any white space including > TABs should be replaced with a single space when unfolding lines. This is not what RFC 822 says.
Comment 46•13 years ago
|
||
Thanks for the correction. I see that the RFC prescribes removing only the CRLF and leave the following character such as TAB intact. (It seems that in my case, some intermediate MTA generated a TAB instead of a space). It would be nice if Thunderbird displayed the TAB in a uniform way. Currently, I see no white space in the thread pane and a single space character in the message page. Cheers.
Updated•2 years ago
|
Severity: minor → S4
You need to log in
before you can comment on or make changes to this bug.
Description
•