Open Bug 64948 Opened 24 years ago Updated 2 years ago

inconsistent display of TAB characters in subjects & thread pane

Categories

(MailNews Core :: Backend, defect)

defect

Tracking

(Not tracked)

People

(Reporter: jgmyers, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: comments 20 thru 25 are irrelevant)

Subject: headers containing TAB characters are displayed differently in the 
caption and message panes.  A message with

Subject: tab<TAB>separated

where <TAB> is a tab character is displayed as "tabseparated" in the caption 
pane and "tab separated" in the message pane.  I believe the latter is better.
note to self, caption pane == thread pane.

accepting.
Status: NEW → ASSIGNED
add to the cc list
fixed.

"tab<TAB>separated" will be "tab seperated" in the thread pane.

note, "test<tab>test<tab><tab>test" will be
"test test  test" in the thread pane

in the message page, you'd see "test test test"

I'm using ReplaceChar('\t',' ').

the message pane looks different because we don't strip any white space (so the
tabs remain) and then because it is in an iframe, we treat it like html so white
space is ignored.

if that is a problem, log a new bug.

marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Couldn't a similar fix with something like:
ReplaceChar('\r',' ')
fix bug 23635 ?
Blocks: 58114
I'll go look into that other bug.  thanks gemal.
actually, my fix only only fixes the Subject header case. all tabs in headers
need to be treated as spaces.

re-opening while I investigate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
re-assign to ducarroz.  jgmyers has provided patches that would fix this bug the
right way and would we chould remove my hack (of calling ReplaceChar())

re-assign to ducarroz for him to sort out.

(thanks for all the help and info jgmyers.)
Assignee: sspitzer → ducarroz
Status: REOPENED → NEW
I haven't provided patches to fix this bug, just bug 23635.
CC'ed to myself.
Now also happening in the thread pane.
Summary: inconsistent display of TAB characters in subjects → inconsistent display of TAB characters in subjects & thread pane
*** Bug 72569 has been marked as a duplicate of this bug. ***
*** Bug 72889 has been marked as a duplicate of this bug. ***
This bug, having crept into the thread pane since the mailnews perf landing, 
will seriously annoy users of non-ASCII characters.  Nominating.

*** Bug 72889 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
*** Bug 104447 has been marked as a duplicate of this bug. ***
There was a code in nsMsgMessageDataSource.cpp (around line 126) which removes
tabs. The file seems to be removed (I cannot find it in LXR).
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/mailnews/base/src/nsMsgMessageDataSource.cpp
So the code had to be copied to a new file for thread pane.

But I am not clear why we cannot remove TAB along with CR and LF when unfolding
the header.
> But I am not clear why we cannot remove TAB along with CR and LF when
> unfolding the header.

Because RFC 2822 (and 822) clearly and unambiguously specify otherwise.
But why? The CR, LF, TAB sequence is put when folding a line, so it is more
natural to remove them all together when unfolding a line.
Per the RFCs, one folds a line by adding CRLF before an existing TAB or SPACE.  
It is incorrect to add a TAB when folding lines.  This is bug 73403
But what if no existing TAB or SPACE in the subject? Can the next line start
without having TAB or SPACE?

Please correct me if I am wrong but here's my take on
2 relevant RFCs on this issue, RFC 2822 (replacing RFC 822)
for ASCII headers and RFC 2047 for non-ASCI headers.

======================================
RFC 2822: Internet Message Format

The general rule is that wherever this standard allows for folding white space
(not simply WSP characters), a CRLF may be inserted before any WSP.  For example
the header field:

FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
                        obs-FWS
WSP  = (SP,ASCII value 32) and (HTAB, ASCII value 9)

======================================

From this, if the header contains ASCII text to the left
of where it folds, then you simpmly remove a CRLF for 
unfolding.

** This is as jgmyers mentions above.

RFC 2822 does not cover non-ASCII headers -- this is
left to RFC 2047, which I quote below:

======================================
RFC 2047: Message Header Extensions for Non-ASCII Text

2 Syntax of encoded-words:

An 'encoded-word' may not be more than 75 characters long, including 'charset',
'encoding', 'encoded-text', and delimiters. If it is desirable to encode more
text than will fit in an 'encoded-word' of 75 characters, multiple
'encoded-word's (separated by CRLF SPACE) may be used.

5 Use of encoded-words in message headers:

 Ordinary ASCII text and 'encoded-word's may appear together in the same header
field. However, an 'encoded-word' that appears in a header field defined as
'*text' MUST be separated from any adjacent 'encoded-word' or 'text' by
'linear-white-space'.

6.2 Display of 'encoded-word's

When displaying a particular header field that contains multiple
'encoded-word's, any 'linear-white-space' that separates a pair of adjacent
'encoded-word's is ignored. (This is to allow the use of multiple
'encoded-word's to represent long strings of unencoded text, without having to
separate 'encoded-word's where spaces occur in the unencoded text.)

*** Definitions from RFC 822 used in RFC 2047 ***

RFC 822 (referenced by RFC 2047): [obsoleted] Internet Message Format

linear-white-space =  1*([CRLF] LWSP-char)  ; semantics = SPACE
                                                 ; CRLF => folding
LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE
SPACE       =  <ASCII SP, space> ; (     40,      32.)
HTAB        =  <ASCII HT, horizontal-tab> ; (     11,       9.)
======================================

For non-ASCII encoded words used in headers, it seems 
from the above there are 2-steps.

1. First an encoding agent needs to **insert** a linear-white-space,
   which is defined in RFC 822 as an optional [CRLF] followed
   by either SPACE or TAB. This insertion occurs between
   an encoded word and (another encoded word or an unencoded
   text).
2. Next if the header line exceeds 75 chsrs, then insert
   CRLF in front of SPACE. (SPACE here I believe is ambiguous
   between the real Space token or Semantic SPACE as defined in 
   RFC 822, which would include TAB also.)
   I assume that line folding can occur by inserting CRLF
   in front an existing Space or TAB created by the process in
   step 1.

Following the above folding generation logic, unfolding
then will have to:

A. remove a CRLF in front of TAB or Space. (or CRLF)
B. Then ignore any remaining linear-white-spaces(s)
   between encoded word and anothger encoded word or
   unencoded text in displaying/decoding the MIME words.




From the above explanation, I think we can reply to
nhotta's question above:

> But what if no existing TAB or SPACE in the subject? Can 
> the next line start without having TAB or SPACE?

If there is more than 1 encoded word or an encoded word 
followed by unencoded text in the header, the encoding mail 
agent must have inserted a TAB or a Space. If my reading of 
RFC 2047 is correct, this situation posed by the question
above cannot possible arise.
Kat, you are reading a rule into RFC 2047 that simply isn't there.  RFC 2047 
only specifies the removal of linear-white-space between two encoded-words.  It 
says nothing about removing white space between an encoded-word and following 
unencoded text, there is no such rule.
> Kat, you are reading a rule into RFC 2047 that simply 
> isn't there.  RFC 2047 only specifies the removal of 
> linear-white-space between two encoded-words.  It says 
> nothing about removing white space between an 
> encoded-word and following unencoded text, there is no 
> such rule

John, thanks for this correction. Upon 2nd reading of
sections 5 and 6.2, there seems to be an asymmetry. 
An agent will insert linear-white-space between 
an encoded word and 'text' by section 5, but section 6.2
does not say to remove it. So, then item #B above should 
read:

B. Then ignore any remaining linear-white-spaces(s)
   between encoded word and anothger encoded word in
   displaying/decoding the MIME words.

So the items I listed above should cover textbook cases
of folding and unfolding.

I guess we are allowing for non-textbook cases in the
code, right? Like some mail agents forgetting to insert
any linear-white-space between encoded words, the use CRLF
only for folding without trailing Space or Tab, etc. In 
decoding/displaying, we are being generous to some extent.
Is that right?
What section 5 is getting at is that encoded-words must (in some situations) be 
surrounded by whitespace.  One cannot encode as:

J=?iso-8859-1?q?=E4?=rnefors

The only way to encode this is to put the surrounding non-whitespace characters 
inside the encoded-word.  If one instead did:

J =?iso-8859-1?q?=E4?= rnefors

That would be syntactically legal, but the spaces surrounding the encoded-words 
would be semantically meaningful.

There are cases where an encoded-word can legally appear adjacent to a 
non-whitespace character.  In all these cases, the adjacent character is a 
special.  These characters include the parentheses of a comment and the 
delimiters around a phrase.  Since the decoder cannot in general even know if an 
arbitrary header is structured or unstructured, the Mozilla decoder will simply 
decode anything that matches the syntax of an encoded-word, regardless of 
whether or not it is surrounded by whitespace.

The header:

From: =?ISO-8859-1?Q?J=E4rnefors?= Olle <jarnefo@example.com>

is entirely legitimite and clearly has a semantically meaningful space before 
"Olle".  To remove linear-whitespace between an encoded-word and unencoded text 
would, in violation of the spec, produce output other than what was intended by 
the sender.
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Keywords: nsbeta1
Target Milestone: mozilla0.9.9 → ---
Keywords: nsbeta1nsbeta1-
Target Milestone: --- → mozilla1.2
*** Bug 72276 has been marked as a duplicate of this bug. ***
This is quite ugly and visible for non-ascii users, can we get some traction
here from the l10n people? I'd prefer to see this fixed before 1.0 (nominating).
Keywords: mozilla1.0
to answer your question please see Taka's comments #27 in the bug # 73403.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030924

Long silence to this bug. I still see this problem: Subject is folded with a tab
character at the beginning of a line (arguably this is bad wrapping, but it is
not wrong to have it, it would only be wrong to change it), then the thread pane
displays the subject unfolded but with a dotted square where the tab was. Just
for the thread pane display we should display a blank.

pi
I am also seeing the behaviour described in comment 29 in Thunderbird 0.3 under 
Windows 2000.

In my case, this is happening for a lot of messages coming from people with 
either Mac or Windows Outlook Express, which seems to wrap long subject lines by 
replacing a space between words with <CR><LF><TAB>.  The result just looks ugly.
Product: Browser → Seamonkey
There are quite a few bugs that complain about this problem, and it is not 
limited to unfolded headers.  Moving this to the backend, and adding 
dependencies for the Moz and TB front end bugs.
Blocks: 240924, 271312
No longer blocks: 58114
Severity: normal → minor
Component: MailNews: Main Mail Window → MailNews: Backend
OS: Linux → All
Product: Mozilla Application Suite → Core
Hardware: PC → All
Whiteboard: comments 20 thru 25 are irrelevant
Target Milestone: mozilla1.2alpha → ---
*** Bug 251325 has been marked as a duplicate of this bug. ***
*** Bug 346976 has been marked as a duplicate of this bug. ***
I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP.

In my case, mailman is what is indenting the wrapped subject lines with TAB characters. mailman is very popular software!

Here is a dump of an example message header:

0002020   C   c   :  sp  nl   S   u   b   j   e   c   t   :  sp   [   B
0002040   u   i   l   d   b   o   t   ]  sp   B   u   i   l   d   B   o
0002060   t  sp   S   U   C   C   E   S   S  nl  ht   (   B   l   d   _
0002100   g   s   s   d   k   _   g   s   s   d   k   _   t   e   s   t
0002120   _   2   0   0   6   _   0   9   _   1   2   _   0   0   _   0
0002140   1   _   0   0   _   I   n   c   r   e   m   e   n   t   a   l
0002160   _   2   )  nl   X   -   B   e   e   n   T   h   e   r   e   :
Assignee: ducarroz → nobody
Status: ASSIGNED → NEW
QA Contact: esther → backend
vseerror@lehigh.edu, did you mean to mark this bug and new, and remove the assigned owner?
yes, most certainly.  outdated (non)assignment
Product: Core → MailNews Core
No longer blocks: 593337
Depends on: 593337
This bug's phenomenon seems to have been morphed to Bug 553280(I dupe'ed to Bug 593337)/Bug 593337 by Tb 3.0(and Tb 3.1)/Sm 2.0, and Bug 593337 is already WORKSFORME with recent trunk builds.  
See Bug 593337 Comment #15 for check result with trunk builds(Tb 3.2pre/Sm 2.2pre), please.
Next can be said.
1. This bug at thread pane never occurs with Tb 3.0/Sm 2.0(this bug is fixed),
   because Tab is removed upon Subject display at thread pane.
   And, by Tb 3.0/Sm 2.0, Bug 553280/Bug 593337 is generated.
2. Bug 553280/Bug 593337 is fixed by Tb 3.2pre/Sm 2.2 pre.
(In reply to comment #34)
> I am seeing this problem using Thunderbird 1.5.0.5 on Windows XP.
> 
> In my case, mailman is what is indenting the wrapped subject lines with TAB
> characters. mailman is very popular software!

I had the same problem with mail from mailman in Thunderbird 3.1.x but it's fixed for me in Thunderbird 5. In Thunderbird 5 the tab character is replaced with a space character in the thread pane's subject line as I would expect it to be.
I side with those who follow RFC 822 and say that any white space including TABs should be replaced with a single space when unfolding lines.

To reproduce the issue in Icedove 3.1.15: edit message headers outside the mail app while it is not running so that the Subject line has a CRLF followed by a TAB followed by other words.

  Subject: foo<CRLF>
  <TAB>bar

Try opening the message in Icedove.  The thread pane will show the subject as "foobar", and I believe this goes against RFC 822.  The message pane will unfold the line into "foo bar", and I think this agrees with the RFC.  Cheers.
I understand the concern in comment 20 as saying that splitting long subject lines without white space with CRLF will render their unfolded content non-identical to the original.  It seems RFC 2047 allows to work around this issue by Q- or B-encoding the complete line, splitting it with CRLFs and putting the encoding prefixes at the beginning of each continuation line.
(In reply to Ilguiz Latypov from comment #43)
> I side with those who follow RFC 822 and say that any white space including
> TABs should be replaced with a single space when unfolding lines.

This is not what RFC 822 says.
Thanks for the correction.  I see that the RFC prescribes removing only the CRLF and leave the following character such as TAB intact.  (It seems that in my case, some intermediate MTA generated a TAB instead of a space).

It would be nice if Thunderbird displayed the TAB in a uniform way.  Currently, I see no white space in the thread pane and a single space character in the message page.  Cheers.
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.