Open Bug 377599 Opened 17 years ago Updated 17 years ago

From: header displayed wrong when contains 0xB0 character

Categories

(SeaMonkey :: MailNews: Message Display, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: nelson, Unassigned)

Details

Attachments

(3 files)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a4pre) Gecko/20070410 SeaMonkey/1.5a

In a newsgroup I read regularly, there appeared a message with a From: 
header that looked like this:

> From: "° some name °" <someaddr@domain.tld>
         ^           ^ 
where the two character shown above are each a single byte containing hex B0.
A copy of that message is attached. 

When I view that message in the newsgroup, the message is displayed with a default character set of UTF-8.  

When that message is viewed with the UTF-8 character set, the From: header 
in the message header pane Looks like this:
 From: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
where each character shown as an "X" above actually looks like a rectangle
containing 4 hex characters.  Here is an ASCII art approximation:
    +-----+
    | F F |
    | F D |
    +-----+

I see two things wrong with that:

1) Not only are the two 0xB0 characters replaced by that odd FFFD character,
but the entire From line is replaced with them.  

2) There are FAR MORE characters in that displayed From: header than there 
are in the actual From: character in the message.  The displayed width of 
the From: header does not even begin to approximate the actual length of 
the real address in the messages From header.  

IMO, that From: header *SHOULD* be displayed something like this:
  From: "? some name ?" <someaddr@domain.tld>

That is, the invalid characters (if that's what they are) should each be 
replaced with a single replacement character, such as "?", and the valid characters in the rest of the header should be displayed as is.

I copies this message to a local mail folder, and when I view it there, 
the default character set is different.  It is western (ISO-8859-1).
In that character set, the 0x0 character displays as a degree symbol.
This different character set behavior surprised me, and raises these questions:

Is the default character set for displayed messages a per-folder setting?
or per-server setting? 
Or a single pref for all folders/servers? (as I thought)
> When that message is viewed with the UTF-8 character set,

It's not a charset, just an encoding. ;-)

> the From: header in the message header pane Looks like this:
>  From: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> where each character shown as an "X" above actually looks like a rectangle
> containing 4 hex characters.  Here is an ASCII art approximation:
>     +-----+
>     | F F |
>     | F D |
>     +-----+

Maybe you could provide a screenshot?
(This sounds like you probably don't have a Unicode font available.)

> 1) Not only are the two 0xB0 characters replaced by that odd FFFD character,
> but the entire From line is replaced with them.

0xFFFD is the correct chararcter to show if 0xB0 is found in a UTF-8 string, its meaning is "invalid character" (0xB0 is invalid in UTF-8). It usually looks like a 'white' question mark inside a 'black' rhomb.

Showing only lots of 0xFFFD surely is a bug.

> IMO, that From: header *SHOULD* be displayed something like this:
>   From: "? some name ?" <someaddr@domain.tld>
> 
> That is, the invalid characters (if that's what they are) should each be 
> replaced with a single replacement character, such as "?", and the valid
> characters in the rest of the header should be displayed as is.

Exactly.
I *think* I have numerous Unicode fonts, but it's not clear to me how to 
select one for utf-8 use.  SM prefs let me select fonts for various 
lanugages, but UTF-8 is not one of the languages for which I can select
a font (naturally enough, I suppose). 

I am seeing this problem on two different systems.  But the actual character
being displayed repeatedly in the From heading is different on the two.
I can get a (better) screen capture from the other system, shortly.
I suspect these are the characters you were expecting to see.
The question is: why are they repeated all the way across the From header ?
Similar result is observed with your test data when Default Character Eincoding = ISO-2022-JP.
What will happen when "character encoding" setting of tools/options is changed to windows-1252 or iso-8859-1 from UTF-8?
 - Tools/Options/Display/Formating/Fonts :
   - Character Encodings :
     - Incoming Mail : western(windows-1252) or iso-8859-1

"View/Character Encoding" is applied to "message text body" and "message header pane", but doesn't seem to apply to thread pane(mail list pane) when header is not encoded.
Since no encoding in From: header, RFC defines charset as us-ascii, but Tb seems to try to use "Tools/Options/Character Encoding" always in such case for user's convenience.
As far as I remember, "enhancement of header data display in thread pane"(more simple way to forcing specific charset or option in display) is already requested in case of invalidly encoded header or invalid data for header encoding like your case. But I can't recall bug number...
(In reply to comment #0)
> Is the default character set for displayed messages a per-folder setting?
> or per-server setting? 
> Or a single pref for all folders/servers? (as I thought)

Sorry but I missed important your comment.
"Default Chracter Encoding" setting was already changed to per-folder from per-account.
  Context menu of folder/news-group => Properties => General Information tab  
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: