If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

utf-16 text file attached incorrectly decoded for in-line display

NEW
Unassigned

Status

MailNews Core
Attachments
14 years ago
a year ago

People

(Reporter: Olivier Vit (just a reporter), Unassigned)

Tracking

(Blocks: 1 bug, {intl})

Trunk
x86
Windows 2000
Bug Flags:
blocking1.7 -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526 MultiZilla/1.5.0.4h
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526

displays unicode text (french, english) in an attached file, as chinese/japanese
inline (even mixed with iso-latin characters)

Reproducible: Always
Steps to Reproduce:
1.Send a unicode text file as attachment to yourself.



Actual Results:  
2.When received, is displayed in-line with a mix of occidental characters and
asian signs (japanese or chinese, I don't know)
3.Hopefully, when you save the file attachement, you get it back as a text !

Expected Results:  
Display the text content
Or do not display inline

Saving the file allows to access the data with another software :(-
(Reporter)

Comment 1

14 years ago
Created attachment 149397 [details]
sample text in unicode U-DOS (done with ultraedit 10)

zipped to preserve the file encoding
(Reporter)

Comment 2

14 years ago
Created attachment 149398 [details]
screen shot
(Reporter)

Comment 3

14 years ago
Created attachment 149399 [details]
message source, base64 encoding
(Reporter)

Updated

14 years ago
Flags: blocking1.7?

Comment 4

14 years ago
down to the point where we are just trying to fix regressions.  is this a
regression? and if so any ideas when it might have been introduced?
I'm not sure if this is a recent regression.  
I doubt it, but I'll compare against 1.4 and 1.6.

some notes:

1)  when sending this file from OE6, they send as "Content-Disposition: attachment;"
2)  when sending this file from 1.7, we send as "Content-Disposition: inline;"

3)

when OE receives the attachment version, it seems to handle it best.   it
doesn't show it inline, and if you go to open the attachment, it opens it into
an external txt viewer:  notepad

4)

when 1.7 receives the attachment version, we try to show it inline.  my guess is
that we know we can render text/plain inline, so we try.  but we do a poor job
possibly because there is no charset (or something) on the attachment?

5)

when I double click on the attachment in 1.7, and we load the attachment (in the
browser), it seems to display correctly.  (just inline is bad)

6) maybe we are supposed to be specifying the charset when we attach a unicode
file, like this?

Content-Type: text/plain; charset=<something>
	name="spip-boucles-fake.txt"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
	filename="spip-boucles-fake.txt"
Status: UNCONFIRMED → NEW
Ever confirmed: true
simon and jshin know way more about this than me, perhaps they have thoughts.

Comment 7

14 years ago
(In reply to comment #5)
> I'm not sure if this is a recent regression.  
> I doubt it, but I'll compare against 1.4 and 1.6.

I also doubt it, but anyway your testing result would be nice to have.

> 6) maybe we are supposed to be specifying the charset when we attach a unicode
> file, like this?

Yeah, we may have to, but we have to come up with a way that doesn't burden
Mom'n'Pop users who've got little clue about 'charset'. Currently, I think it's
assumed that text attachment has the same character encoding as the main body of
the message. It doesn't hold in cases like this but in the majority of the cases
it holds (although it may change as time goes by). 

This case has another twist. The attached file in the sample message uploaded
here is in UTF-16 with BOM at the beginning, which is why notepad has no trouble
opening it (it detects the BOM and does the right thing).  If it's a web page,
we'd have no trouble because for a web page we subject it to multiple mechanisms
to determine the character encoding, one of which is BOM detection. For mail
attachment(in text), I'm not sure what exactly we do. 

In summing up, there are two aspects in this bug. One is how to add the least
'obstructive' UI(or if possible, automatic way) to figure out the character
encoding of a text attachment and add that information explicitly to 'text/*'
attachment (especially text/plain). The other is how to figure out the character
encoding of an unlabelled text attachment (which may be different from those of
other 'text' parts of the same message).

Keywords: intl
>> I'm not sure if this is a recent regression.  
>> I doubt it, but I'll compare against 1.4 and 1.6.
>
>I also doubt it, but anyway your testing result would be nice to have.

I tested both 1.4 and 1.6 release bits, and this bug exists there as well.

since it is not a recent regression, blocking 1.7-.

jshin, thanks for the info.  
Flags: blocking1.7? → blocking1.7-
see also possibly related bug #236941.
Depends on: 236941

Comment 10

14 years ago
Well, this bug doesn't depend on bug 236941. It might be argued that it's
related to that bug remotely, but that's about it.  
No longer depends on: 236941
(Reporter)

Comment 11

14 years ago
just a comment: the fact that 'we' display part of the content as
chinese/japanese characters shows that unicode is recognized as the charset, but
not the right flavor of it ? (UTF-8 / UTF-16 for example) ?

Comment 12

13 years ago
Could people interested in this bug report please take
a look at Bug 241821 ?

My original subject/phrasing for 241821 is not
appropriate now that I understand the nature of the problem.

I agree now with "do not show the attachment inline"
sentiment.

I wonder what others think.

 
Product: MailNews → Core
(Reporter)

Comment 13

13 years ago
Still occuring in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2)
Gecko/20050306


From - Mon Mar 14 11:23:10 2005
X-Account-Key: account3
X-UIDL: 1110795605.16178.mrelay4-1
X-Mozilla-Status: 0001
X-Mozilla-Status2: 10000000
Return-Path: <olivier.vit@free.fr>
Delivered-To: online.fr-olivier.vit@free.fr
Received: (qmail 16077 invoked from network); 14 Mar 2005 10:20:04 -0000
Received: from florius.duke-interactive.net (62.39.136.162)
  by mrelay4-1.free.fr with SMTP; 14 Mar 2005 10:20:04 -0000
Received: from aph-aug-103-2-1-4.w193-252.abo.wanadoo.fr ([193.252.201.4]
helo=mail.duke-interactive.com)
	by florius.duke-interactive.net with smtp (Exim 4.44)
	id 1DAmg8-0004d9-3b
	for olivier.vit@free.fr; Mon, 14 Mar 2005 11:20:04 +0100
Received: from [10.42.10.79] (helo=[10.42.10.79])
	by mail.duke-interactive.com with esmtp (Exim 3.35 #1 (Debian))
	id 1DAmfu-0003JW-00
	for <olivier.vit@free.fr>; Mon, 14 Mar 2005 11:19:50 +0100
Message-ID: <423565F0.6020908@free.fr>
Date: Mon, 14 Mar 2005 11:22:40 +0100
From: Olivier Vit <olivier.vit@free.fr>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050306
MIME-Version: 1.0
To: undisclosed-recipients:;
Subject: test utf8
Content-Type: multipart/mixed;
 boundary="------------050801070906090602040700"
X-Duke-MailScanner: Found to be clean

This is a multi-part message in MIME format.
--------------050801070906090602040700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit



--------------050801070906090602040700
Content-Type: text/plain;
 name="spip-boucles-fake.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="spip-boucles-fake.txt"

//4NAAoADQAKAHEAcwBtAGQAbABtAHEAcwBsACAAcQBzAA0ACgANAAoADQAKAA0ACgBxAGQA
cQBzAGQAawBzAHEAZAANAAoAcQBzAGQAawBzAHEAagBkACAAcwBxAA0ACgBkAHEAcwBrAGQA
cQBzAGQAcwBxAGQADQAKAA==
--------------050801070906090602040700--

Comment 14

13 years ago
Looking at the sample message, I am not (initially) seeing the Asian characters 
displayed in the attachment.  (The test would have been more useful if composed 
of some actual text, rather than random characters.)

I have Auto-detect OFF.  When I select the message, TB's encoding menu shows  
ISO-8859-1  selected (which is my default encoding); the inline'd attachment is 
shown as:
  ÿþ
which I think is the "Little Endian" flag for 16-bit encoded messages.  If I 
save the attachment and open it in MS Word, it identifies the file as "little-
endian Unicode."  Word does not say UTF-16, so I'm not sure exactly which 
encoding this attachment has; but looking at it in hex, the text seems to be 
simple 7-bit ASCII values encoded, lo-byte first, in 16 bits.

If I add all the various Unicode encoding varieties to my "custom list" and 
select them all in turn, none of them display correctly.  Naturally, all of the 
16- and 32-bit varieties display the message body as question-marks; selecting 
any of the UTF-16 varieties, the attachment text is misdisplayed but partly 
legible, and I see one ideogram character in the attachment text.

Note: If I tweak the message to add an explicit "charset=utf-16" to the 
attachment's headers, the attachment displays inline but not quite right, 
appearing just as it did when I selected UTF-16 from the encoding menu for the 
original message; so the problem is not (entirely) that the charset is missing.


Bug 238152 is about doing something to specify a correct charset when attaching 
a text/plain file.
Severity: critical → normal
OS: Windows XP → Windows 2000
Summary: utf text file attached displayed in-line as chinese/japanese text (big5) → utf-16 text file attached incorrectly decoded for in-line display
(Reporter)

Comment 15

13 years ago
I recently noticed that on Win 2000 it doesn't display asian characters but just
blanks
May be some behaviour is it specific to XP ?

Comment 16

13 years ago
 
> save the attachment and open it in MS Word, it identifies the file as "little-
> endian Unicode."  Word does not say UTF-16, so I'm not sure exactly which 
> encoding this attachment has; but looking at it in hex, the text seems to be 
> simple 7-bit ASCII values encoded, lo-byte first, in 16 bits.

That is UTF-16LE (little endian) is :-)
 
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
QA Contact: attachments
(Assignee)

Updated

9 years ago
Product: Core → MailNews Core
Blocks: 604284
See Also: → bug 139686
You need to log in before you can comment on or make changes to this bug.