Open
Bug 244829
Opened 21 years ago
Updated 3 years ago
utf-16 text file attached incorrectly decoded for in-line display
Categories
(MailNews Core :: Attachments, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: olivier.vit, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intl)
Attachments
(4 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526 MultiZilla/1.5.0.4h
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526
displays unicode text (french, english) in an attached file, as chinese/japanese
inline (even mixed with iso-latin characters)
Reproducible: Always
Steps to Reproduce:
1.Send a unicode text file as attachment to yourself.
Actual Results:
2.When received, is displayed in-line with a mix of occidental characters and
asian signs (japanese or chinese, I don't know)
3.Hopefully, when you save the file attachement, you get it back as a text !
Expected Results:
Display the text content
Or do not display inline
Saving the file allows to access the data with another software :(-
Reporter | ||
Comment 1•21 years ago
|
||
zipped to preserve the file encoding
Reporter | ||
Comment 2•21 years ago
|
||
Reporter | ||
Comment 3•21 years ago
|
||
Reporter | ||
Updated•21 years ago
|
Flags: blocking1.7?
Comment 4•21 years ago
|
||
down to the point where we are just trying to fix regressions. is this a
regression? and if so any ideas when it might have been introduced?
Comment 5•21 years ago
|
||
I'm not sure if this is a recent regression.
I doubt it, but I'll compare against 1.4 and 1.6.
some notes:
1) when sending this file from OE6, they send as "Content-Disposition: attachment;"
2) when sending this file from 1.7, we send as "Content-Disposition: inline;"
3)
when OE receives the attachment version, it seems to handle it best. it
doesn't show it inline, and if you go to open the attachment, it opens it into
an external txt viewer: notepad
4)
when 1.7 receives the attachment version, we try to show it inline. my guess is
that we know we can render text/plain inline, so we try. but we do a poor job
possibly because there is no charset (or something) on the attachment?
5)
when I double click on the attachment in 1.7, and we load the attachment (in the
browser), it seems to display correctly. (just inline is bad)
6) maybe we are supposed to be specifying the charset when we attach a unicode
file, like this?
Content-Type: text/plain; charset=<something>
name="spip-boucles-fake.txt"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="spip-boucles-fake.txt"
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 6•21 years ago
|
||
simon and jshin know way more about this than me, perhaps they have thoughts.
Comment 7•21 years ago
|
||
(In reply to comment #5)
> I'm not sure if this is a recent regression.
> I doubt it, but I'll compare against 1.4 and 1.6.
I also doubt it, but anyway your testing result would be nice to have.
> 6) maybe we are supposed to be specifying the charset when we attach a unicode
> file, like this?
Yeah, we may have to, but we have to come up with a way that doesn't burden
Mom'n'Pop users who've got little clue about 'charset'. Currently, I think it's
assumed that text attachment has the same character encoding as the main body of
the message. It doesn't hold in cases like this but in the majority of the cases
it holds (although it may change as time goes by).
This case has another twist. The attached file in the sample message uploaded
here is in UTF-16 with BOM at the beginning, which is why notepad has no trouble
opening it (it detects the BOM and does the right thing). If it's a web page,
we'd have no trouble because for a web page we subject it to multiple mechanisms
to determine the character encoding, one of which is BOM detection. For mail
attachment(in text), I'm not sure what exactly we do.
In summing up, there are two aspects in this bug. One is how to add the least
'obstructive' UI(or if possible, automatic way) to figure out the character
encoding of a text attachment and add that information explicitly to 'text/*'
attachment (especially text/plain). The other is how to figure out the character
encoding of an unlabelled text attachment (which may be different from those of
other 'text' parts of the same message).
Keywords: intl
Comment 8•21 years ago
|
||
>> I'm not sure if this is a recent regression.
>> I doubt it, but I'll compare against 1.4 and 1.6.
>
>I also doubt it, but anyway your testing result would be nice to have.
I tested both 1.4 and 1.6 release bits, and this bug exists there as well.
since it is not a recent regression, blocking 1.7-.
jshin, thanks for the info.
Flags: blocking1.7? → blocking1.7-
Comment 10•21 years ago
|
||
Well, this bug doesn't depend on bug 236941. It might be argued that it's
related to that bug remotely, but that's about it.
No longer depends on: 236941
Reporter | ||
Comment 11•21 years ago
|
||
just a comment: the fact that 'we' display part of the content as
chinese/japanese characters shows that unicode is recognized as the charset, but
not the right flavor of it ? (UTF-8 / UTF-16 for example) ?
Comment 12•21 years ago
|
||
Could people interested in this bug report please take
a look at Bug 241821 ?
My original subject/phrasing for 241821 is not
appropriate now that I understand the nature of the problem.
I agree now with "do not show the attachment inline"
sentiment.
I wonder what others think.
Updated•21 years ago
|
Product: MailNews → Core
Reporter | ||
Comment 13•20 years ago
|
||
Still occuring in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2)
Gecko/20050306
From - Mon Mar 14 11:23:10 2005
X-Account-Key: account3
X-UIDL: 1110795605.16178.mrelay4-1
X-Mozilla-Status: 0001
X-Mozilla-Status2: 10000000
Return-Path: <olivier.vit@free.fr>
Delivered-To: online.fr-olivier.vit@free.fr
Received: (qmail 16077 invoked from network); 14 Mar 2005 10:20:04 -0000
Received: from florius.duke-interactive.net (62.39.136.162)
by mrelay4-1.free.fr with SMTP; 14 Mar 2005 10:20:04 -0000
Received: from aph-aug-103-2-1-4.w193-252.abo.wanadoo.fr ([193.252.201.4]
helo=mail.duke-interactive.com)
by florius.duke-interactive.net with smtp (Exim 4.44)
id 1DAmg8-0004d9-3b
for olivier.vit@free.fr; Mon, 14 Mar 2005 11:20:04 +0100
Received: from [10.42.10.79] (helo=[10.42.10.79])
by mail.duke-interactive.com with esmtp (Exim 3.35 #1 (Debian))
id 1DAmfu-0003JW-00
for <olivier.vit@free.fr>; Mon, 14 Mar 2005 11:19:50 +0100
Message-ID: <423565F0.6020908@free.fr>
Date: Mon, 14 Mar 2005 11:22:40 +0100
From: Olivier Vit <olivier.vit@free.fr>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050306
MIME-Version: 1.0
To: undisclosed-recipients:;
Subject: test utf8
Content-Type: multipart/mixed;
boundary="------------050801070906090602040700"
X-Duke-MailScanner: Found to be clean
This is a multi-part message in MIME format.
--------------050801070906090602040700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
--------------050801070906090602040700
Content-Type: text/plain;
name="spip-boucles-fake.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="spip-boucles-fake.txt"
//4NAAoADQAKAHEAcwBtAGQAbABtAHEAcwBsACAAcQBzAA0ACgANAAoADQAKAA0ACgBxAGQA
cQBzAGQAawBzAHEAZAANAAoAcQBzAGQAawBzAHEAagBkACAAcwBxAA0ACgBkAHEAcwBrAGQA
cQBzAGQAcwBxAGQADQAKAA==
--------------050801070906090602040700--
Comment 14•20 years ago
|
||
Looking at the sample message, I am not (initially) seeing the Asian characters
displayed in the attachment. (The test would have been more useful if composed
of some actual text, rather than random characters.)
I have Auto-detect OFF. When I select the message, TB's encoding menu shows
ISO-8859-1 selected (which is my default encoding); the inline'd attachment is
shown as:
ÿþ
which I think is the "Little Endian" flag for 16-bit encoded messages. If I
save the attachment and open it in MS Word, it identifies the file as "little-
endian Unicode." Word does not say UTF-16, so I'm not sure exactly which
encoding this attachment has; but looking at it in hex, the text seems to be
simple 7-bit ASCII values encoded, lo-byte first, in 16 bits.
If I add all the various Unicode encoding varieties to my "custom list" and
select them all in turn, none of them display correctly. Naturally, all of the
16- and 32-bit varieties display the message body as question-marks; selecting
any of the UTF-16 varieties, the attachment text is misdisplayed but partly
legible, and I see one ideogram character in the attachment text.
Note: If I tweak the message to add an explicit "charset=utf-16" to the
attachment's headers, the attachment displays inline but not quite right,
appearing just as it did when I selected UTF-16 from the encoding menu for the
original message; so the problem is not (entirely) that the charset is missing.
Bug 238152 is about doing something to specify a correct charset when attaching
a text/plain file.
Severity: critical → normal
OS: Windows XP → Windows 2000
Summary: utf text file attached displayed in-line as chinese/japanese text (big5) → utf-16 text file attached incorrectly decoded for in-line display
Reporter | ||
Comment 15•20 years ago
|
||
I recently noticed that on Win 2000 it doesn't display asian characters but just
blanks
May be some behaviour is it specific to XP ?
Comment 16•20 years ago
|
||
> save the attachment and open it in MS Word, it identifies the file as "little-
> endian Unicode." Word does not say UTF-16, so I'm not sure exactly which
> encoding this attachment has; but looking at it in hex, the text seems to be
> simple 7-bit ASCII values encoded, lo-byte first, in 16 bits.
That is UTF-16LE (little endian) is :-)
Comment 17•18 years ago
|
||
sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Updated•17 years ago
|
QA Contact: attachments
Assignee | ||
Updated•17 years ago
|
Product: Core → MailNews Core
Comment 18•6 years ago
|
||
tbird 60.6.1 (64-bit)
this bug appears in my inline display of an attached utf-16 file txt. the attachment seems to be fine but the inline display of the txt file dislays the first line properly (the first line is in all caps), followed by asian characters.
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•