Open Bug 244829 Opened 21 years ago Updated 3 years ago

utf-16 text file attached incorrectly decoded for in-line display

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: olivier.vit, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl)

Attachments

(4 files)

sample text in unicode U-DOS (done with ultraedit 10) 21 years ago Olivier Vit (just a reporter) 198 bytes, application/zip		Details
screen shot 21 years ago Olivier Vit (just a reporter) 12.38 KB, image/gif		Details
message source, base64 encoding 21 years ago Olivier Vit (just a reporter) 1.46 KB, text/plain		Details
Screenshot on 2019-04-25 at 19:10:18.png 6 years ago be1310 576.29 KB, image/png		Details

Olivier Vit (just a reporter)

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526 MultiZilla/1.5.0.4h Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040526 displays unicode text (french, english) in an attached file, as chinese/japanese inline (even mixed with iso-latin characters) Reproducible: Always Steps to Reproduce: 1.Send a unicode text file as attachment to yourself. Actual Results: 2.When received, is displayed in-line with a mix of occidental characters and asian signs (japanese or chinese, I don't know) 3.Hopefully, when you save the file attachement, you get it back as a text ! Expected Results: Display the text content Or do not display inline Saving the file allows to access the data with another software :(-

Olivier Vit (just a reporter)

Reporter

Comment 1

•

21 years ago

Attached file sample text in unicode U-DOS (done with ultraedit 10) — Details

zipped to preserve the file encoding

Olivier Vit (just a reporter)

Reporter

Comment 2

•

21 years ago

Attached image screen shot — Details

Olivier Vit (just a reporter)

Reporter

Comment 3

•

21 years ago

Attached file message source, base64 encoding — Details

Olivier Vit (just a reporter)

Reporter

Updated

•

21 years ago

Flags: blocking1.7?

chris hofmann

Comment 4

•

21 years ago

down to the point where we are just trying to fix regressions. is this a regression? and if so any ideas when it might have been introduced?

(not reading, please use seth@sspitzer.org instead)

Comment 5

•

21 years ago

I'm not sure if this is a recent regression. I doubt it, but I'll compare against 1.4 and 1.6. some notes: 1) when sending this file from OE6, they send as "Content-Disposition: attachment;" 2) when sending this file from 1.7, we send as "Content-Disposition: inline;" 3) when OE receives the attachment version, it seems to handle it best. it doesn't show it inline, and if you go to open the attachment, it opens it into an external txt viewer: notepad 4) when 1.7 receives the attachment version, we try to show it inline. my guess is that we know we can render text/plain inline, so we try. but we do a poor job possibly because there is no charset (or something) on the attachment? 5) when I double click on the attachment in 1.7, and we load the attachment (in the browser), it seems to display correctly. (just inline is bad) 6) maybe we are supposed to be specifying the charset when we attach a unicode file, like this? Content-Type: text/plain; charset=<something> name="spip-boucles-fake.txt" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="spip-boucles-fake.txt"

Status: UNCONFIRMED → NEW

Ever confirmed: true

(not reading, please use seth@sspitzer.org instead)

Comment 6

•

21 years ago

simon and jshin know way more about this than me, perhaps they have thoughts.

Jungshik Shin

Comment 7

•

21 years ago

(In reply to comment #5) > I'm not sure if this is a recent regression. > I doubt it, but I'll compare against 1.4 and 1.6. I also doubt it, but anyway your testing result would be nice to have. > 6) maybe we are supposed to be specifying the charset when we attach a unicode > file, like this? Yeah, we may have to, but we have to come up with a way that doesn't burden Mom'n'Pop users who've got little clue about 'charset'. Currently, I think it's assumed that text attachment has the same character encoding as the main body of the message. It doesn't hold in cases like this but in the majority of the cases it holds (although it may change as time goes by). This case has another twist. The attached file in the sample message uploaded here is in UTF-16 with BOM at the beginning, which is why notepad has no trouble opening it (it detects the BOM and does the right thing). If it's a web page, we'd have no trouble because for a web page we subject it to multiple mechanisms to determine the character encoding, one of which is BOM detection. For mail attachment(in text), I'm not sure what exactly we do. In summing up, there are two aspects in this bug. One is how to add the least 'obstructive' UI(or if possible, automatic way) to figure out the character encoding of a text attachment and add that information explicitly to 'text/*' attachment (especially text/plain). The other is how to figure out the character encoding of an unlabelled text attachment (which may be different from those of other 'text' parts of the same message).

Keywords: intl

(not reading, please use seth@sspitzer.org instead)

Comment 8

•

21 years ago

>> I'm not sure if this is a recent regression. >> I doubt it, but I'll compare against 1.4 and 1.6. > >I also doubt it, but anyway your testing result would be nice to have. I tested both 1.4 and 1.6 release bits, and this bug exists there as well. since it is not a recent regression, blocking 1.7-. jshin, thanks for the info.

Flags: blocking1.7? → blocking1.7-

(not reading, please use seth@sspitzer.org instead)

Comment 9

•

21 years ago

Comment 10

•

21 years ago

Well, this bug doesn't depend on bug 236941. It might be argued that it's related to that bug remotely, but that's about it.

No longer depends on: 236941

Olivier Vit (just a reporter)

Reporter

Comment 11

•

21 years ago

just a comment: the fact that 'we' display part of the content as chinese/japanese characters shows that unicode is recognized as the charset, but not the right flavor of it ? (UTF-8 / UTF-16 for example) ?

ISHIKAWA, Chiaki

Comment 12

•

21 years ago

Could people interested in this bug report please take a look at Bug 241821 ? My original subject/phrasing for 241821 is not appropriate now that I understand the nature of the problem. I agree now with "do not show the attachment inline" sentiment. I wonder what others think.

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Olivier Vit (just a reporter)

Reporter

Comment 13

•

20 years ago

Still occuring in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050306 From - Mon Mar 14 11:23:10 2005 X-Account-Key: account3 X-UIDL: 1110795605.16178.mrelay4-1 X-Mozilla-Status: 0001 X-Mozilla-Status2: 10000000 Return-Path: <olivier.vit@free.fr> Delivered-To: online.fr-olivier.vit@free.fr Received: (qmail 16077 invoked from network); 14 Mar 2005 10:20:04 -0000 Received: from florius.duke-interactive.net (62.39.136.162) by mrelay4-1.free.fr with SMTP; 14 Mar 2005 10:20:04 -0000 Received: from aph-aug-103-2-1-4.w193-252.abo.wanadoo.fr ([193.252.201.4] helo=mail.duke-interactive.com) by florius.duke-interactive.net with smtp (Exim 4.44) id 1DAmg8-0004d9-3b for olivier.vit@free.fr; Mon, 14 Mar 2005 11:20:04 +0100 Received: from [10.42.10.79] (helo=[10.42.10.79]) by mail.duke-interactive.com with esmtp (Exim 3.35 #1 (Debian)) id 1DAmfu-0003JW-00 for <olivier.vit@free.fr>; Mon, 14 Mar 2005 11:19:50 +0100 Message-ID: <423565F0.6020908@free.fr> Date: Mon, 14 Mar 2005 11:22:40 +0100 From: Olivier Vit <olivier.vit@free.fr> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050306 MIME-Version: 1.0 To: undisclosed-recipients:; Subject: test utf8 Content-Type: multipart/mixed; boundary="------------050801070906090602040700" X-Duke-MailScanner: Found to be clean This is a multi-part message in MIME format. --------------050801070906090602040700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit --------------050801070906090602040700 Content-Type: text/plain; name="spip-boucles-fake.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="spip-boucles-fake.txt" //4NAAoADQAKAHEAcwBtAGQAbABtAHEAcwBsACAAcQBzAA0ACgANAAoADQAKAA0ACgBxAGQA cQBzAGQAawBzAHEAZAANAAoAcQBzAGQAawBzAHEAagBkACAAcwBxAA0ACgBkAHEAcwBrAGQA cQBzAGQAcwBxAGQADQAKAA== --------------050801070906090602040700--

Mike Cowperthwaite

Comment 14

•

20 years ago

Looking at the sample message, I am not (initially) seeing the Asian characters displayed in the attachment. (The test would have been more useful if composed of some actual text, rather than random characters.) I have Auto-detect OFF. When I select the message, TB's encoding menu shows ISO-8859-1 selected (which is my default encoding); the inline'd attachment is shown as: ÿþ which I think is the "Little Endian" flag for 16-bit encoded messages. If I save the attachment and open it in MS Word, it identifies the file as "little- endian Unicode." Word does not say UTF-16, so I'm not sure exactly which encoding this attachment has; but looking at it in hex, the text seems to be simple 7-bit ASCII values encoded, lo-byte first, in 16 bits. If I add all the various Unicode encoding varieties to my "custom list" and select them all in turn, none of them display correctly. Naturally, all of the 16- and 32-bit varieties display the message body as question-marks; selecting any of the UTF-16 varieties, the attachment text is misdisplayed but partly legible, and I see one ideogram character in the attachment text. Note: If I tweak the message to add an explicit "charset=utf-16" to the attachment's headers, the attachment displays inline but not quite right, appearing just as it did when I selected UTF-16 from the encoding menu for the original message; so the problem is not (entirely) that the charset is missing. Bug 238152 is about doing something to specify a correct charset when attaching a text/plain file.

Severity: critical → normal

OS: Windows XP → Windows 2000

Summary: utf text file attached displayed in-line as chinese/japanese text (big5) → utf-16 text file attached incorrectly decoded for in-line display

Olivier Vit (just a reporter)

Reporter

Comment 15

•

20 years ago

I recently noticed that on Win 2000 it doesn't display asian characters but just blanks May be some behaviour is it specific to XP ?

Jungshik Shin

Comment 16

•

20 years ago

> save the attachment and open it in MS Word, it identifies the file as "little- > endian Unicode." Word does not say UTF-16, so I'm not sure exactly which > encoding this attachment has; but looking at it in hex, the text seems to be > simple 7-bit ASCII values encoded, lo-byte first, in 16 bits. That is UTF-16LE (little endian) is :-)

(not reading, please use seth@sspitzer.org instead)

Comment 17

•

18 years ago

sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.

Assignee: sspitzer → nobody

Wayne Mery (:wsmwk)

Updated

•

17 years ago

QA Contact: attachments

Nobody; OK to take it and work on it

Assignee

Updated

•

17 years ago

Product: Core → MailNews Core

WADA:World Anti-bad-Duping Agency

Updated

•

11 years ago

Blocks: 604284

WADA:World Anti-bad-Duping Agency

Updated

•

11 years ago

Comment 18

•

6 years ago

Attached image Screenshot on 2019-04-25 at 19:10:18.png — Details

tbird 60.6.1 (64-bit)
this bug appears in my inline display of an attached utf-16 file txt. the attachment seems to be fine but the inline display of the txt file dislays the first line properly (the first line is in all caps), followed by asian characters.

Alfred Peters [:infofrommozilla]

Updated

•

5 years ago

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.