Closed Bug 153855 Opened 22 years ago Closed 18 years ago

Composer does not display a UTF-16 signature in the right way.

Categories

(MailNews Core :: Internationalization, defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.8.1beta2

People

(Reporter: tjibbe, Assigned: masayuki)

References

Details

(Keywords: intl, verified1.8.1)

Attachments

(2 files, 4 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1a) Gecko/20020611
BuildID:    20002061104

When using a UTF-8 encoded file as signature, composer only shows the first
three characters of the file, as if the signature file is not converted.

Reproducible: Always
Steps to Reproduce:
1.Write  a signature file and save it as UTF-8
2.Tell Mozilla to use this file for your signature.
3.Compose a message.

Actual Results:  With my signature file,the following three lines are added in
the composer window:

-- 
ÿþM


Expected Results:  It should display as:

-- 
Met vriendelijke groeten,
Tjibbe Steneker.

The character "ij" is only available in the Unicode charset.
Attached file My signature file.
I just attached the signature file I use.
As per bug 135762, please replace &307; in the original report with the U+0133
(LATIN SMALL LIGATURE IJ) character.
This happens in Linux, too. The signature's interpreted as ISO-8859-1 for some
reason. Even when I start the program with a UTF-8 locale.

This seems to be some old source code that has never been modified. =P
Dave Oftedal, bug 52248 appears to address the BSD (and presumably Linux/*nix) 
sig encoding issue; there is some discussion there for workarounds by setting 
the system locale to use UTF-8.  (This is for plain-text sigs; HTML sigs are bug 
138008.)
See also bug 180985, which appears to be about using UTF-8 (or other encoding) 
for the filename of the sig.
I could not find a dupe specific to Windows for plain-text UTF-8 sigs, so, 
confirming.  Attachment 1 [details] [diff] still fails, as described, even if the default 
encoding is UTF-8, with 1.4-RC1.
Status: UNCONFIRMED → NEW
Component: Composition → Internationalization
Ever confirmed: true
Keywords: intl
It's still visible in Thunderbird version 0.9 (20041103)

Do we have a chance to get this into the aviary branch? 
Flags: blocking-aviary1.0?
Product: MailNews → Core
too late for the 1.0 train now since there is not a patch yet and this is not a
stopper. =
Flags: blocking-aviary1.0? → blocking-aviary1.0-
This will fix by the latest patch of bug 201071.
Assignee: ducarroz → masayuki
Depends on: 201071
Flags: blocking-thunderbird2?
OS: Windows 98 → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.8.1alpha2
Status: NEW → ASSIGNED
i also have this problems with thunderbird 1.5 (20051201).
i have a UTF-8 textfile containing simplified chinese symbols.
when i use UTF-8 encoding for my outgoing emails as default the
UTF-8 becomes "double-converted", means the signature will be
converted, even if it already has the right encoding. this behaviour
makes it impossible to add special characters to your signature.
i can also reproduce this behaviour with german "umlaute" from the
ISO-8859-1. if i add them to my UTF-8 they are also "double-converted".
a possible fix would be to give people the ability to already set
the signature-encoding for each mail account. if the signature
encoding is different from the email-encoding, the signature will
be converted.
Flags: blocking-thunderbird2? → blocking-thunderbird2+
This is fixed on both trunk and 1.8 branch by bug 201071.
Now, we can use UTF-8 signature file on all platforms.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
(In reply to comment #10)
> This is fixed on both trunk and 1.8 branch by bug 201071.
> Now, we can use UTF-8 signature file on all platforms.

Masayuki Nakano, are you sure this is fixed?  Using today's Thunderbird trunk build (3a1-0418), Win2K, and the sig attached to this bug, I'm getting the same original symptom.
Mike:

the attached file is not UTF-8, that is encoded as UTF-16LE.
Because the signature file attached by the reporter is in UTF-16LE. This bug is not about UTF-8 signature files but about UTF-16 signature files. On non-Windows platforms, the chance is low that somebody makes a signature file in UTF-16, but on Windows, it's more likely. 
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Composer does not display a UTF-8 signature in the right way. → Composer does not display a UTF-16 signature in the right way.
Jungshik:

Do you have an idea of a way which checks whether the buffer is UTF-16?
Do we check only BOM?
Keywords: fixed1.8.1
(In reply to comment #14)

> Do you have an idea of a way which checks whether the buffer is UTF-16?
> Do we check only BOM?

That's tough with a plain text file _without_ any formatting. On Windows, checking for a BOM at the beginning and looking for embedded 0x00's (especially '0x0D 0x00 0x0A 0x00' == CRLF or '0x20 0x00' falling at a 'word' boundary) may work in most of cases. An additional check may be to see if the buffer can be round-tripped to and from UTF-16 as the default code page.  None of these (when tested by itself) is not very strong (especially the last one is  weak) but combined together, they can be rather robust. 

Anyway, I don't think it's a major bug not even 'normal'. 
Severity: major → minor
I don't think we should do more than look for a BOM.
Ah... Currenty, we don't support UTF-16 encoding even if the signature file is HTML format... We need to fix this issue...
Attached patch Patch rv1.0 (obsolete) — Splinter Review
This patch only checks BOM. I think that it's enough for supporting Windows. Because notepad always adds BOM.
Attachment #219178 - Flags: review?(jshin1987)
Status: REOPENED → ASSIGNED
Comment on attachment 219178 [details] [diff] [review]
Patch rv1.0

"UTF-16BE" and "UTF-16LE" means "There is no BOM". Although Mozilla uconv may perform the error recovering, it's invalid.
Can you use the "UTF-16" converter? It will auto-detect the endianness from BOM.
(In reply to comment #19)
> "UTF-16BE" and "UTF-16LE" means "There is no BOM". Although Mozilla uconv may
> perform the error recovering, it's invalid.

Really? The patch works fine.
Severity: minor → major
(In reply to comment #20)
> Really?
See RFC 2781.
http://www.ietf.org/rfc/rfc2781.txt
| Systems labelling UTF-16BE text MUST NOT prepend a BOM to the text.
| Systems labelling UTF-16LE text MUST NOT prepend a BOM to the text.
> The patch works fine.
It works thanks to the error recovery. It doesn't mean it's correct. Is tag soup correct if Mozilla (or MSIE, or whatever else) can parse it?

(In reply to comment #21)
> uconv removes the BOM if it's on its head.
See also Unicode Book 4.0.
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G28070
| In UTF-16BE, an initial byte sequence <FE FF> is interpreted as U+FEFF ZERO
WIDTH NO-BREAK SPACE.
| In UTF-16LE, an initial byte sequence <FF FE> is interpreted as U+FEFF ZERO
WIDTH NO-BREAK SPACE.
That is, we aren't supposed to remove it. So I said "it's invalid".

You should not rely on the current behavior. uconv may become more strict someday.
Severity: major → minor
Attached patch Patch rv1.1 (obsolete) — Splinter Review
Thank you, Kimura-san. I changed the point.
Attachment #219178 - Attachment is obsolete: true
Attachment #219300 - Flags: review?(jshin1987)
Attachment #219178 - Flags: review?(jshin1987)
(In reply to comment #22)
> (In reply to comment #20)
> > Really?
> See RFC 2781.
> http://www.ietf.org/rfc/rfc2781.txt

That RFC is about labelled encoding of MIME parts.  The file being read from Windows doesn't *have* a MIME label.  The "LE" and "BE" designations when talking about files are for the purposes of the people discussing it; the OS 
may have a standard way of doing it, but the file should have a BOM.  (And I 
see no point in keeping the BOM when inserting the sig into a message.)
(In reply to comment #24)
> That RFC is about labelled encoding of MIME parts.  The file being read from
> Windows doesn't *have* a MIME label.
uconv is also used for parsing MIME data. Therefore the meaning of encoding names should be much the MIME's one.

> The "LE" and "BE" designations when
> talking about files are for the purposes of the people discussing it;
We should not label "UTF16, with BOM, big endian" as "UTF-16BE" to avoid confising even if we are out of the MIME context.
Here is sample texts from RFC 2781:
|   Text labelled with UTF-16BE, without a BOM:
|   D8 08 DF 45 00 3D 00 52 00 61
|   Text labelled with UTF-16LE, without a BOM:
|   08 D8 45 DF 3D 00 52 00 61 00
|   Big-endian text labelled with UTF-16, with a BOM:
|   FE FF D8 08 DF 45 00 3D 00 52 00 61
|   Little-endian text labelled with UTF-16, with a BOM:
|   FF FE 08 D8 45 DF 3D 00 52 00 61 00
Notice that UTF-16s with BOM are never called as UTF-16BE/UTF-16LE.

> the OS may have a standard way of doing it, but the file should have a BOM. 
Then it should never be called as "UTF-16BE" or "UTF-16LE".
All you have to do is prepend BOM to the everything on the disk and call it as "UTF-16".

> (And I  see no point in keeping the BOM when inserting the sig into a message.)
It's critical about parsing XML document. uconv is not only for the signature parsing.
Correction:
> should be much
should match

Sorry for my poor English.
Comment on attachment 219300 [details] [diff] [review]
Patch rv1.1

I found a bug. The BOM is inserted to body of message. We should remove BOM if the encoding is UTF-16 or UTF-8.
Attachment #219300 - Flags: review?(jshin1987) → review-
Attached patch Patch rv1.2 (obsolete) — Splinter Review
removing BOM.
Attachment #219300 - Attachment is obsolete: true
Attachment #219574 - Flags: review?(jshin1987)
Attached patch Patch rv1.2 (obsolete) — Splinter Review
Sorry for the spam.
Attachment #219574 - Attachment is obsolete: true
Attachment #219577 - Flags: review?(jshin1987)
Attachment #219574 - Flags: review?(jshin1987)
why doesn't the unicode conversion remove the BOM?
Attached patch Patch rv1.3Splinter Review
I know why the BOM isn't removed. It's only when the signature file is UTF-8. In this case, |CopyUTF8toUTF16| is used instead of unicode decoder.
Attachment #219577 - Attachment is obsolete: true
Attachment #219604 - Flags: review?(jshin1987)
Attachment #219577 - Flags: review?(jshin1987)
Target Milestone: mozilla1.8.1alpha2 → mozilla1.8.1beta1
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

Simon:

Could you review this? Originally, this patch should be reviewed by jshin. But he is busy still now. We need this patch for Tb2.0, so I need a reviewer for this in early time. Could you check this?
Attachment #219604 - Flags: review?(jshin1987) → review?(smontagu)
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

I did review the patch, but forgot to log in (at my home, bugzilla keeps asking for login every single transaction I make) after pressing submit button but thought I had. 

>Index: mailnews/base/util/nsMsgI18N.cpp

>+             fSpec.GetFileSize() % 2 == 0 && fSpec.GetFileSize() >= 2 &&
>+             ((readBuf[0] == char(0xFE) && readBuf[1] == char(0xFF)) ||
>+              (readBuf[0] == char(0xFF) && readBuf[1] == char(0xFE)))) {
>+      sigEncoding.Assign("UTF-16");
>+    }

I'm not very happy about the above, but perhaps, it'd work almost all the time...
Attachment #219604 - Flags: review?(smontagu) → review?(jshin1987)
(In reply to comment #33)
> I'm not very happy about the above, but perhaps, it'd work almost all the
> time...

Do you have another idea?
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

r=jshin
Simon should be as good as me... 

(In reply to comment #33)
>> I'm not very happy about the above, but perhaps, it'd work almost all the
>> time...

>Do you have another idea?

  we can make a more complicated check, but I guess we can just get away with this simple-minded check given that UTF-16 is not likely to be used on platforms other than Windows and a similar method is used by Notepad/Wordpad on Windows.
Attachment #219604 - Flags: review?(jshin1987) → review+
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

Scott:
Would you check this?
Attachment #219604 - Flags: superreview?(mscott)
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

if jshin is happy with this approach than so am I.
Attachment #219604 - Flags: superreview?(mscott) → superreview+
checked-in. I'll request approval to 1.8 branch.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
-> v.
Status: RESOLVED → VERIFIED
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

Let's go to Tb2. This patch is needed by bug 201071 that is already checked-in to 1.8.1 branch. Of course, the risk is very low.

# What is different between approval-thunderbird2 and approval1.8.1?? Do I need both approval for check-in?
Attachment #219604 - Flags: approval1.8.1?
Attachment #219604 - Flags: approval-thunderbird2?
Comment on attachment 219604 [details] [diff] [review]
Patch rv1.3

a=drivers, land on the branch
Attachment #219604 - Flags: approval1.8.1? → approval1.8.1+
checked-in to 1.8 branch too.
Keywords: fixed1.8.1
-> v.1.8.1
Target Milestone: mozilla1.8.1beta1 → mozilla1.8.1beta2
Attachment #219604 - Flags: approval-thunderbird2?
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: