Correct misuse of "UTF-16BE", "UTF-16LE", "UTF-32BE", and "UTF-32LE" charset labels

RESOLVED FIXED in mozilla1.9.2a1

Status

()

Core
Internationalization
RESOLVED FIXED
12 years ago
6 years ago

People

(Reporter: emk, Assigned: emk)

Tracking

({fixed1.9.1, late-l10n})

Trunk
mozilla1.9.2a1
x86
Windows XP
fixed1.9.1, late-l10n
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Assignee)

Description

12 years ago
These meanings are clearly defined in RFC 2781[1] and Unicode 4.0[2].
However some Mozilla codes don't obey the rule.

[1] http://www.ietf.org/rfc/rfc2781.txt
[2] http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G28070
(Assignee)

Comment 1

12 years ago
Created attachment 219878 [details] [diff] [review]
Patch rv 1.0

This patch does the following things:
1. Adding "UTF-32" converter (neither "UTF-32BE" nor "UTF-32LE" is equivalent to "UTF-32").
2. Use "UTF-16" or "UTF-32" when endianness is determined by BOM.
3. Renaming charset titles.
Assignee: smontagu → VYV03354
Status: NEW → ASSIGNED
Attachment #219878 - Flags: review?(jshin1987)

Comment 2

10 years ago
Masatoshi should this review also be switched?
QA Contact: amyy → i18n
(Assignee)

Updated

10 years ago
Attachment #219878 - Flags: review?(jshin1987) → review?(smontagu)
(Assignee)

Comment 3

10 years ago
Comment on attachment 219878 [details] [diff] [review]
Patch rv 1.0

Thanks for reminding me, Wayne. Switching the reviewer.
Comment on attachment 219878 [details] [diff] [review]
Patch rv 1.0

>Index: intl/uconv/ucvlatin/nsUTF32ToUnicode.cpp

>+NS_IMETHODIMP nsUTF32ToUnicode::Convert(const char * aSrc, 
>+                                        PRInt32 * aSrcLength, 
>+                                        PRUnichar * aDest, 
>+                                        PRInt32 * aDestLength)
>+{
>+  if (4 == mState) // Called for the first time.
>+  {
>+    NS_ASSERTION(*aSrcLength >= 4, "Too few bytes in input");

Please change this to 
+    if (*aSrcLength < 4)
+      return NS_ERROR_ILLEGAL_INPUT;
as in the current version of nsUCS2BEToUnicode.cpp


>+
>+    // check if BOM (0xFEFF) is at the beginning, remove it if found, and
>+    // set mEndian accordingly.
>+    if (0xFF == PRUint8(aSrc[0]) && 0xFE == PRUint8(aSrc[1]) &&
>+        0 == PRUint8(aSrc[2]) && 0 == PRUint8(aSrc[3])) {
>+      aSrc += 4;
>+      *aSrcLength -= 4;
>+      mState = 0;
>+      mEndian = kLittleEndian;

mEndian is only used in this method, so it doesn't need to be a member variable.

>Index: netwerk/streamconv/converters/nsUnknownDecoder.cpp

>   if (mBufferLen >= 4) {
>     const unsigned char* buf = (const unsigned char*)mBuffer;
>-    if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16BE
>-        (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16LE
>+    if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16, Big Endian
>+        (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16 or UCS-4, Little Endian
>         (buf[0] == 0xEF && buf[1] == 0xBB && buf[2] == 0xBF) || // UTF-8
>-        (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF) || // UCS-4BE
>-        (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFF && buf[3] == 0xFE)) { // UCS-4
>+        (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF)) { // UCS-4, Little Endian

should be "Big Endian"

It would be good to have test-cases for UTF-32 equivalent to test_bug340714.js, test_bug396637.js for UTF-16.

Also test_bug317216.js, which is not checked in yet, but is in attachment 288642 [details] [diff] [review] in the bug.
(Assignee)

Comment 5

10 years ago
> mEndian is only used in this method, so it doesn't need to be a member
> variable.
Really? mEndian will be initialized only when nsUTF32ToUnicode::Convert is called for the first time. How to determine the last argument for ConvertCommon (that is, an endianness) when it is called for the second time (or later)?
Ah. Yes, my mistake.
(Assignee)

Comment 7

10 years ago
Created attachment 342124 [details] [diff] [review]
unbitrotted, resolved review comments

ok. fixed other points.
Attachment #219878 - Attachment is obsolete: true
Attachment #342124 - Flags: review?(smontagu)
Attachment #219878 - Flags: review?(smontagu)
Comment on attachment 342124 [details] [diff] [review]
unbitrotted, resolved review comments

r+moa=smontagu with 2 nits:

>+++ b/intl/uconv/tests/unit/test_bug335531.js
>@@ -0,0 +1,228 @@
>+/* Test case for bug 335531
>+ *
>+ * Uses nsIConverterInputStream to decode UTF-16 text with all combinations
>+ * of UTF-16BE and UTF-16LE with and without BOM.
>+ *
>+ * Sample text is: "�B���u �����p�����|�y�r���u ���u�}���y ���������w�y �t�����s �~�p �t�����s�p, �{�p�w�t�p�� �~�u�����p�����|�y�r�p�� ���u�}���� �~�u�����p�����|�y�r�p ����-���r���u�}��."

Use UTF-8 for the text in comment.

>+++ b/intl/uconv/tests/unit/test_bug457886.js

This file doesn't belong in this patch.
Attachment #342124 - Flags: review?(smontagu) → review+
(Assignee)

Comment 9

10 years ago
Created attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]
Attachment #342124 - Attachment is obsolete: true
Attachment #342236 - Flags: superreview?(dbaron)
Attachment #342236 - Flags: review+
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]

sr=dbaron.  Sorry for the delay.

Beware of potential conflicts with bug 462458 (although you'll hopefully land first).
Attachment #342236 - Flags: superreview?(dbaron) → superreview+
Keywords: checkin-needed

Updated

10 years ago
Attachment #342236 - Flags: approval1.9.1b2?

Updated

10 years ago
Keywords: checkin-needed
Attachment #342236 - Flags: approval1.9.1b2?
Attachment #342236 - Flags: approval1.9.1b2-
Attachment #342236 - Flags: approval1.9.1?
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]

a191=beltzner
Attachment #342236 - Flags: approval1.9.1? → approval1.9.1+
(Assignee)

Updated

10 years ago
Keywords: checkin-needed
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]

http://hg.mozilla.org/mozilla-central/rev/78d662c2c878
Attachment #342236 - Attachment description: fixed nits → fixed nits [Checkin: Comment 12]
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [c-n: baking for 1.9.1]
Target Milestone: --- → mozilla1.9.2a1

Comment 13

10 years ago
How is this supposed to reach localizations?
Backed out as
http://hg.mozilla.org/mozilla-central/rev/024fa1c26e34
due to suspicion of causing a mochitest failure in test_bug399284.html

Failure is:
*** 38979 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug399284.html | Error thrown during test: uncaught exception: [Exception... "Operation is not supported"  code: "9" nsresult: "0x80530009 (NS_ERROR_DOM_NOT_SUPPORTED_ERR)"  location: "http://localhost:8888/tests/layout/base/tests/test_bug399284.html Line: 111"] - got 0, expected 1

Failing test is:
http://mxr.mozilla.org/mozilla-central/source/layout/base/tests/test_bug399284.html

Sample failure logs:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501399.1228505155.7238.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Keywords: checkin-needed
Whiteboard: [c-n: baking for 1.9.1]
Attachment #342236 - Attachment description: fixed nits [Checkin: Comment 12] → fixed nits [Backout: Comment 14]
It's possible that test_bug339284.html depends on the behaviour that this patch is fixing. I'll look into it.
Repushed with a fix to test_bug339284.html: http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a

(the problem was that test_bug339284.html iterates over all decoders, but has some special-casing for UTF-16-* and UTF-32-*, so the special-casing had to be extended to cover the new UTF-32 decoder).
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Attachment #342236 - Attachment description: fixed nits [Backout: Comment 14] → fixed nits [Checked in: Comment 16]
(In reply to comment #16)
> Repushed with a fix to test_bug339284.html:
> http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a

You probably want to attach this new patch and re-set the 1.9.1 whiteboard and keyword stuff...

Updated

10 years ago
Whiteboard: [c-n: baking for 1.9.1]
Created attachment 351709 [details] [diff] [review]
Patch checked in

Transferrinf r+sr; I can't transfer approval1.9.1+
Attachment #342236 - Attachment is obsolete: true
Attachment #351709 - Flags: superreview+
Attachment #351709 - Flags: review+
(Assignee)

Comment 19

10 years ago
(In reply to comment #13)
> How is this supposed to reach localizations?
We will need a localization patch like attachment 328242 [details] [diff] [review].
Keywords: late-l10n
Comment on attachment 351709 [details] [diff] [review]
Patch checked in

per comment18.
Attachment #351709 - Flags: approval1.9.1?
Comment on attachment 351709 [details] [diff] [review]
Patch checked in

a191=beltzner
Attachment #351709 - Flags: approval1.9.1? → approval1.9.1+
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/946ed6188172
Keywords: fixed1.9.1
Whiteboard: [c-n: baking for 1.9.1]

Comment 23

10 years ago
What is going on with the l10n side of this bug?
(Assignee)

Comment 24

10 years ago
Created attachment 352577 [details] [diff] [review]
l10n patch

like this?
Attachment #352577 - Flags: review?(l10n)
(Assignee)

Comment 25

10 years ago
(Note that charsetTitles.properties also needs to be changed, but I think it's a localizers' job.)

Comment 26

10 years ago
Comment on attachment 352577 [details] [diff] [review]
l10n patch

Yeah. We should've changed the key to get this change notified to localizers.

Is there an equivalent sed script? Might be easier to announce that than to live with the fallout of cross landing those patches on hg.

On a different note, why do we have this localizable in the first place?
Attachment #352577 - Flags: review?(l10n) → review+
(Assignee)

Comment 27

10 years ago
(In reply to comment #26)
> Is there an equivalent sed script?
I made this patch using the following sed command. Replace '*' with your language code:
sed 's/UTF-16BE, UTF-32LE/UTF-16BE, UTF-32, UTF-32LE/' */toolkit/chrome/global/intl.properties
> On a different note, why do we have this localizable in the first place?
Good point. I'll file a follw-up bug to sort out the problem.

Comment 28

10 years ago
Posted in .l10n now.
(Assignee)

Comment 29

10 years ago
(In reply to comment #27)
> I'll file a follw-up bug to sort out the problem.
Filed bug 470209.
(Assignee)

Updated

6 years ago
Depends on: 809934
You need to log in before you can comment on or make changes to this bug.