Closed
Bug 335531
Opened 19 years ago
Closed 16 years ago
Correct misuse of "UTF-16BE", "UTF-16LE", "UTF-32BE", and "UTF-32LE" charset labels
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9.2a1
People
(Reporter: emk, Assigned: emk)
References
Details
(Keywords: fixed1.9.1, late-l10n)
Attachments
(2 files, 3 obsolete files)
|
40.65 KB,
patch
|
smontagu
:
review+
smontagu
:
superreview+
beltzner
:
approval1.9.1+
|
Details | Diff | Splinter Review |
|
76.13 KB,
patch
|
Pike
:
review+
|
Details | Diff | Splinter Review |
These meanings are clearly defined in RFC 2781[1] and Unicode 4.0[2].
However some Mozilla codes don't obey the rule.
[1] http://www.ietf.org/rfc/rfc2781.txt
[2] http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G28070
| Assignee | ||
Comment 1•19 years ago
|
||
This patch does the following things:
1. Adding "UTF-32" converter (neither "UTF-32BE" nor "UTF-32LE" is equivalent to "UTF-32").
2. Use "UTF-16" or "UTF-32" when endianness is determined by BOM.
3. Renaming charset titles.
| Assignee | ||
Updated•17 years ago
|
Attachment #219878 -
Flags: review?(jshin1987) → review?(smontagu)
| Assignee | ||
Comment 3•17 years ago
|
||
Comment on attachment 219878 [details] [diff] [review]
Patch rv 1.0
Thanks for reminding me, Wayne. Switching the reviewer.
Comment 4•17 years ago
|
||
Comment on attachment 219878 [details] [diff] [review]
Patch rv 1.0
>Index: intl/uconv/ucvlatin/nsUTF32ToUnicode.cpp
>+NS_IMETHODIMP nsUTF32ToUnicode::Convert(const char * aSrc,
>+ PRInt32 * aSrcLength,
>+ PRUnichar * aDest,
>+ PRInt32 * aDestLength)
>+{
>+ if (4 == mState) // Called for the first time.
>+ {
>+ NS_ASSERTION(*aSrcLength >= 4, "Too few bytes in input");
Please change this to
+ if (*aSrcLength < 4)
+ return NS_ERROR_ILLEGAL_INPUT;
as in the current version of nsUCS2BEToUnicode.cpp
>+
>+ // check if BOM (0xFEFF) is at the beginning, remove it if found, and
>+ // set mEndian accordingly.
>+ if (0xFF == PRUint8(aSrc[0]) && 0xFE == PRUint8(aSrc[1]) &&
>+ 0 == PRUint8(aSrc[2]) && 0 == PRUint8(aSrc[3])) {
>+ aSrc += 4;
>+ *aSrcLength -= 4;
>+ mState = 0;
>+ mEndian = kLittleEndian;
mEndian is only used in this method, so it doesn't need to be a member variable.
>Index: netwerk/streamconv/converters/nsUnknownDecoder.cpp
> if (mBufferLen >= 4) {
> const unsigned char* buf = (const unsigned char*)mBuffer;
>- if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16BE
>- (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16LE
>+ if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16, Big Endian
>+ (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16 or UCS-4, Little Endian
> (buf[0] == 0xEF && buf[1] == 0xBB && buf[2] == 0xBF) || // UTF-8
>- (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF) || // UCS-4BE
>- (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFF && buf[3] == 0xFE)) { // UCS-4
>+ (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF)) { // UCS-4, Little Endian
should be "Big Endian"
It would be good to have test-cases for UTF-32 equivalent to test_bug340714.js, test_bug396637.js for UTF-16.
Also test_bug317216.js, which is not checked in yet, but is in attachment 288642 [details] [diff] [review] in the bug.
| Assignee | ||
Comment 5•17 years ago
|
||
> mEndian is only used in this method, so it doesn't need to be a member
> variable.
Really? mEndian will be initialized only when nsUTF32ToUnicode::Convert is called for the first time. How to determine the last argument for ConvertCommon (that is, an endianness) when it is called for the second time (or later)?
Comment 6•17 years ago
|
||
Ah. Yes, my mistake.
| Assignee | ||
Comment 7•17 years ago
|
||
ok. fixed other points.
Attachment #219878 -
Attachment is obsolete: true
Attachment #342124 -
Flags: review?(smontagu)
Attachment #219878 -
Flags: review?(smontagu)
Comment 8•17 years ago
|
||
Comment on attachment 342124 [details] [diff] [review]
unbitrotted, resolved review comments
r+moa=smontagu with 2 nits:
>+++ b/intl/uconv/tests/unit/test_bug335531.js
>@@ -0,0 +1,228 @@
>+/* Test case for bug 335531
>+ *
>+ * Uses nsIConverterInputStream to decode UTF-16 text with all combinations
>+ * of UTF-16BE and UTF-16LE with and without BOM.
>+ *
>+ * Sample text is: "�B���u �����p�����|�y�r���u ���u�}���y ���������w�y �t�����s �~�p �t�����s�p, �{�p�w�t�p�� �~�u�����p�����|�y�r�p�� ���u�}���� �~�u�����p�����|�y�r�p ����-���r���u�}��."
Use UTF-8 for the text in comment.
>+++ b/intl/uconv/tests/unit/test_bug457886.js
This file doesn't belong in this patch.
Attachment #342124 -
Flags: review?(smontagu) → review+
| Assignee | ||
Comment 9•17 years ago
|
||
Attachment #342124 -
Attachment is obsolete: true
Attachment #342236 -
Flags: superreview?(dbaron)
Attachment #342236 -
Flags: review+
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]
sr=dbaron. Sorry for the delay.
Beware of potential conflicts with bug 462458 (although you'll hopefully land first).
Attachment #342236 -
Flags: superreview?(dbaron) → superreview+
Keywords: checkin-needed
Updated•17 years ago
|
Attachment #342236 -
Flags: approval1.9.1b2?
Updated•17 years ago
|
Keywords: checkin-needed
Updated•17 years ago
|
Attachment #342236 -
Flags: approval1.9.1b2?
Attachment #342236 -
Flags: approval1.9.1b2-
Attachment #342236 -
Flags: approval1.9.1?
Comment 11•17 years ago
|
||
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]
a191=beltzner
Attachment #342236 -
Flags: approval1.9.1? → approval1.9.1+
| Assignee | ||
Updated•17 years ago
|
Keywords: checkin-needed
Comment 12•17 years ago
|
||
Comment on attachment 342236 [details] [diff] [review]
fixed nits
[Checked in: Comment 16]
http://hg.mozilla.org/mozilla-central/rev/78d662c2c878
Attachment #342236 -
Attachment description: fixed nits → fixed nits
[Checkin: Comment 12]
Updated•17 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [c-n: baking for 1.9.1]
Target Milestone: --- → mozilla1.9.2a1
Comment 13•17 years ago
|
||
How is this supposed to reach localizations?
Comment 14•17 years ago
|
||
Backed out as
http://hg.mozilla.org/mozilla-central/rev/024fa1c26e34
due to suspicion of causing a mochitest failure in test_bug399284.html
Failure is:
*** 38979 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug399284.html | Error thrown during test: uncaught exception: [Exception... "Operation is not supported" code: "9" nsresult: "0x80530009 (NS_ERROR_DOM_NOT_SUPPORTED_ERR)" location: "http://localhost:8888/tests/layout/base/tests/test_bug399284.html Line: 111"] - got 0, expected 1
Failing test is:
http://mxr.mozilla.org/mozilla-central/source/layout/base/tests/test_bug399284.html
Sample failure logs:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501399.1228505155.7238.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•16 years ago
|
Keywords: checkin-needed
Whiteboard: [c-n: baking for 1.9.1]
Updated•16 years ago
|
Attachment #342236 -
Attachment description: fixed nits
[Checkin: Comment 12] → fixed nits
[Backout: Comment 14]
Comment 15•16 years ago
|
||
It's possible that test_bug339284.html depends on the behaviour that this patch is fixing. I'll look into it.
Comment 16•16 years ago
|
||
Repushed with a fix to test_bug339284.html: http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a
(the problem was that test_bug339284.html iterates over all decoders, but has some special-casing for UTF-16-* and UTF-32-*, so the special-casing had to be extended to cover the new UTF-32 decoder).
Status: REOPENED → RESOLVED
Closed: 17 years ago → 16 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Attachment #342236 -
Attachment description: fixed nits
[Backout: Comment 14] → fixed nits
[Checked in: Comment 16]
Comment 17•16 years ago
|
||
(In reply to comment #16)
> Repushed with a fix to test_bug339284.html:
> http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a
You probably want to attach this new patch and re-set the 1.9.1 whiteboard and keyword stuff...
Updated•16 years ago
|
Whiteboard: [c-n: baking for 1.9.1]
Comment 18•16 years ago
|
||
Transferrinf r+sr; I can't transfer approval1.9.1+
Attachment #342236 -
Attachment is obsolete: true
Attachment #351709 -
Flags: superreview+
Attachment #351709 -
Flags: review+
| Assignee | ||
Comment 19•16 years ago
|
||
(In reply to comment #13)
> How is this supposed to reach localizations?
We will need a localization patch like attachment 328242 [details] [diff] [review].
Comment 20•16 years ago
|
||
Attachment #351709 -
Flags: approval1.9.1?
Comment 21•16 years ago
|
||
Comment on attachment 351709 [details] [diff] [review]
Patch checked in
a191=beltzner
Attachment #351709 -
Flags: approval1.9.1? → approval1.9.1+
Comment 22•16 years ago
|
||
Keywords: fixed1.9.1
Whiteboard: [c-n: baking for 1.9.1]
Comment 23•16 years ago
|
||
What is going on with the l10n side of this bug?
| Assignee | ||
Comment 25•16 years ago
|
||
(Note that charsetTitles.properties also needs to be changed, but I think it's a localizers' job.)
Comment 26•16 years ago
|
||
Comment on attachment 352577 [details] [diff] [review]
l10n patch
Yeah. We should've changed the key to get this change notified to localizers.
Is there an equivalent sed script? Might be easier to announce that than to live with the fallout of cross landing those patches on hg.
On a different note, why do we have this localizable in the first place?
Attachment #352577 -
Flags: review?(l10n) → review+
| Assignee | ||
Comment 27•16 years ago
|
||
(In reply to comment #26)
> Is there an equivalent sed script?
I made this patch using the following sed command. Replace '*' with your language code:
sed 's/UTF-16BE, UTF-32LE/UTF-16BE, UTF-32, UTF-32LE/' */toolkit/chrome/global/intl.properties
> On a different note, why do we have this localizable in the first place?
Good point. I'll file a follw-up bug to sort out the problem.
Comment 28•16 years ago
|
||
Posted in .l10n now.
| Assignee | ||
Comment 29•16 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•