Closed Bug 335531 Opened 19 years ago Closed 16 years ago

Correct misuse of "UTF-16BE", "UTF-16LE", "UTF-32BE", and "UTF-32LE" charset labels

Categories

(Core :: Internationalization, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: emk, Assigned: emk)

References

Details

(Keywords: fixed1.9.1, late-l10n)

Attachments

(2 files, 3 obsolete files)

These meanings are clearly defined in RFC 2781[1] and Unicode 4.0[2]. However some Mozilla codes don't obey the rule. [1] http://www.ietf.org/rfc/rfc2781.txt [2] http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G28070
Attached patch Patch rv 1.0 (obsolete) — Splinter Review
This patch does the following things: 1. Adding "UTF-32" converter (neither "UTF-32BE" nor "UTF-32LE" is equivalent to "UTF-32"). 2. Use "UTF-16" or "UTF-32" when endianness is determined by BOM. 3. Renaming charset titles.
Assignee: smontagu → VYV03354
Status: NEW → ASSIGNED
Attachment #219878 - Flags: review?(jshin1987)
Masatoshi should this review also be switched?
QA Contact: amyy → i18n
Attachment #219878 - Flags: review?(jshin1987) → review?(smontagu)
Comment on attachment 219878 [details] [diff] [review] Patch rv 1.0 Thanks for reminding me, Wayne. Switching the reviewer.
Comment on attachment 219878 [details] [diff] [review] Patch rv 1.0 >Index: intl/uconv/ucvlatin/nsUTF32ToUnicode.cpp >+NS_IMETHODIMP nsUTF32ToUnicode::Convert(const char * aSrc, >+ PRInt32 * aSrcLength, >+ PRUnichar * aDest, >+ PRInt32 * aDestLength) >+{ >+ if (4 == mState) // Called for the first time. >+ { >+ NS_ASSERTION(*aSrcLength >= 4, "Too few bytes in input"); Please change this to + if (*aSrcLength < 4) + return NS_ERROR_ILLEGAL_INPUT; as in the current version of nsUCS2BEToUnicode.cpp >+ >+ // check if BOM (0xFEFF) is at the beginning, remove it if found, and >+ // set mEndian accordingly. >+ if (0xFF == PRUint8(aSrc[0]) && 0xFE == PRUint8(aSrc[1]) && >+ 0 == PRUint8(aSrc[2]) && 0 == PRUint8(aSrc[3])) { >+ aSrc += 4; >+ *aSrcLength -= 4; >+ mState = 0; >+ mEndian = kLittleEndian; mEndian is only used in this method, so it doesn't need to be a member variable. >Index: netwerk/streamconv/converters/nsUnknownDecoder.cpp > if (mBufferLen >= 4) { > const unsigned char* buf = (const unsigned char*)mBuffer; >- if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16BE >- (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16LE >+ if ((buf[0] == 0xFE && buf[1] == 0xFF) || // UTF-16, Big Endian >+ (buf[0] == 0xFF && buf[1] == 0xFE) || // UTF-16 or UCS-4, Little Endian > (buf[0] == 0xEF && buf[1] == 0xBB && buf[2] == 0xBF) || // UTF-8 >- (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF) || // UCS-4BE >- (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFF && buf[3] == 0xFE)) { // UCS-4 >+ (buf[0] == 0 && buf[1] == 0 && buf[2] == 0xFE && buf[3] == 0xFF)) { // UCS-4, Little Endian should be "Big Endian" It would be good to have test-cases for UTF-32 equivalent to test_bug340714.js, test_bug396637.js for UTF-16. Also test_bug317216.js, which is not checked in yet, but is in attachment 288642 [details] [diff] [review] in the bug.
> mEndian is only used in this method, so it doesn't need to be a member > variable. Really? mEndian will be initialized only when nsUTF32ToUnicode::Convert is called for the first time. How to determine the last argument for ConvertCommon (that is, an endianness) when it is called for the second time (or later)?
Ah. Yes, my mistake.
ok. fixed other points.
Attachment #219878 - Attachment is obsolete: true
Attachment #342124 - Flags: review?(smontagu)
Attachment #219878 - Flags: review?(smontagu)
Comment on attachment 342124 [details] [diff] [review] unbitrotted, resolved review comments r+moa=smontagu with 2 nits: >+++ b/intl/uconv/tests/unit/test_bug335531.js >@@ -0,0 +1,228 @@ >+/* Test case for bug 335531 >+ * >+ * Uses nsIConverterInputStream to decode UTF-16 text with all combinations >+ * of UTF-16BE and UTF-16LE with and without BOM. >+ * >+ * Sample text is: "�B���u �����p�����|�y�r���u ���u�}���y ���������w�y �t�����s �~�p �t�����s�p, �{�p�w�t�p�� �~�u�����p�����|�y�r�p�� ���u�}���� �~�u�����p�����|�y�r�p ����-���r���u�}��." Use UTF-8 for the text in comment. >+++ b/intl/uconv/tests/unit/test_bug457886.js This file doesn't belong in this patch.
Attachment #342124 - Flags: review?(smontagu) → review+
Attachment #342124 - Attachment is obsolete: true
Attachment #342236 - Flags: superreview?(dbaron)
Attachment #342236 - Flags: review+
Comment on attachment 342236 [details] [diff] [review] fixed nits [Checked in: Comment 16] sr=dbaron. Sorry for the delay. Beware of potential conflicts with bug 462458 (although you'll hopefully land first).
Attachment #342236 - Flags: superreview?(dbaron) → superreview+
Attachment #342236 - Flags: approval1.9.1b2?
Keywords: checkin-needed
Attachment #342236 - Flags: approval1.9.1b2?
Attachment #342236 - Flags: approval1.9.1b2-
Attachment #342236 - Flags: approval1.9.1?
Comment on attachment 342236 [details] [diff] [review] fixed nits [Checked in: Comment 16] a191=beltzner
Attachment #342236 - Flags: approval1.9.1? → approval1.9.1+
Keywords: checkin-needed
Attachment #342236 - Attachment description: fixed nits → fixed nits [Checkin: Comment 12]
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [c-n: baking for 1.9.1]
Target Milestone: --- → mozilla1.9.2a1
How is this supposed to reach localizations?
Backed out as http://hg.mozilla.org/mozilla-central/rev/024fa1c26e34 due to suspicion of causing a mochitest failure in test_bug399284.html Failure is: *** 38979 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug399284.html | Error thrown during test: uncaught exception: [Exception... "Operation is not supported" code: "9" nsresult: "0x80530009 (NS_ERROR_DOM_NOT_SUPPORTED_ERR)" location: "http://localhost:8888/tests/layout/base/tests/test_bug399284.html Line: 111"] - got 0, expected 1 Failing test is: http://mxr.mozilla.org/mozilla-central/source/layout/base/tests/test_bug399284.html Sample failure logs: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501399.1228505155.7238.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228501326.1228505807.8492.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Keywords: checkin-needed
Whiteboard: [c-n: baking for 1.9.1]
Attachment #342236 - Attachment description: fixed nits [Checkin: Comment 12] → fixed nits [Backout: Comment 14]
It's possible that test_bug339284.html depends on the behaviour that this patch is fixing. I'll look into it.
Repushed with a fix to test_bug339284.html: http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a (the problem was that test_bug339284.html iterates over all decoders, but has some special-casing for UTF-16-* and UTF-32-*, so the special-casing had to be extended to cover the new UTF-32 decoder).
Status: REOPENED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
Attachment #342236 - Attachment description: fixed nits [Backout: Comment 14] → fixed nits [Checked in: Comment 16]
(In reply to comment #16) > Repushed with a fix to test_bug339284.html: > http://hg.mozilla.org/mozilla-central/rev/557ccb3ff23a You probably want to attach this new patch and re-set the 1.9.1 whiteboard and keyword stuff...
Whiteboard: [c-n: baking for 1.9.1]
Attached patch Patch checked inSplinter Review
Transferrinf r+sr; I can't transfer approval1.9.1+
Attachment #342236 - Attachment is obsolete: true
Attachment #351709 - Flags: superreview+
Attachment #351709 - Flags: review+
(In reply to comment #13) > How is this supposed to reach localizations? We will need a localization patch like attachment 328242 [details] [diff] [review].
Comment on attachment 351709 [details] [diff] [review] Patch checked in a191=beltzner
Attachment #351709 - Flags: approval1.9.1? → approval1.9.1+
What is going on with the l10n side of this bug?
Attached patch l10n patchSplinter Review
like this?
Attachment #352577 - Flags: review?(l10n)
(Note that charsetTitles.properties also needs to be changed, but I think it's a localizers' job.)
Comment on attachment 352577 [details] [diff] [review] l10n patch Yeah. We should've changed the key to get this change notified to localizers. Is there an equivalent sed script? Might be easier to announce that than to live with the fallout of cross landing those patches on hg. On a different note, why do we have this localizable in the first place?
Attachment #352577 - Flags: review?(l10n) → review+
(In reply to comment #26) > Is there an equivalent sed script? I made this patch using the following sed command. Replace '*' with your language code: sed 's/UTF-16BE, UTF-32LE/UTF-16BE, UTF-32, UTF-32LE/' */toolkit/chrome/global/intl.properties > On a different note, why do we have this localizable in the first place? Good point. I'll file a follw-up bug to sort out the problem.
Posted in .l10n now.
(In reply to comment #27) > I'll file a follw-up bug to sort out the problem. Filed bug 470209.
Depends on: 809934
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: