Closed
Bug 562091
Opened 14 years ago
Closed 12 years ago
Make Unicode => EUC-KR converter identical to Unicode => UHC / Windows-949
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla19
People
(Reporter: jshin1987, Assigned: emk)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
27.29 KB,
patch
|
emk
:
review+
|
Details | Diff | Splinter Review |
In the past, we're stricter than other browsers when it comes to EUC-KR. 1. ToUnicode direction, we're as lenient as other browsers in that we accept code points outside the 94x94 grid and treat them as Windows-949 (UHC). Those 2-byte sequences are used for Hangul syllables outside KS X 1001 (total 8,821 of them) in Windows-949 In addition, we can also convert Hangul syllables represented in 8-byte sequences as specified in KS X 1001. 2. FromUnicode direction, when the output encoding is EUC-KR (as opposed to Windows-949), we convert 8,821 Hangul syllables to 8-byte sequences instead of 2-byte sequences used in Windows-949. We can leave alone ToUnicode direction as it is now because there are some web pages containing 8-byte sequences (mainly generated by Firefox users who post to forums in EUC-KR. For instance, mozilla.or.kr has some postings with them). However, FromUnicode direction, I think we have to give up being too strict about EUC-KR especially considering that HTML5 stipulates that EUC-KR be treated synonymously with Windows-949. I see no problem at all with this change in Firefox (and other gecko-based browsers). It might be problematic in some cases for Thunderbird, but I bet it should be ok in the vast majority of cases. Testing with Outlook/Outlook Express and popular web mail services in Korea may be necessary.
Reporter | ||
Comment 1•14 years ago
|
||
HTML5 has this about the issue: http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html --------------- When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility. Character encoding overrides Input encoding Replacement encoding References EUC-KR windows-949 [EUCKR] [WIN949] GB2312 GBK [RFC1345] [GBK] GB_2312-80 GBK [RFC1345] [GBK] ISO-8859-1 windows-1252 [RFC1345] [WIN1252] ISO-8859-9 windows-1254 [RFC1345] [WIN1254] ISO-8859-11 windows-874 [ISO885911] [WIN874] KS_C_5601-1987 windows-949 [RFC1345] [WIN949] Shift_JIS Windows-31J [SHIFTJIS] [WIN31J] TIS-620 windows-874 [TIS620] [WIN874] US-ASCII windows-1252 [RFC1345] [WIN1252] -------------------- I filed bug 562096 for other charsets (I filed EUC-KR vs Windows-949 because it has an additional complication).
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=766886
Comment 4•12 years ago
|
||
Comment on attachment 680396 [details] [diff] [review] Remove the EUC-KR conveter and rename x-windows-949 to EUC-KR Review of attachment 680396 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/plugins/base/nsPluginInstanceOwner.cpp @@ +991,5 @@ > {"x-mac-icelandic", "MacIceland"}, > {"macintosh", "MacRoman"}, > {"x-mac-romanian", "MacRomania"}, > {"x-mac-ukrainian", "MacUkraine"}, > + {"Shift_JIS", "MS932"}, This looks like part of another patch
Attachment #680396 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 5•12 years ago
|
||
patch for checkin
Attachment #680396 -
Attachment is obsolete: true
Attachment #681005 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 6•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd7a0ace6b0e
Keywords: checkin-needed
Comment 7•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/fd7a0ace6b0e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Comment 8•12 years ago
|
||
http://mxr.mozilla.org/l10n-mozilla-aurora/search?string=x-windows-949 shows some usage of x-windows-949 outside of charsetTitles.properties, should we get bugs filed on replacing those?
Assignee | ||
Comment 9•12 years ago
|
||
Sure. x-windows-949 should be removed from intl.charsetmenu.browser.more3 and it should be replaced into EUC-KR in ko searchplugins.
Assignee | ||
Comment 10•12 years ago
|
||
Actually intl.charsetmenu.browser.more* should be just removed entirely because those properties are no longer localizable.
Assignee | ||
Comment 11•12 years ago
|
||
Filed bug 812027 for ko searchplugins. intl.charsetmenu.browser.more* would have a low priority because the garbage would be harmless.
Comment 12•12 years ago
|
||
Thanks, agreed.
You need to log in
before you can comment on or make changes to this bug.
Description
•