Closed Bug 563283 Opened 10 years ago Closed 9 years ago

Perform Hankaku to Zenkaku conversion in ISO-2022-JP encoder

Categories

(Core :: Internationalization, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: smontagu, Assigned: smontagu)

References

Details

Attachments

(2 files, 2 obsolete files)

From bug 559489 comment 10:
I propose incorporating Hankaku-to-Zenkaku conversion into ISO-2022-JP encoder
(nsUnicodeToISO2022JP).
Rationales:
* Other browsers (at least Internet Explorer, Chrome, Safari for Win, and Opera
for Win) already support the conversion.
* Registration of CP50220 is in progress. It mandates the conversion on
sending.
  http://mail.apps.ietf.org/ietf/charsets/msg01882.html
  This charset is meant to be used as a replacement encoding for HTML5
"misinterpret for compatibility".
Attached patch Patch (obsolete) — Splinter Review
Attachment #448598 - Flags: review?(VYV03354)
Attached patch Patch v.2 (obsolete) — Splinter Review
Comment-only change from the previous attachment
Attachment #448602 - Flags: review?(VYV03354)
Attachment #448598 - Attachment is obsolete: true
Attachment #448598 - Flags: review?(VYV03354)
Comment on attachment 448602 [details] [diff] [review]
Patch v.2

>   while (src < srcEnd) {
>+    if (IS_HANKAKU(*src)) {
>+      bcr = srcEnd - src;
>+      bcw = destEnd - dest;
>+      res = ConvertHankaku(src, &bcr, dest, &bcw);
>+      dest += bcw;
>+      src += bcr;
>+      if (res != NS_OK) {
>+        break;
>+      }
>+    }
>+
>     for (i=0; i< SIZE_OF_TABLES ; i++) {
Could you move IS_HANKAKU check after the table search? Hankaku-kana becomes less and less used. This will have a negative impact on encoder's performance.
Attached patch Patch v.3Splinter Review
Fixed that, added some more tests, and fixed a bug which the tests exposed.
Attachment #448602 - Attachment is obsolete: true
Attachment #448809 - Flags: review?(VYV03354)
Attachment #448602 - Flags: review?(VYV03354)
Comment on attachment 448809 [details] [diff] [review]
Patch v.3

>+#define JIS_X_201_INDEX 2
JIS_X_0208_INDEX

>--- a/intl/unicharutil/src/nsHankakuToZenkaku.cpp
>+++ /dev/null
Please keep this until bug 559489 is fixed. Otherwise comm-central will break.
I just noticed that comm-central uses a hidden pref (mailnews.send_hankaku_kana) to disable hankaku conversion. Do you think we need to go on honoring this? If so I need to add code for it to the patch.
Attachment #450423 - Flags: review?(bugzilla)
(In reply to comment #6)
> I just noticed that comm-central uses a hidden pref
> (mailnews.send_hankaku_kana) to disable hankaku conversion. Do you think we
> need to go on honoring this? If so I need to add code for it to the patch.
The pref was broken by bug 49262 anyway. Now, nsUnicodeToISO2022JP does will convert hankaku kana to NCR which is useless in plain-text mail. I don't think it's worth keeping this rarely used, untested feature.
Comment on attachment 448809 [details] [diff] [review]
Patch v.3

Assuming JIS_X_201_INDEX is renamed to JIS_X_0208_INDEX.
Attachment #448809 - Flags: review?(VYV03354) → review+
Comment on attachment 450423 [details] [diff] [review]
comm-central patch

So how can I test that comm-central still works with both these patches applied? (sorry, I'm not that up on charset conversions).

Also, as you're removing the code that tries to support  mailnews.send_hankaku_kana then you should also remove it from mailnews.js.
1. Compose a new mail.
2. Change charset to ISO-2022-JP.
3. Type Hankaku kana (or copy & paste from test_bug563283.js) on compose window.
4. Save the mail as a draft.
5. Open the draft again.
Hankaku characters should be converted to corresponding Zenkaku characters if the conversion still works.
In case it's not obvious, if testing by copy and paste from test_bug563283.js, you can check the results by comparing the strings in the comments with the strings marked "equivalent to" below each one.

Also be careful not to copy the line below the comment "Hankaku preceded and followed by unencodable characters". If this is in a ISO-2022-JP message it will be silently re-encoded to UTF-8.

(In reply to comment #10)
> Also, as you're removing the code that tries to support 
> mailnews.send_hankaku_kana then you should also remove it from mailnews.js.

:S I don't see mailnews.send_hankaku_kana anywhere in mailnews.js
Comment on attachment 450423 [details] [diff] [review]
comm-central patch

That seems to work nicely. Thanks for the patch.
Attachment #450423 - Flags: superreview+
Attachment #450423 - Flags: review?(bugzilla)
Attachment #450423 - Flags: review+
http://hg.mozilla.org/comm-central/rev/c76d6e234cc5
http://hg.mozilla.org/mozilla-central/rev/596c193a9b67
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.