Closed Bug 19081 Opened 25 years ago Closed 15 years ago

ISO-2022-JP conversion bugs for single byte katakana

Categories

(MailNews Core :: Internationalization, defect, P4)

Tracking

(Not tracked)

RESOLVED WORKSFORME
Future

People

(Reporter: ji, Assigned: m_kato)

References

Details

(Keywords: intl)

Attachments

(4 files)

Build: 1999111709-M12 OS: RH6.0 Send out a mail with half-width katakana in the subject and message body. When receiving it, the halt-width atakana disappears in the thread pane and in the message pane the string following the half-width katakana disappears as well. The same message displays alright using 4.7. Steps of reproduce: 1. Open mail compose window and select menu "View | Charset | Japanese ( iso-2022-jp)" 2. Enter "カタカナカタカナ漢字" in the subject and message body, send the mail to the testing account itself. 3. When receiving the mail, you'll see in the thread pane the subject displays as カタカナ漢字  and in the message pane, both in the subject and message body only カタカナ displays. Below is the "view source" with 4.7: o: iqax1@netscape.com Subject: =?ISO-2022-JP?B?GyRCJSslPyUrJUobKEq2wLbFGyRCNEE7ehsoQg==?= Content-Type: text/html; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit <html><head></head> <body>カタカナカタカナ漢字</body> </html>
Status: NEW → ASSIGNED
Depends on: 5894
Set 5894 as depend bug. It should eventually be converted to full-width. But currently they should turned to question marks (unmapped chars) instead of removed.
I think we have ISO-2022-JP encoder/decoder problems for Hankaku Katakana. Here's what I got on sending the string (view this under Japanese (Auto-Detect)): ハンカクハンカク半角 in the Subject header and body. 1. Sent from 11/15/99 Linux M11 build: ------------------------------------- Subject: =?ISO-2022-JP?B?GyRCJU8lcyUrJS8bKErK3ba4GyRCSD4zURsoQg==?= Content-Type: text/plain; charset=ISO-2022-JP; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by netscape.com id UAA15287 =1B$B%O%s%+%/=1B(J=CA=DD=B6=B8=1B$BH>3Q=1B(B ------------------------------------- 2. Sent from 4.71 -- 11/15/99 Win32 build: ------------------------------------- Subject: =?iso-2022-jp?B?GyRCJU8lcyUrJS8bKElKXTY4GyRCSD4zURsoQg==?= Content-Type: text/html; charset=iso-2022-jp Content-Transfer-Encoding: 7bit <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html> ------------------------------------- (Note: You can use a B64 decoder found here: http://kaze:8000/tools/base64decode.html) The important fact is that 5.0 is not removing Hankaku data, it is just encoding it wrong. Note that 4.71 is generating "<ESC>(I" for 7-bit Hankaku but 5.0 is incorrectly generating "<ESC>(B" and then at the same time generating 8-bit JIS (correctly). (Note that Messaging server QP'ed it because 8-bit data were in the mail.) This is why Mozilla cannot display some part of the string sent from Mozilla. So we have a few problems. 1. 5.0 cannot display its own Hankaku generation but can display the one sent from 4.71. On the other hand, 4.71 can display both types correctly. --> This indicates we don't have a good JIS decoder which can tolerate these variations in Hankaku encodings. In particular we need to be able to decode, "<ESC>(I", JIS7 Hankaku, and JIS8 Hankaku. Also tolerate a mistake such as "<ESC>(B" followed by JIS7 Hankaku code points just as 4.x does. 2. We need to be able to encode properly using "<ESC>(I" in case we offer Hankaku-send option via prefs.js. This is the only encoding we should use in case we allow Hankaku in send. I think we should fix these problems pretty soon, particularly the decoding problem and improvement and tolerance in decoding.
I need to make clear that the data string sent from 4.71: <ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html> should be represented more accurately as: <ESC>$B%O%s%+%/<ESC>(IJ]68<ESC>$BH>3Q<ESC>(B</html> where I use <ESC> in place of the escape character which may not displayed dinstinctly enough under Japanese encoding. By the way the B-encoded Subject header by Mozilla is identical to the text data when it is decoded.
I probably should clarify my position on display. I think we should display Hankaku in text body as the data contains Hankaku. 4.x does displays Zenkaku in the headers even though the real data are in Hankaku, but it displays Hankaku in the text body. I'm not sure why the headers don't use Hankaku but we might want to copy this behavior. On send, Hankaku -> Zenkaku should be the default, but with a prefs.js option (no UI) to send Hankaku if needed.
I notice that there are a few other bugs filed earlier for Hankaku problems. Are the problems 1 & 2 covered in these other bugs? It was my understainding that decoding with some tolerance should be working now.
Assignee: nhotta → cata
Status: ASSIGNED → NEW
I think the decoder problem should be a separate bug. We need to provide an easy reproducible case to cata. Momoi san, could you create ISO-2022-JP string and attach to the bug? Use hankaku "aiueo" (\uFF71\uFF72\uFF73\uFF74\uFF75), I couldn't do that because 4.x editor converts hankaku to zenkaku when saving.
Attached a .txt file which contains "a-i-u-e-o" as described by nhotta encoded in ISO-2022-JP with the escape sequence ESC(I followed by 7-bit JIS. This is how we should encode Hankaku Katakana.
I guess I messed up the MIME type. Please change the extension from .cgi to .txt when saving it.
Target Milestone: M13
Should this have [BETA] in the summary?
Target Milestone: M13 → M14
Change platform and OS to ALL
OS: Linux → All
Hardware: Other → All
Target Milestone: M14 → M15
I think I should own this, I rewrote the ISO-2022-JP decoder. Probably a bug in there.
Assignee: cata → ftang
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
the attached test cases render correctly under ISO-2022-JP now. Mark this fixed.
Hi, ji, can you check this out on Linux, Windows,and Mac?
QA Contact: momoi → ji
I checked today's builds. On linux and windows, the half-width char is converted to corresponding full-width char both in thread pane and message view pane when sent out using ISO-2022-JP. And on mac, the half-width char is not converted to full-width and is displayed just as it is both in thread pane and message view pane. It looks like there are still some problems here.
Is this display problem on Mac or sending problem. Will you see the half-width kana in Mac browser when your page is in full-width kana ?
It seems a sending problem. When sending a mail containing half-width katakana from mac, the half-width katakan is not converted to full-width when received on mac or windows. I also can see this with 4.x on mac.
Retested again on mac. The half-width katakana is converted to full-width. I might have done something wrong last time when I tested mac version. So now all three platforms can convert the half-width katakanas to full-width. There are two issues left related to half-width katakana: 1.Change the pref file to be able to send out half-width katakana, as we do with 4.X. Are we going to keep this feature for 5.0? 2.We may need to be good enough to display half-width katakanas sent from the other mail clients.
The second issue, I agree that we want to display them correctly. The pref option, we can use it to test hankaku mail display.
mark it fix. The origional problem is now gone. IQA now need to developer more test cases for different pref setting.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
ftang, what do you want to do about Mozilla not being able to display the Hankaku string it sends out? What Hankaku encoding shoudl we do? 4.7x can display what Mozilla sends out but Mozilla cannot. By the way, Mozilla browser also cannot display ISO-2022-JP saved Hankaku characters created by its own composer. Can we do something better? Hankaku may not be used in Mail all that often but we should be able to display them in Browser. We need to deal with this, I think, either in this bug or in another bug. I'm re-opening this bug and will attach a test message, which displays OK undrer 4.72 but not under Mozilla even though it was created by Mozilla.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The bug was marked as FIXED because the attachment (posted on 11/19) displayed correctly. It has ESC(I for hankaku. The latest attachment (posted on 3/8) has ESC(B for hankaku. I don't think we should support displaying hankaku with ESC(B. So this is a problem of sending (converting to ISO-2022-JP from unicode). We should generate ESC(I instead of ESC(B. This is not a mail specific problem but usually ISO-2022-JP is used for mail and in mozilla it's only used when the pref (no UI) is on explicitly.
It looks like the nsIUnicodeEncoder have problem. It generate 8-bits data in the ESC ( B seq... bad bad behavior. Is the remaining problem a beta1 stoper. If so, please put beta1 to the keyword.
Status: REOPENED → ASSIGNED
After discussing this with ftang, we decided to do the following: 1. There should be a bug about correctly generating "<ESC>(I" for 7-bit Hankaku. 2. There should be a bug about interpreting different Hankaku escape sequences. momoi will investigate and file a separate bug on it. About #1, this bug was from the beginning about Mozilla not generating the correct escape sequence. ji said it and I said it. I provided the exmple from 4.71 only to show you what should be generated, not to use it as a test case to see if Mozilla can display it. Mozilla can display "<ESC>(I" Hankaku even before this bug was filed. So, I think we should keep this bug for item #1 above. I'll file a new one for #2.
M16
Target Milestone: M15 → M16
change the summary from "Wrong handling about Japanese half-width katakana in the subject and message body" to "ISO-2022-JP conversion bugs for single byte katan"
Summary: Wrong handling about Japanese half-width katakana in the subject and message body → ISO-2022-JP conversion bugs for single byte katan
Keywords: beta2
reassign to bobj
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Reassigned to nhotta. Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17
Status: NEW → ASSIGNED
Keywords: nsbeta2
Keywords: beta2
Whiteboard: [nsbeta2+]
For mail/news, single byte katakana is only sent when a backend only pref is set. So I don't think this is critical for beta2 for mail/news. Is this bug for mail/news only or reproducible for form submission?
Summary: ISO-2022-JP conversion bugs for single byte katan → ISO-2022-JP conversion bugs for single byte katakana
base on the assumption that we won't provide pref ui to turn that pref on, we should not mark this beta2.
Keywords: nsbeta2
Whiteboard: [nsbeta2+]
Target Milestone: M17 → M28
Keywords: intl
Priority: P3 → P4
Target Milestone: --- → Future
Product: MailNews → Core
Attachment #2971 - Attachment mime type: text/plain → text/plain; charset=iso-2022-jp
Product: Core → MailNews Core
QA Contact: ji → i18n
Not reproduce on 3.1. It is converted half-width katakana to full-width. So I resolved by WORKSFORME.
Assignee: nhottanscp → m_kato
Status: ASSIGNED → RESOLVED
Closed: 25 years ago15 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: