Closed Bug 19081 Opened 21 years ago Closed 10 years ago

ISO-2022-JP conversion bugs for single byte katakana

Categories

(MailNews Core :: Internationalization, defect, P4)

Tracking

(Not tracked)

RESOLVED WORKSFORME
Future

People

(Reporter: ji, Assigned: m_kato)

References

Details

(Keywords: intl)

Attachments

(4 files)

Build: 1999111709-M12
OS: RH6.0

Send out a mail with half-width katakana in the subject and message body.
When receiving it, the halt-width atakana disappears in the thread pane
and in the message pane the string following the half-width katakana
disappears as well. The same message displays alright using 4.7.

Steps of reproduce:
1. Open mail compose window and select menu "View | Charset | Japanese (
iso-2022-jp)"
2. Enter "カタカナカタカナ漢字" in the subject and message body, send the mail
to the testing account itself.
3. When receiving the mail, you'll see in the thread pane the subject displays
as
   カタカナ漢字 

and in the message pane,  both in the subject and message body only
  カタカナ displays.

Below is the "view source" with 4.7:
o: iqax1@netscape.com
Subject: =?ISO-2022-JP?B?GyRCJSslPyUrJUobKEq2wLbFGyRCNEE7ehsoQg==?=
Content-Type: text/html; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

<html><head></head>
<body>カタカナカタカナ漢字</body>
</html>
Status: NEW → ASSIGNED
Depends on: 5894
Set 5894 as depend bug. It should eventually be converted to full-width.
But currently they should turned to question marks (unmapped chars) instead of
removed.
I think we have ISO-2022-JP encoder/decoder problems for Hankaku Katakana.

Here's what I got on sending the string (view this under Japanese (Auto-Detect)):

ハンカクハンカク半角

in the Subject header and body.

1. Sent from 11/15/99 Linux M11 build:

-------------------------------------
Subject: =?ISO-2022-JP?B?GyRCJU8lcyUrJS8bKErK3ba4GyRCSD4zURsoQg==?=
Content-Type: text/plain; charset=ISO-2022-JP; format=flowed
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by netscape.com id UAA15287

=1B$B%O%s%+%/=1B(J=CA=DD=B6=B8=1B$BH>3Q=1B(B
-------------------------------------

2. Sent from 4.71 -- 11/15/99 Win32 build:

-------------------------------------
Subject: =?iso-2022-jp?B?GyRCJU8lcyUrJS8bKElKXTY4GyRCSD4zURsoQg==?=
Content-Type: text/html; charset=iso-2022-jp
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html>
-------------------------------------

(Note: You can use a B64 decoder found here: http://kaze:8000/tools/base64decode.html)

The important fact is that 5.0 is not removing Hankaku data, it is just encoding it wrong.
Note that 4.71 is generating "<ESC>(I" for 7-bit Hankaku but 5.0 is incorrectly
generating  "<ESC>(B" and then at the same time generating 8-bit JIS (correctly).
(Note that Messaging server QP'ed it because 8-bit data were in the mail.)

This is why Mozilla cannot display some part of the string sent from Mozilla.

So we have a few problems.

1. 5.0 cannot display its own Hankaku generation but can display the one sent from 4.71.
   On the other hand, 4.71 can display both types correctly.

    --> This indicates we don't have a good JIS decoder which can tolerate these
          variations in Hankaku encodings.

         In particular we need to be able to decode, "<ESC>(I", JIS7 Hankaku, and JIS8 Hankaku.
         Also tolerate a mistake such as "<ESC>(B" followed by JIS7 Hankaku code points just as
         4.x does.

2. We need to be able to encode properly using "<ESC>(I" in case we offer Hankaku-send option
   via prefs.js. This is the only encoding we should use in case we allow Hankaku in send.

I think we should fix these problems pretty soon, particularly the decoding problem and
improvement and tolerance in decoding.
I need to make clear that the data string sent from 4.71:

<ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html>

should be represented more accurately as:

<ESC>$B%O%s%+%/<ESC>(IJ]68<ESC>$BH>3Q<ESC>(B</html>

  where I use <ESC> in place of the escape character which may not displayed
  dinstinctly enough under Japanese encoding.

By the way the B-encoded Subject header by Mozilla is identical to the text data
when it is decoded.
I probably should clarify my position on display. I think we should display Hankaku in text body as the data
contains Hankaku. 4.x does displays Zenkaku in the headers even though the real data are in Hankaku, but it displays
Hankaku in the text body. I'm not sure why the headers don't use Hankaku but we might want to copy
this behavior.

On send, Hankaku -> Zenkaku should be the default, but with a prefs.js option (no UI) to send Hankaku if needed.
I notice that there are a few other bugs filed earlier for Hankaku problems. Are the
problems 1 & 2 covered in these other bugs? It was my understainding that decoding
with some tolerance should be working now.
Assignee: nhotta → cata
Status: ASSIGNED → NEW
I think the decoder problem should be a separate bug.
We need to provide an easy reproducible case to cata.
Momoi san, could you create ISO-2022-JP string and attach to the bug?
Use hankaku "aiueo" (\uFF71\uFF72\uFF73\uFF74\uFF75), I couldn't do that because
4.x editor converts hankaku to zenkaku when saving.
Attached a .txt file which contains "a-i-u-e-o" as described
by nhotta encoded in ISO-2022-JP with the escape sequence
ESC(I followed by 7-bit JIS. This is how we should encode
Hankaku Katakana.
I guess I messed up the MIME type. Please change the extension
from .cgi to .txt when saving it.
Target Milestone: M13
Should this have [BETA] in the summary?
Target Milestone: M13 → M14
Change platform and OS to ALL
OS: Linux → All
Hardware: Other → All
Target Milestone: M14 → M15
I think I should own this, I rewrote the ISO-2022-JP decoder. Probably a bug in 
there.
Assignee: cata → ftang
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
the attached test cases render correctly under ISO-2022-JP now. Mark this fixed.
Hi, ji, can you check this out on Linux, Windows,and Mac?
QA Contact: momoi → ji
I checked today's builds. On linux and windows, the half-width char
 is converted to corresponding full-width char both in thread pane and 
message view pane when sent out using ISO-2022-JP. And on mac, the half-width 
char is not converted to full-width and is displayed just as it is both in
thread pane and message view pane. It looks like there are still some problems
here.
Is this display problem on Mac or sending problem. Will you see the half-width 
kana in Mac browser when your page is in full-width kana  ?
It seems a sending problem.
When sending a mail containing half-width katakana from mac, the half-width 
katakan is not converted to full-width when received on mac or windows.
I also can see this with 4.x on mac.
Retested again on mac. The half-width katakana is converted to full-width.
I might have done something wrong last time when I tested mac version.
So now all three platforms can convert the half-width katakanas to full-width.
There are two issues left related to half-width katakana:
1.Change the pref file to be able to send out half-width katakana, as we do with 4.X.
  Are we going to keep this feature for 5.0?
2.We may need to be good enough to display half-width katakanas sent from the other mail clients.
The second issue, I agree that we want to display them correctly. 
The pref option, we can use it to test hankaku mail display.
mark it fix. The origional problem is now gone. IQA now need to developer more 
test cases for different pref setting. 
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
ftang, what do you want to do about Mozilla not being
able to display the Hankaku string it sends out?
What Hankaku encoding shoudl we do? 4.7x can display what
Mozilla sends out but Mozilla cannot. 
By the way, Mozilla browser also cannot display ISO-2022-JP 
saved Hankaku characters created by its own composer.
Can we do something better? Hankaku may not be used in
Mail all that often but we should be able to display 
them in Browser.
We need to deal with this, I think, either in this bug or
in another bug.
I'm re-opening this bug and will attach a test message,
which displays OK undrer 4.72 but not under Mozilla even though
it was created by Mozilla.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The bug was marked as FIXED because the attachment (posted on 11/19) displayed 
correctly. It has ESC(I for hankaku.
The latest attachment (posted on 3/8) has ESC(B for hankaku. I don't think we 
should support displaying hankaku with ESC(B.
So this is a problem of sending (converting to ISO-2022-JP from unicode). We 
should generate ESC(I instead of ESC(B.
This is not a mail specific problem but usually ISO-2022-JP is used for mail and 
in mozilla it's only used when the pref (no UI) is on explicitly.
It looks like the nsIUnicodeEncoder have problem. It generate 8-bits data in the 
ESC ( B seq... bad bad behavior.
Is the remaining problem a beta1 stoper. If so, please put beta1 to the keyword.
Status: REOPENED → ASSIGNED
After discussing this with ftang, we decided to do the following:

1. There should be a bug about correctly generating "<ESC>(I" for 
   7-bit Hankaku.

2. There should be a bug about interpreting different Hankaku
   escape sequences. momoi will investigate and file a
   separate bug on it.

About #1, this bug was from the beginning about Mozilla not 
generating the correct escape sequence. ji said it and I said it.
I provided the exmple from 4.71 only to show you what should
be generated, not to use it as a test case to see if Mozilla
can display it. Mozilla can display "<ESC>(I" Hankaku even
before this bug was filed.

So, I think we should keep this bug for item #1 above.
I'll file a new one for #2.
M16
Target Milestone: M15 → M16
change the summary from "Wrong handling about Japanese half-width katakana in 
the subject and message body" to "ISO-2022-JP conversion bugs for single byte 
katan"
Summary: Wrong handling about Japanese half-width katakana in the subject and message body → ISO-2022-JP conversion bugs for single byte katan
Keywords: beta2
reassign to bobj
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Reassigned to nhotta.  Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17
Status: NEW → ASSIGNED
Keywords: nsbeta2
Keywords: beta2
Whiteboard: [nsbeta2+]
For mail/news, single byte katakana is only sent when a backend only pref is 
set. So I don't think this is critical for beta2 for mail/news.
Is this bug for mail/news only or reproducible for form submission?
Summary: ISO-2022-JP conversion bugs for single byte katan → ISO-2022-JP conversion bugs for single byte katakana
base on the assumption that we won't provide pref ui to turn that pref on, we 
should not mark this beta2. 
Keywords: nsbeta2
Whiteboard: [nsbeta2+]
Target Milestone: M17 → M28
Keywords: intl
Priority: P3 → P4
Target Milestone: --- → Future
Product: MailNews → Core
Attachment #2971 - Attachment mime type: text/plain → text/plain; charset=iso-2022-jp
Product: Core → MailNews Core
QA Contact: ji → i18n
Not reproduce on 3.1.  It is converted half-width katakana to full-width.
So I resolved by WORKSFORME.
Assignee: nhottanscp → m_kato
Status: ASSIGNED → RESOLVED
Closed: 21 years ago10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.