Closed Bug 19081 Opened 25 years ago Closed 15 years ago

ISO-2022-JP conversion bugs for single byte katakana

Tracking

(Not tracked)

Status:

RESOLVED WORKSFORME

Milestone:

Future

People

(Reporter: ji, Assigned: m_kato)

References

Details

(Keywords: intl)

Attachments

(4 files)

ISO-20222-JP encoded Hankaku sample 25 years ago Katsuhiko Momoi 11 bytes, text/plain; charset=iso-2022-jp		Details
Mailbox file containing one message sent by 3/7/2000 build witrh Hankaku option ON. 25 years ago Katsuhiko Momoi 955 bytes, text/plain		Details
Image of 4.72 showing a message from Mozilla mail. Shows all the characters correctly. 25 years ago Katsuhiko Momoi 62.36 KB, image/jpeg		Details
The same msg as above not showing correctly in the header and text body cut off in the middle under Mozilla. 25 years ago Katsuhiko Momoi 64.83 KB, image/jpeg		Details

Reporter

Description

•

25 years ago

Build: 1999111709-M12 OS: RH6.0 Send out a mail with half-width katakana in the subject and message body. When receiving it, the halt-width atakana disappears in the thread pane and in the message pane the string following the half-width katakana disappears as well. The same message displays alright using 4.7. Steps of reproduce: 1. Open mail compose window and select menu "View | Charset | Japanese ( iso-2022-jp)" 2. Enter "カタカナｶﾀｶﾅ漢字" in the subject and message body, send the mail to the testing account itself. 3. When receiving the mail, you'll see in the thread pane the subject displays as カタカナ漢字　 and in the message pane, both in the subject and message body only カタカナ displays. Below is the "view source" with 4.7: o: iqax1@netscape.com Subject: =?ISO-2022-JP?B?GyRCJSslPyUrJUobKEq2wLbFGyRCNEE7ehsoQg==?= Content-Type: text/html; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit <html><head></head> <body>カタカナｶﾀｶﾅ漢字</body> </html>

nhottanscp

Updated

•

25 years ago

Status: NEW → ASSIGNED

Depends on: 5894

nhottanscp

Comment 1

•

25 years ago

Set 5894 as depend bug. It should eventually be converted to full-width. But currently they should turned to question marks (unmapped chars) instead of removed.

Katsuhiko Momoi

Comment 2

•

25 years ago

I think we have ISO-2022-JP encoder/decoder problems for Hankaku Katakana. Here's what I got on sending the string (view this under Japanese (Auto-Detect)): ハンカクﾊﾝｶｸ半角 in the Subject header and body. 1. Sent from 11/15/99 Linux M11 build: ------------------------------------- Subject: =?ISO-2022-JP?B?GyRCJU8lcyUrJS8bKErK3ba4GyRCSD4zURsoQg==?= Content-Type: text/plain; charset=ISO-2022-JP; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by netscape.com id UAA15287 =1B$B%O%s%+%/=1B(J=CA=DD=B6=B8=1B$BH>3Q=1B(B ------------------------------------- 2. Sent from 4.71 -- 11/15/99 Win32 build: ------------------------------------- Subject: =?iso-2022-jp?B?GyRCJU8lcyUrJS8bKElKXTY4GyRCSD4zURsoQg==?= Content-Type: text/html; charset=iso-2022-jp Content-Transfer-Encoding: 7bit <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html> ------------------------------------- (Note: You can use a B64 decoder found here: http://kaze:8000/tools/base64decode.html) The important fact is that 5.0 is not removing Hankaku data, it is just encoding it wrong. Note that 4.71 is generating "<ESC>(I" for 7-bit Hankaku but 5.0 is incorrectly generating "<ESC>(B" and then at the same time generating 8-bit JIS (correctly). (Note that Messaging server QP'ed it because 8-bit data were in the mail.) This is why Mozilla cannot display some part of the string sent from Mozilla. So we have a few problems. 1. 5.0 cannot display its own Hankaku generation but can display the one sent from 4.71. On the other hand, 4.71 can display both types correctly. --> This indicates we don't have a good JIS decoder which can tolerate these variations in Hankaku encodings. In particular we need to be able to decode, "<ESC>(I", JIS7 Hankaku, and JIS8 Hankaku. Also tolerate a mistake such as "<ESC>(B" followed by JIS7 Hankaku code points just as 4.x does. 2. We need to be able to encode properly using "<ESC>(I" in case we offer Hankaku-send option via prefs.js. This is the only encoding we should use in case we allow Hankaku in send. I think we should fix these problems pretty soon, particularly the decoding problem and improvement and tolerance in decoding.

Katsuhiko Momoi

Comment 3

•

25 years ago

I need to make clear that the data string sent from 4.71: <ESC>$B%O%s%+%/<ESC>(IJ]68ESC$BH>3Q<ESC>(B</html> should be represented more accurately as: <ESC>$B%O%s%+%/<ESC>(IJ]68<ESC>$BH>3Q<ESC>(B</html> where I use <ESC> in place of the escape character which may not displayed dinstinctly enough under Japanese encoding. By the way the B-encoded Subject header by Mozilla is identical to the text data when it is decoded.

Katsuhiko Momoi

Comment 4

•

25 years ago

I probably should clarify my position on display. I think we should display Hankaku in text body as the data contains Hankaku. 4.x does displays Zenkaku in the headers even though the real data are in Hankaku, but it displays Hankaku in the text body. I'm not sure why the headers don't use Hankaku but we might want to copy this behavior. On send, Hankaku -> Zenkaku should be the default, but with a prefs.js option (no UI) to send Hankaku if needed.

Katsuhiko Momoi

Comment 5

•

25 years ago

I notice that there are a few other bugs filed earlier for Hankaku problems. Are the problems 1 & 2 covered in these other bugs? It was my understainding that decoding with some tolerance should be working now.

nhottanscp

Updated

•

25 years ago

Assignee: nhotta → cata

Status: ASSIGNED → NEW

nhottanscp

Comment 6

•

25 years ago

I think the decoder problem should be a separate bug. We need to provide an easy reproducible case to cata. Momoi san, could you create ISO-2022-JP string and attach to the bug? Use hankaku "aiueo" (\uFF71\uFF72\uFF73\uFF74\uFF75), I couldn't do that because 4.x editor converts hankaku to zenkaku when saving.

Katsuhiko Momoi

Comment 7

•

25 years ago

Attached a .txt file which contains "a-i-u-e-o" as described by nhotta encoded in ISO-2022-JP with the escape sequence ESC(I followed by 7-bit JIS. This is how we should encode Hankaku Katakana.

Katsuhiko Momoi

Comment 8

•

25 years ago

Attached file ISO-20222-JP encoded Hankaku sample — Details

Katsuhiko Momoi

Comment 9

•

25 years ago

I guess I messed up the MIME type. Please change the extension from .cgi to .txt when saving it.

cata

Updated

•

25 years ago

Target Milestone: M13

bobj

Comment 10

•

25 years ago

Should this have [BETA] in the summary?

cata

Updated

•

25 years ago

Target Milestone: M13 → M14

Frank Tang

Comment 11

•

25 years ago

Change platform and OS to ALL

OS: Linux → All

Hardware: Other → All

bobj

Updated

•

25 years ago

Target Milestone: M14 → M15

Frank Tang

Comment 12

•

25 years ago

I think I should own this, I rewrote the ISO-2022-JP decoder. Probably a bug in there.

Assignee: cata → ftang

Status: ASSIGNED → NEW

Frank Tang

Updated

•

25 years ago

Status: NEW → ASSIGNED

Frank Tang

Comment 13

•

25 years ago

the attached test cases render correctly under ISO-2022-JP now. Mark this fixed.

Katsuhiko Momoi

Comment 14

•

25 years ago

Hi, ji, can you check this out on Linux, Windows,and Mac?

QA Contact: momoi → ji

Reporter

Comment 15

•

25 years ago

I checked today's builds. On linux and windows, the half-width char is converted to corresponding full-width char both in thread pane and message view pane when sent out using ISO-2022-JP. And on mac, the half-width char is not converted to full-width and is displayed just as it is both in thread pane and message view pane. It looks like there are still some problems here.

Frank Tang

Comment 16

•

25 years ago

Is this display problem on Mac or sending problem. Will you see the half-width kana in Mac browser when your page is in full-width kana ?

Reporter

Comment 17

•

25 years ago

It seems a sending problem. When sending a mail containing half-width katakana from mac, the half-width katakan is not converted to full-width when received on mac or windows. I also can see this with 4.x on mac.

Reporter

Comment 18

•

25 years ago

Retested again on mac. The half-width katakana is converted to full-width. I might have done something wrong last time when I tested mac version. So now all three platforms can convert the half-width katakanas to full-width. There are two issues left related to half-width katakana: 1.Change the pref file to be able to send out half-width katakana, as we do with 4.X. Are we going to keep this feature for 5.0? 2.We may need to be good enough to display half-width katakanas sent from the other mail clients.

nhottanscp

Comment 19

•

25 years ago

The second issue, I agree that we want to display them correctly. The pref option, we can use it to test hankaku mail display.

Frank Tang

Comment 20

•

25 years ago

mark it fix. The origional problem is now gone. IQA now need to developer more test cases for different pref setting.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

Katsuhiko Momoi

Comment 21

•

25 years ago

ftang, what do you want to do about Mozilla not being able to display the Hankaku string it sends out? What Hankaku encoding shoudl we do? 4.7x can display what Mozilla sends out but Mozilla cannot. By the way, Mozilla browser also cannot display ISO-2022-JP saved Hankaku characters created by its own composer. Can we do something better? Hankaku may not be used in Mail all that often but we should be able to display them in Browser. We need to deal with this, I think, either in this bug or in another bug. I'm re-opening this bug and will attach a test message, which displays OK undrer 4.72 but not under Mozilla even though it was created by Mozilla.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Katsuhiko Momoi

Comment 22

•

25 years ago

Attached file Mailbox file containing one message sent by 3/7/2000 build witrh Hankaku option ON. — Details

Katsuhiko Momoi

Comment 23

•

25 years ago

Attached image Image of 4.72 showing a message from Mozilla mail. Shows all the characters correctly. — Details

Katsuhiko Momoi

Comment 24

•

25 years ago

Attached image The same msg as above not showing correctly in the header and text body cut off in the middle under Mozilla. — Details

nhottanscp

Comment 25

•

25 years ago

The bug was marked as FIXED because the attachment (posted on 11/19) displayed correctly. It has ESC(I for hankaku. The latest attachment (posted on 3/8) has ESC(B for hankaku. I don't think we should support displaying hankaku with ESC(B. So this is a problem of sending (converting to ISO-2022-JP from unicode). We should generate ESC(I instead of ESC(B. This is not a mail specific problem but usually ISO-2022-JP is used for mail and in mozilla it's only used when the pref (no UI) is on explicitly.

Frank Tang

Comment 26

•

25 years ago

It looks like the nsIUnicodeEncoder have problem. It generate 8-bits data in the ESC ( B seq... bad bad behavior. Is the remaining problem a beta1 stoper. If so, please put beta1 to the keyword.

Status: REOPENED → ASSIGNED

Katsuhiko Momoi

Comment 27

•

25 years ago

After discussing this with ftang, we decided to do the following: 1. There should be a bug about correctly generating "<ESC>(I" for 7-bit Hankaku. 2. There should be a bug about interpreting different Hankaku escape sequences. momoi will investigate and file a separate bug on it. About #1, this bug was from the beginning about Mozilla not generating the correct escape sequence. ji said it and I said it. I provided the exmple from 4.71 only to show you what should be generated, not to use it as a test case to see if Mozilla can display it. Mozilla can display "<ESC>(I" Hankaku even before this bug was filed. So, I think we should keep this bug for item #1 above. I'll file a new one for #2.

Frank Tang

Comment 28

•

25 years ago

M16

Target Milestone: M15 → M16

Frank Tang

Comment 29

•

25 years ago

change the summary from "Wrong handling about Japanese half-width katakana in the subject and message body" to "ISO-2022-JP conversion bugs for single byte katan"

Summary: Wrong handling about Japanese half-width katakana in the subject and message body → ISO-2022-JP conversion bugs for single byte katan

Frank Tang

Updated

•

25 years ago

Keywords: beta2

Frank Tang

Comment 30

•

25 years ago

reassign to bobj

Assignee: ftang → bobj

Status: ASSIGNED → NEW

bobj

Comment 31

•

25 years ago

Reassigned to nhotta. Tentatively set TM to M17.

Assignee: bobj → nhotta

Target Milestone: M16 → M17

nhottanscp

Updated

•

25 years ago

Status: NEW → ASSIGNED

leger

Updated

•

25 years ago

Keywords: nsbeta2

Jim Roskind

Updated

•

25 years ago

Keywords: beta2

Whiteboard: [nsbeta2+]

nhottanscp

Comment 32

•

25 years ago

For mail/news, single byte katakana is only sent when a backend only pref is set. So I don't think this is critical for beta2 for mail/news. Is this bug for mail/news only or reproducible for form submission?

Summary: ISO-2022-JP conversion bugs for single byte katan → ISO-2022-JP conversion bugs for single byte katakana

Frank Tang

Comment 33

•

25 years ago

base on the assumption that we won't provide pref ui to turn that pref on, we should not mark this beta2.

Keywords: nsbeta2

Whiteboard: [nsbeta2+]

nhottanscp

Updated

•

24 years ago

Target Milestone: M17 → M28

Katsuhiko Momoi

Updated

•

24 years ago

Keywords: intl

Priority: P3 → P4

nhottanscp

Updated

•

24 years ago

Target Milestone: --- → Future

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Simon Montagu :smontagu

Updated

•

18 years ago

Attachment #2971 - Attachment mime type: text/plain → text/plain; charset=iso-2022-jp

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: ji → i18n

Makoto Kato [:m_kato]

Assignee

Comment 34

•

15 years ago

Not reproduce on 3.1. It is converted half-width katakana to full-width. So I resolved by WORKSFORME.

Assignee: nhottanscp → m_kato

Status: ASSIGNED → RESOLVED

Closed: 25 years ago → 15 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.