Closed Bug 7965 Opened 25 years ago Closed 22 years ago

[converter] need ISO-2022-CN converters

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: ftang, Assigned: masaki.katakai)

References

Details

(Keywords: fixedOEM)

Attachments

(2 files, 7 obsolete files)

Status: NEW → ASSIGNED
Target Milestone: M11
Summary: need ISO-2022-CN converters → [converter] need ISO-2022-CN converters
Target Milestone: M11 → M10
Target Milestone: M10 → M14
Change to M14 since it is Post Beta
Assignee: ftang → yueheng.xu
Status: ASSIGNED → NEW
you may want to split this bug to two, one for nsIUnicodeEncoder, one for
nsIUnicodeDecoder. Mark the M build, please. Thanks.
Status: NEW → ASSIGNED
yueheng.xu: What is the status of ISO-2022-CN converter ?
Do we really need to support ISO-2022-CN ? See the following email exchanges-
=======================================================================
Subject: 
             software and usage of ISO-2022-CN
        Date: 
             Fri, 01 Oct 1999 17:30:12 -0700
       From: 
             ftang@netscape.com (Yung-Fong Tang)
 Organization: 
             Netscape
         To: 
             zhf@net.tsinghua.edu.cn, zhf@net.edu.cn, hdy@tsinghua.edu.cn,
             tckao@iiidns.iii.org.tw, chung@iiidns.iii.org.tw, 
MRC@CAC.Washington.EDU
         CC: 
             mozilla-i18n@mozilla.org, taka@netscape.com, momoi@netscape.com,
             erik@netscape.com




Dear RFC 1922 authors:

I think I have ask this question one year ago and I don't think any
author reply me yet. Let me ask again-
1. Is there any software currently support ISO-2022-CN ? Can you list
them ? (Mark C, Is that true your IMAP 4 server support ISO-2022-CN as
search charset ?)

2. Is there known usage of ISO-2022-CN ? Private email exanges ?
mailling list which use ISO-2022-CN ? newsgroup that use ISO-2022-CN ?

3. Can ISO-2022-CN replaced by GB2312 or BIG5 with B64 in MIME ? Why not
?

Same question for ISO-2022-CN-EXT.

Thanks.
==========================================================================
Subject: 
        re: software and usage of ISO-2022-CN
   Date: 
        Fri, 1 Oct 1999 17:31:25 -0700 (PDT)
   From: 
        Mark Crispin <MRC@CAC.Washington.EDU>
     To: 
        ftang@netscape.com
    CC: 
        zhf@net.tsinghua.edu.cn, zhf@net.edu.cn, hdy@tsinghua.edu.cn, 
tckao@iiidns.iii.org.tw,
        chung@iiidns.iii.org.tw, mozilla-i18n@mozilla.org, taka@netscape.com,
        momoi@netscape.com, erik@netscape.com




I've seen email messages in ISO-2022-CN, although typically it's been GB2312
encoded in ISO 2022 (no CNS 11643).

My IMAP server supports ISO-2022-CN, as well as GB2312/CN-GB, CN-GB12345,
BIG5/CN-BIG5.  Actually, I treat all ISO-2022-xx charsets the same; I have a
general ISO 2022 interpreter that will interpret just about any ISO 2022 text
you throw at it.

I support CNS 11643 planes 1 and 2, not 3-7 or 14.

I'm still trying to work out how CNS planes 3-7 relate to CNS plane 14 and
Unicode.  As far as I can tell, plane 14 is now CNS plane 3 (but are the
codepoints the same?  e.g., is CNS 0x32121 Unicode 0x4e28?); plane 4 is just a
repeat of the Han characters from Unicode 2.0 (albeit in different locations);
and I don't know anything about how planes 5-7 relate to Unicode.

==========================================================================
Subject: 
            Re: software and usage of ISO-2022-CN
       Date: 
            Sat, 02 Oct 1999 11:55:23 +0800
      From: 
            高天助 <tckao@iii.org.tw>
        To: 
            Mark Crispin <MRC@CAC.Washington.EDU>
        CC: 
            ftang@netscape.com, zhf@net.tsinghua.edu.cn, zhf@net.edu.cn, 
hdy@tsinghua.edu.cn,
            tckao@iiidns.iii.org.tw, chung@iiidns.iii.org.tw, 
mozilla-i18n@mozilla.org,
            taka@netscape.com, momoi@netscape.com, erik@netscape.com,
            Emily Hsu <emilyhsu@iii.org.tw>, "何建民 Ho" 
<hoho@iis.sinica.edu.tw>
 References: 
            1




The previous 6148 characters of CNS plane 14 has been moved to plane 3 and has 
the
same code point . the last 171 charcters of plane 14 has been distributed into
plane 4 according to the stroke and radical sequence.
A few characters of plane 5-7 and 15(used in personal name) has been encoded 
into
Unicode 2.0, and we are working to encoded the othe characters of plane 4-7 and 
15
into plane 2 of ISO 10646 now,  and these characters will be encoded by UTF 16 
in
Unicode in the future.
Some expert reviewed the RFC 1922   4 months ago in Taiwan, and made the 
following
decision that is Taiwan should not endorsed RFC 1922 based on the following
rationale:
        1. there is just one character name" big-5" registered in ICANN, but we
used the name CN-Big 5 in RFC 1922, it will cause more confusion to vendors and
users.
        2. We will adopt the Unicode( ISO 10646) in the future in Taiwan, and 
most
of the software has supported or going to support the Unicoded  , so these 
expert
suggested that the Taiwan should not promote any other encoding method to 
confuse
the user.

        Base on the above reason, we are considing to withdraw our endorsement
form RFC 1922.
        If you have any question to the relation of CNS and unicode, my 
colleague
Emily Hus  will  help you to make it more clear.
T.C.Kao
==========================================================================
Subject: 
        Re: software and usage of ISO-2022-CN
   Date: 
        Fri, 1 Oct 1999 21:13:49 -0700 (PDT)
   From: 
        Mark Crispin <MRC@CAC.Washington.EDU>
     To: 
        高天助 <tckao@iii.org.tw>
    CC: 
        ftang@netscape.com, zhf@net.tsinghua.edu.cn, hdy@tsinghua.edu.cn,
        tckao@iiidns.iii.org.tw, chung@iiidns.iii.org.tw, 
mozilla-i18n@mozilla.org,
        taka@netscape.com, momoi@netscape.com, erik@netscape.com,
        Emily Hsu <emilyhsu@iii.org.tw>, 何建民 Ho <hoho@iis.sinica.edu.tw>




On Sat, 02 Oct 1999 11:55:23 +0800, =?big5?B?sKqk0adV?= wrote:
> The previous 6148 characters of CNS plane 14 has been moved to plane 3 and
> has the same code point . the last 171 charcters of plane 14 has been
> distributed into plane 4 according to the stroke and radical sequence.

Hmm.  I only have information about 4197 characters in plane 14; codepoints
0xE2121 through 0xE672A.  However, I see that some CNS plane 14 codepoints
aren't listed in my source data (the Unicode mappings file for CNS) which
suggests that these are CNS codepoints which do not have a corresponding
Unicode codepoint.  The first of these missing codepoints is 0xE2138.

So, which codepoint range went to plane 3 and which went to plane 4?

I support CNS planes 1 and 2 in my software, and I'd like to support the other
planes, but I need information about how to do this.  At this point, I'm
primarily interested in those characters which are in the BMP.

> Some expert reviewed the RFC 1922   4 months ago in Taiwan, and made the
> following decision that is Taiwan should not endorsed RFC 1922 based on the
> following rationale [...]

In my opinion, we should probably retire the names CN-BIG5 and CN-GB
immediately, in favor of GB2312 and BIG5.

I think that there is still a limited future for ISO-2022-CN as a transitional
mechanism, since we don't yet have full deployment of Unicode or 8-bit clean
links.  I have seen ISO-2022-CN used in email.

Nevertheless, I agree that we should be concentrating our efforts on Unicode.

> Base on the above reason, we are considing to withdraw our endorsement
> form RFC 1922.

I recommend the following compromise position:
 1) withdraw endorsement of CN-BIG5 and CN-GB, on the grounds that these never
    became widely deployed and other names supercede them.
 2) since ISO-2022-CN-EXT and GB-12345, etc. never became widely deployed, no
    efforts should be expended on either of these.
 3) recommend that ISO-2022-CN only be used in transitional applications, and
    that most efforts should be made for Unicode based applications.
 4) the primary focus of activity is Unicode, and the fate of RFC 1922 should
    become "historical" (as opposed to "withdrawn").

What do you think of this?

-- Mark --
==========================================================================
Subject: 
            Re: software and usage of ISO-2022-CN
       Date: 
            Tue, 05 Oct 1999 19:30:42 +0800
      From: 
            高天助 <tckao@iii.org.tw>
        To: 
            ftang@netscape.com
        CC: 
            Mark Crispin <MRC@CAC.Washington.EDU>, zhf@net.tsinghua.edu.cn,
            zhf@net.edu.cn, hdy@tsinghua.edu.cn, tckao@iiidns.iii.org.tw, 
chung@iiidns.iii.org.tw,
            mozilla-i18n@mozilla.org, taka@netscape.com, momoi@netscape.com,
            erik@netscape.com, Emily Hsu <emilyhsu@iii.org.tw>,
            "何建民 Ho" <hoho@iis.sinica.edu.tw>
 References: 
            1 , 2 , 3

Yung-Fong Tang 寫道:
>
> How about Big5-Plus now ? Will you promote it ? Or yet another dead-spec again 
?
>

    Big5-plus was initiated by Research,Development, and Evaluation Commission 
of
Executive Yuan, As I understand, it will be a dead-spec, no organization will 
promote it.
==========================================================================
Subject: 
        Re: software and usage of ISO-2022-CN
   Date: 
        Tue, 5 Oct 1999 09:02:15 -0700 (PDT)
   From: 
        Mark Crispin <MRC@CAC.Washington.EDU>
     To: 
        高天助 <tckao@iii.org.tw>
    CC: 
        ftang@netscape.com, zhf@net.tsinghua.edu.cn, hdy@tsinghua.edu.cn,
        tckao@iiidns.iii.org.tw, chung@iiidns.iii.org.tw, 
mozilla-i18n@mozilla.org,
        taka@netscape.com, momoi@netscape.com, erik@netscape.com,
        Emily Hsu <emilyhsu@iii.org.tw>, 何建民 Ho <hoho@iis.sinica.edu.tw>




On Tue, 05 Oct 1999 19:43:11 +0800, =?big5?B?sKqk0adV?= wrote:
> O.K., I think this statements will be more reality, can you suggest what
> should we do next step.

In my opinion, nothing needs to be done.  It has already happened.  Since RFC
1922 is Informational, there's no need to take any specific document action.
It would be too much work to issue an update to RFC 1922 or make official
statements.

Informally, if someone asks, "what should I do about RFC 1922", tell them the
following recommendations on an informal basis:
 1) RFC 1922 is becoming an historical document due to the Internet-wide
    transition to Unicode.
 2) implement GB2312 EUC and BIG5, since that's what most people use.
 3) from RFC 1922, just implement ISO-2022-CN, at least to encode GB2312 in
    ISO 2022.  It is also a good idea to implement HZ.
 4) don't use the CN-xxx names from RFC 1922, but it is a good idea to
    recognize them as aliases.
 5) don't worry about anything else in RFC 1922.
 6) implement UTF-8 now, since this will become the preferred standard for
    non-English email.
 7) if you implement your application to use Unicode internally, it will be
    much easier to follow these recommendations.

A similar situation exists with Korean.  EUC-KR is preferred in Korea, but
sometimes you still see ISO-2022-KR.

This situation -- a gradual change to obsolescence -- happens to RFCs all the
time.  It's not a big problem.  Fortunately, since the transition to UTF-8 is
happening across the entire Internet, there's no special "Chinese problem"
caused by this either.  It's an "everybody's problem".

If someone implements a multi-lingual email application, it is not difficult
to include ISO-2022-CN support, since you need the ISO 2022 capability for
Japanese.
==========================================================================
Subject: 
             Re: software and usage of ISO-2022-CN
        Date: 
             Wed, 06 Oct 1999 04:49:49 -0700
       From: 
             ftang@netscape.com (Yung-Fong Tang)
 Organization: 
             Netscape
         To: 
             Mark Crispin <MRC@CAC.Washington.EDU>, ftang@netscape.com,
             chung@iiidns.iii.org.tw, erik@netscape.com, 
zhf@net.tsinghua.edu.cn,
             Emily Hsu <emilyhsu@iii.org.tw>, hdy@tsinghua.edu.cn, 
taka@netscape.com,
             tckao@iiidns.iii.org.tw, momoi@netscape.com, 
hoho@iis.sinica.edu.tw,
             lunde@adobe.com
 Newsgroups: 
             netscape.public.mozilla.i18n
  References: 
             1 , 2




Add Ken Lunde to the cc. I will forward him all our discussion up to this
message.

Mark Crispin wrote:

> On Tue, 05 Oct 1999 19:43:11 +0800, =?big5?B?sKqk0adV?= wrote:
> > O.K., I think this statements will be more reality, can you suggest what
> > should we do next step.
>
> In my opinion, nothing needs to be done.  It has already happened.  Since RFC
> 1922 is Informational, there's no need to take any specific document action.
> It would be too much work to issue an update to RFC 1922 or make official
> statements.
>
> Informally, if someone asks, "what should I do about RFC 1922", tell them the
> following recommendations on an informal basis:
>  1) RFC 1922 is becoming an historical document due to the Internet-wide
>     transition to Unicode.

Agree

>
>  2) implement GB2312 EUC and BIG5, since that's what most people use.

Agree

>
>  3) from RFC 1922, just implement ISO-2022-CN, at least to encode GB2312 in
>     ISO 2022.  It is also a good idea to implement HZ.

I wonder we should even recommed people to implement GB encode ISO-2022-CN part.
I have not heard anyone from PRC join this discussion yet. Is ISO-2022-CN really
used in PRC ? How popular ? Or we could even ignore that part ? (Question
Question Question, not statement)

I totally agree it is a good idea to implement HZ. I saw a lot of HZ data on web
site or newsgroup. I never see any REAL LIVE ISO-2022-CN data , even the GB2312
encoded one.

>
>  4) don't use the CN-xxx names from RFC 1922, but it is a good idea to
>     recognize them as aliases.

Yes, we (Netscape) already add them as alias for 4.x for 1~3 years.

>
>  5) don't worry about anything else in RFC 1922.
>  6) implement UTF-8 now, since this will become the preferred standard for
>     non-English email.
>  7) if you implement your application to use Unicode internally, it will be
>     much easier to follow these recommendations.
>
> A similar situation exists with Korean.  EUC-KR is preferred in Korea, but
> sometimes you still see ISO-2022-KR.

I think the situration is a little bit different for ISO-2022-KR. There are
popular SendMail modification which send ISO-2022-KR.

>
>
> This situation -- a gradual change to obsolescence -- happens to RFCs all the
> time.  It's not a big problem.  Fortunately, since the transition to UTF-8 is
> happening across the entire Internet, there's no special "Chinese problem"
> caused by this either.  It's an "everybody's problem".
>
> If someone implements a multi-lingual email application, it is not difficult
> to include ISO-2022-CN support, since you need the ISO 2022 capability for
> Japanese.

Strongly disagree. When we convert Unicode to ISO-2022-JP or ISO-2022-KR, it is
clear which byte combination we should convert to. But for ISO-2022-CN, the code
need to first decide encode in GB seq or CNS seq. This is the most difficult
part- e.g. "undo CJK Unification".
==========================================================================
Subject: 
        Re: software and usage of ISO-2022-CN
   Date: 
        Wed, 6 Oct 1999 04:52:07 -0700 (PDT)
   From: 
        Mark Crispin <MRC@CAC.Washington.EDU>
     To: 
        ftang@netscape.com
    CC: 
        chung@iiidns.iii.org.tw, erik@netscape.com, zhf@net.tsinghua.edu.cn,
        Emily Hsu <emilyhsu@iii.org.tw>, hdy@tsinghua.edu.cn, taka@netscape.com,
        tckao@iiidns.iii.org.tw, momoi@netscape.com, hoho@iis.sinica.edu.tw, 
lunde@adobe.com




Both Annie and I have received messages encoded in ISO-2022-CN.  Some people
are using it.

> > If someone implements a multi-lingual email application, it is not
> > difficult to include ISO-2022-CN support, since you need the ISO 2022
> > capability for Japanese.
>
> Strongly disagree. When we convert Unicode to ISO-2022-JP or ISO-2022-KR, it
> is clear which byte combination we should convert to. But for ISO-2022-CN,
> the code need to first decide encode in GB seq or CNS seq. This is the most
> difficult part- e.g. "undo CJK Unification".

That's only if you support sending mail in ISO-2022-CN.  You don't have to
worry about it if you only support reading ISO-2022-CN, and send mail with a
different charset.

In other words, treat it like incoming Shift-JIS; recognize it and display it
properly, but don't generate it.

As a practical matter, I would implement sending ISO-2022-CN by sending GB
sequences unless there is no suitable GB codepoint, in which case I use the
CNS sequence.  There's too much shifting in ISO 2022 for it to be suitable for
100% CNS text.
==========================================================================
Subject: 
             Re: software and usage of ISO-2022-CN
        Date: 
             Fri, 08 Oct 1999 09:39:30 -0700
       From: 
             ftang@netscape.com (Yung-Fong Tang)
 Organization: 
             Netscape/AOL
         To: 
             Mark Crispin <MRC@CAC.Washington.EDU>
         CC: 
             chung@iiidns.iii.org.tw, erik@netscape.com, 
zhf@net.tsinghua.edu.cn,
             Emily Hsu <emilyhsu@iii.org.tw>, hdy@tsinghua.edu.cn, 
taka@netscape.com,
             tckao@iiidns.iii.org.tw, momoi@netscape.com, 
hoho@iis.sinica.edu.tw,
             lunde@adobe.com
  References: 
             1





Mark Crispin wrote:

> Both Annie and I have received messages encoded in ISO-2022-CN.  Some people
> are using it.
>

Back to my origional question. Which software generate them ?

>
> > > If someone implements a multi-lingual email application, it is not
> > > difficult to include ISO-2022-CN support, since you need the ISO 2022
> > > capability for Japanese.
> >
> > Strongly disagree. When we convert Unicode to ISO-2022-JP or ISO-2022-KR, it
> > is clear which byte combination we should convert to. But for ISO-2022-CN,
> > the code need to first decide encode in GB seq or CNS seq. This is the most
> > difficult part- e.g. "undo CJK Unification".
>
> That's only if you support sending mail in ISO-2022-CN.  You don't have to
> worry about it if you only support reading ISO-2022-CN, and send mail with a
> different charset.

Sure, but I though we SHOULD generate them if we want to support RFC 1922, right 
?

>
>
> In other words, treat it like incoming Shift-JIS; recognize it and display it
> properly, but don't generate it.
>
> As a practical matter, I would implement sending ISO-2022-CN by sending GB
> sequences unless there is no suitable GB codepoint, in which case I use the
> CNS sequence.  There's too much shifting in ISO 2022 for it to be suitable for
> 100% CNS text.

Since reasonable after heard opinions from III, Taiwan, ROC
====Subject: 
        Re: software and usage of ISO-2022-CN
   Date: 
        Fri, 8 Oct 1999 11:58:52 -0700 (PDT)
   From: 
        Mark Crispin <MRC@CAC.Washington.EDU>
     To: 
        Yung-Fong Tang <ftang@netscape.com>
    CC: 
        chung@iiidns.iii.org.tw, erik@netscape.com, zhf@net.tsinghua.edu.cn,
        Emily Hsu <emilyhsu@iii.org.tw>, hdy@tsinghua.edu.cn, taka@netscape.com,
        tckao@iiidns.iii.org.tw, momoi@netscape.com, hoho@iis.sinica.edu.tw, 
lunde@adobe.com




On Fri, 08 Oct 1999 09:39:30 -0700, Yung-Fong Tang wrote:
> Mark Crispin wrote:
> > Both Annie and I have received messages encoded in ISO-2022-CN.  Some
> > people are using it.
> Back to my origional question. Which software generate them ?

I don't know, or I would have told you.  As a guess, it may be a Chinese
version of Eudora.

We've configured Pine to send ISO-2022-CN mail.  But that isn't a matter of
"Pine generates ISO-2022-CN mail"; it's a matter of "Pine can be configured to
generate ISO-2022-CN mail via external filters."  That is, the Pine user
enters and edits GB2312 text in Pine's editor; when the message is sent Pine
follows a configuration file filter rule to pass it through ncf to convert it
to ISO-2022-CN, and sets the outgoing charset=ISO-2022-CN.

When sending 8-bit GB2312 mail, Pine is very likely to convert the whole thing
into BASE64.
======================================================================
Keywords: beta1
Cleared the beta1 keyword because we will not hold Beta1 for this, but
leaving targeted for M14.  If yueheng.xu@intel.com can get this completed,
reviewed and checked-in by 2/15, we'll take it for Beta1.
Keywords: beta1
Whiteboard: patch from mozilla submitted for review
remove "patch from mozilla submitted for review" . The patch they send is not 
for this bug but for a not reported GBK problem.
Whiteboard: patch from mozilla submitted for review
this is not M14. yueheng.xu@intel.com didn't make it and we don't need this for 
beta1. Move to M16
Target Milestone: M14 → M16
QA Contact: teruko → ftang
I will do it later since no body with that content yet. 
Target Milestone: M16 → M20
reassign this back to ftang since yueheng.xu@intel.com is no longer working on
mozilla project. Mark it as Future since the authors of ISO-2022-CN think it is
no longer important to support ISO-2022-CN
Assignee: yueheng.xu → ftang
Status: ASSIGNED → NEW
Target Milestone: M20 → Future
mark it as assign
Status: NEW → ASSIGNED
watanabe@komadori.planet.sci.kobe-u.ac.jp:
want to add ISO-2022-CN for us?
ftang:  I think we need add ISO-2022-CN support in Mozilla. we can not read
mails  from Solaris Chinese dtmail which encoded with ISO-2022-CN and
ISO-2022-CN-EXT encoding.

I will add a patch to support the Decoder for these encoding these days.

My patch works in my machine now, I need to test it for several days.
Attached patch patch for nsISO2022CNToUnicode.h (obsolete) — Splinter Review
Attached patch patch for nsUCvCnModule.cpp (obsolete) — Splinter Review
Add Brian Yuan and Masaki to CC
ftang: Can you review the patches? Thanx.
*** Bug 159863 has been marked as a duplicate of this bug. ***
*** Bug 159863 has been marked as a duplicate of this bug. ***
Keywords: review
Comment on attachment 93393 [details] [diff] [review]
patch for nsISO2022CNToUnicode.cpp

r=ftang
Attachment #93393 - Flags: review+
Comment on attachment 93394 [details] [diff] [review]
patch for nsISO2022CNToUnicode.h

r=ftang change mState_ASCII to eState_ASCII (same as others) m is for member
variable e is for enum. Same issue in .cpp file
Attachment #93394 - Flags: review+
Comment on attachment 93395 [details] [diff] [review]
patch for nsUCvCnModule.cpp

r=ftang
Attachment #93395 - Flags: review+
Attached patch Patch for iso-2022-cn decoder (obsolete) — Splinter Review
Remove old patches.  

Add a new patch which follow ftang's comments.
Attachment #93393 - Attachment is obsolete: true
Attachment #93394 - Attachment is obsolete: true
Attachment #93395 - Attachment is obsolete: true
Add updated patches to follow Alec's new converter structure
Attachment #94284 - Attachment is obsolete: true
Great work, Erivn!!

I've verified ISO-2022-CN email can be browsed now.
Comment on attachment 95224 [details] [diff] [review]
patchs to follow Alec's new converter structure

r=ftang
Attachment #95224 - Flags: review+
Comment on attachment 95224 [details] [diff] [review]
patchs to follow Alec's new converter structure

>Index: ucvcn/nsISO2022CNToUnicode.h
>===================================================================

>+public:
>+  nsISO2022CNToUnicode()
>+  {
>+    mState = eState_ASCII;
>+    mPlaneID = 0;

initialize these using the C++ preferred method:
nsISO2022CNToUnicode() : mState(eState_ASCII), mPlaneID(0) {}

this allows C++ to optimize initialization by doing a memcpy into the initial
structure

>+
>+  // Decoder handle
>+  nsIUnicodeDecoder *mGB2312_Decoder;
>+  nsIUnicodeDecoder *mEUCTW_Decoder;

you shoudl use nsCOMPtr's for the mGB2312_Decoder and mEUCTW_Decoder
members...that way you won't have to release them in the destructor, and you
won't leak on failure.
>+
>+    if(!mGB2312_Decoder) {// failed creating a delegate converter
>+       return NS_ERROR_UNEXPECTED;
>+    } else {
>+       PRInt32  srcLen = aSrcLength;
>+       PRInt32  destLen = *aDestLength;

else after a return makes no sense...

instead:

>    if(!mGB2312_Decoder) {// failed creating a delegate converter
>       return NS_ERROR_UNEXPECTED;
>
>    PRInt32  srcLen = aSrcLength;
>    PRInt32  destLen = *aDestLength;

and so forth

>+       if(res != NS_OK) {
>+          return NS_ERROR_UNEXPECTED;
>+       }

don't compare directly against NS_OK - use if (NS_FAILED(res))

the decoder itself looks fine to me though.
Attachment #95224 - Flags: needs-work+
Attachment #95224 - Attachment is obsolete: true
Sorry, there are some typo errors in comment lines in previous patch.
Attachment #95362 - Attachment is obsolete: true
Comment on attachment 95364 [details] [diff] [review]
Correct some typo errors in comment lines of previous patch

sorry, a few other minor things:

first, this:
nsString tmpCharset;

this is a stack-based string, so you should be using nsAutoString. see
http://www.mozilla.org/projects/xpcom/string-guide.html#Concrete_Classes and 
the section on "Stack based strings" in
http://www.mozilla.org/projects/xpcom/string-quickref.html

second, I see this loop:
+    for (PRInt32 i=0; i<destLen; i++) {
+	aDest[i] = dest[i];
+    }

you should use memcpy() here instead of copying the characters one by one - it
will be signifigantly faster. In fact, I wonder why you don't just forward the
whole call to mEUCTW_Decoder->Convert() instead of creating temporary
variables, and then copying the values over... won't they also return
NS_OK_UDEC_MOREOUTPUT/etc? (same goes for the other converter)

I'm not marking needs-work because I'd like to here why we need the temporary
variables. If there is a good reason, then I'll mark this patch sr=alecf as
long as you fix the nsAutoString stuff.
r=ftang

Thank Alec for review. hope this patch no problems. :-)
Attachment #95364 - Attachment is obsolete: true
Comment on attachment 95521 [details] [diff] [review]
patch to follow Alec's instruction

excellent! sr=alecf (and r=ftang)
Attachment #95521 - Flags: superreview+
Attachment #95521 - Flags: review+
assigned to myself
Assignee: ftang → katakai
Status: ASSIGNED → NEW
patch checked into trunk. Mark this as fixed.

Also added branchOEM keyword for OEM branch.

Ervin, please verify the fix in trunk. (I will also verify) and
work to make patch for OEM branch. I understand 
your patch attachment 94284 [details] [diff] [review] for old structure
needs to be updated according to alecf's comments.

Good work, Ervin. Thank you for fast-reviewing, Frank and Alec!!
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: branchOEM
This busted HPUX:

Error 129: "./../ucvcn/nsISO2022CNToUnicode.h", line 45 # Redefinition of macro
'PMASK' differs from previous definition at ["/usr/include/sys/param.h", line 140].
    #define PMASK       0xa0

I added a |#undef PMASK| just before that line, which should fix it.
Whiteboard: branchOEM → branchOEM+
checked in to OEM branch.
Great work, Ervin!! and Thanks for review, Frank and Alec.
Whiteboard: branchOEM+ → branchOEM+,fixedOEM
Keywords: review
Whiteboard: branchOEM+,fixedOEM → fixedOEM
Keywords: fixedOEM
Whiteboard: fixedOEM
(Dumb ?!) question:
This patch contains a _decoder_ - where is the _encoder_ part for this encoding
?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: