Closed Bug 128587 Opened 23 years ago Closed 23 years ago

EUC-KR decoder : a bug in 8byte seq. rep. of Hangul syllables

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

CLOSED FIXED
mozilla1.0

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

There's a  little typo (seemingly introduced about
two years ago) in uScanDecomposedHangulCommon() in
intl/uconv/src/uscan.c. Because of this typo,
the final consonants of all Hangul syllables
represented with 8byte sequence are turned into
the initial consonants of corresponding syllables.

Somehow I overlooked it and didn't notice
the problem while working on bug 88944.

I'm gonna attach my one line patch.
Attached patch patch Splinter Review
I think this is simple enough to go in for 0.9.9, but
it's all right to put this in a little later.
Keywords: intl
QA Contact: ruixu → teruko
Status: UNCONFIRMED → NEW
Ever confirmed: true
Roy,
Can you review the patch? This is a very simple patch to fix a typo.
After your review, I'll ask for sr.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment on attachment 72220 [details] [diff] [review]
patch 

I really don't understand the code and the code is very scarely; 
but I trust you are familiar
with the code.


I'll give /r=yokoyama
and thank you for catching
the typo.
Attachment #72220 - Flags: review+
Roy, thanks a lot for giving my patch a review and trusting me.  I'm
sorry for having asked you for a 'blank check' :-). Here's the belated
explanation of what that part of the code does in case super-reviewers
want it.

  KS X 1001 annotation 3.3 specifies that 8822 Hangul syllables not listed
  in precomposed forms in KS X 1001 (only 2350 of them are listed) be
  represented with 8byte sequence that begins with HANGUL Filler ( 0xA4
  0xD4 in EUC-KR encoding, 0x24 0x54 in ISO-2022-KR encoding). HANGUL
  Filler should be followed by three pairs of octets. The first octet
  of each pair should be 0xA4 in EUC-KR encoding (0x24 in ISO-2022-KR
  encoding) and the second octet of each pair represent leading consonant,
  medial vowel and trailing consonant, respectively.

  What uScanDecomposedHangulCommon() does is

    - check if  *in buffer contains enough number of bytes (8 or more)
    - check if it begins with HANGUL FILLER
      '0xA4 0xD4' (or 0x24 0x54) depending on 'mask' (in[0], in[1])
    - check if the following three pairs of octets have 0xA4 (or 0x24)
      in the first octet ( in[2], in[4],in[6] )
    - map the second byte of each pair ( in[3], in[5], in[7])
      to indices for leading consonant, medial vowel and trailing
      consonant using lMap and tMap (in case of medial vowel,
      the mapping is just linear without a gap so that no mapping
      table is necessary.: line 811)
    - LIndex, VIndex, and TIndex are used to convert 8byte seq.
      representation of Hangul syllables to Unicode code point
      and store the result in the output buffer *out.
      (line 838 - 840)

In line 833, when calculating TIndex, in[3] (the second byte in the
first pair of octet following  HANGUL FILLER) was used instead of in[7]
(the second byte in the third pair of octet following HANGUL FILLER).
My patch fixes this typo.

Now I'll ask for sr.
Comment on attachment 72220 [details] [diff] [review]
patch 

well, I'm going to base this sr= on what looks like a very extensive analysis
of the problem :)
sr=alecf
Attachment #72220 - Flags: superreview+
Comment on attachment 72220 [details] [diff] [review]
patch 

a=scc
Attachment #72220 - Flags: approval+
fix checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified with 2002-03-25 trunk build
closing now..
Status: RESOLVED → CLOSED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: