Closed Bug 128587 Opened 23 years ago Closed 23 years ago

EUC-KR decoder : a bug in 8byte seq. rep. of Hangul syllables

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

CLOSED FIXED
mozilla1.0

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

There's a little typo (seemingly introduced about two years ago) in uScanDecomposedHangulCommon() in intl/uconv/src/uscan.c. Because of this typo, the final consonants of all Hangul syllables represented with 8byte sequence are turned into the initial consonants of corresponding syllables. Somehow I overlooked it and didn't notice the problem while working on bug 88944. I'm gonna attach my one line patch.
Attached patch patch Splinter Review
I think this is simple enough to go in for 0.9.9, but it's all right to put this in a little later.
Keywords: intl
QA Contact: ruixu → teruko
Status: UNCONFIRMED → NEW
Ever confirmed: true
Roy, Can you review the patch? This is a very simple patch to fix a typo. After your review, I'll ask for sr.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment on attachment 72220 [details] [diff] [review] patch I really don't understand the code and the code is very scarely; but I trust you are familiar with the code. I'll give /r=yokoyama and thank you for catching the typo.
Attachment #72220 - Flags: review+
Roy, thanks a lot for giving my patch a review and trusting me. I'm sorry for having asked you for a 'blank check' :-). Here's the belated explanation of what that part of the code does in case super-reviewers want it. KS X 1001 annotation 3.3 specifies that 8822 Hangul syllables not listed in precomposed forms in KS X 1001 (only 2350 of them are listed) be represented with 8byte sequence that begins with HANGUL Filler ( 0xA4 0xD4 in EUC-KR encoding, 0x24 0x54 in ISO-2022-KR encoding). HANGUL Filler should be followed by three pairs of octets. The first octet of each pair should be 0xA4 in EUC-KR encoding (0x24 in ISO-2022-KR encoding) and the second octet of each pair represent leading consonant, medial vowel and trailing consonant, respectively. What uScanDecomposedHangulCommon() does is - check if *in buffer contains enough number of bytes (8 or more) - check if it begins with HANGUL FILLER '0xA4 0xD4' (or 0x24 0x54) depending on 'mask' (in[0], in[1]) - check if the following three pairs of octets have 0xA4 (or 0x24) in the first octet ( in[2], in[4],in[6] ) - map the second byte of each pair ( in[3], in[5], in[7]) to indices for leading consonant, medial vowel and trailing consonant using lMap and tMap (in case of medial vowel, the mapping is just linear without a gap so that no mapping table is necessary.: line 811) - LIndex, VIndex, and TIndex are used to convert 8byte seq. representation of Hangul syllables to Unicode code point and store the result in the output buffer *out. (line 838 - 840) In line 833, when calculating TIndex, in[3] (the second byte in the first pair of octet following HANGUL FILLER) was used instead of in[7] (the second byte in the third pair of octet following HANGUL FILLER). My patch fixes this typo. Now I'll ask for sr.
Comment on attachment 72220 [details] [diff] [review] patch well, I'm going to base this sr= on what looks like a very extensive analysis of the problem :) sr=alecf
Attachment #72220 - Flags: superreview+
Comment on attachment 72220 [details] [diff] [review] patch a=scc
Attachment #72220 - Flags: approval+
fix checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
verified with 2002-03-25 trunk build closing now..
Status: RESOLVED → CLOSED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: