some korean(2byte character) broken when using euc-kr page encoding - there is some doubt about uscan.c




8 years ago
8 years ago


(Reporter: oskmkr, Assigned: smontagu)



Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox5-, firefox6- fixed)




(5 attachments)



8 years ago
User-Agent:       Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

fixed below ---- 

1. before
else if(! in[1] & 0x80) /* 2nd byte range check */

2. after
else if(!(in[1] & 0x80)) /* 2nd byte range check */

but there is not equal because operator priority !!

Reproducible: Always

Actual Results:  
there is no broken character in euc-kr pages.

o/s : windows 7 (kor)

Comment 1

8 years ago
I could not reproduce the issue on:
Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1

*Note: Reporter, can you be more clear on the steps to reproduce? Thanks.

Comment 2

8 years ago
Relevant Bug 576556

Comment 3

8 years ago
captured reported issue.

Comment 4

8 years ago

Comment 5

8 years ago
(In reply to comment #1)
> I could not reproduce the issue on:
> Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1
> *Note: Reporter, can you be more clear on the steps to reproduce? Thanks.

I attached capture image of this issue.

I wish that is helpful.


Comment 6

8 years ago
Confirmed on
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2pre) Gecko/20110523 Firefox/4.0.2pre ID:20110523030626
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2) Gecko/20110508 Firefox/5.0a2 ID:20110523042001
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110523 Firefox/6.0a1 ID:20110523030619

Regresson window(cached m-c hourly):
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705123310
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705221203

In local build(Windows7)
build from 362e3ce08ead + backed out b18119895f4a : Works as expected
build from 362e3ce08ead : Fails
Regressed by:
b18119895f4a	Daniel Holbert — Bug 576556: Add GCC-suggested parens in uconv.c to fix build warning. r=smontagu
Assignee: nobody → smontagu
Blocks: 576556
Component: General → Internationalization
Ever confirmed: true
Keywords: regression
OS: Windows 7 → All
Product: Firefox → Core
QA Contact: general → i18n

Comment 7

8 years ago
Posted file reduced html


8 years ago
Version: unspecified → Trunk


8 years ago
Attachment #534676 - Attachment mime type: text/html → text/html; charset=ksc5601
So, that cset was just:
which added a single set of parens that I thought GCC was suggesting ("suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~'")

In my initial reading, I'd thought that message meant that the "&" was _part_ of the operand of "!".  Hence, I stuck the close-paren after the end of the "&" expression.

But now when re-reading that warning, I see that that my initial reading was incorrect (and in fact, given that this cset changed behavior, the & was definitely *not* part of the operand of "!").

So, the close-paren I added just needs shifting back to before the "&".
I'm happy to take this & clean up after myself, but I can't get to it until tomorrowish. (The fix itself is trivial, but I want to make sure we get a reftest working right for this, too.)

smontagu (or anyone else), feel free to take this if you get to it first.

Comment 10

8 years ago
Posted file Reference html

Comment 11

8 years ago
There's something odd going on here -- your patch was the correct test according to the comment just above:

   * and the second byte can take any value with MSB = 1.

... but that is just wrong! The codepoint that is failing in the reduced testcase is 0xA45A.

The original condition |if (! in[1] & 0x80)| is equivalent to |if (in[1] == 0)|, so we were basically never rejecting any invalid codepoints (at least in this routine -- they might have failed or been mapped to replacement characters somewhere else).

Daniel, thanks for the offer to fix this, but I think I'd better take it and do an exhaustive check that all possible combinations are being correctly handled.

Comment 12

8 years ago
Posted patch PatchSplinter Review
This includes a fix for bug 462687 and a reftest. Todo: xpcshell test with full coverage.
Attachment #534860 - Flags: review?(VYV03354)
Attachment #534860 - Flags: review?(VYV03354) → review+
Release drivers are not going to track this for 5 or 6. If you want to make the case that it should land for either of those, please request approval on the patch with an explanation of the risks and benefits.

Comment 15

8 years ago
Last Resolved: 8 years ago
Flags: in-testsuite+
Resolution: --- → FIXED

Comment 16

8 years ago
Comment on attachment 534860 [details] [diff] [review]

This is a significant regression from 1.9.2: all Korean characters in the windows CP949 extension to EUC-KR (a total of 1942 codepoints) are not displayed correctly.

I've tested the patch with all possible double-byte combinations from 0x8001 to 0xFFFF, and there is no change in decoding between 1.9.2 and trunk with this patch, so I'm confident that it's a very safe change.
Attachment #534860 - Flags: approval-mozilla-beta?
Attachment #534860 - Flags: approval-mozilla-aurora?
Comment on attachment 534860 [details] [diff] [review]

too late for 5 Beta but we're OK with this going into Aurora for 6.
Attachment #534860 - Flags: approval-mozilla-beta?
Attachment #534860 - Flags: approval-mozilla-beta-
Attachment #534860 - Flags: approval-mozilla-aurora?
Attachment #534860 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.