Last Comment Bug 658952 - some korean(2byte character) broken when using euc-kr page encoding - there is some doubt about uscan.c
: some korean(2byte character) broken when using euc-kr page encoding - there i...
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 All
: -- major (vote)
: ---
Assigned To: Simon Montagu :smontagu
:
Mentors:
http://cafe.naver.com/oskmcafetest
Depends on:
Blocks: 576556
  Show dependency treegraph
 
Reported: 2011-05-23 01:33 PDT by oskmkr
Modified: 2011-06-01 23:49 PDT (History)
7 users (show)
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
fixed


Attachments
captured reported issue. (280.46 KB, image/jpeg)
2011-05-23 18:49 PDT, oskmkr
no flags Details
It's fine on the other browsers (270.69 KB, image/jpeg)
2011-05-23 18:50 PDT, oskmkr
no flags Details
reduced html (183 bytes, text/html; charset=ksc5601)
2011-05-23 22:10 PDT, Alice0775 White
no flags Details
Reference html (293 bytes, text/html; charset=ksc5601)
2011-05-23 23:25 PDT, Simon Montagu :smontagu
no flags Details
Patch (3.74 KB, patch)
2011-05-24 12:16 PDT, Simon Montagu :smontagu
VYV03354: review+
asa: approval‑mozilla‑aurora+
asa: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description oskmkr 2011-05-23 01:33:48 PDT
User-Agent:       Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

uscan.c
fixed below ---- 

1. before
else if(! in[1] & 0x80) /* 2nd byte range check */


2. after
else if(!(in[1] & 0x80)) /* 2nd byte range check */

but there is not equal because operator priority !!


Reproducible: Always


Actual Results:  
there is no broken character in euc-kr pages.


o/s : windows 7 (kor)
http://cafe.naver.com/oskmcafetest
Comment 1 AndreiD[QA] 2011-05-23 01:44:18 PDT
I could not reproduce the issue on:
Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1

*Note: Reporter, can you be more clear on the steps to reproduce? Thanks.
Comment 2 Alice0775 White 2011-05-23 06:27:28 PDT
Relevant Bug 576556
Comment 3 oskmkr 2011-05-23 18:49:13 PDT
Created attachment 534649 [details]
captured reported issue.

captured reported issue.
Comment 4 oskmkr 2011-05-23 18:50:18 PDT
Created attachment 534650 [details]
It's fine on the other browsers
Comment 5 oskmkr 2011-05-23 18:53:13 PDT
(In reply to comment #1)
> I could not reproduce the issue on:
> Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1
> 
> *Note: Reporter, can you be more clear on the steps to reproduce? Thanks.

I attached capture image of this issue.

I wish that is helpful.

Thanks.
Comment 6 Alice0775 White 2011-05-23 22:09:39 PDT
Confirmed on
http://hg.mozilla.org/releases/mozilla-2.0/rev/08cd5ee7b7c3
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2pre) Gecko/20110523 Firefox/4.0.2pre ID:20110523030626
http://hg.mozilla.org/releases/mozilla-aurora/rev/7d4b05121ada
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2) Gecko/20110508 Firefox/5.0a2 ID:20110523042001
http://hg.mozilla.org/mozilla-central/rev/11d04991cdd0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110523 Firefox/6.0a1 ID:20110523030619

Regresson window(cached m-c hourly):
Works:
http://hg.mozilla.org/mozilla-central/rev/883651094520
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705123310
Fails:
http://hg.mozilla.org/mozilla-central/rev/53dc58746f3e
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705221203
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=883651094520&tochange=53dc58746f3e


In local build(Windows7)
build from 362e3ce08ead + backed out b18119895f4a : Works as expected
build from 362e3ce08ead : Fails
Regressed by:
b18119895f4a	Daniel Holbert — Bug 576556: Add GCC-suggested parens in uconv.c to fix build warning. r=smontagu
Comment 7 Alice0775 White 2011-05-23 22:10:05 PDT
Created attachment 534676 [details]
reduced html
Comment 8 Daniel Holbert [:dholbert] 2011-05-23 22:46:46 PDT
So, that cset was just:
  http://hg.mozilla.org/mozilla-central/rev/b18119895f4a
which added a single set of parens that I thought GCC was suggesting ("suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~'")

In my initial reading, I'd thought that message meant that the "&" was _part_ of the operand of "!".  Hence, I stuck the close-paren after the end of the "&" expression.

But now when re-reading that warning, I see that that my initial reading was incorrect (and in fact, given that this cset changed behavior, the & was definitely *not* part of the operand of "!").

So, the close-paren I added just needs shifting back to before the "&".
Comment 9 Daniel Holbert [:dholbert] 2011-05-23 22:55:48 PDT
I'm happy to take this & clean up after myself, but I can't get to it until tomorrowish. (The fix itself is trivial, but I want to make sure we get a reftest working right for this, too.)

smontagu (or anyone else), feel free to take this if you get to it first.
Comment 10 Simon Montagu :smontagu 2011-05-23 23:25:02 PDT
Created attachment 534685 [details]
Reference html
Comment 11 Simon Montagu :smontagu 2011-05-24 04:37:03 PDT
There's something odd going on here -- your patch was the correct test according to the comment just above:

   * and the second byte can take any value with MSB = 1.

... but that is just wrong! The codepoint that is failing in the reduced testcase is 0xA45A.

The original condition |if (! in[1] & 0x80)| is equivalent to |if (in[1] == 0)|, so we were basically never rejecting any invalid codepoints (at least in this routine -- they might have failed or been mapped to replacement characters somewhere else).

Daniel, thanks for the offer to fix this, but I think I'd better take it and do an exhaustive check that all possible combinations are being correctly handled.
Comment 12 Simon Montagu :smontagu 2011-05-24 12:16:59 PDT
Created attachment 534860 [details] [diff] [review]
Patch

This includes a fix for bug 462687 and a reftest. Todo: xpcshell test with full coverage.
Comment 13 Simon Montagu :smontagu 2011-05-24 12:21:04 PDT
http://smontagu.damowmow.com/genEncodingTest.cgi?family=windows&codepage=949 passes with the patch.
Comment 14 Asa Dotzler [:asa] 2011-05-25 11:49:39 PDT
Release drivers are not going to track this for 5 or 6. If you want to make the case that it should land for either of those, please request approval on the patch with an explanation of the risks and benefits.
Comment 15 Simon Montagu :smontagu 2011-05-29 23:05:03 PDT
http://hg.mozilla.org/mozilla-central/rev/c77a430f6467
Comment 16 Simon Montagu :smontagu 2011-05-29 23:40:06 PDT
Comment on attachment 534860 [details] [diff] [review]
Patch

This is a significant regression from 1.9.2: all Korean characters in the windows CP949 extension to EUC-KR (a total of 1942 codepoints) are not displayed correctly.

I've tested the patch with all possible double-byte combinations from 0x8001 to 0xFFFF, and there is no change in decoding between 1.9.2 and trunk with this patch, so I'm confident that it's a very safe change.
Comment 17 Asa Dotzler [:asa] 2011-06-01 11:59:05 PDT
Comment on attachment 534860 [details] [diff] [review]
Patch

too late for 5 Beta but we're OK with this going into Aurora for 6.
Comment 18 Simon Montagu :smontagu 2011-06-01 23:49:41 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/436da90e737a

Note You need to log in before you can comment on or make changes to this bug.