some korean(2byte character) broken when using euc-kr page encoding - there is some doubt about uscan.c

RESOLVED FIXED

Status

()

Core
Internationalization
--
major
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: oskmkr, Assigned: smontagu)

Tracking

({regression})

Trunk
x86
All
regression
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox5-, firefox6- fixed)

Details

(URL)

Attachments

(5 attachments)

(Reporter)

Description

6 years ago
User-Agent:       Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

uscan.c
fixed below ---- 

1. before
else if(! in[1] & 0x80) /* 2nd byte range check */


2. after
else if(!(in[1] & 0x80)) /* 2nd byte range check */

but there is not equal because operator priority !!


Reproducible: Always


Actual Results:  
there is no broken character in euc-kr pages.


o/s : windows 7 (kor)
http://cafe.naver.com/oskmcafetest

Comment 1

6 years ago
I could not reproduce the issue on:
Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1

*Note: Reporter, can you be more clear on the steps to reproduce? Thanks.

Comment 2

6 years ago
Relevant Bug 576556
(Reporter)

Comment 3

6 years ago
Created attachment 534649 [details]
captured reported issue.

captured reported issue.
(Reporter)

Comment 4

6 years ago
Created attachment 534650 [details]
It's fine on the other browsers
(Reporter)

Comment 5

6 years ago
(In reply to comment #1)
> I could not reproduce the issue on:
> Mozilla/5.0 (Windows NT 6.1; rv:6.0a1) Gecko/20110522 Firefox/6.0a1
> 
> *Note: Reporter, can you be more clear on the steps to reproduce? Thanks.

I attached capture image of this issue.

I wish that is helpful.

Thanks.

Comment 6

6 years ago
Confirmed on
http://hg.mozilla.org/releases/mozilla-2.0/rev/08cd5ee7b7c3
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2pre) Gecko/20110523 Firefox/4.0.2pre ID:20110523030626
http://hg.mozilla.org/releases/mozilla-aurora/rev/7d4b05121ada
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.2) Gecko/20110508 Firefox/5.0a2 ID:20110523042001
http://hg.mozilla.org/mozilla-central/rev/11d04991cdd0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a1) Gecko/20110523 Firefox/6.0a1 ID:20110523030619

Regresson window(cached m-c hourly):
Works:
http://hg.mozilla.org/mozilla-central/rev/883651094520
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705123310
Fails:
http://hg.mozilla.org/mozilla-central/rev/53dc58746f3e
Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; en-US; rv:2.0b2pre) Gecko/20100705 Minefield/4.0b2pre ID:20100705221203
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=883651094520&tochange=53dc58746f3e


In local build(Windows7)
build from 362e3ce08ead + backed out b18119895f4a : Works as expected
build from 362e3ce08ead : Fails
Regressed by:
b18119895f4a	Daniel Holbert — Bug 576556: Add GCC-suggested parens in uconv.c to fix build warning. r=smontagu
Assignee: nobody → smontagu
Blocks: 576556
Status: UNCONFIRMED → NEW
Component: General → Internationalization
Ever confirmed: true
Keywords: regression
OS: Windows 7 → All
Product: Firefox → Core
QA Contact: general → i18n

Comment 7

6 years ago
Created attachment 534676 [details]
reduced html

Updated

6 years ago
tracking-firefox5: --- → ?
tracking-firefox6: --- → ?
Version: unspecified → Trunk
(Assignee)

Updated

6 years ago
Attachment #534676 - Attachment mime type: text/html → text/html; charset=ksc5601
So, that cset was just:
  http://hg.mozilla.org/mozilla-central/rev/b18119895f4a
which added a single set of parens that I thought GCC was suggesting ("suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~'")

In my initial reading, I'd thought that message meant that the "&" was _part_ of the operand of "!".  Hence, I stuck the close-paren after the end of the "&" expression.

But now when re-reading that warning, I see that that my initial reading was incorrect (and in fact, given that this cset changed behavior, the & was definitely *not* part of the operand of "!").

So, the close-paren I added just needs shifting back to before the "&".
I'm happy to take this & clean up after myself, but I can't get to it until tomorrowish. (The fix itself is trivial, but I want to make sure we get a reftest working right for this, too.)

smontagu (or anyone else), feel free to take this if you get to it first.
(Assignee)

Comment 10

6 years ago
Created attachment 534685 [details]
Reference html
(Assignee)

Comment 11

6 years ago
There's something odd going on here -- your patch was the correct test according to the comment just above:

   * and the second byte can take any value with MSB = 1.

... but that is just wrong! The codepoint that is failing in the reduced testcase is 0xA45A.

The original condition |if (! in[1] & 0x80)| is equivalent to |if (in[1] == 0)|, so we were basically never rejecting any invalid codepoints (at least in this routine -- they might have failed or been mapped to replacement characters somewhere else).

Daniel, thanks for the offer to fix this, but I think I'd better take it and do an exhaustive check that all possible combinations are being correctly handled.
(Assignee)

Comment 12

6 years ago
Created attachment 534860 [details] [diff] [review]
Patch

This includes a fix for bug 462687 and a reftest. Todo: xpcshell test with full coverage.
Attachment #534860 - Flags: review?(VYV03354)
(Assignee)

Comment 13

6 years ago
http://smontagu.damowmow.com/genEncodingTest.cgi?family=windows&codepage=949 passes with the patch.
Attachment #534860 - Flags: review?(VYV03354) → review+

Comment 14

6 years ago
Release drivers are not going to track this for 5 or 6. If you want to make the case that it should land for either of those, please request approval on the patch with an explanation of the risks and benefits.
tracking-firefox5: ? → -
tracking-firefox6: ? → -
(Assignee)

Comment 15

6 years ago
http://hg.mozilla.org/mozilla-central/rev/c77a430f6467
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
(Assignee)

Comment 16

6 years ago
Comment on attachment 534860 [details] [diff] [review]
Patch

This is a significant regression from 1.9.2: all Korean characters in the windows CP949 extension to EUC-KR (a total of 1942 codepoints) are not displayed correctly.

I've tested the patch with all possible double-byte combinations from 0x8001 to 0xFFFF, and there is no change in decoding between 1.9.2 and trunk with this patch, so I'm confident that it's a very safe change.
Attachment #534860 - Flags: approval-mozilla-beta?
Attachment #534860 - Flags: approval-mozilla-aurora?

Comment 17

6 years ago
Comment on attachment 534860 [details] [diff] [review]
Patch

too late for 5 Beta but we're OK with this going into Aurora for 6.
Attachment #534860 - Flags: approval-mozilla-beta?
Attachment #534860 - Flags: approval-mozilla-beta-
Attachment #534860 - Flags: approval-mozilla-aurora?
Attachment #534860 - Flags: approval-mozilla-aurora+
(Assignee)

Comment 18

6 years ago
http://hg.mozilla.org/releases/mozilla-aurora/rev/436da90e737a
status-firefox6: --- → fixed
You need to log in before you can comment on or make changes to this bug.