Closed
Bug 75707
Opened 23 years ago
Closed 23 years ago
Some BIG5 characters can not be displayed properly in Solaris Trunk
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla0.9.3
People
(Reporter: eyan, Assigned: tetsuroy)
References
Details
(Keywords: intl, Whiteboard: r=bstell, patch is on the branch, vbranch)
Attachments
(6 files)
131.48 KB,
text/plain
|
Details | |
1.18 KB,
patch
|
Details | Diff | Splinter Review | |
1.61 KB,
patch
|
Details | Diff | Splinter Review | |
2.22 KB,
text/plain
|
Details | |
1.63 KB,
patch
|
Details | Diff | Splinter Review | |
1.07 KB,
patch
|
Details | Diff | Splinter Review |
In Solaris Trunk, under UTF-8 locales, and zh_TW.BIG5 locale: all BIG5 characters display OK, except the following: 0xa27e ---> 0xa2a7 displayed as NULL. while in zh_TW/zh_TW.EUC locales, it's OK to display these characters.
Comment 2•23 years ago
|
||
Katakai san, can you reproduce this?
Comment 3•23 years ago
|
||
assign to myself. Target moz0.9.1
Assignee: nhotta → ftang
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → mozilla0.9.1
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 4•23 years ago
|
||
Add Brian@Netscape in Cc
Updated•23 years ago
|
QA Contact: andreasb → ylong
Comment 5•23 years ago
|
||
Ftang - Is this important for nsbeta1? Is this a requirement from the Sun team? Please Advise . . . Adding Lbaliman to cc: list.
Keywords: intl
Comment 6•23 years ago
|
||
Sun says this is not a show-stopper.
Comment 7•23 years ago
|
||
OK . . . let's set the milestone at M0.9.2
Comment 9•23 years ago
|
||
This could be caused by the buggy ucvcn converters. wait and see what happen after we land 75928
Updated•23 years ago
|
Whiteboard: depend on 80772 expect date 5/17
Comment 11•23 years ago
|
||
I don't think this get fixed by 80772. take out that status white board and mark target ---
Whiteboard: depend on 80772 expect date 5/17
Target Milestone: mozilla0.9.1 → ---
Comment 12•23 years ago
|
||
a27e map to 256d a2a1 map to 256E a2a2 map to 2570 a2a3 map to 256f a2a4 map to 2550 a2a5 map to 255e a2a6 map to 256a a2a7 map to 2561
Comment 13•23 years ago
|
||
Is this still a problem ? Does the recent work done by bstell fix this problem ?
Comment 14•23 years ago
|
||
Ervin, please update this bug report. Thanks.
Reporter | ||
Comment 15•23 years ago
|
||
Verified in Mozilla 2001060322: 0xa27e ---> 0xa2a7 still displayed as NULL. and more characters displayed error: 0xa1e3 displayed as '?' 0xa3be displayed as '?' 0xb145 displayed as ' E' 0xb3c4 displayed as '?' 0xb4b9 displayed as '?' 0xb5ae displayed as '?' 0xb6a3 displayed as '?' 0xb776 displayed as ' v' 0xb86b displayed as ' k' 0xbe4c displayed as ' L' 0xd166 displayed as ' f' 0xd25b displayed as ' [' 0xd350 displayed as ' P' 0xd5ee displayed as '?' 0xd6e3 displayed as '?' 0xd7d8 displayed as '?' 0xd8cd displayed as '?' 0xd9c2 displayed as '?' 0xdada displayed as '?' 0xdbac displayed as '?' 0xdca1 displayed as '?' 0xdd74 displayed as ' t' 0xde69 displayed as ' i' 0xe053 displayed as ' S' 0xe148 displayed as ' H' 0xe1fc displayed as '?' 0xe3e6 displayed as '?' 0xe5d0 displayed as '?' 0xe6c5 displayed as '?' 0xe7ba displayed as '?' 0xe7e1 displayed as '?' 0xf4e8 displayed as '?'
Comment 18•23 years ago
|
||
pdt+ base on 6/11 pdt meeting.
Updated•23 years ago
|
Whiteboard: [PDT+]
Comment 19•23 years ago
|
||
I see different result in my build. I have the following Big5 characters display as "\ufffd" 0xA15a 0xA1C3 0xA1C5 0xA1fe 0xA240 0xA2cc 0xA2ce also, the following BIg5 character display as blank 0xA3bc (I think this is fine because BIG5 standard display this as blank) the folloing display as ? 0xa3be I also have 0xb145 display as "E" Be4C display as "L" CDD3 as "?" dada as "?"
Updated•23 years ago
|
Whiteboard: [PDT+] → [PDT+]no progress yet.
Comment 20•23 years ago
|
||
I don't think this is show stopper. Move it to moz0.9.3
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Comment 21•23 years ago
|
||
Frank, if you don't think this is a showstopper, pls remove the PDT+ in the status whiteboard. Thanks.
Comment 22•23 years ago
|
||
per PDT triage mtg with montse, removing PDT+ from status summary.
Whiteboard: [PDT+]no progress yet. → no progress yet.
Comment 24•23 years ago
|
||
>I have the following Big5 characters display as "\ufffd" >0xA15a >0xA1C3 >0xA1C5 >0xA1fe >0xA240 >0xA2cc >0xA2ce This is because the test case http://bugzilla.mozilla.org/showattachment.cgi?attach_id=30560 itself is buggy. it contains 6 characters string "\ufffd" intead of those big5 code point. In window, I can also see the following problem, which mean this is a big5 to unicode decoding issue. 0xa3be, Be4C, cdd3, dada, e7e1, f4e8 However, if I put these characters into seperate file, they display fine. which mean it is a buffer issue.
Comment 25•23 years ago
|
||
I can reproduce this problem on my window. and I am sure this is not a character level conversion problem but a buffer related conversion issue. jshin said he see similar problem on Korean again recently. add jshin@pantheon.yale.edu and shanjian to the cc list.
Comment 26•23 years ago
|
||
Just for your reference, I reported the problem Frank mentioned about in my comments added to bug 26920 (under jshin@pantheon.yale.edu account). As this bug is likely to be a buffer-issue as well, it may as well be marked as dependent on bug 26920.
Comment 27•23 years ago
|
||
got a fix.
Comment 28•23 years ago
|
||
Comment 29•23 years ago
|
||
r=bstell@netscape.com
Comment 30•23 years ago
|
||
ok, so what happen is the following If the last bytes of a block is the first byte of a multibyte characters, non of the uScan will success and the done will be false. And if the previous byte (the byte before the last byte in the block) contains value < 0x20, the we will treat it as a control code. The fix is simple, add a boolean value and set it to false, only set it to true if uScan success. and we check the boolean before we check med. This bug is a data lost bug. the character will lost if 1. it is Traditional Chinese or Korean document 2. the first byte and last byte of a multi byte character are not received together 3. before the first byte of that multibyte character, it is a code point < 0x20, for example, tab or CR, LF, etc this is a safe fix
Comment 31•23 years ago
|
||
Please consider this as PDT+, data lost in Traditional chinese and Korean.
Whiteboard: no progress yet. → r=bstell
Comment 32•23 years ago
|
||
I take it back. That patch is not complete. Wait for a complete fix.
Comment 33•23 years ago
|
||
shouldn't medIsValid be set to false right after it is consumed?
Comment 34•23 years ago
|
||
Comment 35•23 years ago
|
||
>shouldn't medIsValid be set to false right after it is consumed?
basically, it IS.
In the new patch, if we never successfully uScan once, we should return
INPUT_ERROR as what we did in ConvertTable. That mean we have partial bytes in
the block. We need the next block to complete the conversion
Comment 36•23 years ago
|
||
ok, I wrote a good cgi script to test the buffer condiction. I put it under http://warp/u/ftang/utf8test/buffer.cgi Try change the encoding to Big5, EUC-KR, GB2312 and other multibyte characters.
Whiteboard: r=bstell → r=bstell,pdt+
Comment 37•23 years ago
|
||
Comment 38•23 years ago
|
||
Comment 39•23 years ago
|
||
we need a seperate patch to fix simplified Chinese converter
Comment 40•23 years ago
|
||
r=nhotta
Comment 41•23 years ago
|
||
Comment 42•23 years ago
|
||
with this patch, we decrease the problem to the following cases 1. HZ still have problem. But HZ encoding is less important now. 2. when the size is equal to 1, we have problem to all multibyte encoding. To fix it may need a big change. I feel it is too risky to fix that and the case that we will hit only one byte as block size is very very samll.
Comment 43•23 years ago
|
||
+ } else if(*src == (PRUint8) 0xa0) { Fix the cast to be like the previous if test. sr=sfraser on the last two patches.
Comment 44•23 years ago
|
||
r=nhotta for patch 07/11/01 12:35
Comment 45•23 years ago
|
||
*** Bug 88874 has been marked as a duplicate of this bug. ***
Comment 46•23 years ago
|
||
two cases are fixed. The HZ case is not fixed yet but the priority is lower. We still have problem in the case of size=1 but that is really a edge case. BTW IE look horrible while in small block size. :) As concern of m92 branch. We are done.
Comment 47•23 years ago
|
||
spin the HZ problem into 90411 Spin the general buffer problem while size=1 to bug 90414
Comment 48•23 years ago
|
||
reassign to yokoyama for trunk landing. roy- close this bug after you land into trunk. The other two problem are spin off as stated above.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 49•23 years ago
|
||
Has the correct fix been landed on the branch yet? If so, please update the bug in some way - either take the PDT+ off, or add a comment saying we're done. Thanks.
Updated•23 years ago
|
Whiteboard: r=bstell,pdt+ → r=bstell,pdt+, patch is on the branch
Assignee | ||
Comment 50•23 years ago
|
||
landed to trunk.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 51•23 years ago
|
||
Taking off pdt+ since fix has been landed on the branch. Adding "vbranch" to confirm fix.
Whiteboard: r=bstell,pdt+, patch is on the branch → r=bstell, patch is on the branch, vbranch
Reporter | ||
Comment 52•23 years ago
|
||
Verified in Solaris Trunk (base on 2001.07.27): 0xa27e ---> 0xa2a7 still displayed as blank. and still some characters displayed error: 0xa3be displayed as '?' 0xb145 displayed as ' E' 0xb3c4 displayed as '?' 0xb5ae displayed as '?' 0xbe4c displayed as ' L' 0xcdd3 displayed as ' ?' 0xdada displayed as '?' 0xe7e1 displayed as '?' 0xf4e8 displayed as '?' Others are OK now.
You need to log in
before you can comment on or make changes to this bug.
Description
•