Closed Bug 75707 Opened 23 years ago Closed 23 years ago

Some BIG5 characters can not be displayed properly in Solaris Trunk

Categories

(Core :: Internationalization, defect, P3)

Sun
Solaris
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.3

People

(Reporter: eyan, Assigned: tetsuroy)

References

Details

(Keywords: intl, Whiteboard: r=bstell, patch is on the branch, vbranch)

Attachments

(6 files)

In Solaris Trunk, 

under UTF-8 locales, and zh_TW.BIG5 locale:
all BIG5 characters display OK, except the following:
0xa27e ---> 0xa2a7 displayed as NULL.

while in zh_TW/zh_TW.EUC locales, it's OK to display these characters.
Katakai san, can you reproduce this?
assign to myself. Target moz0.9.1
Assignee: nhotta → ftang
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → mozilla0.9.1
Status: NEW → ASSIGNED
Add Brian@Netscape in Cc
QA Contact: andreasb → ylong
Ftang - Is this important for nsbeta1? Is this a requirement from the Sun team? 
Please Advise . . .

Adding Lbaliman to cc: list.
Keywords: intl
Sun says this is not a show-stopper. 
OK . . . let's set the milestone at M0.9.2
Changed QA contact tokatakai@japan.sun.com
QA Contact: ylong → katakai
This could be caused by the buggy ucvcn converters. wait and see what happen 
after we land 75928
Assign Ervin to QA contact
QA Contact: katakai → eyan
Depends on: 80772
Whiteboard: depend on 80772 expect date 5/17
I don't think this get fixed by 80772. take out that status white board and mark 
target ---
Whiteboard: depend on 80772 expect date 5/17
Target Milestone: mozilla0.9.1 → ---
a27e map to 256d
a2a1 map to 256E
a2a2 map to 2570
a2a3 map to 256f
a2a4 map to 2550
a2a5 map to 255e
a2a6 map to 256a
a2a7 map to 2561
Is this still a problem ? Does the recent work done by bstell fix this problem ?
Ervin,

please update this bug report.

Thanks.
Verified in Mozilla 2001060322:

0xa27e ---> 0xa2a7 still displayed as NULL. 

and more characters displayed error:

0xa1e3  displayed as '?'
0xa3be  displayed as '?'
0xb145  displayed as '  E'
0xb3c4  displayed as '?'
0xb4b9  displayed as '?'
0xb5ae  displayed as '?'
0xb6a3  displayed as '?'
0xb776  displayed as '  v'
0xb86b  displayed as '  k'
0xbe4c  displayed as '  L'
0xd166  displayed as '  f'
0xd25b  displayed as '  ['
0xd350  displayed as '  P'
0xd5ee  displayed as '?'
0xd6e3  displayed as '?'
0xd7d8  displayed as '?'
0xd8cd  displayed as '?'
0xd9c2  displayed as '?'
0xdada  displayed as '?'
0xdbac  displayed as '?'
0xdca1  displayed as '?'
0xdd74  displayed as '  t'
0xde69  displayed as '  i'
0xe053  displayed as '  S'
0xe148  displayed as '  H'
0xe1fc  displayed as '?'
0xe3e6  displayed as '?'
0xe5d0  displayed as '?'
0xe6c5  displayed as '?'
0xe7ba  displayed as '?'
0xe7e1  displayed as '?'
0xf4e8  displayed as '?'
mark it as moz0.9.2
Target Milestone: --- → mozilla0.9.2
mark it as P3
Priority: -- → P3
pdt+ base on 6/11 pdt meeting.
Whiteboard: [PDT+]
I see different result in my build.
I have the following Big5 characters display as "\ufffd"
0xA15a
0xA1C3
0xA1C5
0xA1fe
0xA240
0xA2cc
0xA2ce

also, the following BIg5 character display as blank
0xA3bc (I think this is fine because BIG5 standard display this as blank)


the folloing display as ?
0xa3be

I also have 0xb145 display as
"E"
Be4C display as "L"
CDD3 as "?"
dada as "?"
Whiteboard: [PDT+] → [PDT+]no progress yet.
I don't think this is show stopper. Move it to moz0.9.3
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Frank, if you don't think this is a showstopper, pls remove the PDT+ in the
status whiteboard. Thanks.
per PDT triage mtg with montse, removing PDT+ from status summary.  
Whiteboard: [PDT+]no progress yet. → no progress yet.
mark as nsbranch
Keywords: nsBranch
>I have the following Big5 characters display as "\ufffd"
>0xA15a
>0xA1C3
>0xA1C5
>0xA1fe
>0xA240
>0xA2cc
>0xA2ce
This is because the test case
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=30560 itself is buggy.
it contains 6 characters string "\ufffd" intead of those big5 code point.

In window, I can also see the following problem, which mean this is a big5 to
unicode decoding issue.
0xa3be, Be4C, cdd3, dada, e7e1, f4e8

However, if I put these characters into seperate file, they display fine. which
mean it is a buffer issue.

I can reproduce this problem on my window. and I am sure this is not a character
level conversion problem but a buffer related conversion issue.
jshin said he see similar problem on Korean again recently. 
add jshin@pantheon.yale.edu and shanjian to the cc list.
Just for your reference, I reported the problem Frank mentioned about
in my comments added to bug 26920 (under jshin@pantheon.yale.edu
account). As this bug is likely to be a buffer-issue
as well, it may as well be marked as dependent on bug 26920.
got a fix.
ok, so what happen is the following
If the last bytes of a block is the first byte of a multibyte characters, non of
the uScan will success and the done will be false. And if the previous byte (the
byte before the last byte in the block) contains value < 0x20, the we will treat
it as a control code.
The fix is simple, add a boolean value and set it to false, only set it to true
if uScan success. and we check the boolean before we check med.

This bug is a data lost bug. the character will lost if
1. it is Traditional Chinese or Korean document
2. the first byte and last byte of a multi byte character are not received together
3. before the first byte of that multibyte character, it is a code point < 0x20,
for example, tab or CR, LF, etc

this is a safe fix 
Please consider this as PDT+, data lost in Traditional chinese and Korean. 
Whiteboard: no progress yet. → r=bstell
I take it back. That patch is not complete. Wait for a complete fix.
shouldn't medIsValid be set to false right after it is consumed?
Attached patch a real patchSplinter Review
>shouldn't medIsValid be set to false right after it is consumed?
basically, it IS.

In the new patch, if we never successfully uScan once, we should return
INPUT_ERROR as what we did in ConvertTable. That mean we have partial bytes in
the block. We need the next block to complete the conversion
ok, I wrote a good cgi script to test the buffer condiction. 
I put it under http://warp/u/ftang/utf8test/buffer.cgi
Try change the encoding to Big5, EUC-KR, GB2312 and other multibyte characters.
Whiteboard: r=bstell → r=bstell,pdt+
Attached patch v3 of patchSplinter Review
we need a seperate patch to fix simplified Chinese converter
r=nhotta
with this patch, we decrease the problem to the following cases
1. HZ still have problem. But HZ encoding is less important now.
2. when the size is equal to 1, we have problem to all multibyte encoding. To
fix it may need a big change. I feel it is too risky to fix that and the case
that we will hit only one byte as block size is very very samll. 
+      } else if(*src == (PRUint8) 0xa0) {

Fix the cast to be like the previous if test.
sr=sfraser on the last two patches.
r=nhotta for patch 07/11/01 12:35
*** Bug 88874 has been marked as a duplicate of this bug. ***
two cases are fixed.
The HZ case is not fixed yet but the priority is lower.
We still have problem in the case of size=1 but that is really a edge case.
BTW IE look horrible while in small block size. :)
As concern of m92 branch. We are done.
spin the HZ problem into 90411
Spin the general buffer problem while size=1 to bug 90414
reassign to yokoyama for trunk landing.
roy- close this bug after you land into trunk. The other two problem are spin
off as stated above. 
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Has the correct fix been landed on the branch yet?  If so, please update the bug
in some way - either take the PDT+ off, or add a comment saying we're done.  Thanks.
Whiteboard: r=bstell,pdt+ → r=bstell,pdt+, patch is on the branch
landed to trunk. 
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Taking off pdt+ since fix has been landed on the branch.

Adding "vbranch" to confirm fix.
Whiteboard: r=bstell,pdt+, patch is on the branch → r=bstell, patch is on the branch, vbranch
Verified in Solaris Trunk (base on 2001.07.27):

0xa27e ---> 0xa2a7 still displayed as blank. 

and still some characters displayed error:

0xa3be  displayed as '?'
0xb145  displayed as '  E'
0xb3c4  displayed as '?'
0xb5ae  displayed as '?'
0xbe4c  displayed as '  L'
0xcdd3  displayed as '  ?'
0xdada  displayed as '?'
0xe7e1  displayed as '?'
0xf4e8  displayed as '?'

Others are OK now.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: