Closed Bug 75707 Opened 23 years ago Closed 23 years ago

Some BIG5 characters can not be displayed properly in Solaris Trunk

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.3

People

(Reporter: eyan, Assigned: tetsuroy)

References

Details

(Keywords: intl, Whiteboard: r=bstell, patch is on the branch, vbranch)

Attachments

(6 files)

A file contain all BIG5 characters 23 years ago Ervin Yan 131.48 KB, text/plain		Details
add a boolean value to valid the med 23 years ago Frank Tang 1.18 KB, patch		Details \| Diff \| Splinter Review
a real patch 23 years ago Frank Tang 1.61 KB, patch		Details \| Diff \| Splinter Review
perl cgi which force buffer size to smaller. save it as buffer.cgi 23 years ago Frank Tang 2.22 KB, text/plain		Details
v3 of patch 23 years ago Frank Tang 1.63 KB, patch		Details \| Diff \| Splinter Review
a seperate patch to solve gbk/gb2312 problem 23 years ago Frank Tang 1.07 KB, patch		Details \| Diff \| Splinter Review

Ervin Yan

Reporter

Description

•

23 years ago

In Solaris Trunk, 

under UTF-8 locales, and zh_TW.BIG5 locale:
all BIG5 characters display OK, except the following:
0xa27e ---> 0xa2a7 displayed as NULL.

while in zh_TW/zh_TW.EUC locales, it's OK to display these characters.

Ervin Yan

Reporter

Comment 1

•

23 years ago

Attached file A file contain all BIG5 characters — Details

nhottanscp

Comment 2

•

23 years ago

Katakai san, can you reproduce this?

Frank Tang

Comment 3

•

23 years ago

assign to myself. Target moz0.9.1

Assignee: nhotta → ftang

Status: UNCONFIRMED → NEW

Ever confirmed: true

Target Milestone: --- → mozilla0.9.1

Frank Tang

Updated

•

23 years ago

Status: NEW → ASSIGNED

Masaki Katakai

Comment 4

•

23 years ago

Add Brian@Netscape in Cc

Andreas Becker

Updated

•

23 years ago

QA Contact: andreasb → ylong

Jaime Rodriguez, Jr.

Comment 5

•

23 years ago

Ftang - Is this important for nsbeta1? Is this a requirement from the Sun team? 
Please Advise . . .

Adding Lbaliman to cc: list.

Keywords: intl

Linda Baliman

Comment 6

•

23 years ago

Sun says this is not a show-stopper.

Jaime Rodriguez, Jr.

Comment 7

•

23 years ago

OK . . . let's set the milestone at M0.9.2

Teruko Kobayashi

Comment 8

•

23 years ago

Changed QA contact tokatakai@japan.sun.com

QA Contact: ylong → katakai

Frank Tang

Comment 9

•

23 years ago

This could be caused by the buggy ucvcn converters. wait and see what happen 
after we land 75928

Masaki Katakai

Comment 10

•

23 years ago

Assign Ervin to QA contact

QA Contact: katakai → eyan

Frank Tang

Updated

•

23 years ago

Depends on: 80772

Frank Tang

Updated

•

23 years ago

Whiteboard: depend on 80772 expect date 5/17

Frank Tang

Comment 11

•

23 years ago

I don't think this get fixed by 80772. take out that status white board and mark 
target ---

Whiteboard: depend on 80772 expect date 5/17

Target Milestone: mozilla0.9.1 → ---

Frank Tang

Comment 12

•

23 years ago

a27e map to 256d
a2a1 map to 256E
a2a2 map to 2570
a2a3 map to 256f
a2a4 map to 2550
a2a5 map to 255e
a2a6 map to 256a
a2a7 map to 2561

Frank Tang

Comment 13

•

23 years ago

Is this still a problem ? Does the recent work done by bstell fix this problem ?

Masaki Katakai

Comment 14

•

23 years ago

Ervin,

please update this bug report.

Thanks.

Ervin Yan

Reporter

Comment 15

•

23 years ago

Verified in Mozilla 2001060322:

0xa27e ---> 0xa2a7 still displayed as NULL. 

and more characters displayed error:

0xa1e3  displayed as '?'
0xa3be  displayed as '?'
0xb145  displayed as '  E'
0xb3c4  displayed as '?'
0xb4b9  displayed as '?'
0xb5ae  displayed as '?'
0xb6a3  displayed as '?'
0xb776  displayed as '  v'
0xb86b  displayed as '  k'
0xbe4c  displayed as '  L'
0xd166  displayed as '  f'
0xd25b  displayed as '  ['
0xd350  displayed as '  P'
0xd5ee  displayed as '?'
0xd6e3  displayed as '?'
0xd7d8  displayed as '?'
0xd8cd  displayed as '?'
0xd9c2  displayed as '?'
0xdada  displayed as '?'
0xdbac  displayed as '?'
0xdca1  displayed as '?'
0xdd74  displayed as '  t'
0xde69  displayed as '  i'
0xe053  displayed as '  S'
0xe148  displayed as '  H'
0xe1fc  displayed as '?'
0xe3e6  displayed as '?'
0xe5d0  displayed as '?'
0xe6c5  displayed as '?'
0xe7ba  displayed as '?'
0xe7e1  displayed as '?'
0xf4e8  displayed as '?'

Frank Tang

Comment 16

•

23 years ago

mark it as moz0.9.2

Target Milestone: --- → mozilla0.9.2

Frank Tang

Comment 17

•

23 years ago

mark it as P3

Priority: -- → P3

Frank Tang

Comment 18

•

23 years ago

pdt+ base on 6/11 pdt meeting.

Frank Tang

Updated

•

23 years ago

Whiteboard: [PDT+]

Frank Tang

Comment 19

•

23 years ago

I see different result in my build.
I have the following Big5 characters display as "\ufffd"
0xA15a
0xA1C3
0xA1C5
0xA1fe
0xA240
0xA2cc
0xA2ce

also, the following BIg5 character display as blank
0xA3bc (I think this is fine because BIG5 standard display this as blank)


the folloing display as ?
0xa3be

I also have 0xb145 display as
"E"
Be4C display as "L"
CDD3 as "?"
dada as "?"

Frank Tang

Updated

•

23 years ago

Whiteboard: [PDT+] → [PDT+]no progress yet.

Frank Tang

Comment 20

•

23 years ago

I don't think this is show stopper. Move it to moz0.9.3

Target Milestone: mozilla0.9.2 → mozilla0.9.3

lchiang

Comment 21

•

23 years ago

Frank, if you don't think this is a showstopper, pls remove the PDT+ in the
status whiteboard. Thanks.

lchiang

Comment 22

•

23 years ago

per PDT triage mtg with montse, removing PDT+ from status summary.

Whiteboard: [PDT+]no progress yet. → no progress yet.

Frank Tang

Comment 23

•

23 years ago

mark as nsbranch

Keywords: nsBranch

Frank Tang

Comment 24

•

23 years ago

>I have the following Big5 characters display as "\ufffd"
>0xA15a
>0xA1C3
>0xA1C5
>0xA1fe
>0xA240
>0xA2cc
>0xA2ce
This is because the test case
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=30560 itself is buggy.
it contains 6 characters string "\ufffd" intead of those big5 code point.

In window, I can also see the following problem, which mean this is a big5 to
unicode decoding issue.
0xa3be, Be4C, cdd3, dada, e7e1, f4e8

However, if I put these characters into seperate file, they display fine. which
mean it is a buffer issue.

Frank Tang

Comment 25

•

23 years ago

I can reproduce this problem on my window. and I am sure this is not a character
level conversion problem but a buffer related conversion issue.
jshin said he see similar problem on Korean again recently. 
add jshin@pantheon.yale.edu and shanjian to the cc list.

Jungshik Shin

Comment 26

•

23 years ago

Just for your reference, I reported the problem Frank mentioned about
in my comments added to bug 26920 (under jshin@pantheon.yale.edu
account). As this bug is likely to be a buffer-issue
as well, it may as well be marked as dependent on bug 26920.

Frank Tang

Comment 27

•

23 years ago

got a fix.

Frank Tang

Comment 28

•

23 years ago

Attached patch add a boolean value to valid the med — Details — Splinter Review

kill this account

Comment 29

•

23 years ago

r=bstell@netscape.com

Frank Tang

Comment 30

•

23 years ago

ok, so what happen is the following
If the last bytes of a block is the first byte of a multibyte characters, non of
the uScan will success and the done will be false. And if the previous byte (the
byte before the last byte in the block) contains value < 0x20, the we will treat
it as a control code.
The fix is simple, add a boolean value and set it to false, only set it to true
if uScan success. and we check the boolean before we check med.

This bug is a data lost bug. the character will lost if
1. it is Traditional Chinese or Korean document
2. the first byte and last byte of a multi byte character are not received together
3. before the first byte of that multibyte character, it is a code point < 0x20,
for example, tab or CR, LF, etc

this is a safe fix

Frank Tang

Comment 31

•

23 years ago

Please consider this as PDT+, data lost in Traditional chinese and Korean.

Whiteboard: no progress yet. → r=bstell

Frank Tang

Comment 32

•

23 years ago

I take it back. That patch is not complete. Wait for a complete fix.

selmer (gone)

Comment 33

•

23 years ago

shouldn't medIsValid be set to false right after it is consumed?

Frank Tang

Comment 34

•

23 years ago

Attached patch a real patch — Details — Splinter Review

Frank Tang

Comment 35

•

23 years ago

>shouldn't medIsValid be set to false right after it is consumed?
basically, it IS.

In the new patch, if we never successfully uScan once, we should return
INPUT_ERROR as what we did in ConvertTable. That mean we have partial bytes in
the block. We need the next block to complete the conversion

Frank Tang

Comment 36

•

23 years ago

ok, I wrote a good cgi script to test the buffer condiction. 
I put it under http://warp/u/ftang/utf8test/buffer.cgi
Try change the encoding to Big5, EUC-KR, GB2312 and other multibyte characters.

Whiteboard: r=bstell → r=bstell,pdt+

Frank Tang

Comment 37

•

23 years ago

Attached file perl cgi which force buffer size to smaller. save it as buffer.cgi — Details

Frank Tang

Comment 38

•

23 years ago

Attached patch v3 of patch — Details — Splinter Review

Frank Tang

Comment 39

•

23 years ago

we need a seperate patch to fix simplified Chinese converter

nhottanscp

Comment 40

•

23 years ago

r=nhotta

Frank Tang

Comment 41

•

23 years ago

Attached patch a seperate patch to solve gbk/gb2312 problem — Details — Splinter Review

Frank Tang

Comment 42

•

23 years ago

with this patch, we decrease the problem to the following cases
1. HZ still have problem. But HZ encoding is less important now.
2. when the size is equal to 1, we have problem to all multibyte encoding. To
fix it may need a big change. I feel it is too risky to fix that and the case
that we will hit only one byte as block size is very very samll.

Simon Fraser [no longer active]

Comment 43

•

23 years ago

+      } else if(*src == (PRUint8) 0xa0) {

Fix the cast to be like the previous if test.
sr=sfraser on the last two patches.

nhottanscp

Comment 44

•

23 years ago

r=nhotta for patch 07/11/01 12:35

Frank Tang

Comment 45

•

23 years ago

*** Bug 88874 has been marked as a duplicate of this bug. ***

Frank Tang

Comment 46

•

23 years ago

two cases are fixed.
The HZ case is not fixed yet but the priority is lower.
We still have problem in the case of size=1 but that is really a edge case.
BTW IE look horrible while in small block size. :)
As concern of m92 branch. We are done.

Frank Tang

Comment 47

•

23 years ago

spin the HZ problem into 90411
Spin the general buffer problem while size=1 to bug 90414

Frank Tang

Comment 48

•

23 years ago

reassign to yokoyama for trunk landing.
roy- close this bug after you land into trunk. The other two problem are spin
off as stated above.

Assignee: ftang → yokoyama

Status: ASSIGNED → NEW

Roy Yokoyama

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

selmer (gone)

Comment 49

•

23 years ago

Has the correct fix been landed on the branch yet?  If so, please update the bug
in some way - either take the PDT+ off, or add a comment saying we're done.  Thanks.

chris hofmann

Updated

•

23 years ago

Whiteboard: r=bstell,pdt+ → r=bstell,pdt+, patch is on the branch

Roy Yokoyama

Assignee

Comment 50

•

23 years ago

landed to trunk.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

rubydoo123

Comment 51

•

23 years ago

Taking off pdt+ since fix has been landed on the branch.

Adding "vbranch" to confirm fix.

Whiteboard: r=bstell,pdt+, patch is on the branch → r=bstell, patch is on the branch, vbranch

Ervin Yan

Reporter

Comment 52

•

23 years ago

Verified in Solaris Trunk (base on 2001.07.27):

0xa27e ---> 0xa2a7 still displayed as blank. 

and still some characters displayed error:

0xa3be  displayed as '?'
0xb145  displayed as '  E'
0xb3c4  displayed as '?'
0xb5ae  displayed as '?'
0xbe4c  displayed as '  L'
0xcdd3  displayed as '  ?'
0xdada  displayed as '?'
0xe7e1  displayed as '?'
0xf4e8  displayed as '?'

Others are OK now.

You need to log in before you can comment on or make changes to this bug.