Closed
Bug 131837
Opened 24 years ago
Closed 24 years ago
Space is not displayed correct in china.com
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: amyy, Assigned: ftang)
References
()
Details
(Keywords: intl, topembed+, Whiteboard: [adt2] [need a=])
Attachments
(5 files, 2 obsolete files)
|
179.47 KB,
image/jpeg
|
Details | |
|
59.56 KB,
text/html
|
Details | |
|
180.92 KB,
image/jpeg
|
Details | |
|
1.20 KB,
patch
|
nhottanscp
:
review+
kinmoz
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
|
176.55 KB,
image/jpeg
|
Details |
Spaces in page http://chinese.china.com/zh_cn/ are not displayed correctly.
With page http://chinese.china.com/zh_tw/ has similar problem, the bulleted dots
are not displayed correctly also.
IE doesn't has the problem.
| Reporter | ||
Comment 1•24 years ago
|
||
Updated•24 years ago
|
Component: Asian → Internationalization
Product: Tech Evangelism → Browser
Version: unspecified → other
Comment 2•24 years ago
|
||
I don't think this is an evangelism issue. This looks like layout or
i18n issue. To take one relevant part from http://chinese.china.com/zh_cn/
as shown on ylong's attachment (view this under GB2312),
<td bgcolor="#FFF3C3" width="175">0xA3 0xA0<a href="http://shop4u.china.com/"
class="nounderline">É̳Ç</a>
What looks like a blank to the eyes actually contains 2 bytes that are not
displayed, i.e. 0xA3 and 0xA0. The latter is an nbsp. I don't know why 0xA3 is
there but it seems to be a legitimate lead byte in GB2312.
Should we display anything here? On my Windows 2000 (edfault locale set to
Japanese), it displays a middle dot rather than what looks like a Korean
character in ylong's attachment above.
I will take up this issue as an evangelism issue if we determine that there
is no way to deal with this in our code first.
Comment 4•24 years ago
|
||
This page is encoded in GB2312. In some places, it contains a byte
sequence 0xA3 0xA0 (NBSP). Depending on platform and locale (I believe)
this is displayed differently.
| Reporter | ||
Comment 5•24 years ago
|
||
With page: http://chinese.china.com/zh_tw/
is showing a squal rather than a Korean looking character in
http://chinese.china.com/zh_cn/.
| Assignee | ||
Comment 6•24 years ago
|
||
looks pretty bad.
Comment 7•24 years ago
|
||
Momoi san, what is the character 0xA3A0 is that supposed to be shown as space?
Is that frequently used as a space in many sites or this site specific?
Updated•24 years ago
|
Whiteboard: Need Info
Comment 8•24 years ago
|
||
> Momoi san, what is the character 0xA3A0 is that supposed to be shown as space?
> Is that frequently used as a space in many sites or this site specific?
My guess is that they wanted to generated just NBSP but somehow their
code generator appended a legitimate GB2312 lead byte. I am not as concerned
about this specific example as I am with a more general question. What is an
allowable range of range of errors in this encoding? Is there a reasonable
approach?
Comment 9•24 years ago
|
||
I checked the two sites today and they both use
http://chinese.china.com/zh_cn/
http://chinese.china.com/zh_tw/
Is using \uFFFD for illegal characters causing the inconsistent result?
If so, can we do something at rendering code instead of taking care by each
converter?
| Reporter | ||
Comment 10•24 years ago
|
||
> I checked the two sites today and they both use
there is nothing changed with me today:
http://chinese.china.com/zh_cn/
It's displayed as space on Win2000-CN, but WinXP-CN still same as before.
http://chinese.china.com/zh_tw/
Both WinXP-CN and Win2k-CN still display as squal as before.
Comment 11•24 years ago
|
||
> I checked the two sites today and they both use
Okay, I just searched for and it is used for the pages. I did not check
was used instead of 0xA3A0 (because the pages were shown without the
problem on my machine EN W2K, does this bug only happens with Chiense Windows?).
I wonder why 0xA3A0 is used when is also used in the same page.
| Reporter | ||
Comment 12•24 years ago
|
||
> does this bug only happens with Chiense Windows?
For me, seems it depends on the what kind of OS, Win2k-CN display as , but
WinXP-CN it's displayed as 0xA3A0, they are both SimpChinese window syystem.
I used 0xA3A0 in Comopser html source page, both win2k-CN and WinXP-CN give me a
Korean looking character.
| Reporter | ||
Comment 13•24 years ago
|
||
I just found that some other web pages has same problem:
http://www.sohu.com/, http://www.163.com/, http://www.whaic.com/
| Assignee | ||
Comment 14•24 years ago
|
||
According to GB18030, 0xA3A0 is part of "Double Bytes Private Use Area" and is
mapped to U+E5E5 (see page 88 of GB18030 specification). In Unicode 3.0, U+E5E5
are part of Private Use Area" (U+E000 - U+F8FF)
I cannot fix this code in the client side unless I break GB18030 support.
I think this IS an evangelism issue.
Comment 15•24 years ago
|
||
Regarding comment #14, do you know how IE (and Comm 4.x) does not
display antyhing for them? Is it ignoring them? What makes it possible
for them to deal with this on the client side? I need additional info to
even begin an evangelim effort.
| Assignee | ||
Comment 16•24 years ago
|
||
I think we should find out why there are 0xa3a0 exist.
I will go ahead to map 0xa3a0 to U+3000 instead of U+E5E5, But I really don't
think any reviewer or superreviewer will let me thrugh
Comment 17•24 years ago
|
||
> I think we should find out why there are 0xa3a0 exist.
That is precisely what I want to know. It is easy to write to
this web site but there are others like this as reported by
ylong. If we don't understand why these byte combinations creep
into these web pages, I don't have an effective strategy.
Let me ask around.
| Assignee | ||
Comment 18•24 years ago
|
||
Impact Platform: ALL
Impact language users: Simplified Chinese users
Probability of hitting the problem: HIGH, see several major web sites have this
problem
Severity if hit the problem in the worst case: garbage display between words
when we should show space.
Way of recover after hit the problem: none.
Risk of the fix: LOW
Potential benefit of fix this problem: None
| Assignee | ||
Comment 19•24 years ago
|
||
this is a hacky patch to trade off one character in the unicode private use
area to space. It is hacky because it will break the GB18030 standard. The good
part of that is the characters is listed in the user defined area and have very
very very little chance that user will realy use it for some other thing.
| Assignee | ||
Comment 20•24 years ago
|
||
cc sun's Chinese expert on this
bugzilla do not take Ervin.Yan@sun.com in cc list.
| Assignee | ||
Comment 21•24 years ago
|
||
>Impact language users: Simplified Chinese users
sorry , forgot the fill this in
Impact language users: Simplified Chinese users 55.5M 9.8% of total internet users
(see http://www.glreach.com/globstats/index.php3 )
| Assignee | ||
Comment 22•24 years ago
|
||
nhotta- can you r= this one ?
Comment 23•24 years ago
|
||
Are those sites use 0xA3A0 for Ideographic space or non break space?
So which one we want to map, \u3000 or \u00A0?
In any case, you need to put a comment.
Updated•24 years ago
|
Attachment #77758 -
Flags: review+
Comment 24•24 years ago
|
||
Comment on attachment 77758 [details] [diff] [review]
patch1.0
r=nhotta
please add a comment
| Assignee | ||
Updated•24 years ago
|
Whiteboard: adt2 → [adt2]
| Assignee | ||
Comment 25•24 years ago
|
||
Attachment #77758 -
Attachment is obsolete: true
| Assignee | ||
Comment 26•24 years ago
|
||
Attachment #78256 -
Attachment is obsolete: true
Comment 27•24 years ago
|
||
Comment on attachment 78257 [details] [diff] [review]
real patch v2
r=nhotta
Attachment #78257 -
Flags: review+
| Assignee | ||
Comment 28•24 years ago
|
||
hum... I have some doubt about this patch now. we need to make sure we won't
convert U+3000 back to e5e5 when we send mail out or post form or save html
pages if we take this patch.
| Assignee | ||
Comment 29•24 years ago
|
||
>we need to make sure we won't convert U+3000 back to e5e5 when
> we send mail out or post form or save html
I test this carefully, it won't.
ask for sr=
Comment 30•24 years ago
|
||
Attachment #78257 -
Flags: superreview+
| Assignee | ||
Updated•24 years ago
|
Comment 31•24 years ago
|
||
Please check this into the trunk. After it's been tested, please update the bug.
Comment 32•24 years ago
|
||
I still see the problem with 20020416 trunk on SC WinXP, probably not check in
20020416 trunk?
| Reporter | ||
Comment 33•24 years ago
|
||
Frank's checked-in(7:56AM) was after 04-16-06 win32 build(7:26AM).
I'm waiting for a newer trunk build to verify it.
| Reporter | ||
Comment 34•24 years ago
|
||
I checked it on 04-17 trunk build on Mac10.1.3 and linux RH7.2 (no win32 trunk
build availible right now), the space in SimpChinese pages are displayed fine.
However, the bulleted dots still show as squal in
http://chinese.china.com/zh_tw/ which is a different character code point. I'll
file a seperate bug for that later.
Comment 35•24 years ago
|
||
adding adt1.0.0+. Please check in to the branch as soon as possible and add the
fixed1.0.0 keyword.
Comment 36•24 years ago
|
||
Comment on attachment 78257 [details] [diff] [review]
real patch v2
a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78257 -
Flags: approval+
| Assignee | ||
Comment 37•24 years ago
|
||
checked into m1.0 branch. mark it as fixed1.0.0
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Keywords: adt1.0.0+ → fixed1.0.0
Resolution: --- → FIXED
| Reporter | ||
Comment 38•24 years ago
|
||
Verified fixed on 04-22 branch build on all platforms.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•