Closed Bug 131837 Opened 24 years ago Closed 24 years ago

Space is not displayed correct in china.com

Categories

(Core :: Internationalization, defect, P3)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: amyy, Assigned: ftang)

References

()

Details

(Keywords: intl, topembed+, Whiteboard: [adt2] [need a=])

Attachments

(5 files, 2 obsolete files)

Spaces in page http://chinese.china.com/zh_cn/ are not displayed correctly. With page http://chinese.china.com/zh_tw/ has similar problem, the bulleted dots are not displayed correctly also. IE doesn't has the problem.
Component: Asian → Internationalization
Product: Tech Evangelism → Browser
Version: unspecified → other
I don't think this is an evangelism issue. This looks like layout or i18n issue. To take one relevant part from http://chinese.china.com/zh_cn/ as shown on ylong's attachment (view this under GB2312), <td bgcolor="#FFF3C3" width="175">0xA3 0xA0<a href="http://shop4u.china.com/" class="nounderline">É̳Ç</a> What looks like a blank to the eyes actually contains 2 bytes that are not displayed, i.e. 0xA3 and 0xA0. The latter is an nbsp. I don't know why 0xA3 is there but it seems to be a legitimate lead byte in GB2312. Should we display anything here? On my Windows 2000 (edfault locale set to Japanese), it displays a middle dot rather than what looks like a Korean character in ylong's attachment above. I will take up this issue as an evangelism issue if we determine that there is no way to deal with this in our code first.
Frank? Shanjian?
Assignee: momoi → ftang
This page is encoded in GB2312. In some places, it contains a byte sequence 0xA3 0xA0 (NBSP). Depending on platform and locale (I believe) this is displayed differently.
Keywords: intl
QA Contact: ruixu → ylong
With page: http://chinese.china.com/zh_tw/ is showing a squal rather than a Korean looking character in http://chinese.china.com/zh_cn/.
looks pretty bad.
Status: NEW → ASSIGNED
Keywords: nsbeta1
Priority: -- → P3
Target Milestone: --- → mozilla1.0
Momoi san, what is the character 0xA3A0 is that supposed to be shown as space? Is that frequently used as a space in many sites or this site specific?
Whiteboard: Need Info
> Momoi san, what is the character 0xA3A0 is that supposed to be shown as space? > Is that frequently used as a space in many sites or this site specific? My guess is that they wanted to generated just NBSP but somehow their code generator appended a legitimate GB2312 lead byte. I am not as concerned about this specific example as I am with a more general question. What is an allowable range of range of errors in this encoding? Is there a reasonable approach?
I checked the two sites today and they both use &nbsp; http://chinese.china.com/zh_cn/ http://chinese.china.com/zh_tw/ Is using \uFFFD for illegal characters causing the inconsistent result? If so, can we do something at rendering code instead of taking care by each converter?
> I checked the two sites today and they both use &nbsp; there is nothing changed with me today: http://chinese.china.com/zh_cn/ It's displayed as space on Win2000-CN, but WinXP-CN still same as before. http://chinese.china.com/zh_tw/ Both WinXP-CN and Win2k-CN still display as squal as before.
> I checked the two sites today and they both use &nbsp; Okay, I just searched for &nbsp; and it is used for the pages. I did not check &nbsp; was used instead of 0xA3A0 (because the pages were shown without the problem on my machine EN W2K, does this bug only happens with Chiense Windows?). I wonder why 0xA3A0 is used when &nbsp; is also used in the same page.
> does this bug only happens with Chiense Windows? For me, seems it depends on the what kind of OS, Win2k-CN display as &nbsp;, but WinXP-CN it's displayed as 0xA3A0, they are both SimpChinese window syystem. I used 0xA3A0 in Comopser html source page, both win2k-CN and WinXP-CN give me a Korean looking character.
I just found that some other web pages has same problem: http://www.sohu.com/, http://www.163.com/, http://www.whaic.com/
According to GB18030, 0xA3A0 is part of "Double Bytes Private Use Area" and is mapped to U+E5E5 (see page 88 of GB18030 specification). In Unicode 3.0, U+E5E5 are part of Private Use Area" (U+E000 - U+F8FF) I cannot fix this code in the client side unless I break GB18030 support. I think this IS an evangelism issue.
Regarding comment #14, do you know how IE (and Comm 4.x) does not display antyhing for them? Is it ignoring them? What makes it possible for them to deal with this on the client side? I need additional info to even begin an evangelim effort.
I think we should find out why there are 0xa3a0 exist. I will go ahead to map 0xa3a0 to U+3000 instead of U+E5E5, But I really don't think any reviewer or superreviewer will let me thrugh
Keywords: nsbeta1nsbeta1+
Whiteboard: Need Info → adt2
> I think we should find out why there are 0xa3a0 exist. That is precisely what I want to know. It is easy to write to this web site but there are others like this as reported by ylong. If we don't understand why these byte combinations creep into these web pages, I don't have an effective strategy. Let me ask around.
Impact Platform: ALL Impact language users: Simplified Chinese users Probability of hitting the problem: HIGH, see several major web sites have this problem Severity if hit the problem in the worst case: garbage display between words when we should show space. Way of recover after hit the problem: none. Risk of the fix: LOW Potential benefit of fix this problem: None
Attached patch patch1.0 (obsolete) — Splinter Review
this is a hacky patch to trade off one character in the unicode private use area to space. It is hacky because it will break the GB18030 standard. The good part of that is the characters is listed in the user defined area and have very very very little chance that user will realy use it for some other thing.
cc sun's Chinese expert on this bugzilla do not take Ervin.Yan@sun.com in cc list.
>Impact language users: Simplified Chinese users sorry , forgot the fill this in Impact language users: Simplified Chinese users 55.5M 9.8% of total internet users (see http://www.glreach.com/globstats/index.php3 )
nhotta- can you r= this one ?
Are those sites use 0xA3A0 for Ideographic space or non break space? So which one we want to map, \u3000 or \u00A0? In any case, you need to put a comment.
Attachment #77758 - Flags: review+
Comment on attachment 77758 [details] [diff] [review] patch1.0 r=nhotta please add a comment
Whiteboard: adt2 → [adt2]
Attached patch patch v2 with comments (obsolete) — Splinter Review
Attachment #77758 - Attachment is obsolete: true
Attached patch real patch v2Splinter Review
Attachment #78256 - Attachment is obsolete: true
Comment on attachment 78257 [details] [diff] [review] real patch v2 r=nhotta
Attachment #78257 - Flags: review+
hum... I have some doubt about this patch now. we need to make sure we won't convert U+3000 back to e5e5 when we send mail out or post form or save html pages if we take this patch.
>we need to make sure we won't convert U+3000 back to e5e5 when > we send mail out or post form or save html I test this carefully, it won't. ask for sr=
Attachment #78257 - Flags: superreview+
Keywords: adt1.0.0, approval
Whiteboard: [adt2] → [adt2] [need a=]
Please check this into the trunk. After it's been tested, please update the bug.
I still see the problem with 20020416 trunk on SC WinXP, probably not check in 20020416 trunk?
Frank's checked-in(7:56AM) was after 04-16-06 win32 build(7:26AM). I'm waiting for a newer trunk build to verify it.
I checked it on 04-17 trunk build on Mac10.1.3 and linux RH7.2 (no win32 trunk build availible right now), the space in SimpChinese pages are displayed fine. However, the bulleted dots still show as squal in http://chinese.china.com/zh_tw/ which is a different character code point. I'll file a seperate bug for that later.
adding adt1.0.0+. Please check in to the branch as soon as possible and add the fixed1.0.0 keyword.
Keywords: adt1.0.0adt1.0.0+
Comment on attachment 78257 [details] [diff] [review] real patch v2 a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78257 - Flags: approval+
Keywords: topembed+
checked into m1.0 branch. mark it as fixed1.0.0
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Keywords: adt1.0.0+fixed1.0.0
Resolution: --- → FIXED
Verified fixed on 04-22 branch build on all platforms.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: