Closed Bug 116882 Opened 23 years ago Closed 23 years ago

A middle dot character is not displayed on this page

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: momoi, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(2 files, 1 obsolete file)

** Observed with 2001-12-22 Win32 trunk build ** On the above page, there is one character which is not displayed properly with Mozilla under Shift_JIS encoding. It looks like the character has the codepoint 0x81. (There is a similar bug filed -- Bug 116880. But in that bug the codepoint for the problem character is 0x86 0xA6.) Neither NN4 nor IE 5.5. has a propblem in displaying this character.
Keywords: intl, nsbeta1
over to Mr.Li.
Assignee: yokoyama → shanjian
The character in question is 0x81, which is followed by 0x20. 0x8120 is not a legal sjis byte sequence. It is very strange to see that both IE and Netscape4.x replace such sequence to 0x8145, which is middle-dot. But anyway, I don't think this is a mozilla problem. I believe mozilla's behavior is better than both IE and Netscape4.x. Why replace illegal byte sequence to 0x8145? (I tried another byte sequence 0x8136, which was also replaced by 0x8145.)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
sorry, I cannot tell which character you refere to.
> I believe mozilla's behavior is better than both IE > and Netscape4.x. Why replace illegal byte sequence to 0x8145? Windows applications when they use Windows OS converters map this codepoint to the middle dot character. I am sorry but this is expected on Windows. The character is apparently fairly widely used -- right or wrong. If you use Notepad, Word, and other Windows applications, you see the same character, not "not found" character as we do on Mozilla. How are we going to convince Windows users that what they see in every other application is wrong? Let me re-open this for re-consideration and let me provide additional facts. ftang: If you want to see which character we are referring to, just open the URL with Mozilla and compare it with NN4 or IE5/6. You will see one character with a question mark with Mozilla but expressed with a middle-dot character in other browsers and applications.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Kat, I am not convinced yet. Is this kind of practice common? Did user do this intentionally? I mean when they put 0x81, what they want is mid dot? If MS just take 0x81 and map it to mid dot, that will be easy to understand it as a "feature". But to map a range of code points to one character does not make much sense. Can you tell me how such page is created?
momoi, please attach a screen shot here (and circle with mark) . I cannot see that ? mark.
In comment 5, shanjian said: > Is this kind of practice common? Did user do this intentionally? ... >Can you tell me how such page is created? Yes, this is the question we should be asking before we decide on this bug. Let me dig around a bit more before making a decision one way or the other. I suspect this is an intentional character.
let's try to fix SJIS to Unicode conversion to map 0x8120 to U+30fb so we have backward compatability ? reassign back to ftang and mark it as M1.0
Assignee: shanjian → ftang
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
As I mentioned in my previous comment, at least 0x8120 and 0x8136 are mapped to u30fb. I believe all characters in 0x8120 to 0x813f are mapped to u30fb, probably even larger. Adding such nonsense conversion just for this page does not make any sense, unless momoi's investigation show that this is a common practice and many webpages are doing it. In our charset detector, 0x8120 to 0x813f are illegal byte sequence. That may confuse some users when they switch detector on and off.
nsbeta1+ per i18n triage
Keywords: nsbeta1nsbeta1+
let's fix this.
Status: NEW → ASSIGNED
p3
Priority: -- → P3
move to m0.9.9
Target Milestone: mozilla1.0 → mozilla0.9.9
let's merge this bug into 116882. basically , we want compatible with IE6 on error handling to reduce risk of site compatability. What I found by looking at IE6 is the following a. IE6 treat 0xfd - 0xff as single byte. and convert them into f8f1-f8f3. We currently treat it as 2 bytes characters and convert to fffd b. if a lead byte is legal shift jis range but the 2nd byte are illegal range, IE 6 treat it as a two byte characters and convert to 30fb. we currently treat it as single byte character and convert it to 0xfffd c. for valid shift jis , if a character have no definitation . IE6 map it ot 30fb but we map to fffd we need to fix all the three above so we have IE6 parity in error handling. also, I wrote a cgi which generate legal shift according to the Nadin book also invalide shift jis. I post in http://warp/u/ftang/utf8test/sjis.cgi I will try to push it out to http://people.netscape.com/ftang/testscript/sjis/sjis.cgi
*** Bug 116880 has been marked as a duplicate of this bug. ***
Attached patch patch v1 (obsolete) — Splinter Review
add nhotta and shanjian to the list.
Comment on attachment 70437 [details] [diff] [review] patch v1 r=shanjian, (I suggest to remove the break in original line 147.)
Attachment #70437 - Flags: review+
+ // IE convert fc-ff as single byte and convert to + // U+f8f1 to U+f8f3 + if((0xfd == *src) || (0xfe == *src) || (0xff == *src)) + { + *dest++ = (PRUnichar) 0xf8f1 + + (*src - (unsigned char)(0xfd)); Does this mean, mapping like this? 0xfd -> 0xf8f1 0xfe -> 0xf8f2 0xff -> 0xf8f3 But the comment says fc-ff (includes fc). So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the similiar behavior for EUC-JP?
>Does this mean, mapping like this? >0xfd -> 0xf8f1 >0xfe -> 0xf8f2 >0xff -> 0xf8f3 >But the comment says fc-ff (includes fc). good catch, it is fd-ff not fc. sorry. I will change the comment >So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the >similiar behavior for EUC-JP? Not sure, need develope more test. Let's fix it one by one. open bug 127275 for EUC-JP issue.
Attachment #70437 - Attachment is obsolete: true
Attached patch patch v2 Splinter Review
nhotta or shanjian, please r=
Comment on attachment 70941 [details] [diff] [review] patch v2 r=nhotta
Attachment #70941 - Flags: review+
Blocks: 104148
Attachment #70941 - Flags: superreview+
Blocks: 104060
No longer blocks: 104148
fixed and check in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
No longer blocks: 104060
Status: RESOLVED → VERIFIED
Verified as fixed in 0329 Win32 trunk and 0402 0.9.9ec Win32 build.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: