Closed Bug 116882 Opened 18 years ago Closed 18 years ago

A middle dot character is not displayed on this page

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: momoi, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(2 files, 1 obsolete file)

** Observed with 2001-12-22 Win32 trunk build **

On the above page, there is one character which is not displayed
properly with Mozilla under Shift_JIS encoding. 

It looks like the character has the codepoint 0x81.

(There is a similar bug filed -- Bug 116880. But in that bug the codepoint for
  the problem character is 0x86 0xA6.) 

Neither NN4 nor IE 5.5. has a propblem in displaying this character.
Keywords: intl, nsbeta1
over to Mr.Li.  
Assignee: yokoyama → shanjian
The character in question is 0x81, which is followed by 0x20. 0x8120 is not a 
legal sjis byte sequence. It is very strange to see that both IE and Netscape4.x 
replace such sequence to 0x8145, which is middle-dot. But anyway, I don't think 
this is a mozilla problem. I believe mozilla's behavior is better than both IE 
and Netscape4.x. Why replace illegal byte sequence to 0x8145? (I tried another 
byte sequence 0x8136, which was also replaced by 0x8145.)
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
sorry, I cannot tell which character you refere to.
> I believe mozilla's behavior is better than both IE 
> and Netscape4.x. Why replace illegal byte sequence to 0x8145?

Windows applications when they use Windows OS converters
map this codepoint to the middle dot character. I am sorry
but this is expected on Windows. The character is apparently
fairly widely used -- right or wrong. If you use Notepad, 
Word, and other Windows applications, you see the same
character, not "not found" character as we do on Mozilla.

How are we going to convince Windows users that what they
see in every other application is wrong? 

Let me re-open this for re-consideration and let me provide
additional facts.

ftang: If you want to see which character we are referring to,
just open the URL with Mozilla and compare it with NN4 or IE5/6.
You will see one character with a question mark with Mozilla
but expressed with a middle-dot character in other browsers
and applications.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Kat, I am not convinced yet. Is this kind of practice common? Did user 
do this intentionally? I mean when they put 0x81, what they want is mid dot?
If MS just take 0x81 and map it to mid dot, that will be easy to understand 
it as a "feature". But to map a range of code points to one character does not 
make much sense.  Can you tell me how such page is created? 
momoi, please attach a screen shot here (and circle with mark) . I cannot see
that ? mark. 
In comment 5, shanjian said:

> Is this kind of practice common? Did user do this intentionally? 
...
>Can you tell me how such page is created? 

Yes, this is the question we should be asking before we
decide on this bug. Let me dig around a bit more before
making a decision one way or the other. I suspect this is
an intentional character.

let's try to fix SJIS to Unicode conversion to map 0x8120 to U+30fb so we have
backward compatability ?
reassign back to ftang and mark it as M1.0
Assignee: shanjian → ftang
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
As I mentioned in my previous comment, at least 0x8120 and 0x8136 are mapped to 
u30fb. I believe all characters in 0x8120 to 0x813f are mapped to u30fb, probably
even larger. Adding such nonsense conversion just for this page does not make any 
sense, unless momoi's investigation show that this is a common practice and many 
webpages are doing it. In our charset detector, 0x8120 to 0x813f are illegal byte
sequence. That may confuse some users when they switch detector on and off.
nsbeta1+ per i18n triage
Keywords: nsbeta1nsbeta1+
let's fix this.
Status: NEW → ASSIGNED
p3
Priority: -- → P3
move to m0.9.9
Target Milestone: mozilla1.0 → mozilla0.9.9
let's merge this bug into 116882. basically , we want compatible with IE6 on
error handling to reduce risk of site compatability.

What I found by looking at IE6 is the following
a. IE6 treat 0xfd - 0xff as single byte. and convert them into f8f1-f8f3. We
currently treat it as 2 bytes characters and convert to fffd
b. if a lead byte is legal shift jis range but the 2nd byte are illegal range,
IE 6 treat it as a two byte characters and convert to 30fb. we currently treat
it as single byte character and convert it to 0xfffd
c. for valid shift jis , if a character have no definitation . IE6 map it ot
30fb but we map to fffd

we need to fix all the three above so we have IE6 parity in error handling.

also, I wrote a cgi which generate legal shift according to the Nadin book also
invalide shift jis. I post in http://warp/u/ftang/utf8test/sjis.cgi
I will try to push it out to
http://people.netscape.com/ftang/testscript/sjis/sjis.cgi

*** Bug 116880 has been marked as a duplicate of this bug. ***
Attached patch patch v1 (obsolete) — Splinter Review
add nhotta and shanjian to the list.
Comment on attachment 70437 [details] [diff] [review]
patch v1

r=shanjian,
(I suggest to remove the break in  original line 147.)
Attachment #70437 - Flags: review+
+                   // IE convert fc-ff as single byte and convert to
+                   // U+f8f1 to U+f8f3
+                   if((0xfd == *src) || (0xfe == *src) || (0xff == *src))
+                   {
+                     *dest++ = (PRUnichar) 0xf8f1 + 
+                                   (*src - (unsigned char)(0xfd));

Does this mean, mapping like this? 
0xfd -> 0xf8f1
0xfe -> 0xf8f2
0xff -> 0xf8f3
But the comment says fc-ff (includes fc).

So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
similiar behavior for EUC-JP?
>Does this mean, mapping like this? 
>0xfd -> 0xf8f1
>0xfe -> 0xf8f2
>0xff -> 0xf8f3
>But the comment says fc-ff (includes fc).
good catch, it is fd-ff not fc. sorry. I will change the comment 

>So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
>similiar behavior for EUC-JP?
Not sure, need develope more test. Let's fix it one by one. 
open bug 127275  for EUC-JP issue. 
Attachment #70437 - Attachment is obsolete: true
Attached patch patch v2 Splinter Review
nhotta or shanjian, please r= 
Comment on attachment 70941 [details] [diff] [review]
patch v2 

r=nhotta
Attachment #70941 - Flags: review+
Blocks: 104148
Comment on attachment 70941 [details] [diff] [review]
patch v2 

sr=kin@netscape.com
Attachment #70941 - Flags: superreview+
Blocks: 104060
No longer blocks: 104148
fixed and check in.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
No longer blocks: 104060
Status: RESOLVED → VERIFIED
Verified as fixed in 0329 Win32 trunk and 0402 0.9.9ec Win32 build.
You need to log in before you can comment on or make changes to this bug.