Closed
Bug 116882
Opened 23 years ago
Closed 23 years ago
A middle dot character is not displayed on this page
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla0.9.9
People
(Reporter: momoi, Assigned: ftang)
References
()
Details
(Keywords: intl)
Attachments
(2 files, 1 obsolete file)
111.67 KB,
image/jpeg
|
Details | |
4.07 KB,
patch
|
nhottanscp
:
review+
kinmoz
:
superreview+
roc
:
approval+
|
Details | Diff | Splinter Review |
** Observed with 2001-12-22 Win32 trunk build **
On the above page, there is one character which is not displayed
properly with Mozilla under Shift_JIS encoding.
It looks like the character has the codepoint 0x81.
(There is a similar bug filed -- Bug 116880. But in that bug the codepoint for
the problem character is 0x86 0xA6.)
Neither NN4 nor IE 5.5. has a propblem in displaying this character.
Updated•23 years ago
|
Comment 2•23 years ago
|
||
The character in question is 0x81, which is followed by 0x20. 0x8120 is not a
legal sjis byte sequence. It is very strange to see that both IE and Netscape4.x
replace such sequence to 0x8145, which is middle-dot. But anyway, I don't think
this is a mozilla problem. I believe mozilla's behavior is better than both IE
and Netscape4.x. Why replace illegal byte sequence to 0x8145? (I tried another
byte sequence 0x8136, which was also replaced by 0x8145.)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Comment 3•23 years ago
|
||
sorry, I cannot tell which character you refere to.
Reporter | ||
Comment 4•23 years ago
|
||
> I believe mozilla's behavior is better than both IE
> and Netscape4.x. Why replace illegal byte sequence to 0x8145?
Windows applications when they use Windows OS converters
map this codepoint to the middle dot character. I am sorry
but this is expected on Windows. The character is apparently
fairly widely used -- right or wrong. If you use Notepad,
Word, and other Windows applications, you see the same
character, not "not found" character as we do on Mozilla.
How are we going to convince Windows users that what they
see in every other application is wrong?
Let me re-open this for re-consideration and let me provide
additional facts.
ftang: If you want to see which character we are referring to,
just open the URL with Mozilla and compare it with NN4 or IE5/6.
You will see one character with a question mark with Mozilla
but expressed with a middle-dot character in other browsers
and applications.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 5•23 years ago
|
||
Kat, I am not convinced yet. Is this kind of practice common? Did user
do this intentionally? I mean when they put 0x81, what they want is mid dot?
If MS just take 0x81 and map it to mid dot, that will be easy to understand
it as a "feature". But to map a range of code points to one character does not
make much sense. Can you tell me how such page is created?
Assignee | ||
Comment 6•23 years ago
|
||
momoi, please attach a screen shot here (and circle with mark) . I cannot see
that ? mark.
Reporter | ||
Comment 7•23 years ago
|
||
In comment 5, shanjian said:
> Is this kind of practice common? Did user do this intentionally?
...
>Can you tell me how such page is created?
Yes, this is the question we should be asking before we
decide on this bug. Let me dig around a bit more before
making a decision one way or the other. I suspect this is
an intentional character.
Reporter | ||
Comment 8•23 years ago
|
||
Assignee | ||
Comment 9•23 years ago
|
||
let's try to fix SJIS to Unicode conversion to map 0x8120 to U+30fb so we have
backward compatability ?
reassign back to ftang and mark it as M1.0
Assignee: shanjian → ftang
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
Comment 10•23 years ago
|
||
As I mentioned in my previous comment, at least 0x8120 and 0x8136 are mapped to
u30fb. I believe all characters in 0x8120 to 0x813f are mapped to u30fb, probably
even larger. Adding such nonsense conversion just for this page does not make any
sense, unless momoi's investigation show that this is a common practice and many
webpages are doing it. In our charset detector, 0x8120 to 0x813f are illegal byte
sequence. That may confuse some users when they switch detector on and off.
Assignee | ||
Comment 15•23 years ago
|
||
let's merge this bug into 116882. basically , we want compatible with IE6 on
error handling to reduce risk of site compatability.
What I found by looking at IE6 is the following
a. IE6 treat 0xfd - 0xff as single byte. and convert them into f8f1-f8f3. We
currently treat it as 2 bytes characters and convert to fffd
b. if a lead byte is legal shift jis range but the 2nd byte are illegal range,
IE 6 treat it as a two byte characters and convert to 30fb. we currently treat
it as single byte character and convert it to 0xfffd
c. for valid shift jis , if a character have no definitation . IE6 map it ot
30fb but we map to fffd
we need to fix all the three above so we have IE6 parity in error handling.
also, I wrote a cgi which generate legal shift according to the Nadin book also
invalide shift jis. I post in http://warp/u/ftang/utf8test/sjis.cgi
I will try to push it out to
http://people.netscape.com/ftang/testscript/sjis/sjis.cgi
Assignee | ||
Comment 16•23 years ago
|
||
*** Bug 116880 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 17•23 years ago
|
||
Assignee | ||
Comment 18•23 years ago
|
||
add nhotta and shanjian to the list.
Comment 19•23 years ago
|
||
Comment on attachment 70437 [details] [diff] [review]
patch v1
r=shanjian,
(I suggest to remove the break in original line 147.)
Attachment #70437 -
Flags: review+
Comment 20•23 years ago
|
||
+ // IE convert fc-ff as single byte and convert to
+ // U+f8f1 to U+f8f3
+ if((0xfd == *src) || (0xfe == *src) || (0xff == *src))
+ {
+ *dest++ = (PRUnichar) 0xf8f1 +
+ (*src - (unsigned char)(0xfd));
Does this mean, mapping like this?
0xfd -> 0xf8f1
0xfe -> 0xf8f2
0xff -> 0xf8f3
But the comment says fc-ff (includes fc).
So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
similiar behavior for EUC-JP?
Assignee | ||
Comment 21•23 years ago
|
||
>Does this mean, mapping like this?
>0xfd -> 0xf8f1
>0xfe -> 0xf8f2
>0xff -> 0xf8f3
>But the comment says fc-ff (includes fc).
good catch, it is fd-ff not fc. sorry. I will change the comment
>So is the IE6 behavior to map 0x30fb (the case c) specific to Shift_JIS or the
>similiar behavior for EUC-JP?
Not sure, need develope more test. Let's fix it one by one.
open bug 127275 for EUC-JP issue.
Assignee | ||
Updated•23 years ago
|
Attachment #70437 -
Attachment is obsolete: true
Assignee | ||
Comment 22•23 years ago
|
||
Assignee | ||
Comment 23•23 years ago
|
||
nhotta or shanjian, please r=
Comment 24•23 years ago
|
||
Comment on attachment 70941 [details] [diff] [review]
patch v2
r=nhotta
Attachment #70941 -
Flags: review+
Comment 25•23 years ago
|
||
Attachment #70941 -
Flags: superreview+
Assignee | ||
Updated•23 years ago
|
Comment on attachment 70941 [details] [diff] [review]
patch v2
a=roc+moz for 0.9.9
Attachment #70941 -
Flags: approval+
Keywords: mozilla0.9.9+
Assignee | ||
Comment 27•23 years ago
|
||
fixed and check in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Updated•23 years ago
|
Status: RESOLVED → VERIFIED
Comment 28•23 years ago
|
||
Verified as fixed in 0329 Win32 trunk and 0402 0.9.9ec Win32 build.
Comment 29•16 years ago
|
||
Flags: in-testsuite+
You need to log in
before you can comment on or make changes to this bug.
Description
•