Closed
Bug 245770
Opened 21 years ago
Closed 21 years ago
backslash rendered as yen in japanese locale
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
People
(Reporter: glandium, Assigned: jshin1987)
References
Details
(Keywords: fixed-aviary1.0, fixed1.7.5)
Attachments
(3 files, 3 obsolete files)
348 bytes,
text/html
|
Details | |
8.10 KB,
patch
|
smontagu
:
review+
dbaron
:
superreview+
mkaply
:
approval1.7.5+
|
Details | Diff | Splinter Review |
5.29 KB,
patch
|
smontagu
:
review+
dbaron
:
superreview+
asa
:
approval-aviary+
asa
:
approval1.7.5+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.6) Gecko/20040602 Firefox/0.8 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.6) Gecko/20040602 Firefox/0.8 In the page attached, both backslash and yen characters are rendered as yen character if the locale is japanese, while the page is definitely coded in UTF-8, providing 2 different codes for these characters. The rendering is correct in english locale. Reproducible: Always Steps to Reproduce:
Reporter | ||
Comment 1•21 years ago
|
||
Updated•21 years ago
|
Assignee: firefox → smontagu
Component: General → Internationalization
Product: Firefox → Browser
QA Contact: firefox.general → amyy
Version: unspecified → Trunk
Assignee | ||
Comment 2•21 years ago
|
||
well, a notorious Yen vs reverse solidus (back slash). It's well-known and was made this way on purpose somewhere in 'layout' code. The same was the case of Korean locale (with WON sign), but I persuaded ftang to get rid of that years ago. (see http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsTextTransformer.cpp#816 and bug 88050). I did like to remove that for Japanese, too, but ftang wanted to keep that. The problem is that we really don't know what 0x5c in Shift_JIS and EUC-JP represent. An alternative to what we're doing is to replace 'back slash' with 'Yen' only when the locale is Japanese and the doc. charset is one of legacy Japanese character encodings so that UTF-* would preserve the distinction between back slash and Yen. I have yet to check whether this is feasible. For a better tracking, assigning to myself.
Assignee: smontagu → jshin
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Comment 3•21 years ago
|
||
According to bug 4238 comment 29, the original code in nsTextTransformer was designed to apply only to legacy Japanese charsets, but I don't see from a quick look at the patches there how this was supposed to happen.
Assignee | ||
Comment 4•21 years ago
|
||
It seems like nsPresContext::UpdateCharSet() in attachment 15040 [details] [diff] [review] was written to apply JA-specific 'transformer' only to documents in Japanese legacy encodings. [1] I need to take a closer look as to why it's also applied to UTF-8 documents under Japanese locale. [1] http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsPresContext.cpp#719
Reporter | ||
Comment 5•21 years ago
|
||
Then, there's probably an issue with EUC-JP as well... because in EUC-JP, 0x5C is not *necessarily* the yen symbol... See http://sources.redhat.com/ml/libc-alpha/2000-10/msg00190.html for details. (for instance, converting the attached testcase back and forth to euc-jp through iconv gives two backslashes, converting back and forth to shift-jis gives two yen symbols)
Assignee | ||
Comment 6•21 years ago
|
||
(In reply to comment #5) > Then, there's probably an issue with EUC-JP as well... because in EUC-JP, 0x5C > is not *necessarily* the yen symbol... Sure, I'm very well aware of the problem. I'd rather remove the 'transformation' all together as I wrote in bug 88050.
Comment 7•21 years ago
|
||
(In reply to comment #4) >I need to take a closer look as to why it's also applied to UTF-8 documents > under Japanese locale. I think the problem is here: http://lxr.mozilla.org/seamonkey/source/intl/locale/src/nsLanguageAtomService.cpp#249 which comes from bug 39570.
Comment 8•21 years ago
|
||
We discussed this problem on Bugzilla-jp. http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=3595 As for this problem, opinions are divided also among Japanese people. 1. Mozilla should not replace 0x5C to U+5A always. 2. Mozilla should replace 0x5C to U+5A that documents only encoded by Shift_JIS and EUC-JP and IS0-2022-JP. 3. User should be able to choice that to replace or not. We have a question: Why Mozilla replace that. In WinIE and Opera, they have no behavior for 0x5C.
Comment 9•21 years ago
|
||
Sorry. U+5A -> U+A5.
Assignee | ||
Comment 10•21 years ago
|
||
(In reply to comment #8) Thanks for your input. > 1. Mozilla should not replace 0x5C to U+5A always. > 2. Mozilla should replace 0x5C to U+5A that documents only encoded by Shift_JIS > and EUC-JP and IS0-2022-JP. > 3. User should be able to choice that to replace or not. > > We have a question: Why Mozilla replace that. > In WinIE and Opera, they have no behavior for 0x5C. See bug 39570 and bug 88050. Anyway, I'm strongly in favor of option #1 because unless Mozilla suddenly acquires 'near-human' intelligence :-), it's all but impossible to tell which character the author of a document meant by 0x5c, 'back slash' or Yen, in which case I think just leaving it as it is better.
Comment 11•21 years ago
|
||
However, we are that this is a bug in UTF-*. If locale of OS is Japan, 0x5C is always replaced to U+A5. If an element has :lang(en) that document encoded by UTF-*,but Mozilla replace it in the element. This behavior is wrong. The replacement of 0x5C to U+A5 should exist in the element that has :lang(ja). And this behavior has an problem. In spec of HTML, the document is treated at ISO10646. In other words, \ and ¥(or ¥) are different character. In source code(e.x., perl), 0x5c is replaced to U+A5, this behavior is troublesome. In this case, user can't display backslash. But, Japanese fonts usually have the yen sign glyph at U+5C. I think that the best choice of the behavior is #3 of comment 8.
Assignee | ||
Comment 12•21 years ago
|
||
(In reply to comment #11) > The replacement of 0x5C to U+A5 should exist in the element that has :lang(ja). I don't think that's the case. There is NOT Japanese Unicode. Neither is there non-Japanese Unicode. There's only one Unicode and 'U+005C' is 'Reverse Solidus' period. Note that what you wrote above is different from option 3 and option 2. > But, Japanese fonts usually have the yen sign glyph at U+5C. Having 'YEN' sign glyph for U+005C is clearly a bug of those fonts. Microsoft should fix their bug in their fonts. Anyway, that's besides the point here. > I think that the best choice of the behavior is #3 of comment 8. If we do that, what should be the default?
Comment 13•21 years ago
|
||
> I don't think that's the case. There is NOT Japanese Unicode. > Neither is there non-Japanese Unicode. Yes. I was going to say. If Mozilla replace 0x5C to U+A5 on UTF-*, 0x5C should be replaced only in the element that has :lang(ja). Mozilla should not replace it in the other element. I think that this behavior is QUIRKS for environment of Japanese language. In other words, this behavior should not exist in the document that written in other language. Though that is displaied on the system that locale is Japan. > Microsoft should fix their bug in their fonts. It is not realistic. Because that is true on the Unicode applications. But on native code(CP932) applications, those applications cannot display yen sign. > If we do that, what should be the default? I think that the default value is that is NOT replaced. And momoi-san said the same opinion on http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=3595#c69 . Becase the most famous UA is WinIE, and WinIE doesn't have this behavior.
Assignee | ||
Comment 14•21 years ago
|
||
(In reply to comment #13) > > I don't think that's the case. There is NOT Japanese Unicode. > > Neither is there non-Japanese Unicode. > Yes. > I was going to say. > If Mozilla replace 0x5C to U+A5 on UTF-*, 0x5C should be replaced only in the > element that has :lang(ja). > Mozilla should not replace it in the other element. .... > Though that is displaied on the system that locale is Japan. It sounds to me that what you wrote above is equivalent to saying that there are two versions of Unicode, Japanese and non-Japanese. Note that the reporter of this bug wants to get back his backslash even when the locale is JA (if the document is in UTF-8) > > Microsoft should fix their bug in their fonts. > > It is not realistic. > Because that is true on the Unicode applications. > But on native code(CP932) applications, those applications cannot display yen sign. It's realistic and possible. See my posting to the Unicode mailing list at http://www.unicode.org/mail-arch/unicode-ml/y2002-m10/0340.html (use 'unicode-ml' and 'unicode' as the username and password) > > If we do that, what should be the default? > > I think that the default value is that is NOT replaced. > And momoi-san said the same opinion on > http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=3595#c69 . > Becase the most famous UA is WinIE, and WinIE doesn't have this behavior. Ok. That's easy enough. I'll do that over the weekend.
Status: NEW → ASSIGNED
Comment 15•21 years ago
|
||
momoi-san: If you have opinion, please comment here. I hope it. > It sounds to me that what you wrote above is equivalent to saying that there > are two versions of Unicode, Japanese and non-Japanese. In the rendered character, I said so. But in the encoding, I didn't say so. # sorry. I cannot speak English well. > See my posting to the Unicode mailing list # Sorry. I have not yet read it. In Japan, many people recognize the 0x5C as yen sign. If this is fixed by font, many Japanese people will feel sense of incongruity. e.x., The windows path separator is yen sign, not backslash for many Japanese people.
Assignee | ||
Comment 16•21 years ago
|
||
I added layout.enable_japanese_specific_transform pref. entry. It's false by default.
Assignee | ||
Comment 17•21 years ago
|
||
Comment on attachment 150722 [details] [diff] [review] patch asking for r/sr. I also got rid of 'eLanguageSpecificTransformType_Korean' because it's not used anywhere.
Attachment #150722 -
Flags: superreview?(dbaron)
Attachment #150722 -
Flags: review?(smontagu)
Comment 18•21 years ago
|
||
Comment on attachment 150722 [details] [diff] [review] patch If the pref is set to true, will backslash still be replaced by Yen on Japanese locale even in UTF-8 documents or when it is specified by \?
Comment 19•21 years ago
|
||
We also recognize the problem. We don't hope that 0x5C is replaced to U+A5 in UTF-* document. However, there may also be those who desire the replacement. (the people's environment can be the reason.) So, we need this behavior in Japanese document.
Assignee | ||
Comment 20•21 years ago
|
||
smontagu's comment prompted me to make the pref (layout.enable_japanese_specific_transform) only effective when the character encoding is one of Japanese legacy encodings (EUC-JP, Shift_JIS, ISO-2022-JP).
Attachment #150722 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #150722 -
Flags: superreview?(dbaron)
Attachment #150722 -
Flags: review?(smontagu)
Assignee | ||
Comment 21•21 years ago
|
||
I've just realized that this is likely to result in a regression. A better patch is coming up soon.
Comment 22•21 years ago
|
||
It bothers me that this is still not identical to any of the options described in comment 8. Doesn't option 3 mean that when the pref is set, replacement will take place in all documents whatever the encoding?
Assignee | ||
Comment 23•21 years ago
|
||
Instead of changing the behavior of 'UpdateCharset', I change the condition for activating the Japanese specific transform. It's now activated only with all of the following three conditions satisfied: 1. mLangGroup is ja 2. the pref. entry is true 3. charset is not one of Unicode encodings Actually, the check for the 3rd condtion is not robust enough because the raw charset name of a Unicode encoding does not always begin with 'UTF-'. I can invoke the charset alias resolution routine, but it seems expensive for little gain. Simon, what do you think?
Attachment #150726 -
Attachment is obsolete: true
Comment 24•21 years ago
|
||
Simon: When we discussed, our conclusion is that the replacement should not be occured in non-Japanese encoding document. The reason is that if we will see the document that written in other language, we don't want the replacement.
Comment 25•21 years ago
|
||
Instead of that, why don't you remove the code in LookupCharset that start if (langGroup == mUnicode) { langGroup = GetLocaleLanguageGroup(&res); and do that in nsPresContext::UpdateCharSet() after setting the transform type?
Why do we want a pref? What's the right thing to do?
Reporter | ||
Comment 27•21 years ago
|
||
Note that the condition 1 in comment 23 might not work always, because of bug 234485. (my guess)
Assignee | ||
Comment 28•21 years ago
|
||
(In reply to comment #26) > Why do we want a pref? What's the right thing to do? There's no clear-cut answer because what '0x5c' means in Shift_JIS/EUC-JP/ISO-2022-JP is always ambiguous. I'd rather remove the replacement all together, but some Japanese users want to keep that behavior for documents in one of legacy Japanese encodings (but not in documents in Unicode. There's no ambiguity at all in the identity of U+005C.) and the consensus among Japanese mozilla users is that we need a pref which is off by default. Nakano-san answered Simon's questions (actually, I did - unless my memory is failing me-, too, but it seems like my answer got thrown away). re comment #25: I've just made that change. I'll upload it later today or tomorrow after testing it. re comment #27: Even with that fixed, it wouldn't work. The behavior only depends on whether the current document encoding is Japanese or not and the value of the pref. That is, xml:lang and lang don't play any role here. That shouldn't matter much because I don't think there are many documents in the wild with 'lang=xx or xml:lang=xx' (where xx is not ja/ja_JP) that are encoded in Shift_JIS/EUC-JP/ISO-2022-JP. For Unicode encoded-documents, we want to leave U+005C alone no matter what so that we don't have to worry about them.
Assignee | ||
Comment 29•21 years ago
|
||
changed per Simon's comment
Attachment #150730 -
Attachment is obsolete: true
Comment 30•21 years ago
|
||
Comment on attachment 150786 [details] [diff] [review] a new patch r=smontagu. We will want to release note this change in behaviour.
Attachment #150786 -
Flags: review+
Assignee | ||
Comment 31•21 years ago
|
||
Comment on attachment 150786 [details] [diff] [review] a new patch thansk for r. asking for sr.
Attachment #150786 -
Flags: superreview?(dbaron)
Comment on attachment 150786 [details] [diff] [review] a new patch I'm not convinced about the need for the pref, but sr=dbaron.
Attachment #150786 -
Flags: superreview?(dbaron) → superreview+
Comment 33•21 years ago
|
||
Please check in.
Assignee | ||
Comment 34•21 years ago
|
||
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 35•21 years ago
|
||
Comment on attachment 150786 [details] [diff] [review] a new patch asking for a1.7.1 (considering that 1.7.* branch will be long-lived, I think we have to make it consistent with 1.8 and later). I'll ask for aviary 1.0 approval, seprately. risk : very low affected users: anyone using Mozilla under Japanese locale and anyone viewing documents in legacy Japanese encodings. effect: turn off, by default (a pref. was added to turn it on), the replacement of '0x5c' (backslash) with Yen Sign in documents in legacy Japanese encodings. In documents in any other encodings, U+005C is preserved regardless of the pref. value.
Attachment #150786 -
Flags: approval1.7.1?
Comment 36•21 years ago
|
||
What are all the other changes in this patch related to Korean?
Assignee | ||
Comment 37•21 years ago
|
||
Virtually nothing. I just got rid of what should have been removed a long time ago. They haven't been used since bug 88050 was fixed.
Comment 38•21 years ago
|
||
Comment on attachment 150786 [details] [diff] [review] a new patch a=mkaply for 1.7.1
Attachment #150786 -
Flags: approval1.7.1? → approval1.7.1+
Comment 39•21 years ago
|
||
I tested on 2004062109-trunk/WinXP. This patch works fine. Thank you for all people who related to this bug.
Assignee | ||
Comment 40•21 years ago
|
||
I'm sorry I didn't realized that there were some changes (inclduing deCOMization) between 1.7branch and the trunk in files affected by my patch. I'll make a separate patch for 1.7branch and ask for r/sr/a. BTW, I found the following document about the conversion between Japanese encodings and Unicode. It also talks about backslash vs Yen problem. http://www.w3.org/TR/japanese-xml/
Reporter | ||
Comment 41•20 years ago
|
||
is the patch applied in aviary branch ? 'cause with a checkout of 2 days ago, the bug is still here.
Comment 42•20 years ago
|
||
This bug still occurs on Firefox 1.0 PR. Jungshik Shin, when the fix will be applied to Firefox or Mozilla 1.7.x ?
Assignee | ||
Comment 43•20 years ago
|
||
I'm sorry I haven't gotten back to this earlier. Due to some chnages in language atom service and nsPresContext, attachment 150786 [details] [diff] [review] can't be applied to 1.7/av 1.0 branch. This is rather similar to attachment 150730 [details] [diff] [review]. I was sorta forced to take this approach.
Comment 44•20 years ago
|
||
(In reply to comment #43) > This is rather similar to attachment 150730 [details] [diff] [review]. Jungshik Shin, do you mean that Comment #25 from Simon can be ignored when 1.7 branch and Aviary branch?
Assignee | ||
Comment 45•20 years ago
|
||
(In reply to comment #44) > (In reply to comment #43) > > This is rather similar to attachment 150730 [details] [diff] [review]. > Jungshik Shin, do you mean that Comment #25 from Simon can be ignored when 1.7 > branch and Aviary branch? As an end-user, you are not likely to see any problem. Simon's comment #25 is about avoiding the following test for Unicode encoding forms, |nsCRT::strncasecmp(aCharSet, "UTF-", 4))|, which is not as robust as we want it to be. I'd love to address it for the branch, but I couldn't come up with a clean way so that I ended up falling back to a less robust alternative.
Assignee | ||
Comment 46•20 years ago
|
||
Comment on attachment 160035 [details] [diff] [review] 1.7 branch and aviary 1.0 patch I thought I had asked for r/sr, but apparently I haven't. This is basically the same as what's been committed to the trunk except that the test for 'Unicode' encoding form is less robust (Simon's comment #25 was not addressed in this patch) because I couldn't find a clean way to do that in the branch.
Attachment #160035 -
Flags: superreview?(dbaron)
Attachment #160035 -
Flags: review?(smontagu)
Updated•20 years ago
|
Attachment #160035 -
Flags: review?(smontagu) → review+
Attachment #160035 -
Flags: superreview?(dbaron) → superreview+
Assignee | ||
Comment 47•20 years ago
|
||
Comment on attachment 160035 [details] [diff] [review] 1.7 branch and aviary 1.0 patch asking for approval to branches. The previous patch was already approved for the branch check-in, but it turned out that the branch needs a different patch.
Attachment #160035 -
Flags: approval1.7.x?
Attachment #160035 -
Flags: approval-aviary?
Comment 48•20 years ago
|
||
Comment on attachment 160035 [details] [diff] [review] 1.7 branch and aviary 1.0 patch a=asa for branches checkins.
Attachment #160035 -
Flags: approval1.7.x?
Attachment #160035 -
Flags: approval1.7.x+
Attachment #160035 -
Flags: approval-aviary?
Attachment #160035 -
Flags: approval-aviary+
Comment 49•20 years ago
|
||
Jshin, have your patch been checked-in to Firefox? Problem still occurs on both "Firefox 1.0 PR release build" and "Firefox 1.0 RC1 release build" (I tested on Win-2K). Rough changelog of Firefox 1.0RC also does not include this bug. ( http://www.mozilla.org/projects/firefox/qa/changelog-rc1.html ) Since this bug's severity is not blocker nor critical, this bug can not be a blocker of Firefox 1.0. However, we Japanse will be happy if this bug will be fixed on Firefox 1.0 in addition to Mozilla trunk.
Assignee | ||
Comment 50•20 years ago
|
||
Sorry and thanks. Somehow I landed the patch only in 1.7 branch. I've just asked for the approval for aviary-1.0 checkin (it seems like I need a new approval)
Comment 51•20 years ago
|
||
This bug still occurs on Firefox 1.0 Release Candidate 2 (Release build, Win-2K). Jshin, the fix will not be applied to final Firefox 1.0 release?
Assignee | ||
Comment 52•20 years ago
|
||
I'm waiting for the re-approval.
Comment 53•20 years ago
|
||
Comment on attachment 160035 [details] [diff] [review] 1.7 branch and aviary 1.0 patch a=asa for aviary checkin.
Assignee | ||
Comment 54•20 years ago
|
||
checked into the av-1.0 branch
Keywords: fixed-aviary1.0,
fixed1.7.x
Comment 55•20 years ago
|
||
Verified with Firefox nightly latest-trunk build(Win32,ZIP).
> Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a5) Gecko/20041104
Firefox/0.9.1+
Jshin, thanks for your effort.
Comment 57•20 years ago
|
||
Not fixed in View source window with 2004-11-07-12-0.11/Win32.
Assignee | ||
Comment 58•20 years ago
|
||
What font is used to render view-source? If you use one of *broken* Japanese truetype fonts shipped with Japanese Windows, you can't tell because in those fonts, the glyph for U+005C (Reverse Solidus/Backslash) has the shape of 'Japanese Yen'. I tried to persuade MS engineers to fix this font issue, but failed. http://www.unicode.org/mail-arch/unicode-ml/y2002-m10/0340.html (username : unicode, password: unicode-ml)
Do we still need this pref? Do people actually use it? It's annoying to have special code just for this.
You need to log in
before you can comment on or make changes to this bug.
Description
•