Closed Bug 356654 Opened 19 years ago Closed 14 years ago

Problems with backslash with some encodings when using native uconv module.

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: glandium, Assigned: smontagu)

References

()

Details

When using the native uconv module, which uses iconv for encoding conversions, some pages in some special encodings have failing javascript. For an example of such a failing, see http://www.jp.sonystyle.com/ The problem here is that some encodings, such as shift-jis in the sonystyle page, don't really have a backslash character, and have the yen sign instead at code point 0x5c. The same applies to some other encodings with the japanese yen sign, and with the korean won sign. Iconv, when converting such a character to ucs-2 for internal storage, maps the ascii backslash 0x5c to the real yen sign U+00a5, and then, if some javascript contain escaping (such as "a string with the \" character in it"), the string doesn't contain a backslash anymore and fails. The non-native uconv module does not have this problem because it maps the ascii backslash 0x5c to U+005c, and the display code has some special rules to display U+00a5 instead of 0x5c (see bug #245770 for an example of bug that appeared because of that special code). I'm wondering what would be the best solution to fix that... maybe replace U+00a5 back to U+005c when source encoding is shift-jis, and similar rules for all problematic conversions...
Another solution could be to find all the alternatives in iconv for these problematic encodings: using sjis-open or sjis-win instead of Shift_JIS maps 0x5c to U+005c instead of U+00a5
So, after a bit more investigation, it seems only shift jis is concerned. Now, as for my previous suggestion about aliases, the sad thing is that shift-jis is not set through aliases but through special cases in nsCharsetAliasImpl, which, by the way is used every 4KB of the same input stream, which is pretty useless. I'll file a new bug for the latter.
forget it, it's just called a lot, not every 4KB...
It appears sjis-open and sjis-win are an alias of cp932 which is not exactly shift jis, but a superset of it. But i think it's safe to use it instead of shift jis. I have a patch against the nsNativeUconvService.cpp file, but it's after my patch for bug #331748 has been applied. I'll wait for it to be fixed first... (Let's put a dependency for that)
Status: NEW → ASSIGNED
Depends on: 331748
Assignee: smontagu → mh+mozilla
Status: ASSIGNED → NEW
Summary: Problems with backspace with some encodings when using native uconv module. → Problems with backslash with some encodings when using native uconv module.
Assignee: mh+mozilla → smontagu
Depends on: 644801
Native uconv is gone.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.