Closed
Bug 203264
Opened 21 years ago
Closed 17 years ago
"Hankaku kana" should be converted to zenkaku
Categories
(Other Applications :: ChatZilla, defect)
Other Applications
ChatZilla
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kazhik, Assigned: bugzilla-mozilla-20000923)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [cz-charset] [cz-0.9.79])
Attachments
(1 file, 1 obsolete file)
5.40 KB,
patch
|
Gijs
:
review+
|
Details | Diff | Splinter Review |
"Hankaku kana" in iso-2022-jp should be converted to zenkaku before sending, like many IRC clients being used in Japan and Mozilla Mail. http://lxr.mozilla.org/seamonkey/source/mailnews/base/util/nsMsgI18N.cpp#639
Comment 1•21 years ago
|
||
In RFC 2812 2.2, "No specific character set is specified. The protocol is based on a set of codes which are composed of eight (8) bits, making up an octet." Non-delimiter characters should be sent as is, in favor of original intention of sender. In other words, in IRC you should be able to send any binary data except for delimiter characters, anyway. IMHO such hankaku->zenkaku conversion should be done at message interpretation by receiver who expects Japanese strings by detecting Japanese encoding in given data, not sender side. This bug should be marked WONT or at least there should be configurable option to disable sender-side conversion. And more preferably, receiver-side conversion, which should be configurable too, may be nice enhancement.
Updated•20 years ago
|
Product: Core → Other Applications
Updated•19 years ago
|
Assignee | ||
Updated•19 years ago
|
Whiteboard: [cz-charset]
Assignee | ||
Comment 2•18 years ago
|
||
I'm happy to do this conversion on send, but only if someone can actually explain why it is done at all - it seems like a very odd thing to do. None of the existing code or associated bugs seems to help, so can someone explain what this conversion is for and why is it necessary?
Comment 3•18 years ago
|
||
See http://en.wikipedia.org/wiki/Katakana#Computer_encoding Since ISO-2022-JP does not encode the Hankaku kana (half-width katakana) forms, text input using these characters will be converted to question marks by the ISO-2022-JP encoder. In mailnews we prevent this happening by converting to zenkaku (full-width katakana) before encoding.
Updated•18 years ago
|
Assignee | ||
Comment 4•18 years ago
|
||
Ah, that makes sense. Let's see what we can do...
Assignee | ||
Comment 5•17 years ago
|
||
Turning off this conversion results in ? being sent instead, meaning that the data is lost long before the receiver gets it, so I don't propose to give any option to disable this. However, unfortunately we cannot do this like mailnews can; nsITextTransform is not scriptable, making the use of the conversion component impossible. I'm not sure I feel comfortable duplicating the code into JS, although that seems like the only option now.
Assignee | ||
Comment 6•17 years ago
|
||
Well, it's possible in JavaScript, that's for sure. I don't like having to do a string concat for each character, though, but I can't find a more efficient way to build the result string. Note that I opted to keep the old MessageManager behaviour by default and simply turn on the conversion when used in ChatZilla. People had a habit of re-using some of our useful libraries. :)
Assignee: rginda → silver
Status: NEW → ASSIGNED
Attachment #266268 -
Flags: review?(gijskruitbosch+bugs)
Comment 7•17 years ago
|
||
Comment on attachment 266268 [details] [diff] [review] Convert halfwidth hankaku to fullwith zenkaku >RCS file: /cvsroot/mozilla/extensions/irc/js/lib/message-manager.js,v >@@ -46,12 +46,17 @@ function MessageManager(entities) > this.ucConverter = > Components.classes[UC_CTRID].getService(nsIUnicodeConverter); > this.defaultBundle = null; > this.bundleList = new Array(); > // Provide a fallback so we don't break getMsg and related constants later. > this.entities = entities || {}; >+ >+ // ISO-2022-JP (often used for Japanese on IRC) doesn't contain any >+ // support for "halfwith" hankaku, so we support the option to convert it to >+ // "fullwith" zenkaku. This does not affect any other encoding at this time. >+ this.enableHankakuToZenkaku = false; > } Nit: halfwidth, fullwidth Why not put that boolean on the prototype? That would be cleaner in my opinion. <snip> >+ // If it is in hankaku - do basic mapping first. Nit: colon, comma or nothing here - not a dash. >+ dest = basicMapping[src - HANKAKU_BASE1]; >+ >+ // If is some char could be modifier, and the next char >+ // is a modifier, modify it and eat one byte. >+ Nit: This isn't valid English. I see you've copied it from the original code, but please fix this so it makes sense. Also, please remove this blank line. :-) >+ if (i < msg.length - 1) >+ { >+ if ((MOD_NIGORI == srcMod) && >+ (((src >= NIGORI_MIN1) && (src <= NIGORI_MAX1)) || >+ ((src >= NIGORI_MIN2) && (src <= NIGORI_MAX2)))) >+ { >+ dest += NIGORI_MODIFIER; >+ i++; >+ } >+ else if ((MOD_MARU == srcMod) && >+ ((src >= MARU_MIN) && (src <= MARU_MAX))) Nit: lose a set of braces here, they're all AND operators. Also line up the remaining braces. <snip> r=me with the nits fixed and the boolean on the prototype
Attachment #266268 -
Flags: review?(gijskruitbosch+bugs) → review+
Assignee | ||
Comment 8•17 years ago
|
||
Re-written all the comments in the conversion code, and also spotted and removed some trailing whitespace.
Attachment #266268 -
Attachment is obsolete: true
Attachment #267810 -
Flags: review?(gijskruitbosch+bugs)
Hmm, charsetalias.properties says there are a few aliases for iso-2022-jp (e.g. csiso2022jp). If I use /charset csiso2022jp, fromUnicode does see its |charset| param as "csiso2022jp" (not iso-2022-jp), but I _think_ the same encoder/decoder gets used. Would a follow-up patch be needed to handle that?
Assignee | ||
Comment 10•17 years ago
|
||
Yeah, probably; let's not hold up this patch on that for now. One thing that concerns me is how we could cope generally, e.g. does the converter return the canonical name or the one used, could we read chatsetalias.properties ourselves, etc.?
Comment 11•17 years ago
|
||
Comment on attachment 267810 [details] [diff] [review] Updated per nits + // We get the following character as well, since it can be a modifier. Please move that comment to the line before the if statement. Please file a followup bug about Mook's comment as necessary. Otherwise, r=gijs.
Attachment #267810 -
Flags: review?(gijskruitbosch+bugs) → review+
Comment 12•17 years ago
|
||
(In reply to comment #9) > Hmm, charsetalias.properties says there are a few aliases for iso-2022-jp (e.g. > csiso2022jp). If I use /charset csiso2022jp, fromUnicode does see its > |charset| param as "csiso2022jp" (not iso-2022-jp), but I _think_ the same > encoder/decoder gets used. > > Would a follow-up patch be needed to handle that? > To handle what exactly? nsIScriptableUnicodeConverter does charset alias resolution when setting its charset, but doesn't expose the canonical name of the charset in use. You can do your own alias->canonical name conversion with GetCharsetAlias() from nsICharsetConverterManager.
Assignee | ||
Comment 13•17 years ago
|
||
We're invoking our own code for the hankaku to zenkaku conversion based on the character encoding we've been set to use; if the user specifies an alias to ISO-2022-jp, we should still invoke said code. That's all in bug 383878.
Assignee | ||
Comment 14•17 years ago
|
||
Checked in --> FIXED.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Whiteboard: [cz-charset] → [cz-0.9.79]
Comment 15•17 years ago
|
||
Changing multiple bugs' whiteboards should be smarter. :-( Sorry for bugspam.
Whiteboard: [cz-0.9.79] → [cz-charset] [cz-0.9.79]
You need to log in
before you can comment on or make changes to this bug.
Description
•