Closed
Bug 41564
Opened 24 years ago
Closed 23 years ago
Internationalize ChatZilla to handle different language scripts
Categories
(Other Applications :: ChatZilla, defect, P3)
Other Applications
ChatZilla
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.0.1
People
(Reporter: m_kato, Assigned: oliver)
References
Details
Attachments
(2 files, 6 obsolete files)
5.81 KB,
patch
|
Details | Diff | Splinter Review | |
2.30 KB,
patch
|
Details | Diff | Splinter Review |
Current implementation supports latin-1 encoding only. But in Japanese, IRC encoding uses ISO-2022-JP encoding. Please support multiple encoding for I18N.
Comment 1•24 years ago
|
||
I'm going to confirm this bug and summarize the current status of Mozilla chat below and make some recommendations as to what the specs for internationalization should be. 1. Currently, we are able to deal with only Latin 1 (ISO-8859-1) charactes in the chat window. We should use the practice elsewhere in Mozilla to send out Unicode and expect to receive Unicode. 2. #1 will simplfiy dealings with character encoding issues among Mozilla Chatzilla users. 3. There are many existing IRC clients geared toward only single languages. As people communicate across continents using Chatzilla and talk to other chat clients which might not know Unicode, what we should have is a Character Coding menu like the one you find in Messenger or Browser components. In fact you can copy the menu from there. Ask i18n people how to do this. Then, when the Character coding menu is set to Japanese (ISO-2022-JP) -- as an example --, use that encoding to both send chat data and also to interpet incoming data. This way you will be able to deal with legacy clients. 4. Currently, we are not handling the CJK input method correctly. We are now commiting any entry when the CR is pressed. IN CJK, pressing CR means different things depending on what the IME status is. If it is in candidate state, pressing CR means "commit to canvas" but NOT send out the data. When IME is not in candidate state, then presssing CR means to send the data out, etc. Someone familiar with CJK IME should be able to fix this quickly. Let's make Chatzilla into a great multilingual chat tool!
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: rginda → momoi
Comment 2•24 years ago
|
||
There are other issues concerning this and m_kato has raised it in mozilla-i18n group. Kato-san, please send that message to rginda who may not have seen it.
Reporter | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•24 years ago
|
||
Robert, sorry. I mistake. please assign to me. I have a fix code
Reporter | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 5•24 years ago
|
||
TODO plan: o create scriptable Unicode convert interface. (bug 54857) o add new command "/charset <charactor-set>"
Depends on: 54857
Comment 6•24 years ago
|
||
*MASS SPAM* Changing QA contact on all open or unverified ChatZilla bugs to me, David Krause, as I am now the QA contact for this component.
QA Contact: momoi → David
Comment 7•24 years ago
|
||
David, the fact that you're the default contact does not mean that you should take over all the bugs. Some bugs can be re-assigned to appropriate people. This bug should be QA'ed by an international contact with machines and environments set up for that task. For now changing it to ji@netscape.com. We may assign this to someone in Mozilla.org Japan. Maybe Koike-san can take over this one?
QA Contact: david → ji
Comment 8•24 years ago
|
||
Kato-san said he would fix this until 1.0.
QA Contact: ji → kazhik
Target Milestone: --- → mozilla1.0
Comment 9•24 years ago
|
||
Whoops, sorry about that I just did a "Change all bugs at once" and didn't notice that I was removing a QA other than rginda. I'll try to be more careful next time. It is true that I am not the best person to QA this type of bug. Sorry again.
Comment 10•23 years ago
|
||
Reassigning to Furukawa-san.
Assignee: m_kato → oliver
Status: ASSIGNED → NEW
Comment 11•23 years ago
|
||
My experience is currently that ChatZilla doesn't allow the user to change fonts - that way, even if Russian (significant for me) is correctly transferred, I see only garbage because of latin1-font. Could someone please change the Summary field - I am not sure, whether I am allowed to.
Comment 12•23 years ago
|
||
OK. I changed the summary line to include other language scripts than Japanese. We need to deal with different language scripts Mozilla provides support for.
Summary: Cannot use Japanese via IRC chat → Internationalize ChatZilla to handle different language scripts
Comment 13•23 years ago
|
||
*** Bug 102757 has been marked as a duplicate of this bug. ***
Updated•23 years ago
|
Blocks: patchmaker
Updated•23 years ago
|
No longer blocks: patchmaker
Comment 14•23 years ago
|
||
*** Bug 111216 has been marked as a duplicate of this bug. ***
Comment 15•23 years ago
|
||
Maruyama-san(mal@mozilla.gr.jp) and I have made a minimum patch to handle non-ASCII characters in ChatZilla. user_pref("extensions.irc.charset", "iso-2022-jp"); Users can set their default charset like this. We don't have UI or command to switch charset yet.
Comment 16•23 years ago
|
||
That's great, thanks for the contribution. I'll check this into the chatzilla 0.8.5 branch, which will hopefully land in a week or so.
Depends on: 103386
Comment 17•23 years ago
|
||
> user_pref("extensions.irc.charset", "iso-2022-jp");
Mistake. This should be:
user_pref("extensions.irc.default_charset", "iso-2022-jp");
Comment 18•23 years ago
|
||
Do we need to encode strings from the .properties file too?
Updated•23 years ago
|
Attachment #59093 -
Attachment is obsolete: true
Comment 19•23 years ago
|
||
I've reworked the patch a bit to integrate better with the existing codebase. I've already checked this into the CHATZILLA_0_8_5_BRANCH, and will respin an xpi for <http://www.hacksrus.com/~ginda/chatzilla/>. The first xpi to have this code will be 0.8.5-pre23, look for it in an hour or so. The 0.8.5 branch will hopefully land early next week, so please test this out asap.
Comment 20•23 years ago
|
||
I have installed new 0.8.5-pre23 from URL you have posted. Mozilla already have had older ChatZilla client. How could I find out version of Chatzilla I'm running now? Is it new (was it updated by xpi ?) Problem is - it doesn't show russian/ukrainian (koi8-r,koi8-u) I have put both user_pref("extensions.irc.charset", "koi8-r"); user_pref("extensions.irc.default_charset", "koi8-r"); into prefs.js Network for test: ForestNet - irc.ForestNet.org command /list (see topics) change server side charset "/quote codepage koi8" (for koi8-u and koi8-r)
Comment 21•23 years ago
|
||
> Do we need to encode strings from the .properties file too?
What sort of string are you talking about? Localizable strings? or just
some settings for charset.
Comment 22•23 years ago
|
||
OK, let me CC tao about this. It looks like chatzilla.jr contains localizable .dtd files. chatzilla.jar is self-contained as is venkman.jar. How does localization work in this type of case? Should it not follow the localization convention like en-US.jar? en-US.chatzilla.jar, for example. So, it would make sense to insert the default charset of the chatzilla client you're shipping into chatzilla.properties file. But that should be easily discoverable by localizers. Suggestions?
Comment 23•23 years ago
|
||
If the resource is locale-specific, we need to put them in properties files packaged in locale-specific jar such as en-US-irc.jar so localizers can easily translate irc to other languages.
Comment 24•23 years ago
|
||
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment 25•23 years ago
|
||
kat: >> Do we need to encode strings from the .properties file too? > > What sort of string are you talking about? Localizable strings? or just > some settings for charset. I was asking if we need to pass strings we get from a string bundle through the decoder. Now that I've got an idea what's going on here, I see we don't because they are already in unicode. I've just landed the branch, and branded it 0.8.5-rc1 (release candidate one.) I'd like to get malvin's problem cleared up before mozilla 0.9.7. Any debugging help would be greatly appreciated. In 0.8.5-rc1, users should be able to switch the charset on the fly by typing '/eval setCharset("iso-2022-jp");' in chatzilla. This setting should be persisted in prefs for the next session.
Comment 26•23 years ago
|
||
There are two problems in 0.8.5-rc2. (1) Second outgoing message after executing "/eval setCharset()" isn't converted. function ucConvertOutgoingMessage (msg) { if (client.ucConverter) return client.ucConverter.ConvertFromUnicode(msg); return msg; } If you create an instance of ucConverter every time you send message, as in my first patch, ConvertFromUnicode() works well. (2) Outgoing messages are always displayed as garbage in message display area. if (!client.eventPump.getHook("uc-hook")) { client.eventPump.addHook ([{type: "privmsg", set: "server"}], ucConvertIncomingMessage, "uc-hook"); } This doesn't work for outgoing messages.
Comment 27•23 years ago
|
||
Koike's (2) problem fixed patch. Row charctor (UTF-8) send display area.
Comment 28•23 years ago
|
||
make "ucConvert" class, then (1) Problem fixed!
Comment 29•23 years ago
|
||
outbound conversions need to be done in more places than just sayToCurrentTarget. Doing them in filterOutput (as I had done) was too early, and resulted in us trying to display encoded text (instead of unicode) in the output window. I've added a fromUnicode() function and called it at each site that sends plain text to the server that I can think of (I may have missed some call sites.) I'm not sure why only the first outbound message was converted (possibly because of my createInstance vs. getService mixup) but it seems to be fixed now. I think adding a new class for this is a little too much, and re-creating the xpcom component for every message processed is *definatley* too much. I've tested this patch on irc.forestnet.org as described by malvin in comment #20, and it looks like it works to me. I see cyrillic characters in the topics, and when I paste those characters in a private message to myself they appear at both ends. I'll post this to hacksrus as pre3 for further testing. Thanks to everyone who has commented and attached patches to this bug, I wouldn't have been able to fix this without your help.
Attachment #59819 -
Attachment is obsolete: true
Attachment #61005 -
Attachment is obsolete: true
Attachment #61013 -
Attachment is obsolete: true
Comment 30•23 years ago
|
||
rc3, not pre3. rc3 is now available on www.hacksrus.com/~ginda/chatzilla/. Please test it out, I'd like to check it in by tomorrow (which is the 0.9.7 close.)
Comment 31•23 years ago
|
||
rc3 doesn't convert the second outgoing message. But every message is displayed fine in local window.
Comment 32•23 years ago
|
||
Creating an instance of Unicode converter isn't a good solution. But that is the only way we know now. I think we should adopt it as the temporary fix for 0.9.7.
Comment 33•23 years ago
|
||
I'm sorry, but I'm not sure I agree. Creating a new encoder for every message sent will just hide the real problem, which I'd much prefer to solve and get on the 0.9.7 branch. The koi8 encoder seems to work for me. I attach to forestnet, /list #moldova, and paste some of the characters from the topic into the input box. I can /msg those characters to myself multiple times, and they always look the same. Could it be that the ISO-2022-JP encoder leaves itself in a bad state after encoding the first message? I'm trying to verify this, but nothing obvoius goes wrong when I pass two ASCII messages through it. Can you name an irc server which has 2002-jp users so I can see the problem for myself?
Comment 34•23 years ago
|
||
/attach moznet, /join #mozillazine-jp. If we put an ASCII character at the beginning of Japanese characters, every outgoing message is converted correctly.
Comment 35•23 years ago
|
||
An ad-hoc patch for iso-2022-jp. "iso-2022-jp" have some STATEs. Once the STATE of ucConverter becomes a non-ascii charset, it won't change until the next ascii char. Trick: A dummy exec "client.ucConverter.ConvertFromUnicode('a');" changes the STATE to ascii charset. A matter of concern: Is ucConverter synchronized?
Comment 36•23 years ago
|
||
client.ucConverter.ConvertFromUnicode('a'); return client.ucConverter.ConvertFromUnicode(msg); The following code also works. client.ucConverter.charset = client.CHARSET; return client.ucConverter.ConvertFromUnicode(msg);
Comment 37•23 years ago
|
||
shoji, what does it mean for the converter to be "synchronized"?
Comment 38•23 years ago
|
||
When ucConverter.convert(From|To)Unicode() is called by TWO (or more) callers simultaniously, strings and STATE changers will be mixed. fromUnicode() and toUnicode() must lock ucConverter. in Java style, fromUnicode(msg) { ... synchronized (client.ucConverter) { client.ucConverter.fromUnicode("a"); return client.ucConverter.fromUnicode(msg); } } # oops.., I made a mistake in adhoc patch. # client.ucConverter.ConvertFromUnicode('a') is called in toUnicode().. # It must be client.ucConverter.ConvertToUnicode('a')
Comment 39•23 years ago
|
||
I think ConvertFromUnicode() should add "ESC ( B" at the end of the returned string. Then JavaScript code doesn't have to care about STATE.
Comment 40•23 years ago
|
||
shom: We have no locking/synchronization constructs in xpcom or javascript, the converter would have to provide it's own synchronization api. More likely, the converter should synchronize itself, so the caller doesn't have to worry about the details. kazhik: what is "ESC ( B" in bytes? What does that sequence mean, and is it valid for all encodings, or just iso-2022-jp?
Comment 41•23 years ago
|
||
"ESC ( B" means the beginning of ASCII characters. ISO-2022-JP string in ChatZilla begins with "ESC $ B" and ends with no escape sequence. So the followin text is assumed to be ISO-2022-JP.
Comment 42•23 years ago
|
||
It seems nsScriptableUnicodeConverter::ConvertFromUnicode() should call mEncoder->Finish() after mEncoder->Convert().
Comment 43•23 years ago
|
||
The escape sequence seems to have done the trick for the problems I saw with iso-2022-jp. In this patch, I am assuming that ESC(B is the ASCII sequence for all iso-2022 encodings, can anyone verify that this is this a valid assumption? I'll post this as rc5 in a minute.
Attachment #61183 -
Attachment is obsolete: true
Attachment #61316 -
Attachment is obsolete: true
Comment 44•23 years ago
|
||
> In this patch, I am assuming that ESC(B is the ASCII sequence for
> all iso-2022 encodings, can anyone verify that this is this a valid
> assumption.
It is. The final "B" is not unique in ISO-2022 encodings but
"ESC ( B" is unique to ANSI X3.4-1986 (=ASCII).
Comment 45•23 years ago
|
||
I've checked the latest patch into the trunk. kazhik, mEncoder->Finish() sounds like the right fix to me too, will you file an i18n bug with a patch?
Comment 46•23 years ago
|
||
I posted bug 114923 for nsScriptableUnicodeConverter::ConvertFromUnicode() problem.
Comment 47•23 years ago
|
||
We need a command to change charset. /charset iso-2022-jp /charset euc-kr
Comment 48•23 years ago
|
||
I've just landed the client.ucConverter.charset = client.CHARSET; fix, along with the /charset command on the trunk. Providing everything works as expected, we'll have charset support in chatzilla for 0.9.7! Thanks again to all who helped out. I'll mark this bug as fixed, please repoen if there are any problems.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 49•23 years ago
|
||
The charsets aren't working properly. I can't see the right-sided languages support here, for example: hebrew !!! I can only see it backwards (with both charsets: ISO-5559-5 - should show it backwards Windows-1255 - shoud switch it so DCBA will be ABCD). Same thing with hebrew input. I can only input in one way - not sure which one is it, may be it's a problem in displaying or in the input, but I see it backwards !!! Chatzilla should use the same charset methods that mozilla browser uses. I would like this bug to be reopened.
Comment 50•23 years ago
|
||
m_vitaly, I think you should open another bug for bidi support in Chatzilla. We need specialists in that area to diagnose what needs to happen for Chatzilla to support Hebrew, Arabic and other bidi languages. This bug put in basic charset support in Chatzilla and we should leave it at that. When you file a new bug, in addition to rginda@netscape.com, CC also mkaply@us.ibm.com and smontagu@netscape.com.
Comment 51•22 years ago
|
||
By the way... Some IRC server supports command "codepage". I know about RusNet and ForestNet servers for certain. (http://www.rus.net.ua and http://www.ForestNet.Org). So user must type "/quote codepage koi8u" to get KOI8-U charset (ForestNet). Or "/quote codepage cp1251" for "windows-1251" charset. It whould be just great to have this feature in ChatZilla (so it will send this command to server if user changes it's client charset). What do you think?
Comment 52•20 years ago
|
||
I believe the updated IRC standard supports character encoding negotiation between client and server. So chatzilla should support this whenever the server supports it and get rid of cryptic config file editing and have it "Just Work(tm)" ;-) Here are some useful links : http://www.irc.org/tech_docs/005.html http://www.irc.org/tech_docs/draft-brocklesby-irc-isupport-03.txt
Updated•20 years ago
|
Product: Core → Other Applications
You need to log in
before you can comment on or make changes to this bug.
Description
•