Closed
Bug 41564
Opened 25 years ago
Closed 24 years ago
Internationalize ChatZilla to handle different language scripts
Categories
(Other Applications Graveyard :: ChatZilla, defect, P3)
Other Applications Graveyard
ChatZilla
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla1.0.1
People
(Reporter: m_kato, Assigned: oliver)
References
Details
Attachments
(2 files, 6 obsolete files)
|
5.81 KB,
patch
|
Details | Diff | Splinter Review | |
|
2.30 KB,
patch
|
Details | Diff | Splinter Review |
Current implementation supports latin-1 encoding only. But in Japanese, IRC
encoding uses ISO-2022-JP encoding.
Please support multiple encoding for I18N.
Comment 1•25 years ago
|
||
I'm going to confirm this bug and summarize the current status of Mozilla chat below and
make some recommendations as to what the specs for internationalization should be.
1. Currently, we are able to deal with only Latin 1 (ISO-8859-1) charactes in the chat window.
We should use the practice elsewhere in Mozilla to send out Unicode and expect to receive
Unicode.
2. #1 will simplfiy dealings with character encoding issues among Mozilla Chatzilla users.
3. There are many existing IRC clients geared toward only single languages. As people communicate
across continents using Chatzilla and talk to other chat clients which might not know Unicode,
what we should have is a Character Coding menu like the one you find in Messenger or Browser
components. In fact you can copy the menu from there. Ask i18n people how to do this.
Then, when the Character coding menu is set to Japanese (ISO-2022-JP) -- as an example --, use
that encoding to both send chat data and also to interpet incoming data.
This way you will be able to deal with legacy clients.
4. Currently, we are not handling the CJK input method correctly. We are now commiting any entry when
the CR is pressed. IN CJK, pressing CR means different things depending on what the IME status is.
If it is in candidate state, pressing CR means "commit to canvas" but NOT send out the data.
When IME is not in candidate state, then presssing CR means to send the data out, etc.
Someone familiar with CJK IME should be able to fix this quickly.
Let's make Chatzilla into a great multilingual chat tool!
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: rginda → momoi
Comment 2•25 years ago
|
||
There are other issues concerning this and m_kato has raised it in
mozilla-i18n group. Kato-san, please send that message to rginda
who may not have seen it.
| Reporter | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
| Reporter | ||
Comment 3•25 years ago
|
||
Robert, sorry. I mistake.
please assign to me. I have a fix code
| Reporter | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
| Reporter | ||
Comment 5•25 years ago
|
||
TODO plan:
o create scriptable Unicode convert interface. (bug 54857)
o add new command "/charset <charactor-set>"
Depends on: 54857
Comment 6•25 years ago
|
||
*MASS SPAM*
Changing QA contact on all open or unverified ChatZilla bugs to me, David
Krause, as I am now the QA contact for this component.
QA Contact: momoi → David
Comment 7•25 years ago
|
||
David, the fact that you're the default contact does not
mean that you should take over all the bugs. Some bugs
can be re-assigned to appropriate people.
This bug should be QA'ed by an international contact
with machines and environments set up for that task.
For now changing it to ji@netscape.com.
We may assign this to someone in Mozilla.org Japan.
Maybe Koike-san can take over this one?
QA Contact: david → ji
Comment 8•25 years ago
|
||
Kato-san said he would fix this until 1.0.
QA Contact: ji → kazhik
Target Milestone: --- → mozilla1.0
Comment 9•25 years ago
|
||
Whoops, sorry about that I just did a "Change all bugs at once" and didn't
notice that I was removing a QA other than rginda. I'll try to be more careful
next time. It is true that I am not the best person to QA this type of bug.
Sorry again.
Comment 10•24 years ago
|
||
Reassigning to Furukawa-san.
Assignee: m_kato → oliver
Status: ASSIGNED → NEW
Comment 11•24 years ago
|
||
My experience is currently that ChatZilla doesn't allow the user to change fonts
- that way, even if Russian (significant for me) is correctly transferred, I see
only garbage because of latin1-font. Could someone please change the Summary
field - I am not sure, whether I am allowed to.
Comment 12•24 years ago
|
||
OK. I changed the summary line to include other language scripts
than Japanese. We need to deal with different language scripts
Mozilla provides support for.
Summary: Cannot use Japanese via IRC chat → Internationalize ChatZilla to handle different language scripts
Comment 13•24 years ago
|
||
*** Bug 102757 has been marked as a duplicate of this bug. ***
Updated•24 years ago
|
Blocks: patchmaker
Updated•24 years ago
|
No longer blocks: patchmaker
Comment 14•24 years ago
|
||
*** Bug 111216 has been marked as a duplicate of this bug. ***
Comment 15•24 years ago
|
||
Maruyama-san(mal@mozilla.gr.jp) and I have made a minimum patch to
handle non-ASCII characters in ChatZilla.
user_pref("extensions.irc.charset", "iso-2022-jp");
Users can set their default charset like this.
We don't have UI or command to switch charset yet.
Comment 16•24 years ago
|
||
That's great, thanks for the contribution. I'll check this into the chatzilla
0.8.5 branch, which will hopefully land in a week or so.
Depends on: 103386
Comment 17•24 years ago
|
||
> user_pref("extensions.irc.charset", "iso-2022-jp");
Mistake. This should be:
user_pref("extensions.irc.default_charset", "iso-2022-jp");
Comment 18•24 years ago
|
||
Do we need to encode strings from the .properties file too?
Updated•24 years ago
|
Attachment #59093 -
Attachment is obsolete: true
Comment 19•24 years ago
|
||
I've reworked the patch a bit to integrate better with the existing codebase.
I've already checked this into the CHATZILLA_0_8_5_BRANCH, and will respin an
xpi for <http://www.hacksrus.com/~ginda/chatzilla/>. The first xpi to have
this code will be 0.8.5-pre23, look for it in an hour or so. The 0.8.5 branch
will hopefully land early next week, so please test this out asap.
Comment 20•24 years ago
|
||
I have installed new 0.8.5-pre23 from URL you have posted.
Mozilla already have had older ChatZilla client.
How could I find out version of Chatzilla I'm running now? Is it new (was it
updated by xpi ?)
Problem is - it doesn't show russian/ukrainian (koi8-r,koi8-u)
I have put both
user_pref("extensions.irc.charset", "koi8-r");
user_pref("extensions.irc.default_charset", "koi8-r");
into prefs.js
Network for test: ForestNet - irc.ForestNet.org
command /list (see topics)
change server side charset "/quote codepage koi8" (for koi8-u and koi8-r)
Comment 21•24 years ago
|
||
> Do we need to encode strings from the .properties file too?
What sort of string are you talking about? Localizable strings? or just
some settings for charset.
Comment 22•24 years ago
|
||
OK, let me CC tao about this. It looks like chatzilla.jr contains
localizable .dtd files. chatzilla.jar is self-contained as is venkman.jar.
How does localization work in this type of case? Should it not follow
the localization convention like en-US.jar? en-US.chatzilla.jar, for
example. So, it would make sense to insert the default charset of the
chatzilla client you're shipping into chatzilla.properties file. But that
should be easily discoverable by localizers. Suggestions?
Comment 23•24 years ago
|
||
If the resource is locale-specific, we need to put them in properties files
packaged in locale-specific jar such as en-US-irc.jar so localizers can easily
translate irc to other languages.
Comment 24•24 years ago
|
||
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1
(you can query for this string to delete spam or retrieve the list of bugs I've
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment 25•24 years ago
|
||
kat:
>> Do we need to encode strings from the .properties file too?
>
> What sort of string are you talking about? Localizable strings? or just
> some settings for charset.
I was asking if we need to pass strings we get from a string bundle through the
decoder. Now that I've got an idea what's going on here, I see we don't because
they are already in unicode.
I've just landed the branch, and branded it 0.8.5-rc1 (release candidate one.)
I'd like to get malvin's problem cleared up before mozilla 0.9.7. Any debugging
help would be greatly appreciated.
In 0.8.5-rc1, users should be able to switch the charset on the fly by typing
'/eval setCharset("iso-2022-jp");' in chatzilla. This setting should be
persisted in prefs for the next session.
Comment 26•24 years ago
|
||
There are two problems in 0.8.5-rc2.
(1) Second outgoing message after executing "/eval setCharset()"
isn't converted.
function ucConvertOutgoingMessage (msg)
{
if (client.ucConverter)
return client.ucConverter.ConvertFromUnicode(msg);
return msg;
}
If you create an instance of ucConverter every time you send message,
as in my first patch, ConvertFromUnicode() works well.
(2) Outgoing messages are always displayed as garbage in message display area.
if (!client.eventPump.getHook("uc-hook"))
{
client.eventPump.addHook ([{type: "privmsg", set: "server"}],
ucConvertIncomingMessage, "uc-hook");
}
This doesn't work for outgoing messages.
Comment 27•24 years ago
|
||
Koike's (2) problem fixed patch.
Row charctor (UTF-8) send display area.
Comment 28•24 years ago
|
||
make "ucConvert" class, then (1) Problem fixed!
Comment 29•24 years ago
|
||
outbound conversions need to be done in more places than just
sayToCurrentTarget. Doing them in filterOutput (as I had done) was too early,
and resulted in us trying to display encoded text (instead of unicode) in the
output window. I've added a fromUnicode() function and called it at each site
that sends plain text to the server that I can think of (I may have missed some
call sites.)
I'm not sure why only the first outbound message was converted (possibly
because of my createInstance vs. getService mixup) but it seems to be fixed
now.
I think adding a new class for this is a little too much, and re-creating the
xpcom component for every message processed is *definatley* too much.
I've tested this patch on irc.forestnet.org as described by malvin in comment
#20, and it looks like it works to me. I see cyrillic characters in the
topics, and when I paste those characters in a private message to myself they
appear at both ends.
I'll post this to hacksrus as pre3 for further testing.
Thanks to everyone who has commented and attached patches to this bug, I
wouldn't have been able to fix this without your help.
Attachment #59819 -
Attachment is obsolete: true
Attachment #61005 -
Attachment is obsolete: true
Attachment #61013 -
Attachment is obsolete: true
Comment 30•24 years ago
|
||
rc3, not pre3.
rc3 is now available on www.hacksrus.com/~ginda/chatzilla/. Please test it out,
I'd like to check it in by tomorrow (which is the 0.9.7 close.)
Comment 31•24 years ago
|
||
rc3 doesn't convert the second outgoing message. But every message is
displayed fine in local window.
Comment 32•24 years ago
|
||
Creating an instance of Unicode converter isn't a good solution.
But that is the only way we know now. I think we should adopt it
as the temporary fix for 0.9.7.
Comment 33•24 years ago
|
||
I'm sorry, but I'm not sure I agree. Creating a new encoder for every message
sent will just hide the real problem, which I'd much prefer to solve and get on
the 0.9.7 branch.
The koi8 encoder seems to work for me. I attach to forestnet, /list #moldova,
and paste some of the characters from the topic into the input box. I can /msg
those characters to myself multiple times, and they always look the same.
Could it be that the ISO-2022-JP encoder leaves itself in a bad state after
encoding the first message? I'm trying to verify this, but nothing obvoius goes
wrong when I pass two ASCII messages through it. Can you name an irc server
which has 2002-jp users so I can see the problem for myself?
Comment 34•24 years ago
|
||
/attach moznet, /join #mozillazine-jp.
If we put an ASCII character at the beginning of Japanese characters,
every outgoing message is converted correctly.
Comment 35•24 years ago
|
||
An ad-hoc patch for iso-2022-jp.
"iso-2022-jp" have some STATEs.
Once the STATE of ucConverter becomes a non-ascii charset,
it won't change until the next ascii char.
Trick: A dummy exec "client.ucConverter.ConvertFromUnicode('a');" changes the
STATE to ascii charset.
A matter of concern: Is ucConverter synchronized?
Comment 36•24 years ago
|
||
client.ucConverter.ConvertFromUnicode('a');
return client.ucConverter.ConvertFromUnicode(msg);
The following code also works.
client.ucConverter.charset = client.CHARSET;
return client.ucConverter.ConvertFromUnicode(msg);
Comment 37•24 years ago
|
||
shoji, what does it mean for the converter to be "synchronized"?
Comment 38•24 years ago
|
||
When ucConverter.convert(From|To)Unicode() is called by TWO (or more) callers
simultaniously, strings and STATE changers will be mixed.
fromUnicode() and toUnicode() must lock ucConverter.
in Java style,
fromUnicode(msg) {
...
synchronized (client.ucConverter) {
client.ucConverter.fromUnicode("a");
return client.ucConverter.fromUnicode(msg);
}
}
# oops.., I made a mistake in adhoc patch.
# client.ucConverter.ConvertFromUnicode('a') is called in toUnicode()..
# It must be client.ucConverter.ConvertToUnicode('a')
Comment 39•24 years ago
|
||
I think ConvertFromUnicode() should add "ESC ( B" at the end of
the returned string. Then JavaScript code doesn't have to care about
STATE.
Comment 40•24 years ago
|
||
shom: We have no locking/synchronization constructs in xpcom or javascript, the
converter would have to provide it's own synchronization api. More likely, the
converter should synchronize itself, so the caller doesn't have to worry about
the details.
kazhik: what is "ESC ( B" in bytes? What does that sequence mean, and is it
valid for all encodings, or just iso-2022-jp?
Comment 41•24 years ago
|
||
"ESC ( B" means the beginning of ASCII characters.
ISO-2022-JP string in ChatZilla begins with "ESC $ B" and
ends with no escape sequence. So the followin text is assumed
to be ISO-2022-JP.
Comment 42•24 years ago
|
||
It seems nsScriptableUnicodeConverter::ConvertFromUnicode()
should call mEncoder->Finish() after mEncoder->Convert().
Comment 43•24 years ago
|
||
The escape sequence seems to have done the trick for the problems I saw with
iso-2022-jp. In this patch, I am assuming that ESC(B is the ASCII sequence for
all iso-2022 encodings, can anyone verify that this is this a valid assumption?
I'll post this as rc5 in a minute.
Attachment #61183 -
Attachment is obsolete: true
Attachment #61316 -
Attachment is obsolete: true
Comment 44•24 years ago
|
||
> In this patch, I am assuming that ESC(B is the ASCII sequence for
> all iso-2022 encodings, can anyone verify that this is this a valid
> assumption.
It is. The final "B" is not unique in ISO-2022 encodings but
"ESC ( B" is unique to ANSI X3.4-1986 (=ASCII).
Comment 45•24 years ago
|
||
I've checked the latest patch into the trunk.
kazhik, mEncoder->Finish() sounds like the right fix to me too, will you file an
i18n bug with a patch?
Comment 46•24 years ago
|
||
I posted bug 114923 for nsScriptableUnicodeConverter::ConvertFromUnicode()
problem.
Comment 47•24 years ago
|
||
We need a command to change charset.
/charset iso-2022-jp
/charset euc-kr
Comment 48•24 years ago
|
||
I've just landed the client.ucConverter.charset = client.CHARSET; fix, along
with the /charset command on the trunk.
Providing everything works as expected, we'll have charset support in chatzilla
for 0.9.7! Thanks again to all who helped out.
I'll mark this bug as fixed, please repoen if there are any problems.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 49•23 years ago
|
||
The charsets aren't working properly.
I can't see the right-sided languages support here, for example: hebrew !!!
I can only see it backwards (with both charsets:
ISO-5559-5 - should show it backwards
Windows-1255 - shoud switch it so DCBA will be ABCD).
Same thing with hebrew input.
I can only input in one way - not sure which one is it, may be it's a problem in
displaying or in the input, but I see it backwards !!!
Chatzilla should use the same charset methods that mozilla browser uses.
I would like this bug to be reopened.
Comment 50•23 years ago
|
||
m_vitaly, I think you should open another bug for bidi support
in Chatzilla. We need specialists in that area to diagnose what needs
to happen for Chatzilla to support Hebrew, Arabic and other bidi languages.
This bug put in basic charset support in Chatzilla and we should leave it
at that.
When you file a new bug, in addition to rginda@netscape.com, CC also
mkaply@us.ibm.com and smontagu@netscape.com.
Comment 51•23 years ago
|
||
By the way...
Some IRC server supports command "codepage". I know about RusNet and ForestNet
servers for certain. (http://www.rus.net.ua and http://www.ForestNet.Org).
So user must type "/quote codepage koi8u" to get KOI8-U charset (ForestNet). Or
"/quote codepage cp1251" for "windows-1251" charset.
It whould be just great to have this feature in ChatZilla (so it will send this
command to server if user changes it's client charset).
What do you think?
Comment 52•21 years ago
|
||
I believe the updated IRC standard supports character encoding negotiation
between client and server. So chatzilla should support this whenever the server
supports it and get rid of cryptic config file editing and have it "Just
Work(tm)" ;-)
Here are some useful links :
http://www.irc.org/tech_docs/005.html
http://www.irc.org/tech_docs/draft-brocklesby-irc-isupport-03.txt
Updated•21 years ago
|
Product: Core → Other Applications
Updated•9 months ago
|
Product: Other Applications → Other Applications Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•