Internationalize ChatZilla to handle different language scripts

RESOLVED FIXED in mozilla1.0.1

Status

Other Applications
ChatZilla
P3
normal
RESOLVED FIXED
18 years ago
13 years ago

People

(Reporter: m_kato, Assigned: Ryoichi Furukawa)

Tracking

Trunk
mozilla1.0.1
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 6 obsolete attachments)

(Reporter)

Description

18 years ago
Current implementation supports latin-1 encoding only.  But in Japanese, IRC 
encoding uses ISO-2022-JP encoding.

Please support multiple encoding for I18N.

Comment 1

18 years ago
I'm going to confirm this bug and summarize the current status of Mozilla chat below and
make some recommendations as to what the specs for internationalization should be.

1. Currently, we are able to deal with only Latin 1 (ISO-8859-1) charactes in the chat window.
   We should use the practice elsewhere in Mozilla to send out Unicode and expect to receive
   Unicode. 
2. #1 will simplfiy dealings with character encoding issues among Mozilla Chatzilla users.
3. There are many existing IRC clients geared toward only single languages. As people communicate
    across continents using Chatzilla and talk to other chat clients which might not know Unicode,
    what we should have is a Character  Coding menu like the one you find in Messenger or Browser
    components. In fact you can copy the menu from there. Ask i18n people how  to do this.
    Then, when the Character coding menu is set to Japanese (ISO-2022-JP) -- as an example --, use
    that encoding to both send chat data and also to interpet incoming data. 
    This way you will be able to deal with legacy clients. 

4. Currently, we are not handling the CJK input method correctly. We are now commiting any entry when
    the CR is pressed. IN CJK, pressing CR means different things depending on what the IME status is.
    If it is in candidate state, pressing CR means "commit to canvas" but NOT send out the data.
    When IME is not in candidate state, then presssing CR means to send the data out, etc.
   Someone familiar with CJK IME should be able to fix this quickly.

Let's make Chatzilla into a great multilingual chat tool!
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: rginda → momoi

Comment 2

18 years ago
There are other issues concerning this and m_kato has raised it in 
mozilla-i18n group. Kato-san, please send that message to rginda
who may not have seen it.

Updated

18 years ago
Depends on: 27805
(Reporter)

Updated

18 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 3

18 years ago
Robert, sorry.  I mistake.
please assign to me.  I have a fix code

Comment 4

18 years ago
reassigning to m_kato
Assignee: rginda → m_kato
Status: ASSIGNED → NEW
(Reporter)

Updated

18 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 5

18 years ago
TODO plan:
o create scriptable Unicode convert interface. (bug 54857)
o add new command "/charset <charactor-set>"
Depends on: 54857

Comment 6

18 years ago
*MASS SPAM*

Changing QA contact on all open or unverified ChatZilla bugs to me, David
Krause, as I am now the QA contact for this component.
QA Contact: momoi → David

Comment 7

17 years ago
David, the fact that you're the default contact does not 
mean that you should take over all the bugs. Some bugs
can be re-assigned to appropriate people. 
This bug should be QA'ed by an international contact
with machines and environments set up for that task.

For now changing it to ji@netscape.com.
We may assign this to someone in Mozilla.org Japan.
Maybe Koike-san can take over this one?
QA Contact: david → ji

Comment 8

17 years ago
Kato-san said he would fix this until 1.0.
QA Contact: ji → kazhik
Target Milestone: --- → mozilla1.0

Comment 9

17 years ago
Whoops, sorry about that I just did a "Change all bugs at once" and didn't
notice that I was removing a QA other than rginda.  I'll try to be more careful
next time.  It is true that I am not the best person to QA this type of bug.
Sorry again.

Comment 10

17 years ago
Reassigning to Furukawa-san.
Assignee: m_kato → oliver
Status: ASSIGNED → NEW

Comment 11

17 years ago
My experience is currently that ChatZilla doesn't allow the user to change fonts
- that way, even if Russian (significant for me) is correctly transferred, I see
only garbage because of latin1-font. Could someone please change the Summary
field - I am not sure, whether I am allowed to.

Comment 12

17 years ago
OK. I changed the summary line to include other language scripts
than Japanese. We need to deal with different language scripts
Mozilla provides support for.
Summary: Cannot use Japanese via IRC chat → Internationalize ChatZilla to handle different language scripts

Comment 13

17 years ago
*** Bug 102757 has been marked as a duplicate of this bug. ***

Updated

17 years ago
Blocks: 104624

Comment 14

17 years ago
*** Bug 111216 has been marked as a duplicate of this bug. ***

Comment 15

17 years ago
Created attachment 59093 [details] [diff] [review]
patch

Maruyama-san(mal@mozilla.gr.jp) and I have made a minimum patch to
handle non-ASCII characters in ChatZilla.

user_pref("extensions.irc.charset", "iso-2022-jp");

Users can set their default charset like this. 

We don't have UI or command to switch charset yet.

Comment 16

17 years ago
That's great, thanks for the contribution.  I'll check this into the chatzilla
0.8.5 branch, which will hopefully land in a week or so.
Depends on: 103386

Comment 17

17 years ago
> user_pref("extensions.irc.charset", "iso-2022-jp");

Mistake. This should be:

user_pref("extensions.irc.default_charset", "iso-2022-jp");

Comment 18

17 years ago
Do we need to encode strings from the .properties file too?

Updated

17 years ago
Attachment #59093 - Attachment is obsolete: true

Comment 19

17 years ago
Created attachment 59819 [details] [diff] [review]
new patch

I've reworked the patch a bit to integrate better with the existing codebase.
I've already checked this into the CHATZILLA_0_8_5_BRANCH, and will respin an
xpi for <http://www.hacksrus.com/~ginda/chatzilla/>.  The first xpi to have
this code will be 0.8.5-pre23, look for it in an hour or so.  The 0.8.5 branch
will hopefully land early next week, so please test this out asap.

Comment 20

17 years ago
I have installed new 0.8.5-pre23 from URL you have posted.
Mozilla already have had older ChatZilla client.
How could I find out version of Chatzilla I'm running now? Is it new (was it
updated by xpi ?)
Problem is - it doesn't show russian/ukrainian (koi8-r,koi8-u)
I have put both
user_pref("extensions.irc.charset", "koi8-r");
user_pref("extensions.irc.default_charset", "koi8-r");
into prefs.js

Network for test: ForestNet - irc.ForestNet.org
command /list (see topics)
change server side charset "/quote codepage koi8" (for koi8-u and koi8-r)

Comment 21

17 years ago
> Do we need to encode strings from the .properties file too?

What sort of string are you talking about? Localizable strings? or just
some settings for charset.

Comment 22

17 years ago
OK, let me CC tao about this. It looks like chatzilla.jr contains
localizable .dtd files. chatzilla.jar is self-contained as is venkman.jar.
How does localization work in this type of case? Should it not follow
the localization convention like en-US.jar? en-US.chatzilla.jar, for
example. So, it would make sense to insert the default charset of the
chatzilla client you're shipping into chatzilla.properties file. But that
should be easily discoverable by localizers. Suggestions?

Comment 23

17 years ago
If the resource is locale-specific, we need to put them in properties files
packaged in locale-specific jar such as en-US-irc.jar so localizers can easily
translate irc to other languages.

Comment 24

16 years ago
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1

Comment 25

16 years ago
kat:
>> Do we need to encode strings from the .properties file too?
>
> What sort of string are you talking about? Localizable strings? or just
> some settings for charset.

I was asking if we need to pass strings we get from a string bundle through the
decoder.  Now that I've got an idea what's going on here, I see we don't because
they are already in unicode.

I've just landed the branch, and branded it 0.8.5-rc1 (release candidate one.) 
I'd like to get malvin's problem cleared up before mozilla 0.9.7.  Any debugging
help would be greatly appreciated.

In 0.8.5-rc1, users should be able to switch the charset on the fly by typing
'/eval setCharset("iso-2022-jp");' in chatzilla.  This setting should be
persisted in prefs for the next session.

Comment 26

16 years ago
There are two problems in 0.8.5-rc2.

(1) Second outgoing message after executing "/eval setCharset()"
 isn't converted.

function ucConvertOutgoingMessage (msg)
{
    if (client.ucConverter)
        return client.ucConverter.ConvertFromUnicode(msg);

    return msg;
}

If you create an instance of ucConverter every time you send message,
as in my first patch, ConvertFromUnicode() works well.


(2) Outgoing messages are always displayed as garbage in message display area.

if (!client.eventPump.getHook("uc-hook"))
{
    client.eventPump.addHook ([{type: "privmsg", set: "server"}],
        ucConvertIncomingMessage, "uc-hook");
}

This doesn't work for outgoing messages.

Comment 27

16 years ago
Created attachment 61005 [details] [diff] [review]
(2) easy fixed patch

Koike's (2) problem fixed patch.
Row charctor (UTF-8) send display area.

Comment 28

16 years ago
Created attachment 61013 [details] [diff] [review]
modified prevous patch

make "ucConvert" class, then (1) Problem fixed!

Comment 29

16 years ago
Created attachment 61183 [details] [diff] [review]
latest patch

outbound conversions need to be done in more places than just
sayToCurrentTarget.  Doing them in filterOutput (as I had done) was too early,
and resulted in us trying to display encoded text (instead of unicode) in the
output window.	I've added a fromUnicode() function and called it at each site
that sends plain text to the server that I can think of (I may have missed some
call sites.)

I'm not sure why only the first outbound message was converted (possibly
because of my createInstance vs. getService mixup) but it seems to be fixed
now.

I think adding a new class for this is a little too much, and re-creating the
xpcom component for every message processed is *definatley* too much.

I've tested this patch on irc.forestnet.org as described by malvin in comment
#20, and it looks like it works to me.	I see cyrillic characters in the
topics, and when I paste those characters in a private message to myself they
appear at both ends.

I'll post this to hacksrus as pre3 for further testing.

Thanks to everyone who has commented and attached patches to this bug, I
wouldn't have been able to fix this without your help.
Attachment #59819 - Attachment is obsolete: true
Attachment #61005 - Attachment is obsolete: true
Attachment #61013 - Attachment is obsolete: true

Comment 30

16 years ago
rc3, not pre3.

rc3 is now available on www.hacksrus.com/~ginda/chatzilla/.  Please test it out,
I'd like to check it in by tomorrow (which is the 0.9.7 close.)

Comment 31

16 years ago
rc3 doesn't convert the second outgoing message. But every message is 
displayed fine in local window.

Comment 32

16 years ago
Creating an instance of Unicode converter isn't a good solution.
But that is the only way we know now. I think we should adopt it 
as the temporary fix for 0.9.7.

Comment 33

16 years ago
I'm sorry, but I'm not sure I agree.  Creating a new encoder for every message
sent will just hide the real problem, which I'd much prefer to solve and get on
the 0.9.7 branch.

The koi8 encoder seems to work for me.  I attach to forestnet, /list #moldova,
and paste some of the characters from the topic into the input box.  I can /msg
those characters to myself multiple times, and they always look the same.

Could it be that the ISO-2022-JP encoder leaves itself in a bad state after
encoding the first message?  I'm trying to verify this, but nothing obvoius goes
wrong when I pass two ASCII messages through it.  Can you name an irc server
which has 2002-jp users so I can see the problem for myself?

Comment 34

16 years ago
/attach moznet, /join #mozillazine-jp.

If we put an ASCII character at the beginning of Japanese characters,
every outgoing message is converted correctly.

Comment 35

16 years ago
Created attachment 61316 [details] [diff] [review]
ad-hoc patch for iso-2022-jp

An ad-hoc patch for iso-2022-jp.

"iso-2022-jp" have some STATEs.
Once the STATE of ucConverter becomes a non-ascii charset,
it won't change until the next ascii char.

Trick: A dummy exec "client.ucConverter.ConvertFromUnicode('a');" changes the
STATE to ascii charset.

A matter of concern: Is ucConverter synchronized?

Comment 36

16 years ago
     client.ucConverter.ConvertFromUnicode('a');
     return client.ucConverter.ConvertFromUnicode(msg);

The following code also works.

     client.ucConverter.charset = client.CHARSET;
     return client.ucConverter.ConvertFromUnicode(msg);

Comment 37

16 years ago
shoji, what does it mean for the converter to be "synchronized"?

Comment 38

16 years ago
When ucConverter.convert(From|To)Unicode() is called by TWO (or more) callers
simultaniously, strings and STATE changers will be mixed.

fromUnicode() and toUnicode() must lock ucConverter.

in Java style,

fromUnicode(msg) {
  ...
  synchronized (client.ucConverter) {
    client.ucConverter.fromUnicode("a");
    return client.ucConverter.fromUnicode(msg);
  }
}

# oops.., I made a mistake in adhoc patch.
# client.ucConverter.ConvertFromUnicode('a') is called in toUnicode()..
# It must be client.ucConverter.ConvertToUnicode('a')


Comment 39

16 years ago
I think ConvertFromUnicode() should add "ESC ( B" at the end of 
the returned string. Then JavaScript code doesn't have to care about
STATE.

Comment 40

16 years ago
shom: We have no locking/synchronization constructs in xpcom or javascript, the
converter would have to provide it's own synchronization api.  More likely, the
converter should synchronize itself, so the caller doesn't have to worry about
the details.

kazhik: what is "ESC ( B" in bytes?  What does that sequence mean, and is it
valid for all encodings, or just iso-2022-jp?

Comment 41

16 years ago
"ESC ( B" means the beginning of ASCII characters.

ISO-2022-JP string in ChatZilla begins with "ESC $ B" and 
ends with no escape sequence. So the followin text is assumed 
to be ISO-2022-JP.


Comment 42

16 years ago
It seems nsScriptableUnicodeConverter::ConvertFromUnicode()
should call mEncoder->Finish() after mEncoder->Convert().


Comment 43

16 years ago
Created attachment 61399 [details] [diff] [review]
more patching

The escape sequence seems to have done the trick for the problems I saw with
iso-2022-jp.  In this patch, I am assuming that ESC(B is the ASCII sequence for
all iso-2022 encodings, can anyone verify that this is this a valid assumption?


I'll post this as rc5 in a minute.
Attachment #61183 - Attachment is obsolete: true
Attachment #61316 - Attachment is obsolete: true

Comment 44

16 years ago
> In this patch, I am assuming that ESC(B is the ASCII sequence for
> all iso-2022 encodings, can anyone verify that this is this a valid
> assumption.

It is. The final "B" is not unique in ISO-2022 encodings but 
"ESC ( B" is unique to ANSI X3.4-1986 (=ASCII).

Comment 45

16 years ago
I've checked the latest patch into the trunk.

kazhik, mEncoder->Finish() sounds like the right fix to me too, will you file an
i18n bug with a patch?

Comment 46

16 years ago
I posted bug 114923 for nsScriptableUnicodeConverter::ConvertFromUnicode()
problem.

Comment 47

16 years ago
Created attachment 61614 [details] [diff] [review]
patch for "charset" command

We need a command to change charset.

/charset iso-2022-jp
/charset euc-kr

Comment 48

16 years ago
I've just landed the client.ucConverter.charset = client.CHARSET; fix, along
with the /charset command on the trunk.

Providing everything works as expected, we'll have charset support in chatzilla
for 0.9.7!  Thanks again to all who helped out.

I'll mark this bug as fixed, please repoen if there are any problems.
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Comment 49

16 years ago
The charsets aren't working properly.

I can't see the right-sided languages support here, for example: hebrew !!!
I can only see it backwards (with both charsets:
ISO-5559-5 - should show it backwards
Windows-1255 - shoud switch it so DCBA will be ABCD).

Same thing with hebrew input.
I can only input in one way - not sure which one is it, may be it's a problem in
displaying or in the input, but I see it backwards !!!

Chatzilla should use the same charset methods that mozilla browser uses.

I would like this bug to be reopened.

Comment 50

16 years ago
m_vitaly, I think you should open another bug for bidi support
in Chatzilla. We need specialists in that area to diagnose what needs
to happen for Chatzilla to support Hebrew, Arabic and other bidi languages.
This bug put in basic charset support in Chatzilla and we should leave it
at that.
When you file a new bug, in addition to rginda@netscape.com, CC also
mkaply@us.ibm.com and smontagu@netscape.com.

Updated

16 years ago
Blocks: 128773

Comment 51

16 years ago
By the way...
Some IRC server supports command "codepage". I know about RusNet and ForestNet
servers for certain. (http://www.rus.net.ua and http://www.ForestNet.Org).

So user must type "/quote codepage koi8u" to get KOI8-U charset (ForestNet). Or 
"/quote codepage cp1251" for "windows-1251" charset.
It whould be just great to have this feature in ChatZilla (so it will send this
command to server if user changes it's client charset).

What do you think?

Comment 52

14 years ago
I believe the updated IRC standard supports character encoding negotiation
between client and server. So chatzilla should support this whenever the server
supports it and get rid of cryptic config file editing and have it "Just
Work(tm)" ;-)

Here are some useful links :

http://www.irc.org/tech_docs/005.html
http://www.irc.org/tech_docs/draft-brocklesby-irc-isupport-03.txt
Product: Core → Other Applications
You need to log in before you can comment on or make changes to this bug.