Closed Bug 41564 Opened 24 years ago Closed 23 years ago

Internationalize ChatZilla to handle different language scripts

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla1.0.1

People

(Reporter: m_kato, Assigned: oliver)

References

Details

Attachments

(2 files, 6 obsolete files)

patch 23 years ago Koike Kazuhiko 3.28 KB, patch		Details \| Diff \| Splinter Review
new patch 23 years ago Robert Ginda 4.48 KB, patch		Details \| Diff \| Splinter Review
(2) easy fixed patch 23 years ago Tsukasa Maruyama 1.28 KB, patch		Details \| Diff \| Splinter Review
modified prevous patch 23 years ago Tsukasa Maruyama 2.60 KB, patch		Details \| Diff \| Splinter Review
latest patch 23 years ago Robert Ginda 4.39 KB, patch		Details \| Diff \| Splinter Review
ad-hoc patch for iso-2022-jp 23 years ago Shoji Matsumoto 2.13 KB, patch		Details \| Diff \| Splinter Review
more patching 23 years ago Robert Ginda 5.81 KB, patch		Details \| Diff \| Splinter Review
patch for "charset" command 23 years ago Koike Kazuhiko 2.30 KB, patch		Details \| Diff \| Splinter Review

Makoto Kato [:m_kato]

Reporter

Description

•

24 years ago

Current implementation supports latin-1 encoding only.  But in Japanese, IRC 
encoding uses ISO-2022-JP encoding.

Please support multiple encoding for I18N.

Katsuhiko Momoi

Comment 1

•

24 years ago

I'm going to confirm this bug and summarize the current status of Mozilla chat below and
make some recommendations as to what the specs for internationalization should be.

1. Currently, we are able to deal with only Latin 1 (ISO-8859-1) charactes in the chat window.
We should use the practice elsewhere in Mozilla to send out Unicode and expect to receive
Unicode.
2. #1 will simplfiy dealings with character encoding issues among Mozilla Chatzilla users.
3. There are many existing IRC clients geared toward only single languages. As people communicate
across continents using Chatzilla and talk to other chat clients which might not know Unicode,
what we should have is a Character Coding menu like the one you find in Messenger or Browser
components. In fact you can copy the menu from there. Ask i18n people how to do this.
Then, when the Character coding menu is set to Japanese (ISO-2022-JP) -- as an example --, use
that encoding to both send chat data and also to interpet incoming data.
This way you will be able to deal with legacy clients.

4. Currently, we are not handling the CJK input method correctly. We are now commiting any entry when
the CR is pressed. IN CJK, pressing CR means different things depending on what the IME status is.
If it is in candidate state, pressing CR means "commit to canvas" but NOT send out the data.
When IME is not in candidate state, then presssing CR means to send the data out, etc.
Someone familiar with CJK IME should be able to fix this quickly.

Let's make Chatzilla into a great multilingual chat tool!

Status: UNCONFIRMED → NEW

Ever confirmed: true

QA Contact: rginda → momoi

Katsuhiko Momoi

Comment 2

•

24 years ago

There are other issues concerning this and m_kato has raised it in 
mozilla-i18n group. Kato-san, please send that message to rginda
who may not have seen it.

David Krause

Updated

•

24 years ago

Depends on: 27805

Makoto Kato [:m_kato]

Reporter

Updated

•

24 years ago

Status: NEW → ASSIGNED

Makoto Kato [:m_kato]

Reporter

Comment 3

•

24 years ago

Robert, sorry.  I mistake.
please assign to me.  I have a fix code

Robert Ginda

Comment 4

•

24 years ago

reassigning to m_kato

Assignee: rginda → m_kato

Status: ASSIGNED → NEW

Makoto Kato [:m_kato]

Reporter

Updated

•

24 years ago

Status: NEW → ASSIGNED

Makoto Kato [:m_kato]

Reporter

Comment 5

•

24 years ago

TODO plan:
o create scriptable Unicode convert interface. (bug 54857)
o add new command "/charset <charactor-set>"

Depends on: 54857

David Krause

Comment 6

•

24 years ago

*MASS SPAM*

Changing QA contact on all open or unverified ChatZilla bugs to me, David
Krause, as I am now the QA contact for this component.

QA Contact: momoi → David

Katsuhiko Momoi

Comment 7

•

24 years ago

David, the fact that you're the default contact does not 
mean that you should take over all the bugs. Some bugs
can be re-assigned to appropriate people. 
This bug should be QA'ed by an international contact
with machines and environments set up for that task.

For now changing it to ji@netscape.com.
We may assign this to someone in Mozilla.org Japan.
Maybe Koike-san can take over this one?

QA Contact: david → ji

Koike Kazuhiko

Comment 8

•

24 years ago

Kato-san said he would fix this until 1.0.

QA Contact: ji → kazhik

Target Milestone: --- → mozilla1.0

David Krause

Comment 9

•

24 years ago

Whoops, sorry about that I just did a "Change all bugs at once" and didn't
notice that I was removing a QA other than rginda.  I'll try to be more careful
next time.  It is true that I am not the best person to QA this type of bug.
Sorry again.

Koike Kazuhiko

Comment 10

•

23 years ago

Reassigning to Furukawa-san.

Assignee: m_kato → oliver

Status: ASSIGNED → NEW

Nikolai Prokoschenko

Comment 11

•

23 years ago

My experience is currently that ChatZilla doesn't allow the user to change fonts
- that way, even if Russian (significant for me) is correctly transferred, I see
only garbage because of latin1-font. Could someone please change the Summary
field - I am not sure, whether I am allowed to.

Katsuhiko Momoi

Comment 12

•

23 years ago

OK. I changed the summary line to include other language scripts
than Japanese. We need to deal with different language scripts
Mozilla provides support for.

Summary: Cannot use Japanese via IRC chat → Internationalize ChatZilla to handle different language scripts

Chase Tingley

Comment 13

•

23 years ago

*** Bug 102757 has been marked as a duplicate of this bug. ***

Neil Marshall

Updated

•

23 years ago

Blocks: patchmaker

Gervase Markham [:gerv]

Updated

•

23 years ago

No longer blocks: patchmaker

Chase Tingley

Comment 14

•

23 years ago

*** Bug 111216 has been marked as a duplicate of this bug. ***

Koike Kazuhiko

Comment 15

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Maruyama-san(mal@mozilla.gr.jp) and I have made a minimum patch to
handle non-ASCII characters in ChatZilla.

user_pref("extensions.irc.charset", "iso-2022-jp");

Users can set their default charset like this. 

We don't have UI or command to switch charset yet.

Robert Ginda

Comment 16

•

23 years ago

That's great, thanks for the contribution.  I'll check this into the chatzilla
0.8.5 branch, which will hopefully land in a week or so.

Depends on: 103386

Koike Kazuhiko

Comment 17

•

23 years ago

> user_pref("extensions.irc.charset", "iso-2022-jp");

Mistake. This should be:

user_pref("extensions.irc.default_charset", "iso-2022-jp");

Robert Ginda

Comment 18

•

23 years ago

Do we need to encode strings from the .properties file too?

Robert Ginda

Updated

•

23 years ago

Attachment #59093 - Attachment is obsolete: true

Robert Ginda

Comment 19

•

23 years ago

Attached patch new patch (obsolete) — Details — Splinter Review

I've reworked the patch a bit to integrate better with the existing codebase.
I've already checked this into the CHATZILLA_0_8_5_BRANCH, and will respin an
xpi for <http://www.hacksrus.com/~ginda/chatzilla/>.  The first xpi to have
this code will be 0.8.5-pre23, look for it in an hour or so.  The 0.8.5 branch
will hopefully land early next week, so please test this out asap.

Malx

Comment 20

•

23 years ago

I have installed new 0.8.5-pre23 from URL you have posted.
Mozilla already have had older ChatZilla client.
How could I find out version of Chatzilla I'm running now? Is it new (was it
updated by xpi ?)
Problem is - it doesn't show russian/ukrainian (koi8-r,koi8-u)
I have put both
user_pref("extensions.irc.charset", "koi8-r");
user_pref("extensions.irc.default_charset", "koi8-r");
into prefs.js

Network for test: ForestNet - irc.ForestNet.org
command /list (see topics)
change server side charset "/quote codepage koi8" (for koi8-u and koi8-r)

Katsuhiko Momoi

Comment 21

•

23 years ago

> Do we need to encode strings from the .properties file too?

What sort of string are you talking about? Localizable strings? or just
some settings for charset.

Katsuhiko Momoi

Comment 22

•

23 years ago

OK, let me CC tao about this. It looks like chatzilla.jr contains
localizable .dtd files. chatzilla.jar is self-contained as is venkman.jar.
How does localization work in this type of case? Should it not follow
the localization convention like en-US.jar? en-US.chatzilla.jar, for
example. So, it would make sense to insert the default charset of the
chatzilla client you're shipping into chatzilla.properties file. But that
should be easily discoverable by localizers. Suggestions?

tao

Comment 23

•

23 years ago

If the resource is locale-specific, we need to put them in properties files
packaged in locale-specific jar such as en-US-irc.jar so localizers can easily
translate irc to other languages.

Asa Dotzler [:asa]

Comment 24

•

23 years ago

Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)

Target Milestone: mozilla1.0 → mozilla1.0.1

Robert Ginda

Comment 25

•

23 years ago

kat:
>> Do we need to encode strings from the .properties file too?
>
> What sort of string are you talking about? Localizable strings? or just
> some settings for charset.

I was asking if we need to pass strings we get from a string bundle through the
decoder.  Now that I've got an idea what's going on here, I see we don't because
they are already in unicode.

I've just landed the branch, and branded it 0.8.5-rc1 (release candidate one.) 
I'd like to get malvin's problem cleared up before mozilla 0.9.7.  Any debugging
help would be greatly appreciated.

In 0.8.5-rc1, users should be able to switch the charset on the fly by typing
'/eval setCharset("iso-2022-jp");' in chatzilla.  This setting should be
persisted in prefs for the next session.

Koike Kazuhiko

Comment 26

•

23 years ago

There are two problems in 0.8.5-rc2.

(1) Second outgoing message after executing "/eval setCharset()"
 isn't converted.

function ucConvertOutgoingMessage (msg)
{
    if (client.ucConverter)
        return client.ucConverter.ConvertFromUnicode(msg);

    return msg;
}

If you create an instance of ucConverter every time you send message,
as in my first patch, ConvertFromUnicode() works well.


(2) Outgoing messages are always displayed as garbage in message display area.

if (!client.eventPump.getHook("uc-hook"))
{
    client.eventPump.addHook ([{type: "privmsg", set: "server"}],
        ucConvertIncomingMessage, "uc-hook");
}

This doesn't work for outgoing messages.

Tsukasa Maruyama

Comment 27

•

23 years ago

Attached patch (2) easy fixed patch (obsolete) — Details — Splinter Review

Koike's (2) problem fixed patch.
Row charctor (UTF-8) send display area.

Tsukasa Maruyama

Comment 28

•

23 years ago

Attached patch modified prevous patch (obsolete) — Details — Splinter Review

make "ucConvert" class, then (1) Problem fixed!

Robert Ginda

Comment 29

•

23 years ago

Attached patch latest patch (obsolete) — Details — Splinter Review

outbound conversions need to be done in more places than just
sayToCurrentTarget.  Doing them in filterOutput (as I had done) was too early,
and resulted in us trying to display encoded text (instead of unicode) in the
output window.	I've added a fromUnicode() function and called it at each site
that sends plain text to the server that I can think of (I may have missed some
call sites.)

I'm not sure why only the first outbound message was converted (possibly
because of my createInstance vs. getService mixup) but it seems to be fixed
now.

I think adding a new class for this is a little too much, and re-creating the
xpcom component for every message processed is *definatley* too much.

I've tested this patch on irc.forestnet.org as described by malvin in comment
#20, and it looks like it works to me.	I see cyrillic characters in the
topics, and when I paste those characters in a private message to myself they
appear at both ends.

I'll post this to hacksrus as pre3 for further testing.

Thanks to everyone who has commented and attached patches to this bug, I
wouldn't have been able to fix this without your help.

Attachment #59819 - Attachment is obsolete: true

Attachment #61005 - Attachment is obsolete: true

Attachment #61013 - Attachment is obsolete: true

Robert Ginda

Comment 30

•

23 years ago

rc3, not pre3.

rc3 is now available on www.hacksrus.com/~ginda/chatzilla/.  Please test it out,
I'd like to check it in by tomorrow (which is the 0.9.7 close.)

Koike Kazuhiko

Comment 31

•

23 years ago

rc3 doesn't convert the second outgoing message. But every message is 
displayed fine in local window.

Koike Kazuhiko

Comment 32

•

23 years ago

Creating an instance of Unicode converter isn't a good solution.
But that is the only way we know now. I think we should adopt it 
as the temporary fix for 0.9.7.

Robert Ginda

Comment 33

•

23 years ago

I'm sorry, but I'm not sure I agree.  Creating a new encoder for every message
sent will just hide the real problem, which I'd much prefer to solve and get on
the 0.9.7 branch.

The koi8 encoder seems to work for me.  I attach to forestnet, /list #moldova,
and paste some of the characters from the topic into the input box.  I can /msg
those characters to myself multiple times, and they always look the same.

Could it be that the ISO-2022-JP encoder leaves itself in a bad state after
encoding the first message?  I'm trying to verify this, but nothing obvoius goes
wrong when I pass two ASCII messages through it.  Can you name an irc server
which has 2002-jp users so I can see the problem for myself?

Koike Kazuhiko

Comment 34

•

23 years ago

/attach moznet, /join #mozillazine-jp.

If we put an ASCII character at the beginning of Japanese characters,
every outgoing message is converted correctly.

Shoji Matsumoto

Comment 35

•

23 years ago

Attached patch ad-hoc patch for iso-2022-jp (obsolete) — Details — Splinter Review

An ad-hoc patch for iso-2022-jp.

"iso-2022-jp" have some STATEs.
Once the STATE of ucConverter becomes a non-ascii charset,
it won't change until the next ascii char.

Trick: A dummy exec "client.ucConverter.ConvertFromUnicode('a');" changes the
STATE to ascii charset.

A matter of concern: Is ucConverter synchronized?

Koike Kazuhiko

Comment 36

•

23 years ago

     client.ucConverter.ConvertFromUnicode('a');
     return client.ucConverter.ConvertFromUnicode(msg);

The following code also works.

     client.ucConverter.charset = client.CHARSET;
     return client.ucConverter.ConvertFromUnicode(msg);

Robert Ginda

Comment 37

•

23 years ago

shoji, what does it mean for the converter to be "synchronized"?

Shoji Matsumoto

Comment 38

•

23 years ago

When ucConverter.convert(From|To)Unicode() is called by TWO (or more) callers
simultaniously, strings and STATE changers will be mixed.

fromUnicode() and toUnicode() must lock ucConverter.

in Java style,

fromUnicode(msg) {
  ...
  synchronized (client.ucConverter) {
    client.ucConverter.fromUnicode("a");
    return client.ucConverter.fromUnicode(msg);
  }
}

# oops.., I made a mistake in adhoc patch.
# client.ucConverter.ConvertFromUnicode('a') is called in toUnicode()..
# It must be client.ucConverter.ConvertToUnicode('a')

Koike Kazuhiko

Comment 39

•

23 years ago

I think ConvertFromUnicode() should add "ESC ( B" at the end of 
the returned string. Then JavaScript code doesn't have to care about
STATE.

Robert Ginda

Comment 40

•

23 years ago

shom: We have no locking/synchronization constructs in xpcom or javascript, the
converter would have to provide it's own synchronization api.  More likely, the
converter should synchronize itself, so the caller doesn't have to worry about
the details.

kazhik: what is "ESC ( B" in bytes?  What does that sequence mean, and is it
valid for all encodings, or just iso-2022-jp?

Koike Kazuhiko

Comment 41

•

23 years ago

"ESC ( B" means the beginning of ASCII characters.

ISO-2022-JP string in ChatZilla begins with "ESC $ B" and 
ends with no escape sequence. So the followin text is assumed 
to be ISO-2022-JP.

Koike Kazuhiko

Comment 42

•

23 years ago

It seems nsScriptableUnicodeConverter::ConvertFromUnicode()
should call mEncoder->Finish() after mEncoder->Convert().

Robert Ginda

Comment 43

•

23 years ago

Attached patch more patching — Details — Splinter Review

The escape sequence seems to have done the trick for the problems I saw with
iso-2022-jp.  In this patch, I am assuming that ESC(B is the ASCII sequence for
all iso-2022 encodings, can anyone verify that this is this a valid assumption?


I'll post this as rc5 in a minute.

Attachment #61183 - Attachment is obsolete: true

Attachment #61316 - Attachment is obsolete: true

Katsuhiko Momoi

Comment 44

•

23 years ago

> In this patch, I am assuming that ESC(B is the ASCII sequence for
> all iso-2022 encodings, can anyone verify that this is this a valid
> assumption.

It is. The final "B" is not unique in ISO-2022 encodings but 
"ESC ( B" is unique to ANSI X3.4-1986 (=ASCII).

Robert Ginda

Comment 45

•

23 years ago

I've checked the latest patch into the trunk.

kazhik, mEncoder->Finish() sounds like the right fix to me too, will you file an
i18n bug with a patch?

Koike Kazuhiko

Comment 46

•

23 years ago

I posted bug 114923 for nsScriptableUnicodeConverter::ConvertFromUnicode()
problem.

Koike Kazuhiko

Comment 47

•

23 years ago

Attached patch patch for "charset" command — Details — Splinter Review

We need a command to change charset.

/charset iso-2022-jp
/charset euc-kr

Robert Ginda

Comment 48

•

23 years ago

I've just landed the client.ucConverter.charset = client.CHARSET; fix, along
with the /charset command on the trunk.

Providing everything works as expected, we'll have charset support in chatzilla
for 0.9.7!  Thanks again to all who helped out.

I'll mark this bug as fixed, please repoen if there are any problems.

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

m_vitaly

Comment 49

•

23 years ago

The charsets aren't working properly.

I can't see the right-sided languages support here, for example: hebrew !!!
I can only see it backwards (with both charsets:
ISO-5559-5 - should show it backwards
Windows-1255 - shoud switch it so DCBA will be ABCD).

Same thing with hebrew input.
I can only input in one way - not sure which one is it, may be it's a problem in
displaying or in the input, but I see it backwards !!!

Chatzilla should use the same charset methods that mozilla browser uses.

I would like this bug to be reopened.

Katsuhiko Momoi

Comment 50

•

23 years ago

m_vitaly, I think you should open another bug for bidi support
in Chatzilla. We need specialists in that area to diagnose what needs
to happen for Chatzilla to support Hebrew, Arabic and other bidi languages.
This bug put in basic charset support in Chatzilla and we should leave it
at that.
When you file a new bug, in addition to rginda@netscape.com, CC also
mkaply@us.ibm.com and smontagu@netscape.com.

m_vitaly

Updated

•

23 years ago

Blocks: 128773

Malx

Comment 51

•

22 years ago

By the way...
Some IRC server supports command "codepage". I know about RusNet and ForestNet
servers for certain. (http://www.rus.net.ua and http://www.ForestNet.Org).

So user must type "/quote codepage koi8u" to get KOI8-U charset (ForestNet). Or 
"/quote codepage cp1251" for "windows-1251" charset.
It whould be just great to have this feature in ChatZilla (so it will send this
command to server if user changes it's client charset).

What do you think?

David Balažic

Comment 52

•

20 years ago

I believe the updated IRC standard supports character encoding negotiation
between client and server. So chatzilla should support this whenever the server
supports it and get rid of cryptic config file editing and have it "Just
Work(tm)" ;-)

Here are some useful links :

http://www.irc.org/tech_docs/005.html
http://www.irc.org/tech_docs/draft-brocklesby-irc-isupport-03.txt

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Core → Other Applications

You need to log in before you can comment on or make changes to this bug.