Closed Bug 208363 Opened 21 years ago Closed 18 years ago

mozilla -compose cannot interpret UTF-8 String correctly

Categories

(SeaMonkey :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joey.shen, Assigned: smontagu)

References

Details

(Keywords: fixed-seamonkey1.1b)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507

mozilla -compose commandline allow user to add body and subject to the message
compose window, but currently it cannot correctly display bodies and subjects
encoded with UTF-8. 

Reproducible: Always

Steps to Reproduce:
1.run the following commandline:
$mozilla -compose body="some UTF-8 String"
2.
3.

Actual Results:  
If the UTF-8 String contains for example chinese characters, those characters
won't be displayed correctly

Expected Results:  
correctly displays all the characters encoded by UTF-8

Not sure which charset is assumed by the mozilla commandline. the String objects
and conversions in nsAppRunner are some kind of old fashion. I will submit a
patch for handling all the arguments input from commandline as UTF-8 String.
This may not be a good enough solution, just paste for further discussions.
-compose commandline works fine with the patch.
The key point here is whether treating all the commandline arguments as UTF-8
String will affect behaviors of other current commands. 
Personally I think make UTF-8 the default "commandline charset" is reasonable.
Attachment #124965 - Flags: superreview?(dmose)
Attachment #124965 - Flags: review?(smontagu)
add i18n expert to the cc list
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 124965 [details] [diff] [review]
accept commandline arguments as UTF-8 String

I'm not 100% sure this is the right fix.  Should we really be using the system
native charset rather than UTF-8?  However, it seems unlikely to be worse than
what we've got now, so if darin and smontagu are OK with it,
sr=dmose@mozilla.org
Attachment #124965 - Flags: superreview?(dmose) → superreview+
Comment on attachment 124965 [details] [diff] [review]
accept commandline arguments as UTF-8 String

We should definitely be using the system default charset. What we do now
(implicitly assuming ISO-8859-1) is wrong, but forcing UTF-8 may cause a
regression for some people.
Attachment #124965 - Flags: review?(smontagu) → review-
yeah, assuming UTF-8 is way wrong.  you should use mozilla's uconv library to
convert from the platform/native charset to unicode.  see nsIUnicodeEncoder,
nsIPlatformCharset, and nsICharsetConverterManager2.
OK, I understand. But I think there should be some way for the commandline to
accept utf8 String directly. Assume the condition that the system default
charset is ISO-8859-1, while I pass a utf8 String contains Chinese characters, I
could lost the charset information.

what about a switch like -utf8? with this switch, commandline would accept
arguments as utf8 String. or maybe there should be a switch and arg like:
-charset iso-8859-1 or -charset utf8.

What's your opinion?
if you want to pass an URL with UTF-8 characters through the command line then
if the platform charset is not UTF-8 you really must URL escape the non-ASCII
bytes.  i don't think it is worth it to add a switch to specify the command line
charset.  the command line charset should simply match the platform charset
(i.e., whatever nsIPlatformCharset would provide).
as discussed with smontagu, the command line has a well-understood contract that
arguments are in the platform charset. If an application is passing around UTF8
command line arguments, then it is the application that is broken..

However, we have our own bugs as this conversation describes. We should be going
through the nsICharsetConverterManager to convert from the platform charset when
processing an argument. One approach to reduce the amount of code in
nsAppRunner.cpp is to make nsICommandLineService give out unicode or UTF8
strings, thereby sheilding the consumers of nsICommandlineService from any kind
of string conversions..

I'd personally vote for nsICommandLineService to give out UTF8 (rather than
unicode) and use jshin's nsIUTF8ConverterService that he'll be landing soon in
bug 162765. That bug also has UTF8 detection routines which we could use to
detect UTF8 if it is (incorrectly) passed in on the command line.
In my case of using mozilla commandline, I launched mozilla -compose
commandline from Java code, where the body and subject arguments would be
encoded by UTF-8. I don't it's incorrect for Java application to pass UTF-8
strings, the limitation of commandline to platform charset, would badly
affect the ability of multi-language supported application, which was trying to
launch mozilla by commandline.

Maybe I should make the question much simpler: How could I pass, say, chinese
characters, to mozilla by commandline when the platform charset is for example
ISO-8859? 
Any suggestions?

BTW: the commandline I used looks like:
$mozilla -compose to=somebody@somewhere,subject=hi,body=Hello
Joe: have you tried URL escaping the non-ASCII bytes?

mozilla -P test -compose
to=somebody@somewhere,subject=hi%20how%20are%20you,body=Hello

results in a subject that reads "hi how are you"

i'm not sure exactly how moz will treat unescaped bytes, but perhaps it can be
made to work.
Another workaround for the command line is to find (or write) a tool which
converts UTF-8 to URL-encoding, and then use a command line like

./mozilla -compose body=$(urlencode <random utf-8 string>)

But if you are doing this from Java you can do the URL-encoding internally and
you won't need that :-)
looks like the compose command line handler only recognizes certain escape
sequences like %20, but not others.  i'm not sure why.  we need to ask someone
on the mailnews team.
I just came across this bug while looking for other bugs :-). Anyway, I'll try 
to implement what Alec suggested in comment #9. 
 
Keywords: intl
*** Bug 229016 has been marked as a duplicate of this bug. ***
*** Bug 291294 has been marked as a duplicate of this bug. ***
Following discussion on #seamonkey, we probably don't want to go to the trouble of modifying nsICommandLineService if it's soon going to be superseded by the toolkit command line handler. This patch just NS_CopyNativeToUnicodes the command line arguments.
Attachment #124965 - Attachment is obsolete: true
Attachment #230481 - Flags: superreview?(neil)
Attachment #230481 - Flags: review?(jshin1987)
Assignee: law → general
Component: Cmd-line Features → General
Product: Core → Mozilla Application Suite
QA Contact: bugzilla → general
Assignee: general → smontagu
Comment on attachment 230481 [details] [diff] [review]
Convert command line arguments from native charset to unicode

This is the right thing to do by comparison with DoCommandLines().
Attachment #230481 - Flags: superreview?(neil)
Attachment #230481 - Flags: superreview+
Attachment #230481 - Flags: review?(jshin1987)
Attachment #230481 - Flags: review+
Checked in.
Status: NEW → RESOLVED
Closed: 18 years ago
Flags: blocking-seamonkey1.1a?
Resolution: --- → FIXED
I wouldn't block the release by this, but I think it's fine if it goes in :)
Flags: blocking-seamonkey1.1a? → blocking-seamonkey1.1a-
Attachment #230481 - Flags: approval-seamonkey1.1a+
This doesn't work on the branch, there's a dependency (bug 58523?)
Backing out for now.
Comment on attachment 230481 [details] [diff] [review]
Convert command line arguments from native charset to unicode

Rerequesting approval bearing in mind that this patch depends on one in bug 305949 or 312287 (I can't remember which).
Attachment #230481 - Flags: approval-seamonkey1.1b?
Attachment #230481 - Flags: approval-seamonkey1.1b? → approval-seamonkey1.1b+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: