Closed Bug 73446 Opened 23 years ago Closed 20 years ago

Need to know how to convert between local encoding and UCS2, e.g., Need NS_ConvertUCS2ToLocalEncoding() and NS_ConvertLocalEncodingToUCS2()

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: roland.mainz, Assigned: ftang)

References

Details

(Keywords: intl)

(Based on discussion on IRC #mozilla with scc):
There's need for a fucntion which converts a "PRUnichar *" to a "char *" in
system's local encoding - whatever it is (en_US.UTF-8, ja.UTF-8, xzy.UTF-123).

Example usage would be a function which get's a document title for printing
(which is a PRUnichar *title) and likes to feed it to "/usr/bin/lp -t $MYTITLE"
- which is called via a function like lpPrintSetTitle( x, y, const char *title )
where "title" must be in system's locale...
On scc's request - making it a blocker for bug 73009.
Making it block bug 72087 ("Xprint major revamp") as the current method to get a
char * from PRUnichar * is more than silly (quoting it here just for fun as an
example how "not to do this" =:-) :
-- snip --
// stolen from mozilla/webshell/embed/xlib/qt/QMozillaContainer.cpp
// helper fuction for BeginLoadURL, ProgressLoadURL, EndLoadURL
// XXX Dont forget to delete this 'C' String since we create it here
static char* makeCString( const PRUnichar* aString )
{
        int     len = 0;
        const PRUnichar*        ptr = aString;
        
        while ( *ptr ) len++, ptr++;

        char    *cstring = new char[ ++len ];

        // just cast down to a character
        while ( len >= 0 )
        {
                cstring[len] = ( char )aString[len];
                len--;
     }
        return cstring;
  }
-- snip --

Blocks: 72087, 73009
1. Reverse way, too (please :-)
2. It would be usefull to have a function like NS_ConvertUCS2ToCOMPOUND_TEXT
(and the reverse way). I assume the function would be identical to the
NS_ConvertUCS2ToLocalEncoding()/NS_ConvertLocalEncodingToUCS2() - but a set of
wrapper functions would be usefull to indicate that these functions are
_specially_ for handling the X11 COMPOUND_TEXT datatype (maybe there are
differences... and having a "special" function which catches possible exceptions
from normal LocalEncoding behaviour would be usefull in such cases).
Summary: RFE: Need NS_ConvertUCS2ToLocalEncoding() → RFE: Need NS_ConvertUCS2ToLocalEncoding() and NS_ConvertLocalEncodingToUCS2()
All he wants is the knowledge of how to get a UCS2 string into the appropriate
encoding for the current locale (and perhaps vice versa); I'm sure this
functionality already exists in i18n land, and someone just needs to explain how
to use it.  The names he suggests in the summary are just based on his knowledge
of existing string routines.  He doesn't necessarily need the conversion
functionality in that form ... that's just the only way he knew how to ask for it.

So how do you do this?
Assignee: scc → nhotta
Severity: enhancement → normal
Component: String → Internationalization
QA Contact: scc → andreasb
Summary: RFE: Need NS_ConvertUCS2ToLocalEncoding() and NS_ConvertLocalEncodingToUCS2() → Need to know how to convert between local encoding and UCS2, e.g., Need NS_ConvertUCS2ToLocalEncoding() and NS_ConvertLocalEncodingToUCS2()
No longer blocks: 73009
I think the similar thing has been already taken care by nsILocalFile which
converts between OS file system charset and UCS2. So that may be used if that
fits with your requirement.

nsICharsetConverterManager2 is the interface for charset conversion.
Using lxr, you should be able to find many examples which use that interface.
I am looking for a function which explicitly says that "I am converting UCS2 to
X11_COMPOUND_TEXT" (and backwards). Basically (except that there _may_ be
exceptions...) COMPOUND_TEXT is the same as Xserver's local encoding ($LANG).
Does nsICharsetConverterManager2 support this ?
I am not familiar with "COMPOUND_TEXT" (and UNIX in general).
I searched lxr an found "COMPOUND_TEXT" in widget/src/gtk/nsClipboard.cpp.
I believe charset conversion is also happening there.

What you can do now is
1. call nsIPlatformCharset to find the charset of the current system. You need to 
pass down a parameter for what you ask for because it is possible in the future 
your clipboard encoding maybe different from your window manager encoding.
2. use that charset to find an nsIUnicodeDeocder or nsIUnicodeEncoder, and then 
you can convert between PRUnichar* and char*
3. We want to keep the string class in the low level function w/o dependency on 
unicode converter for now. 

regarding to your comment about
>COMPOUND_TEXT is the same as Xserver's local encoding ($LANG).

This is a false statement. COMPOUND_TEXT is an universial encoding scheme which 
is locale independent. We currently have no function to convert to compound text 
yet but we may one day. The reason we pass a selector to the nsIPlatformCharset 
is exactly design for it. One day we may need to use COMPOUND_TEXT as the 
clipboard format but ISO-8859-2 for window title. 

compound text use ISO-2022 esc sequence to switch between charset. 

the reason I don't want to make a convientent function to convert unicode to 
local encoding is because there are a lot of case which the encoding is NOT local 
encoding (for example, if I view a rfc822 message, the charset is what ever got 
labeled in the message, not the one you label.) An implict function which do not 
take charest as parameter will encourage api designer ignore passing charset 
information from top to bottom. Not a perfect argument, but it is much easier to 
change implementation than interface. 
Target Milestone: --- → Future
> regarding to your comment about
> COMPOUND_TEXT is the same as Xserver's local encoding ($LANG).

> This is a false statement. 

That means that
http://lxr.mozilla.org/mozilla/source/widget/src/xlib/nsClipboard.cpp#147
is _wrong_, right ?

> COMPOUND_TEXT is an universial encoding scheme 
> which is locale independent. We currently have no function to convert to 
> compound text yet but we may one day. The reason we pass a selector to the 
> nsIPlatformCharset is exactly design for it. One day we may need to use 
> COMPOUND_TEXT as the clipboard format but ISO-8859-2 for window title.

What about implementing a function which claims to convert from/to COMPOUND_TEXT
but is currently only a dummy ? IMHO it would be better to have something _now_
which offers the "correct" API instead of forcing the programmers to introduce
all their own "workarounds"... which need to be _found_ and removed later...
Reassign to ftang.
Assignee: nhotta → ftang
mark all future new as assigned after move from erik to ftang
Status: NEW → ASSIGNED
Switching qa contact to teruko for now.
Keywords: intl
QA Contact: andreasb → teruko
nsNativeCharsetUtils.cpp was checked in on 2002-06-10. (in xpcom/io). Is this 
bug still valid with that implemented? 

 http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsNativeCharsetUtils.cpp
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.