Closed Bug 332646 Opened 18 years ago Closed 13 years ago

Investigate using native uconv on Win32

Categories

(Core :: Internationalization, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dougt, Unassigned)

References

Details

I wrote a native Unicode converter for windows ce which also works on Windows 32. This allows us to remove alot of code+data from firefox (on ARM i saved .67MB). This might be something we could enable on the trunk since we have dropped Windows 98 support. To create a build, just add this like to your mozconfig:

ac_add_options --enable-native-uconv

The code for this converter lives here:


http://lxr.mozilla.org/mozilla/source/intl/uconv/native/nsWinCEUConvService.cpp
This is what I sent to Doug in response to his email seeking my opinion (with emphasis added around 'consistently')

 I'm not sure what you had in mind when you wrote 'This might be
 something we could enable on the trunk since we have dropped Windows
 98 support.'. When we interact with the OS, we rely on OS APIs
 (nsNativeCharsetUtils.cpp), but we need our converters (unless there's
 an acute need to save memory footprint as in minimo) when dealing with
 web pages/forms/mail messages that come from outside and that we send
 out to the wild because our converters do more than what the OS APIs
 can do and do things a little differently. They also enable us to
 handle all sorts of character encodings **consistently** across platforms.
 In short, I don't see any connection between dropping support for
 Win98 and using the native uconv.

 Please, 'enlighten' me if I'm missing anything and I'd be glad to
 stand corrected.
(In reply to comment #1)

>  In short, I don't see any connection between dropping support for
>  Win98 and using the native uconv.

Ok. I can see some connections in that WideCharToMultiByte and MultibytoWideChar on Win 2k or later support a lot more encodings than on Win 9x/ME. Still, I don't like some of their converters and I prefer ours to theirs. Moreover, I'm loath to give up the consistency across platforms. Neither do I like to let go our control over the way incoming data stream is interpreted and outgoing data is encoded.  

thanks for your response.  are there any encoding/decoding tests that we can run to see if there are significant differences between the native uconv and the mozilla converters?
(In reply to comment #3)
> are there any encoding/decoding tests that we can
> run to see if there are significant differences between the native uconv and
> the mozilla converters?

http://smontagu.damowmow.com/encodingtest.html
I don't like using native uconv too.
If the native uconvs don't have compatibility on each versions of Windows (including future releases), we are not happy...
maybe this isn't so much "I am demanding that we use native uconv", but rather "is there a way to reduce code+data bloat in the uconv code.  your helps and ideas in making improvements in this area are important.  What can we do?
By the way, does MultibyteToWideChar emit UTF-16 or UCS-2? Testcase: http://www.i18nguy.com/unicode-plane1-utf8.html
MultibyteToWideChar emits UTF-16 from Windows 2000 upward. It might break on the 9x, but it will be very hard to display non BMP characters on those anyway.

670 Kb just for encodings is really a lot. It might be worth thinking of a way to do better on this point.
dougt: what sort of performance improvement do you see from this change?  is it just a codesize savings?  does that translate to performance in this case?  i tend to agree with jshin+masayuki+smontagu.  i18n consistency across ff builds is important, so this change sounds risky.
i have not measured perf.  I will post a engineering build shortly.

Also, I am not avocating breaking consistency for the sake of it.  See comment #6. 
not sure what it means.  Build running native uconv against the tests yields these failures.

iso-8859-3 28 codepoint(s) failed
iso-8859-6 48 codepoint(s) failed
iso-8859-7 5 codepoint(s) failed
iso-8859-8 32 codepoint(s) failed
iso-8859-10 46 codepoint(s) failed
iso-8859-11 87 codepoint(s) failed
iso-8859-13 56 codepoint(s) failed
iso-8859-14 32 codepoint(s) failed
iso-8859-16 40 codepoint(s) failed

Shift-JIS  3 codepoint(s) failed
windows-936 5 codepoint(s) failed

I didn't run anything past the Windows-949 Korean testcase.  

A build with --enable-native-uconv doesn't get a scriptable unicode converter, which is pretty essential for various code. Lack of it completely breaks ChatZilla and Venkman, for example (bug 327835, bug 327827).
(In reply to comment #11)
> Shift-JIS  3 codepoint(s) failed
> windows-936 5 codepoint(s) failed
Native uconv randomly fails when converting multibyte charset because MultiByteToWideChar can't hold the state.
At least, you will have to use IMultiLanguage. (I don't know whether WinCE supports IMultiLanguage)
(In reply to comment #13)

> At least, you will have to use IMultiLanguage. (I don't know whether WinCE
> supports IMultiLanguage)

That's not a good idea, either if it means we have to rely on the presence of MS IE. It seems like the trunk build already does (it uses Mlang), but IMHO, we should try to get rid of that dependency. 


On trunk the minimum system requirements are Win2k, so we can rely on any components which are shipped with that.
ideally, this code would only depend on stuff that windows ce 4.2 would have so that I don't get left out implementing something on my own.
IMultiLanguage requires Windows CE .NET 4.0 and later per MSDN.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wceielng/html/cerefimultilanguageiunknown.asp

Nonetheless if we don't use IMultiLanguage, I propose WONTFIXing this bug.
1. Both of intl owners opposed using native uconv.
2. Saving code size will be little unless we use native uconv for multibyte 
   charsets (namely GBK, UHC, and Shift_JIS).
(In reply to comment #15)
> On trunk the minimum system requirements are Win2k, so we can rely on any
> components which are shipped with that.

What I was implying was that if we rely on MS IE's presence (in this case 'Mlang'), the whole point of developing firefox is sort of moot in a sense. To me, for us to depend on Mlang looks like MS IE depending on our intl library.


if you mark WONTFIX, please reopen a new bug to address my comment #6.
mass reassigning to nobody.
Assignee: dougt → nobody
QA Contact: amyy → i18n
Depends on: 644801
Native uconv is gone.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.